支持unsloth的multi gpu fine tuning（人工智能）

分类：人工智能发布时间 2026-03-17 03:49 路浏览 69 路点赞 0

#人工智能 #模型训练 #微调

Multi-GPU Fine-tuning with Unsloth | Unsloth Documentation

Multi-GPU Fine-tuning with Unsloth

Learn how to fine-tune LLMs on multiple GPUs and parallelism with Unsloth.

Unsloth currently supports multi-GPU setups through libraries like Accelerate and DeepSpeed. This means you can already leverage parallelism methods such as FSDP and DDP with Unsloth.

See our new Distributed Data Parallel (DDP) multi-GPU Guide here.

We know that the process can be complex and requires manual setup. We’re working hard to make multi-GPU support much simpler and more user-friendly, and we’ll be announcing official multi-GPU support for Unsloth soon.

For now, you can use our Magistral-2509 Kaggle notebook as an example which utilizes multi-GPU Unsloth to fit the 24B parameter model or our DDP guide.

In the meantime, to enable multi GPU for DDP, do the following:

Create your training script as train.py (or similar). For example, you can use one of our training scripts created from our various notebooks!
Run accelerate launch train.py or torchrun --nproc_per_node N_GPUS train.py where N_GPUS is the number of GPUs you have.

Pipeline / model splitting loading

If you do not have enough VRAM for 1 GPU to load say Llama 70B, no worries - we will split the model for you on each GPU! To enable this, use the device_map = "balanced" flag:

from unsloth import FastLanguageModelmodel, tokenizer = FastLanguageModel.from_pretrained("unsloth/Llama-3.3-70B-Instruct",load_in_4bit=True,device_map="balanced",)

Stay tuned for our official announcement!
For more details, check out our ongoing Pull Request discussing multi-GPU support.

PreviousDGX Station NextDistributed Data Parallel (DDP)

Last updated 18 days ago

Was this helpful?

上一篇：路由 (routing) 功能简析（下）（免费机场）
下一篇：lmdeploy（大模型应用与知识增强项目｜人工智能）

还没有评论，欢迎抢沙发。

发表评论

会员中心

会员登录注册账号