Acceleration¶
LLaMA-Factory supports multiple acceleration techniques, including: FlashAttention, Unsloth, Liger Kernel.
FlashAttention¶
FlashAttention can speed up attention mechanism computation while reducing memory usage.
If you want to use FlashAttention, please add the following parameters to the training configuration file when starting training:
flash_attn:fa2Unsloth¶
The Unsloth framework supports large language models such as Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, and supports 4-bit and 16-bit QLoRA/LoRA fine-tuning. This framework improves computation speed while reducing memory usage.
If you want to use Unsloth, please add the following parameters to the training configuration file when starting training:
use_unsloth:TrueLiger Kernel¶
Liger Kernel is a performance optimization framework for large language model training that can effectively improve throughput and reduce memory usage.
If you want to use Liger Kernel, please add the following parameters to the training configuration file when starting training:
enable_liger_kernel:True