Advanced Training Mechanics
Fine-tuning Specifics
The technical details of how fine-tuning actually works
What it is
Fine-tuning updates all or a subset of a pre-trained model's weights on a new dataset. Full fine-tuning updates everything, computationally expensive and risks forgetting. LoRA (Low-Rank Adaptation) is the dominant efficient fine-tuning technique: it freezes the base model weights and adds small trainable rank-decomposed matrices to attention layers, reducing trainable parameters by 90%+ while preserving most performance.
QLoRA extends this by quantizing the frozen base model to 4-bit precision, enabling fine-tuning of 65B models on a single consumer GPU.
Training data quality and format matter as much as architecture choice, garbage in, garbage out applies doubly to fine-tuning.