Sign in
Advanced Training Mechanics

Fine-tuning Specifics

The technical details of how fine-tuning actually works

What it is

Fine-tuning updates all or a subset of a pre-trained model's weights on a new dataset. Full fine-tuning updates everything, computationally expensive and risks forgetting. LoRA (Low-Rank Adaptation) is the dominant efficient fine-tuning technique: it freezes the base model weights and adds small trainable rank-decomposed matrices to attention layers, reducing trainable parameters by 90%+ while preserving most performance.

QLoRA extends this by quantizing the frozen base model to 4-bit precision, enabling fine-tuning of 65B models on a single consumer GPU.

Training data quality and format matter as much as architecture choice, garbage in, garbage out applies doubly to fine-tuning.

Why it matters

LoRA and QLoRA make fine-tuning accessible beyond organizations with massive compute budgets. Understanding the tradeoffs between full fine-tuning, LoRA, and QLoRA (and the practical requirements (data size, GPU memory, training time)) helps you plan feasibly and evaluate fine-tuning services offered by model providers.

Related concepts

Resources

PreviousContinual Learning
NextEnd of section