Scaling Laws
The mathematical relationships that predict how AI models improve with scale
What it is
Scaling laws are empirical mathematical relationships that describe how model performance improves predictably with increases in model size, training data, and compute. The key finding from DeepMind's Chinchilla paper: given a fixed compute budget, there's an optimal allocation between model size and data quantity.
The Chinchilla optimal ratio suggests training a model on approximately 20 tokens per parameter (e.g., a 7B model should train on ~140B tokens for compute-optimal training). Earlier models like GPT-3 were significantly undertrained by this metric.
Scaling laws are why AI labs can make confident predictions about what a training run will achieve before spending the compute.