Sign in
Scaling & Data

Scaling Laws

The mathematical relationships that predict how AI models improve with scale

What it is

Scaling laws are empirical mathematical relationships that describe how model performance improves predictably with increases in model size, training data, and compute. The key finding from DeepMind's Chinchilla paper: given a fixed compute budget, there's an optimal allocation between model size and data quantity.

The Chinchilla optimal ratio suggests training a model on approximately 20 tokens per parameter (e.g., a 7B model should train on ~140B tokens for compute-optimal training). Earlier models like GPT-3 were significantly undertrained by this metric.

Scaling laws are why AI labs can make confident predictions about what a training run will achieve before spending the compute.

Why it matters

Scaling laws are the intellectual foundation for the massive investment in AI compute. "We know spending more will make the model better, and we can predict by how much" is an unusually strong basis for capital allocation. They also explain why model development strategy changed significantly after the Chinchilla paper, labs shifted from training bigger models to training more data-efficient ones.

Related concepts

Resources

PreviousBeginning of section
NextSynthetic Data