Training vs. Inference
Building a model vs. running one, fundamentally different compute profiles
What it is
Training is the process of updating a model's parameters by running data through it and computing gradients, the mathematical recipe for how each parameter should change to reduce prediction error. It requires storing all intermediate activations for backpropagation, making it extremely memory-intensive.
Inference is using a trained model to generate outputs, a forward pass only, no gradients stored. It's much cheaper: the same GPU used for training can serve many more inference requests. You can also batch multiple inference requests together, further improving efficiency.
For frontier models, training costs hundreds of millions of dollars and happens rarely. Inference happens billions of times per day at a tiny fraction of the per-query cost.