Sign in
Key Concepts

Training vs. Inference

Building a model vs. running one, fundamentally different compute profiles

What it is

Training is the process of updating a model's parameters by running data through it and computing gradients, the mathematical recipe for how each parameter should change to reduce prediction error. It requires storing all intermediate activations for backpropagation, making it extremely memory-intensive.

Inference is using a trained model to generate outputs, a forward pass only, no gradients stored. It's much cheaper: the same GPU used for training can serve many more inference requests. You can also batch multiple inference requests together, further improving efficiency.

For frontier models, training costs hundreds of millions of dollars and happens rarely. Inference happens billions of times per day at a tiny fraction of the per-query cost.

Why it matters

This distinction directly informs cost conversations with clients and stakeholders. "Why is OpenAI so expensive?" relates to inference costs at scale. "Why can't I just train my own model?" relates to training costs. "Why is fine-tuning cheaper than pre-training?", the compute profile is completely different. Understanding this split also explains why specialized inference hardware is a distinct market.

Resources

Deep Dive into LLMs like ChatGPT (full training pipeline)
youtube.com· The most comprehensive beginner-friendly walkthrough of the full pipeline: pre-training → supervised fine-tuning → RLHF → inference. Uses GPT-2 and Llama 3.1 as hands-on examples.
211 min
How LLMs Work: Pre-Training to Post-Training
towardsdatascience.com· Distills the key concepts from Karpathy's 3.5-hour video into a readable format. Perfect for recruits who want the concepts without the full time commitment.
10 min
AI Inference vs. Training
cloudflare.com· Clean, concise two-concept explainer. Good analogy: learning to recognize stop signs (training) → recognizing them on new roads (inference).
7 min
AI 101: Training vs. Inference
backblaze.com· Fun Sherlock Holmes analogy (Watson learning = training, Holmes deducing = inference). Covers neural network basics, training types, and real-world inference applications.
10 min
Training vs. Inference
bentoml.com· Concise side-by-side comparison table. Great quick reference. Addresses the practical reality that inference is often the bigger long-term cost.
8 min