Sign in
Hardware & Compute

Training Costs

Why building frontier models costs hundreds of millions, and what that means

What it is

Pre-training a frontier model costs hundreds of millions of dollars and involves running data centers consuming megawatts of power for months. The costs come from: hardware (thousands of H100 GPUs at high rental or purchase cost), energy (power and cooling), engineering time for training infrastructure, and the R&D compute spent on experiments before the final run.

For major labs, the majority of spending goes to R&D compute (experiments, ablations, failed runs), not the final model run. A training failure partway through wastes enormous resources.

This is why only well-capitalized organizations (backed by Microsoft, Google, or significant venture capital) can train at the frontier.

Why it matters

Training cost context is essential for industry analysis. It explains why AI labs need massive investment, why the hyperscaler advantage is so significant, why open-weight models are a big deal (they let smaller organizations access frontier-class capability), and why many "AI startups" are actually wrapper businesses rather than model developers.

Related concepts

Resources