Training Costs
Why building frontier models costs hundreds of millions, and what that means
What it is
Pre-training a frontier model costs hundreds of millions of dollars and involves running data centers consuming megawatts of power for months. The costs come from: hardware (thousands of H100 GPUs at high rental or purchase cost), energy (power and cooling), engineering time for training infrastructure, and the R&D compute spent on experiments before the final run.
For major labs, the majority of spending goes to R&D compute (experiments, ablations, failed runs), not the final model run. A training failure partway through wastes enormous resources.
This is why only well-capitalized organizations (backed by Microsoft, Google, or significant venture capital) can train at the frontier.