Sign in
Practical Decision-Making

Cost and Deployment Tradeoffs

API vs. self-hosted, which model tier, and how to control AI costs

What it is

Key deployment decision axes:

API vs. self-hosted: APIs have zero infrastructure overhead but variable cost that scales with usage, potential data privacy issues, and vendor dependence. Self-hosted open-weight models have fixed infrastructure cost, full data privacy, and customization flexibility, but require ML engineering expertise.

Model tier selection: Larger models are more capable but more expensive and slower. Routing easy queries to cheaper small models and hard queries to capable large models (model routing) is a common optimization.

Cost controls: Prompt caching (reusing common system prompt tokens), batch processing, output length caps, and smart routing are the primary levers.

Why it matters

AI costs can spiral quickly at production scale. A model costing $0.01 per query at 100 queries/day costs $30K/month at 100K queries/day. Learning to think about AI economics before committing to an architecture prevents expensive rebuilds. These tradeoffs come up in every serious AI product conversation.

Related concepts

Resources