Cost and Deployment Tradeoffs
API vs. self-hosted, which model tier, and how to control AI costs
What it is
Key deployment decision axes:
API vs. self-hosted: APIs have zero infrastructure overhead but variable cost that scales with usage, potential data privacy issues, and vendor dependence. Self-hosted open-weight models have fixed infrastructure cost, full data privacy, and customization flexibility, but require ML engineering expertise.
Model tier selection: Larger models are more capable but more expensive and slower. Routing easy queries to cheaper small models and hard queries to capable large models (model routing) is a common optimization.
Cost controls: Prompt caching (reusing common system prompt tokens), batch processing, output length caps, and smart routing are the primary levers.