Reasoning Models
The current frontier, models that think before they answer
What it is
Reasoning models are instruction-tuned models further trained with RLVR to develop extended chain-of-thought reasoning capabilities. They generate "thinking" tokens before their final answer, working through the problem step by step, trying approaches, catching errors, and backtracking when something doesn't work.
Examples include OpenAI's o1, o3; Anthropic's Claude with extended thinking; and Google's Gemini Flash Thinking. These models dramatically outperform standard instruction-tuned models on math, coding, and complex multi-step reasoning tasks.
The tradeoff: reasoning tokens cost more (latency and API cost scale with thinking depth), and they're overkill for simple tasks. The best systems route simple queries to fast instruction-tuned models and hard queries to reasoning models.