Model Types

Reasoning Models

The current frontier, models that think before they answer

What it is

Reasoning models are instruction-tuned models further trained with RLVR to develop extended chain-of-thought reasoning capabilities. They generate "thinking" tokens before their final answer, working through the problem step by step, trying approaches, catching errors, and backtracking when something doesn't work.

Examples include OpenAI's o1, o3; Anthropic's Claude with extended thinking; and Google's Gemini Flash Thinking. These models dramatically outperform standard instruction-tuned models on math, coding, and complex multi-step reasoning tasks.

The tradeoff: reasoning tokens cost more (latency and API cost scale with thinking depth), and they're overkill for simple tasks. The best systems route simple queries to fast instruction-tuned models and hard queries to reasoning models.

Why it matters

Reasoning models represent the current frontier of what AI can do. For projects involving code generation, data analysis, or complex decision-making, knowing when to use a reasoning model vs. a standard model is a real engineering decision with cost and latency implications.

Related concepts

Reasoning Training / RLVR Benchmarking LLMs Agentic Capabilities

Resources

Deep Dive into LLMs like ChatGPT (Reasoning + DeepSeek-R1)

youtube.com· Best video explanation of reasoning models for a general audience. Covers reasoning models as a category: thinking tokens, chain-of-thought, DeepSeek-R1's "aha moment," and comparison to AlphaGo's Move 37.

How I Use LLMs (Thinking models segment)

youtube.com· Practical demo/guide of when to use thinking/reasoning models vs. standard models. Shows real examples of o1-style reasoning and thinking tokens in action.

DeepSeek-R1 Paper Explained: A New Era in AI

aipapersacademy.com· Thorough walkthrough of the DeepSeek-R1 paper in plain language. Covers R1-Zero (pure RL), the self-evolution process, GRPO, and distillation into smaller models.

How Reasoning Works in DeepSeek-R1

mccormickml.com· Also listed under topic 7. Addresses the key question of what's actually happening when reasoning models "think." Debunks the mystery and explains the think-tag mechanism.

DeepSeek R1's Recipe to Replicate o1

interconnects.ai· More technical analysis of R1's training recipe. Covers R1-Zero, the RL training-time scaling plot, and implications for the field. Good for recruits who want more depth.

DeepSeek's reasoning AI shows the power of small models

ibm.com· Accessible overview of reasoning models as a category, using DeepSeek-R1 as the case study. Covers chain-of-thought, RL training, and model distillation. Good for mixed-background audiences.

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages RL

huggingface.co· Detailed but accessible walkthrough of R1-Zero (pure RL, no SFT) and R1 (multi-stage). Covers emergent behaviors, the "aha moment," distillation to smaller models, and failed experiments. Great for recruits who want depth.

PreviousInstruction-tuned Models

NextEnd of section