Instruction-tuned Models
The chatbots you know, RLHF-trained to follow instructions reliably
What it is
Instruction-tuned models are base models that have been fine-tuned through RLHF (or related techniques) to reliably follow user instructions and behave like assistants. These are the models behind ChatGPT, Claude, and Gemini as most people experience them.
Post-training teaches the model to: interpret prompts as instructions rather than text to continue, format responses appropriately, apply safety training, and maintain a consistent assistant persona. The training data is relatively small compared to pre-training (often millions of examples rather than trillions of tokens) but it dramatically changes the model's behavior.
The "instruct" or "chat" suffix in model names (e.g., Llama-3-8B-Instruct) signals this post-training.