Advanced Training Mechanics

Continual Learning

The unsolved problem of teaching models without forgetting what they know

What it is

Continual learning is the challenge of training a model on new data without it forgetting previously learned knowledge. Neural networks exhibit catastrophic forgetting: when trained on a new task or dataset, weight updates to accommodate new information overwrite weights that encoded previous knowledge.

For LLMs, this means you cannot simply keep training a deployed model on new information as the world changes. You must either retrain from scratch (expensive), fine-tune carefully with regularization techniques that limit forgetting, or use retrieval (RAG) to provide current information at inference time.

Active research approaches include: elastic weight consolidation, memory replay (include old data in new training), and modular architectures that add new capacity rather than overwriting old.

Why it matters

Continual learning explains why LLMs have knowledge cutoff dates, why "the model doesn't know about X that happened last month" is a fundamental limitation rather than an oversight, and why keeping models current is an engineering challenge. It also motivates RAG as the primary practical solution for current-information needs.

Related concepts

Context Windows

Resources

What is Catastrophic Forgetting?

ibm.com· IBM's explainer is clear, well-structured, and beginner-friendly. Covers the history (McCloskey & Cohen 1989), why neural networks forget, the stability-plasticity tradeoff, and key mitigation techniques (EWC, replay). Perfect starting resource.

8 min

Continual Learning and Catastrophic Forgetting

arxiv.org· Comprehensive 2024 book chapter reviewing the entire field. Covers replay methods, regularization (EWC), context-dependent processing, model expansion, and connections to neuroscience. More depth than IBM piece, but still readable. Best available survey-level resource.

20 min

Catastrophic Forgetting in Neural Networks

goml.io· Practical blog post that covers the problem clearly with real-world examples (autonomous vehicles forgetting pedestrian detection, medical AI losing tuberculosis accuracy after retraining). Good for motivating why this matters.

10 min