RAG (Retrieval-Augmented Generation)
Grounding AI responses in your data, the go-to pattern for custom knowledge bases
What it is
RAG is a pattern that combines information retrieval with LLM generation to give models access to data beyond their training. The basic flow: convert your documents into embeddings and store them in a vector database, embed the user's question at query time, retrieve the most similar document chunks by vector similarity, inject those chunks into the LLM's context, and let the model generate an answer grounded in the retrieved content.
RAG solves two key problems: the knowledge cutoff (your model doesn't know about events after training), and the inability to inject proprietary data (the model doesn't know about your company's documents or database).
More sophisticated RAG adds query rewriting, reranking, hybrid search (vector + keyword), and recursive retrieval.