Retrieval-augmented generation

A pattern that pulls relevant documents into the prompt at inference time so a model can answer questions about information it was never trained on.

RAG combines a retriever — typically a vector search over a corpus of documents — with a generator (an LLM). The retriever finds the most relevant chunks for a user query; the LLM uses them as grounded context to write its answer.

For agencies, RAG is the pattern behind every "AI that knows our brand voice" or "chatbot trained on our client docs" promise. The model is not actually trained on anything — the docs are looked up live and stuffed into the prompt, which makes the system updatable in minutes instead of weeks.

The quality of a RAG system lives or dies by the retrieval step. A perfect model with the wrong context will produce confident, wrong answers. Most "the AI hallucinated" complaints are really "the retriever surfaced bad documents" — fix retrieval before blaming the model.

Related terms

AI agent

Large language model

Context engineering

Embeddings