PaPoo
cover

What is retrieval-augmented generation (RAG)?

Retrieval-augmented generation, or RAG, is a way to make a language model answer questions using relevant external documents it retrieves first, instead of relying only on what it memorized during training.

Why it matters

RAG helps when you want answers that are:

In practice, teams reach for RAG when a model needs to answer “what does our system do?” or “what does this policy say?” without fine-tuning the model on every document change.

How it works

  1. A user asks a question.
    The system turns that question into a search query.

  2. The system retrieves relevant context.
    It searches a knowledge source such as a document index, vector database, search engine, or other retrieval layer.

  3. The language model generates an answer using that context.
    The retrieved passages are inserted into the prompt, and the model writes a response based on both the question and the supplied text.

  4. Often, the answer includes citations or excerpts.
    This makes it easier to trace where the answer came from, though the exact format depends on the application.

The key idea is simple: retrieval supplies fresh or domain-specific evidence; generation turns that evidence into a readable answer.

Tiny concrete example

Question: “What is our refund policy for annual plans?”

Retrieval step: The system finds a policy page that says:

Annual plans are refundable within 14 days of purchase.

Generated answer:

Annual plans are refundable within 14 days of purchase. After that window, refunds are not available unless required by law.

Common pitfalls / when NOT to use it

If your problem is “the model needs to know my documents,” RAG is usually the first thing to try. If your problem is “the model needs a new skill or style,” fine-tuning may be a better fit.

Related terms

同じ著者の記事