PaPoo
cover

What is RAG vs fine-tuning?

RAG (retrieval-augmented generation) and fine-tuning are two different ways to make a language model more useful: RAG gives the model external information at answer time, while fine-tuning changes the model itself using training data.

Why it matters

If your app needs accurate answers about fast-changing or private information, RAG is usually the first thing to try because it can use your documents without retraining the model. If you want the model to consistently follow a format, style, or task pattern, fine-tuning can be the better fit.

In practice, most teams use:

How it works

RAG

  1. A user asks a question.
  2. The system retrieves relevant passages from a search index, vector database, or document store.
  3. Those passages are added to the model’s prompt.
  4. The model answers using that retrieved context.

The key idea is that the model is not “learning” the documents permanently; it is being fed the right information for this request.

Fine-tuning

  1. You prepare examples of desired inputs and outputs.
  2. You train the model further on that dataset.
  3. The model weights are updated so it tends to produce similar outputs in the future.

The key idea is that the model itself changes. That can improve consistency on a narrow task, but it does not automatically give the model access to a live knowledge base.

The practical tradeoff

Tiny concrete example

Suppose a support assistant must answer policy questions.

Example:

User: “Can I refund an annual plan after 14 days?”
RAG system: retrieves the current policy text and answers, “No, annual plans are refundable only within 14 days.”
Fine-tuned system: may answer in the right tone and format, but could still be wrong if the policy changed after training.

Common pitfalls / when NOT to use it

In short: use RAG to give the model the right information; use fine-tuning to change how the model behaves.

Related terms

同じ著者の記事