PaPoo
cover

What is hallucination detection?

Hallucination detection is the process of spotting when an AI model’s answer is likely unsupported, false, or made up rather than grounded in the available evidence.

Why it matters

Large language models can produce fluent answers that sound right even when they are wrong. Hallucination detection helps reduce the risk of shipping misleading outputs in search, customer support, medical/legal workflows, internal assistants, and any system where users may trust the model too much.

In practice, teams use it when they want to:

How it works

There is no single standard method. “Hallucination detection” is usually a family of checks that ask: does this answer have evidence behind it?

Common approaches include:

  1. Source-grounding checks
    Compare the model’s answer against retrieved documents, tool outputs, or a known database. If the answer contains claims not supported by the source, it may be flagged.

  2. Consistency checks
    Ask the model the same question in different ways, or compare multiple model outputs. Big contradictions can be a sign of hallucination, though consistency alone does not guarantee correctness.

  3. Verifier or judge models
    A second model evaluates whether each claim is supported by context. This is common in research and evaluation pipelines, but it can itself make mistakes.

  4. Heuristics and confidence signals
    Systems may use uncertainty scores, citation presence, or rules like “answer must quote a retrieved passage.” These are practical, but none is a perfect detector.

A key limitation: hallucination detection is usually probabilistic, not absolute. It can say “this looks unsupported” more reliably than it can prove “this is definitely false.”

Tiny concrete example

User asks: “What is the capital of Australia?”

In a retrieval-augmented app, the same idea looks like this:

Common pitfalls / when NOT to use it

Related terms

同じ著者の記事