Hallucination detection is the process of spotting when an AI model’s answer is likely unsupported, false, or made up rather than grounded in the available evidence.
Large language models can produce fluent answers that sound right even when they are wrong. Hallucination detection helps reduce the risk of shipping misleading outputs in search, customer support, medical/legal workflows, internal assistants, and any system where users may trust the model too much.
In practice, teams use it when they want to:
There is no single standard method. “Hallucination detection” is usually a family of checks that ask: does this answer have evidence behind it?
Common approaches include:
Source-grounding checks
Compare the model’s answer against retrieved documents, tool outputs, or a known database. If the answer contains claims not supported by the source, it may be flagged.
Consistency checks
Ask the model the same question in different ways, or compare multiple model outputs. Big contradictions can be a sign of hallucination, though consistency alone does not guarantee correctness.
Verifier or judge models
A second model evaluates whether each claim is supported by context. This is common in research and evaluation pipelines, but it can itself make mistakes.
Heuristics and confidence signals
Systems may use uncertainty scores, citation presence, or rules like “answer must quote a retrieved passage.” These are practical, but none is a perfect detector.
A key limitation: hallucination detection is usually probabilistic, not absolute. It can say “this looks unsupported” more reliably than it can prove “this is definitely false.”
User asks: “What is the capital of Australia?”
In a retrieval-augmented app, the same idea looks like this: