PaPoo
cover

What is self-consistency?

Self-consistency is a way to get a language model to answer by generating several reasoning paths and then choosing the answer that appears most often, instead of trusting a single chain of thought.

Why it matters

A single reasoning trace from an LLM can be brittle: one unlucky step can derail the final answer. Self-consistency helps when the task needs multi-step reasoning, such as math word problems, logic puzzles, or multi-hop question answering.

In practice, you reach for it when you want a cheap reliability boost without changing the model itself. It is especially useful when the model can produce different valid reasoning paths that still lead to the same final answer.

How it works

The basic idea is simple:

  1. Ask the model the same question multiple times, usually with sampling enabled so the outputs vary.
  2. Each run produces a different reasoning path and a final answer.
  3. Extract the final answer from each run.
  4. Pick the answer that shows up most often.

This is usually described in the paper “Self-Consistency Improves Chain of Thought Reasoning in Language Models” (Wang et al., 2022). The intuition is that if many different reasoning paths converge on the same result, that result is more likely to be correct than the answer from just one sampled path.

It is not the same as majority voting over arbitrary text. The voting is typically over the model’s final answer after reasoning, which matters because the reasoning text itself can vary a lot.

Tiny concrete example

Suppose you ask:

If a book costs $12 and you get a 25% discount, what is the final price?

You sample the model 5 times. The reasoning may differ, but the final answers come back as:

Self-consistency returns $9, because that is the most common final answer.

Common pitfalls / when NOT to use it

If you want one practical rule: start with a single well-prompted run, then add self-consistency when accuracy matters more than cost.

What is self-consistency?

Self-consistency is a way to get a language model to answer by generating several reasoning paths and then choosing the answer that appears most often, instead of trusting a single chain of thought.

Why it matters

A single reasoning trace from an LLM can be brittle: one unlucky step can derail the final answer. Self-consistency helps when the task needs multi-step reasoning, such as math word problems, logic puzzles, or multi-hop question answering.

In practice, you reach for it when you want a cheap reliability boost without changing the model itself. It is especially useful when the model can produce different valid reasoning paths that still lead to the same final answer.

How it works

The basic idea is simple:

  1. Ask the model the same question multiple times, usually with sampling enabled so the outputs vary.
  2. Each run produces a different reasoning path and a final answer.
  3. Extract the final answer from each run.
  4. Pick the answer that shows up most often.

This is usually described in the paper “Self-Consistency Improves Chain of Thought Reasoning in Language Models” (Wang et al., 2022). The intuition is that if many different reasoning paths converge on the same result, that result is more likely to be correct than the answer from just one sampled path.

It is not the same as majority voting over arbitrary text. The voting is typically over the model’s final answer after reasoning, which matters because the reasoning text itself can vary a lot.

Tiny concrete example

Suppose you ask:

If a book costs $12 and you get a 25% discount, what is the final price?

You sample the model 5 times. The reasoning may differ, but the final answers come back as:

Self-consistency returns $9, because that is the most common final answer.

Common pitfalls / when NOT to use it

If you want one practical rule: start with a single well-prompted run, then add self-consistency when accuracy matters more than cost.

Related terms

同じ著者の記事