Self-Consistency Voting

Run the same prompt 3-5 times → take the majority answer. +12-18% accuracy on reasoning tasks (Wang et al. 2022).

THE MINDSET SHIFT

“One prompt, one answer = a guess. Five prompts, one majority = an opinion. The math is straightforward: variance compresses faster than cost grows.”
— SHE · YOUR AI GUIDE

Wang et al. 2022 ("Self-Consistency Improves Chain-of-Thought Reasoning in Language Models") showed that sampling multiple reasoning paths from the same prompt and taking the majority answer outperforms greedy decoding by 12-18% on GSM8K, AQuA-RAT, and CommonsenseQA.

The mechanism is intuitive: complex reasoning is path-dependent. A single sample can lock onto an early mistake and follow it to a wrong answer. Sampling N times explores N paths — and the right answer tends to cluster (because there are usually fewer right paths than wrong ones).

The technique is most valuable when: (1) you're doing multi-step reasoning, (2) the cost of a wrong answer is high, (3) the answer space is enumerable enough that you can vote. Use it for investment memos, legal analysis, root-cause diagnosis. Don't use it for creative tasks — voting averages out the variance you wanted.

“+17.9% → +58.1% on GSM8K (math) when self-consistency stacks with CoT.”

Wang et al., "Self-Consistency Improves Chain-of-Thought Reasoning," 2022

“5 samples captures ~90% of the gain of 40 samples — diminishing returns are steep.”

Same paper, Table 3

“Cost grows linearly with sample count; accuracy grows logarithmically.”

Replicated, Anthropic eval cookbook 2024