BLOG

TECHNIQUES · 9 min read

Chain-of-Thought Prompting: How and When to Use It

promptcorrectly.com · Updated 2026-06-20

Chain-of-thought (CoT) prompting asks the model to work through a problem in steps before it commits to an answer. It reliably lifts accuracy on math, logic, and multi-step analysis, and it actively hurts on simple lookups and most creative writing. The skill is knowing which bucket your task is in.

What chain-of-thought actually is

A language model generates one token at a time, and each token it has already written becomes context for the next one. When you let it answer in a single leap, it has to compress an entire multi-step problem into the very first tokens it produces. That is where wrong answers come from: the model guesses a conclusion, then writes a justification for a guess it already made.

Chain-of-thought flips the order. You ask it to externalize the intermediate steps first, so each step becomes visible context that constrains the next one. The reasoning is not decoration. It is scratch paper the model gets to read back to itself while it works.

This connects to a hard limit. Models, like people, have a bounded working memory inside a single forward pass. A problem with four dependencies and three constraints does not fit. Writing the steps out moves that load onto the page, where it can be referenced instead of held. The same reason a person reaches for a napkin to split a restaurant bill applies here.

Why it works (in one sentence)

Forcing intermediate reasoning stops the model from jumping to a conclusion and then rationalizing it, which is exactly the failure mode that breaks multi-step tasks. If you only remember one thing, remember that.

When to use it

Reach for chain-of-thought when the task has more than one dependent step or when a wrong intermediate value silently corrupts the final answer.

Arithmetic and quantitative word problems. Unit conversions, percentages, multi-part calculations, anything with a "then" in it.
Logic and constraint puzzles. Scheduling, eligibility rules, "who sits where," deductions from a set of facts.
Multi-step analysis. Comparing options across several weighted criteria, tracing a cause to a root, building an argument that depends on earlier claims.
Debugging and root-cause work. Reading a stack trace, reasoning about what state produced an error, ruling out hypotheses one at a time.
Anything you would slow down for yourself. If you would grab a pen, the model probably needs to as well.

A quick gut check: if you cannot answer the question correctly in your own head without intermediate notes, the model usually can't either.

When NOT to use it

Chain-of-thought has a real cost, and on the wrong task it buys you nothing but noise.

Simple lookups and recall. "What is the capital of Australia?" does not improve when you ask for reasoning. You just pay for extra tokens and waiting.
Direct extraction or formatting. Pulling a date out of a sentence, reformatting JSON, classifying into two buckets. The answer is mechanical.
Most creative writing. Asking a model to "think step by step" before a poem or a brand tagline tends to flatten the output. It reasons its way to the safe, average choice instead of the surprising one. Creativity wants range, not a single justified path.
Latency-sensitive UX. If a user is waiting on a live response, hundreds of reasoning tokens they never see is a worse experience.

The honest tradeoff: CoT adds verbosity and latency. On a reasoning task that is a bargain. On a lookup it is pure overhead, and on a creative task it can make the result measurably worse.

Zero-shot vs structured vs few-shot CoT

There are three ways to trigger reasoning, in rising order of effort and control.

Zero-shot CoT is the famous one-liner: you append "Let's think step by step" (or "Work through this carefully before answering") and let the model decide how to reason. It is the cheapest upgrade in prompting and it works surprisingly often.

Weak: A bat and a ball cost 1.10 dollars in total. The bat costs 1.00 dollar more than the ball. How much is the ball? Strong: A bat and a ball cost 1.10 dollars in total. The bat costs 1.00 dollar more than the ball. How much is the ball? Think step by step, then give the answer.

The weak version famously pulls the intuitive-but-wrong "10 cents." The strong version makes the model set up bat equals ball plus 1.00 and solve, landing on 5 cents.

Structured CoT goes further: instead of hoping the model picks good steps, you name the steps. This is the version that scales to real work, because you control the reasoning path instead of leaving it to chance.

Weak: Should we launch the feature in Q3 or Q4? Think step by step. Strong: Decide whether to launch in Q3 or Q4. Reason in this order: (1) list the engineering dependencies and their risk; (2) estimate market timing for each quarter; (3) weigh team capacity; (4) state the tradeoff; (5) recommend one quarter with the single deciding reason. Then give the recommendation.

The strong version is no longer "think step by step." It is "think in these steps," which is what separates a reliable analysis from a rambling one.

Few-shot CoT shows rather than tells. You give one or two fully worked examples — problem, reasoning, answer — and the model imitates the shape of the reasoning on your new input. It is the most powerful and the most expensive, and it shines when your reasoning format is specific or unusual. If you reach for it often, the few-shot prompting guide goes deep on choosing good exemplars.

How to get reasoning AND a clean final answer

The common complaint about CoT is the wall of text. You wanted "5 cents" and got six paragraphs. The fix is to separate the thinking from the deliverable and tell the model exactly how to mark the boundary.

Weak: Calculate the blended margin across these three product lines and explain your reasoning. Strong: Calculate the blended margin across these three product lines. First, under a heading "Working," show each line's revenue, cost, and margin. Then, under a heading "Answer," give only the blended margin as a single percentage rounded to one decimal. Put nothing else under "Answer."

Now the reasoning is auditable when you want it and trivially skippable when you don't. Three patterns that work well:

Labeled sections. "Working" then "Answer," as above. Easy to scan, easy to parse.
Reason-then-restate. Let it reason freely, then end with "Final answer:" on its own line. You read the last line; the audit trail is there if a number looks off.
Hidden reasoning. On models with a dedicated thinking mode, the step-by-step work happens in a separate channel and you receive only the conclusion. Same accuracy benefit, none of the clutter in your output.

This is also why CoT pairs naturally with review. Once the steps are on the page, you can ask the model to check its own work — see self-critique prompting for turning visible reasoning into a second-pass correction.

A worked example

Take a realistic prompt and watch the steps do the heavy lifting.

Weak: A subscription is 18.99 dollars per month or 145.71 dollars per year. What percent does the annual plan save?

A one-shot answer here often divides the wrong pair of numbers, or compares to the wrong baseline, and confidently returns something like "23 percent."

Strong: A subscription is 18.99 dollars per month or 145.71 dollars per year. What percent does the annual plan save versus paying monthly for 12 months? Reason step by step: (1) compute the cost of 12 monthly payments; (2) subtract the annual price to get the dollar saving; (3) divide the saving by the 12-month cost; (4) convert to a percent rounded to a whole number. Then, under "Answer," give only the percentage.

The structured version forces the chain: 12 times 18.99 is 227.88; minus 145.71 is 82.17 saved; 82.17 divided by 227.88 is about 0.36; so roughly 36 percent. Each step is checkable, and step three pins down the baseline that the weak prompt left ambiguous. That single clarification — saving against the 12-month cost, not against the annual price — is what the reasoning surfaced.

Wiring it into a real prompt

Chain-of-thought is one component, not a whole prompt. It lives inside the larger Role, Context, Task, Constraints, Output structure that governs any strong request — the reasoning instruction is part of Task and Output, while Role and Context still set up the model. If that scaffold is new to you, start with how to prompt and the Role-Context-Task-Constraints-Output breakdown, then layer reasoning on top.

A compact template you can adapt:

Role: who the model is acting as (analyst, tutor, reviewer).
Context: the inputs and any constraints the reasoning must respect.
Task: the question, plus the named reasoning steps.
Output: "Working" then "Answer," with a strict format for the answer.

In Studio this maps to nodes you can wire visually: a Context node feeding a dedicated chain-of-thought step ahead of the executor, so reasoning is a deliberate stage of the canvas rather than a phrase you keep retyping. If you would rather drill the instinct for when reasoning helps, Cortex has courses built around exactly this judgment call, and the Library has ready-made reasoning prompts to adapt. For the broader foundation, the how to prompt AI correctly guide ties these techniques together.

The short version

Use chain-of-thought when the problem has dependent steps and a wrong middle ruins the end — math, logic, multi-step analysis, debugging. Skip it for lookups, extraction, and creative work, where it adds latency and dampens range. When you do use it, name the steps instead of hoping, and always split visible reasoning from a clean final answer so you get the accuracy without the wall of text.

Ready to make reasoning a reusable step instead of a phrase you retype? Build it on the canvas in Studio.

Put this into practice

Build prompts visually on the canvas with your own key, or grab a ready-made one from the Library.

Open the Studio Browse 2,750+ prompts

Keep reading

📚

Few-Shot Prompting: Teaching AI by Example

Few-shot prompting shows the model 2–5 input/output examples so it copies your format and standard. How to pick, format, and count your shots.

8 min read

🔍

Self-Critique Prompting: Get AI to Improve Its Own Output

Self-critique prompting makes AI grade its own draft against named criteria, then revise. The generate-critique-revise loop, with before/after prompts.

9 min read

🧭

How to Prompt AI Correctly: The Complete 2026 Guide

Prompt AI correctly by specifying role, context, task, constraints, and output. A practical 2026 guide with before/after examples and named techniques.

11 min read

← All articles