BLOG

TECHNIQUES · 9 min read

Self-Critique Prompting: Get AI to Improve Its Own Output

promptcorrectly.com · Updated 2026-06-20

Self-critique prompting is when you ask a model to produce a draft, evaluate that draft against explicit criteria, and then rewrite it to fix what it found — all in one pass or across a few. It works because judging a piece of writing is easier than producing it, so the second look catches weaknesses the first pass shipped without noticing.

What self-critique prompting actually is

Most people use a model in one move: ask, receive, accept. Self-critique adds a second move the model was always capable of but never volunteers — it stops and grades its own work before handing it over.

The technique has a research name, self-refine, and a simple shape: generate, critique, revise. You ask for a draft, then ask the model to evaluate that draft against named standards, then ask it to produce a version that fixes every weakness it found. The whole thing can live in a single prompt or be split across turns.

It is nearly free, which is what makes it one of prompting's highest-leverage habits. You are not buying a better model or hunting for a magic phrase — you are using the model twice, once as a writer and once as an editor, and editors catch things writers miss.

The mental model: a first draft is the model thinking out loud. A critique is the model reading what it just said with fresh, skeptical eyes. You would not ship your own first draft of anything that mattered. Stop letting the model ship its.

Why it works: evaluation is easier than generation

There is a real asymmetry behind this, and it tells you when the technique will help and when it won't.

Generating good output means holding the goal, the constraints, the structure, and the actual words in working memory all at once and getting them all right simultaneously. Evaluating existing output is narrower: the text is already on the page, so the model can read one sentence and ask a single focused question — is this clause doing any work? — without juggling everything else. Checking one property of a finished thing is a smaller task than producing the whole thing correctly the first time.

This is the same reason code review catches bugs the author stared past, and the same reason you spot the typo the moment you hit send. Switching from make to judge changes what you attend to.

There is a second reason, and it is about the prompt, not the model. Forcing a critique usually means you have to name the criteria — and naming the criteria is most of the value. A vague make it better gives the model nothing to push against. Check whether the opening line is about the reader or about us gives it a specific test to run. Half the lift in self-critique comes from the discipline of writing down what good actually means.

The generate-critique-revise loop

Here is the canonical structure as a single prompt. Notice the three labelled phases.

You are an editor. First, write a 150-word product description for a noise-cancelling travel pillow aimed at long-haul business travellers. Then, critique your own draft against these criteria: (1) Does the first sentence name a specific pain a tired traveller feels, or does it open with the product? (2) Is there exactly one concrete benefit per sentence, with no stacked adjectives? (3) Does it avoid the words "perfect", "ultimate", and "revolutionary"? (4) Could any sentence be deleted without losing information? Finally, write the improved version that fixes every issue you found. Label the three parts Draft, Critique, Final.

Run that and the Critique section comes back honest and specific — the opening sentence leads with the product, not the traveller; sentence three stacks "soft, plush, luxurious" — and the Final version fixes exactly those things. The model could have written the Final version first. It didn't, because nothing asked it to look.

The labels matter because they separate the two roles cleanly. When generate and evaluate are tangled into one instruction, the model splits its attention and does both at half strength. Pulling them apart — write, then judge, then rewrite — lets each phase get full focus. Separating the generator from the evaluator is the core move. If you take one thing from this article, take that.

Give it a real rubric, not a vibe

The difference between a self-critique that transforms the output and one that produces a polite shrug is the quality of the criteria. Compare these.

Weak critique instruction: Now review your draft and improve anything that could be better.

Strong critique instruction: Score your draft from 1 to 5 on each of these, and for any score below 4, quote the exact sentence and say what is wrong: (a) Specificity — does every claim include a number, name, or concrete detail rather than an adjective? (b) Audience fit — would a CFO, not a marketer, find this credible? (c) Structure — does the most important point come first? (d) Cuttability — is there a single sentence that could be removed with zero loss? Then list every change you will make before you make it.

The weak version invites the model to nod and tweak a comma. The strong version forces a per-criterion verdict, demands evidence (quote the sentence), and makes the model commit to a change list before rewriting. That structure is what turns critique from theatre into editing.

A few rubric ingredients that consistently earn their place:

Make criteria binary or scored, not open-ended. "Is the CTA singular — yes or no?" beats "is the CTA good?"
Demand evidence. Require the model to quote the offending sentence. This stops it from inventing problems or hand-waving past real ones.
Separate must-haves from nice-to-haves. A rubric of fifteen equal points produces fifteen shallow edits. Three weighted criteria produce three real ones.
Tie criteria to the goal, not to generic "quality." For a sales email they are about conversion; for a legal summary, accuracy and hedging. There is no universal rubric — if you have an internal standard, paste it.

This is where the RCTCO prompt structure pays off twice: the constraints you wrote for the generation phase double as the rubric for the critique phase. Write your constraints once, then say grade the draft against the constraints above.

Single-pass vs multi-pass

You have two ways to run the loop, and they trade convenience against control.

Single-pass packs generate-critique-revise into one prompt, as in the travel-pillow example. It is fast, cheap, and good enough for most everyday writing. The downside is that the model sometimes rushes the critique to get to the rewrite, and you cannot inspect or correct the critique before it acts on it.

Multi-pass splits the loop across turns: get the draft, read the critique, then send a separate instruction to revise — and you can edit the critique first, add criteria the model missed, or run a second critique on the revision. This is slower but gives you a control point at the most valuable moment, when the problems have been named but not yet fixed.

A practical rule: use single-pass for low-stakes volume (product blurbs, replies, first drafts) and multi-pass when the cost of a miss is high (a contract clause, a launch announcement, anything a stranger will judge you by). For genuinely important work, two critique passes beat one — the second pass catches what the first pass introduced.

In Studio this maps directly onto the canvas. There is a dedicated Self-Critique node you wire in after your generation node: it holds the rubric, runs the evaluation as a discrete step, and feeds the result into a revision node. Seeing the loop as actual boxes and arrows makes the structure obvious in a way a paragraph never does — you can watch the draft flow into the critic and the critique flow into the rewrite.

Where it helps and where it hurts

This technique is not universal — know its range.

It shines on:

Writing. Tone, redundancy, weak openings, buried calls to action — all are easy to spot on a second read and hard to avoid on the first.
Code review. Critique this function for unhandled edge cases, off-by-one errors, and inputs that would throw finds real bugs, because checking code against a named failure mode is exactly the kind of narrow evaluation models do well.
Reasoning and analysis. Check your argument for unstated assumptions and steps that don't follow from the previous one catches logical gaps the first pass glossed over.
Anything with explicit standards — a style guide, a rubric, a checklist, a brand voice. If you can write down "good," self-critique can enforce it.

It hurts when:

There are no real criteria. With nothing concrete to check against, the model rationalizes — it defends the draft it already wrote and calls it improved. A critique with no rubric tends to validate, not challenge.
The draft is already good. Forced to find problems where none exist, the model over-edits: it adds hedges, swaps clean words for fancier ones, and pads. Always allow the escape hatch if the draft already meets a criterion, say "no change needed" and leave it alone.
The task is factual recall. Self-critique improves reasoning and craft, not knowledge. A model that doesn't know a fact will critique its way to a more confident wrong answer, not a right one. It cannot fact-check itself against information it never had.

The honest failure mode is sycophancy toward its own work. A model asked to critique will sometimes write a glowing review of a mediocre draft. The fix is structural: demand a numeric score, demand a quoted example for every flaw, and explicitly permit "no change needed" so it isn't pressured to invent edits. Make honest critique the path of least resistance.

Combine it with chain-of-thought

Self-critique and chain-of-thought prompting are natural partners, and stacking them compounds the gain.

Chain-of-thought improves the generation step by letting the model reason through intermediate steps before committing to an answer. Self-critique improves the evaluation step by checking that answer against criteria. Use both and you get reasoning on the way in and reasoning on the way back.

The strongest version applies chain-of-thought to the critique itself — instead of "is this good," you ask the model to reason about why each criterion passes or fails before it scores. Here is the combined pattern on a reasoning task.

A startup has 18 months of runway and is deciding whether to raise now at a flat valuation or wait 9 months to raise at a higher one. Step 1 — reason it through: think step by step about the trade-offs (dilution, market timing, execution risk, the cost of running low on runway) before stating a recommendation. Step 2 — critique your reasoning: review your own analysis against these tests — did you consider the downside scenario where the next round is harder, not easier? Did you weigh the cost of distraction during a raise? Is any step asserted rather than argued? Quote any weak step. Step 3 — revise: give the final recommendation, strengthened to address every gap you found, and state your confidence and what would change your mind.

Step 1 is chain-of-thought. Step 2 is self-critique with a rubric. Step 3 is the refined answer. Each technique covers the other's blind spot: chain-of-thought can reason confidently down a wrong path, and the critique step is what catches it. The broader habit of treating a prompt as a specification — of which both techniques are instances — is covered in how to prompt AI correctly.

Putting it into practice

Self-critique is a habit, and habits form through reps. A few ways to build it:

Add one critique line to your next ten prompts. The smallest version — then critique your draft against [your two top criteria] and fix what you find — already beats accepting the first output. Start there.
Reuse your constraints as your rubric. You already wrote what "good" means when you set the constraints. Point the critique at them.
Build the loop visually. Wire a generation node into the Self-Critique node in Studio, give it a real rubric, and watch the draft, the critique, and the revision as separate steps. Seeing the evaluator as its own box is the fastest way to internalize that generating and judging are different jobs.
Train it deliberately. Cortex runs 36 hands-on courses, several of which drill self-critique and self-refine with graded feedback, so you practise writing rubrics that actually bite instead of just reading about them.
Start from prompts that already do it. The Library has 2,750+ forkable prompts; search for ones with built-in critique loops, open them up, and study how the criteria are written. The fastest way to learn good rubrics is to take apart ones that work. For the one-screen reference on all of this, see how to prompt.

The summary is short. A model's first draft is a starting point, not a deliverable. Ask it to grade its own work against criteria you actually wrote down, then rewrite — and most of the gap between mediocre and good closes for free.

Ready to try it on real work? Open Studio, drop a Self-Critique node after your generation step, and give it a rubric with teeth. You will watch the second pass catch what the first one shipped — and you will start writing better first drafts because you finally know what the editor is looking for.

Put this into practice

Build prompts visually on the canvas with your own key, or grab a ready-made one from the Library.

Open the Studio Browse 2,750+ prompts

Keep reading

🧠

Chain-of-Thought Prompting: How and When to Use It

When chain-of-thought prompting helps, when it hurts, and how to make the model reason step by step then hand you one clean answer.

9 min read

🧭

How to Prompt AI Correctly: The Complete 2026 Guide

Prompt AI correctly by specifying role, context, task, constraints, and output. A practical 2026 guide with before/after examples and named techniques.

11 min read

🏗️

The 5-Part Prompt Structure That Fixes 90% of Bad Outputs

Role, Context, Task, Constraints, Output: the 5-part prompt structure that fixes vague AI answers. With a full worked rewrite and a copy-paste template.

10 min read

← All articles