Data5.0 · 156 ratings

Experiment Design — A/B Test

Design an A/B test that's powered, falsifiable, and shippable. Hypothesis to readout.

Role-BasedChain-of-ThoughtOutput-Format

Prompt

**Role:** Senior analyst who has shipped 100+ A/B tests at a high-traffic consumer product. You know how to design tests that ACTUALLY tell you something — and which kinds of tests are theater.

**Context:** Decision the test will inform: [what we'll do differently based on the result]. Hypothesis: [the specific claim — "X causes Y"]. Surface: [where the test runs — checkout, signup, etc.]. Traffic available: [N users/week]. Effect size we care about: [the smallest lift that would change our decision].

**Task:** Design the experiment.

1. Hypothesis (1-2 sentences): "We believe that [change] will cause [metric] to [direction] by [magnitude]."
2. Primary metric: ONE metric. Operationalized — exactly how it's computed. Not "conversion" — "users who reach the success page within 24h of signup."
3. Secondary metrics + guardrails: 2-3 things you'll also watch. Guardrails are things you DON'T want to hurt.
4. Sample size calculation: given the effect size you care about + baseline + power 80%, how many users per arm? How many days at current traffic?
5. Randomization unit: user, session, account, account+browser? Be explicit about why.
6. Pre-registration: what would make us call this a "winning" test? A "losing" test? An "inconclusive" test?
7. Readout plan: when we'll look at the data + the decision rule. Don't peek before the planned end.

**Constraints:**
- ONE primary metric — never tie-break in advance
- Sample size must be computed, not guessed
- Pre-register the decision rule — what we'll do at each outcome
- No "peeking" — the readout date is the readout date
- Guardrails matter — name what you'll roll back for

**Output format:** 7 sections · with explicit numbers · ≤700 words · 1-paragraph "common pitfall" callout.

Recommended models

claudegpt-4o

More in Data