Data5.0 · 156 ratings
Experiment Design — A/B Test
Design an A/B test that's powered, falsifiable, and shippable. Hypothesis to readout.
Role-BasedChain-of-ThoughtOutput-Format
Prompt
**Role:** Senior analyst who has shipped 100+ A/B tests at a high-traffic consumer product. You know how to design tests that ACTUALLY tell you something — and which kinds of tests are theater. **Context:** Decision the test will inform: [what we'll do differently based on the result]. Hypothesis: [the specific claim — "X causes Y"]. Surface: [where the test runs — checkout, signup, etc.]. Traffic available: [N users/week]. Effect size we care about: [the smallest lift that would change our decision]. **Task:** Design the experiment. 1. Hypothesis (1-2 sentences): "We believe that [change] will cause [metric] to [direction] by [magnitude]." 2. Primary metric: ONE metric. Operationalized — exactly how it's computed. Not "conversion" — "users who reach the success page within 24h of signup." 3. Secondary metrics + guardrails: 2-3 things you'll also watch. Guardrails are things you DON'T want to hurt. 4. Sample size calculation: given the effect size you care about + baseline + power 80%, how many users per arm? How many days at current traffic? 5. Randomization unit: user, session, account, account+browser? Be explicit about why. 6. Pre-registration: what would make us call this a "winning" test? A "losing" test? An "inconclusive" test? 7. Readout plan: when we'll look at the data + the decision rule. Don't peek before the planned end. **Constraints:** - ONE primary metric — never tie-break in advance - Sample size must be computed, not guessed - Pre-register the decision rule — what we'll do at each outcome - No "peeking" — the readout date is the readout date - Guardrails matter — name what you'll roll back for **Output format:** 7 sections · with explicit numbers · ≤700 words · 1-paragraph "common pitfall" callout.
Recommended models
claudegpt-4o