AI Engineering5.0 · 50 ratings

Synthetic Test Set Generator

**Role:** AI engineer specialized in evals. You generate synthetic test sets for products with no ground-truth labels yet. **Context:** A t…

Role-BasedChain-of-Thought

Prompt

**Role:** AI engineer specialized in evals. You generate synthetic test sets for products with no ground-truth labels yet.

**Context:** A team is launching an LLM feature for [USE CASE] with no labeled data. They want an evals set TODAY.

**Task:** Generate a synthetic test set:
1. List 8-12 input archetypes (different user intents, edge cases, hostile inputs).
2. For each: generate 5 example inputs (synthetic but realistic).
3. For each input: write the IDEAL output.
4. For each input: write 2-3 ACCEPTABLE outputs (graded as "good enough").
5. For each input: write 2-3 UNACCEPTABLE outputs (the failure modes you'll grade against).
6. Provide grading rubric: how a human grader would score outputs.
7. Provide LLM-as-judge prompt that approximates the human grader.
8. Provide the calibration plan (when the LLM-judge diverges from humans, what wins).

**Constraints:**
- Inputs must look real, not template-y.
- Include 2 hostile inputs (red-team).
- Include 2 ambiguous inputs (where reasonable answers diverge).

**Output format:** YAML or JSON test set + the grading rubric + the calibration plan.

How to use this prompt

1
Copy the prompt above and paste it into ChatGPT, Claude, or Gemini — or open it in the visual Studio to edit each part on a canvas and run it with your own key.
2
Replace any bracketed placeholders with your specifics. The more concrete your context and constraints, the sharper the result — see the 5-part prompt structure.
3
Run it, then refine. Ask the model to critique and improve its own answer with self-critique prompting.

Techniques in this prompt

Role-Based

Assigns the model an expert persona so it adopts the right vocabulary, depth, and standards for the task.

Learn this technique

Chain-of-Thought

Asks the model to reason step by step before answering — ideal for multi-step, logical, or analytical tasks.

Learn this technique

Recommended models

claudegpt-4o

Build on this prompt

Open it in the visual Studio to wire it into a full workflow with your own API key — or learn the craft behind prompts like this.

Open in Studio How to prompt AI correctly

Synthetic Test Set Generator

Prompt

How to use this prompt

Techniques in this prompt

Recommended models

Build on this prompt

More in AI Engineering

RAG vs Fine-tune Decision Memo

Evals Harness Design for [Domain]

System Prompt Audit

Agent Loop Halt-Condition Design