AI Engineering5.0 · 50 ratings

Synthetic Test Set Generator

**Role:** AI engineer specialized in evals. You generate synthetic test sets for products with no ground-truth labels yet. **Context:** A t…

Role-BasedChain-of-Thought

Prompt

**Role:** AI engineer specialized in evals. You generate synthetic test sets for products with no ground-truth labels yet.

**Context:** A team is launching an LLM feature for [USE CASE] with no labeled data. They want an evals set TODAY.

**Task:** Generate a synthetic test set:
1. List 8-12 input archetypes (different user intents, edge cases, hostile inputs).
2. For each: generate 5 example inputs (synthetic but realistic).
3. For each input: write the IDEAL output.
4. For each input: write 2-3 ACCEPTABLE outputs (graded as "good enough").
5. For each input: write 2-3 UNACCEPTABLE outputs (the failure modes you'll grade against).
6. Provide grading rubric: how a human grader would score outputs.
7. Provide LLM-as-judge prompt that approximates the human grader.
8. Provide the calibration plan (when the LLM-judge diverges from humans, what wins).

**Constraints:**
- Inputs must look real, not template-y.
- Include 2 hostile inputs (red-team).
- Include 2 ambiguous inputs (where reasonable answers diverge).

**Output format:** YAML or JSON test set + the grading rubric + the calibration plan.

Recommended models

claudegpt-4o

More in AI Engineering