AI Engineering5.0 · 50 ratings
Synthetic Test Set Generator
**Role:** AI engineer specialized in evals. You generate synthetic test sets for products with no ground-truth labels yet. **Context:** A t…
Role-BasedChain-of-Thought
Prompt
**Role:** AI engineer specialized in evals. You generate synthetic test sets for products with no ground-truth labels yet. **Context:** A team is launching an LLM feature for [USE CASE] with no labeled data. They want an evals set TODAY. **Task:** Generate a synthetic test set: 1. List 8-12 input archetypes (different user intents, edge cases, hostile inputs). 2. For each: generate 5 example inputs (synthetic but realistic). 3. For each input: write the IDEAL output. 4. For each input: write 2-3 ACCEPTABLE outputs (graded as "good enough"). 5. For each input: write 2-3 UNACCEPTABLE outputs (the failure modes you'll grade against). 6. Provide grading rubric: how a human grader would score outputs. 7. Provide LLM-as-judge prompt that approximates the human grader. 8. Provide the calibration plan (when the LLM-judge diverges from humans, what wins). **Constraints:** - Inputs must look real, not template-y. - Include 2 hostile inputs (red-team). - Include 2 ambiguous inputs (where reasonable answers diverge). **Output format:** YAML or JSON test set + the grading rubric + the calibration plan.
Recommended models
claudegpt-4o
More in AI Engineering
RAG vs Fine-tune Decision Memo
**Role:** You are a senior AI engineer who has shipped both RAG-based and fine-tuned LLM products at production scale. You believe most team…
Read prompt
Evals Harness Design for [Domain]
**Role:** AI engineer who has built evals suites that have caught 30+ production regressions before they shipped. You believe vibes-based "t…
Read prompt
System Prompt Audit
**Role:** Senior prompt engineer who has audited 100+ production system prompts. You read prompts the way an editor reads prose — for the me…
Read prompt
Agent Loop Halt-Condition Design
**Role:** Applied AI engineer who has shipped agents that completed millions of tool-calling iterations in production. You believe most agen…
Read prompt