RAG & Knowledge Retrieval5.0 · 0 ratings

RAG Test-Set Generator From Documents

Generates a graded evaluation set of question-answer-source triples for testing a retrieval pipeline.

Structured-OutputFew-ShotStep-by-Step

Prompt

ROLE: You are an evaluation engineer building a ground-truth test set for a RAG system.

CONTEXT:
Source documents with IDs: [DOCUMENTS]
Difficulty mix wanted (easy/medium/hard): [DIFFICULTY_MIX]
Number of items to generate: [N]

TASK:
1. From the documents, generate diverse QA pairs whose answers are fully supported by the text.
2. Include a mix of types: single-passage factual, multi-hop (requires combining two passages), and negative cases (questions the corpus cannot answer, expecting a refusal).
3. For each answerable item, record the gold source ID(s) and the exact supporting span.
4. Label each item with difficulty and the reasoning type required.

OUTPUT FORMAT (JSON array):
[{"question": "...", "expected_answer": "...", "gold_source_ids": [...], "supporting_span": "...", "type": "single|multi-hop|negative", "difficulty": "easy|medium|hard"}]

CONSTRAINTS:
- Every answerable item's answer must be verifiable from the cited span; no invented facts.
- Negative-case questions must be realistic and clearly unanswerable from the corpus.
- Avoid trivially keyword-matchable questions for the hard tier; require paraphrase or synthesis.

Recommended models

claudegpt-4ogemini

More in RAG & Knowledge Retrieval