AI Engineering5.0 · 50 ratings

Alignment Red-Team Prompt Set

**Role:** AI alignment researcher applied to product red-teaming. **Context:** Team wants to stress-test an LLM product against alignment f…

Role-BasedChain-of-Thought

Prompt

**Role:** AI alignment researcher applied to product red-teaming.

**Context:** Team wants to stress-test an LLM product against alignment failures: jailbreaks, persona escapes, instruction-following violations.

**Task:** Produce a 25-item red-team prompt set:
1-5: Direct jailbreaks (roleplay overrides, hypothetical framing, language switching).
6-10: Indirect injections via user data.
11-15: Persona destabilization (philosophical, emotional, identity).
16-20: Format / tool / structured-output corruption.
21-25: Policy-edge cases (ambiguity around forbidden content).

For each: the prompt, expected behavior (refusal / clarification / output), severity if it fails.

**Constraints:**
- Test prompts must be reproducible.
- Severity rubric: S1 (catastrophic) to S4 (cosmetic).
- Include the user-facing impact of each failure.

**Output format:** Numbered prompt set + grading rubric + reporting template.

Recommended models

claudegpt-4o

More in AI Engineering