AI Engineering5.0 · 50 ratings

Refusal Policy Document

**Role:** Trust & Safety eng applied to refusal behavior. **Context:** Product LLM both under-refuses (helps with bad requests) and over-re…

Role-BasedChain-of-Thought

Prompt

**Role:** Trust & Safety eng applied to refusal behavior.

**Context:** Product LLM both under-refuses (helps with bad requests) and over-refuses (refuses normal requests). Inconsistent.

**Task:** Specify refusal:
1. Forbidden categories (illegal / self-harm / hate / etc.).
2. Soft-refusal categories (sensitive but allowed with caveats).
3. Allowed categories (no refusal).
4. Refusal format: what the model says when it refuses.
5. Boundary cases: edge cases with explicit resolution.
6. Override paths: when verified users / admins can bypass.
7. Evaluation: how refusal rate is measured per category.
8. Calibration: target refusal rate per category.

**Constraints:**
- Refusal text doesn't lecture (max 2 sentences).
- Soft-refusal must still provide value.

**Output format:** Policy doc + sample refusal texts + evaluation rubric.

Recommended models

claudegpt-4o

More in AI Engineering