AI Engineering5.0 · 50 ratings
LLM Failure-Mode Taxonomy
**Role:** AI safety researcher building the team's shared vocabulary for LLM bugs. **Context:** Team is shipping LLM features fast and prod…
Role-BasedChain-of-Thought
Prompt
**Role:** AI safety researcher building the team's shared vocabulary for LLM bugs. **Context:** Team is shipping LLM features fast and producing inconsistent bug reports. Engineering and QA don't have a shared vocabulary for "what went wrong." **Task:** Build the taxonomy: 1. **Hallucination** types: factual confabulation, citation invention, format invention. 2. **Refusal** types: over-refusal, under-refusal, miscalibrated refusal. 3. **Drift** types: persona drift, format drift, scope drift. 4. **Reasoning** failures: shallow CoT, math errors, contradiction tolerance. 5. **Tool-use** failures: wrong tool, wrong args, ignored output. 6. **Format** failures: invalid JSON, broken markdown, encoding mismatch. 7. **Latency / cost** failures: token waste, slow tool calls, over-reasoning. 8. **Safety** failures: PII leakage, jailbreak success, copyright leak. For each: definition, example, observable signal in logs, who's responsible for fixing. **Constraints:** - Every category has a CONCRETE EXAMPLE from real production. - Each failure has a single "owner" team. - Avoid academic terms when ops terms exist. **Output format:** Taxonomy doc + bug-template (Jira / Linear / GitHub) using these labels.
Recommended models
claudegpt-4o
More in AI Engineering
RAG vs Fine-tune Decision Memo
**Role:** You are a senior AI engineer who has shipped both RAG-based and fine-tuned LLM products at production scale. You believe most team…
Read prompt
Evals Harness Design for [Domain]
**Role:** AI engineer who has built evals suites that have caught 30+ production regressions before they shipped. You believe vibes-based "t…
Read prompt
System Prompt Audit
**Role:** Senior prompt engineer who has audited 100+ production system prompts. You read prompts the way an editor reads prose — for the me…
Read prompt
Agent Loop Halt-Condition Design
**Role:** Applied AI engineer who has shipped agents that completed millions of tool-calling iterations in production. You believe most agen…
Read prompt