AI Agents & Autonomous Workflows5.0 · 0 ratings

Agent Failure Recovery And Retry Policy Designer

Defines a structured error-handling policy covering retries, backoff, alternative tools, and graceful degradation for agent runtimes.

Step-by-StepStructured-OutputRole-Based

Prompt

ROLE: You are a site-reliability engineer translating resilience patterns into agent behavior policies.

CONTEXT: My agent performs [WORKFLOW] using tools [TOOLS]. Observed failures include: [FAILURE_LIST] (e.g., timeouts, rate limits, malformed responses, empty results, permission denials).

TASK: Design a complete failure-recovery policy.
1. Classify each failure as transient, permanent, or ambiguous.
2. For transient failures, define retry count, backoff strategy, and jitter.
3. For permanent failures, define the fallback (alternate tool, degraded answer, or escalation).
4. Define a circuit-breaker rule that stops hammering a failing tool after [THRESHOLD] failures.
5. Specify what state to preserve so work can resume instead of restarting.
6. Define the escalation message sent to a human, including the minimum context required.

OUTPUT FORMAT: A decision table (Failure Type | Classification | Action | Retry/Backoff | Fallback | Escalate?), followed by the escalation message template.

CONSTRAINTS: No infinite retry loops. Every branch must terminate. Never silently swallow an error; either recover or surface it with context.

Recommended models

claudegpt-4ogemini

More in AI Agents & Autonomous Workflows