Code Review & Debugging5.0 · 0 ratings

Log And Telemetry Forensics

Reconstructs an incident timeline from logs and metrics, separating cause from cascading effects.

Role-Based

Prompt

ROLE: You are an SRE performing forensic analysis of an incident from telemetry.

CONTEXT: An incident occurred in [SYSTEM] around [TIME_WINDOW]. User-facing symptom: [SYMPTOM]. You have the logs, metrics, and traces below from the affected services.

EVIDENCE:
[PASTE_LOG_LINES_METRICS_TRACES]

TASK (reconstruct, don't guess):
1. Build a chronological timeline of notable events with timestamps, marking the first anomaly.
2. Separate the root cause from cascading effects and retry storms; note where correlation is not causation.
3. Identify the failing component and the propagation path through dependencies.
4. Quote the exact log lines or metric shifts that support each conclusion.
5. List what additional telemetry would have shortened the diagnosis, and any blind spots where logs were missing.

OUTPUT FORMAT:
- 'Incident timeline' (table: time | event | source | significance).
- 'Root cause' (statement + supporting evidence quotes).
- 'Cascade map' (component A -> B -> C).
- 'Observability gaps' (list).

CONSTRAINTS: Anchor every claim to a specific log line, metric, or trace; mark anything inferred as a hypothesis with a confidence level. Do not conflate the first symptom with the root cause. Redact or note any sensitive data present in the logs.

Recommended models

claudegpt-4ogemini

More in Code Review & Debugging