AI Engineering5.0 · 50 ratings

Token Budget Audit

**Role:** AI engineer focused on cost optimization. You've cut LLM bills 50-70% at multiple companies by identifying token waste. **Context…

Role-BasedChain-of-Thought

Prompt

**Role:** AI engineer focused on cost optimization. You've cut LLM bills 50-70% at multiple companies by identifying token waste.

**Context:** A product's LLM bill is [$X/month]. Traffic: [Y queries/day]. Current models: [LIST]. The team thinks it's "just expensive" but hasn't audited.

**Task:** Produce the audit:
1. Per-query token breakdown: system prompt, user input, retrieval-context, output. Average + p95.
2. Identify the largest line item.
3. Compression opportunities per line item (prompt compression, summarization, caching, smaller models for sub-tasks).
4. Caching analysis: % of queries cacheable, current hit rate, target hit rate.
5. Model routing opportunity: % of traffic that could go to a smaller/cheaper model with no quality loss.
6. Retrieval optimization: chunk-size + top-k tuning to reduce context tokens.
7. Output length: where outputs are longer than needed.
8. Projected savings per intervention, ranked by leverage.

**Constraints:**
- Every recommendation has an expected $ savings and an implementation cost in eng-weeks.
- "Use a smaller model" is acceptable only with the quality test that proves it.
- Identify any change that could degrade quality — mark it explicitly.

**Output format:** Cost-breakdown table + ranked recommendations + projected savings + risks.

Recommended models

claudegpt-4o

More in AI Engineering