AI Pilot — Cost Optimization (multi-provider)
Audience: FinOps, platform engineers, AI governance owners Time: ~10 min read
A single AI Pilot bouncer can hold many cost rules, each scoped to a specific target. This page explains the rule shape, the merge contract, and includes worked enterprise examples.
Rule shape
Each rule in Settings -> AI Pilot -> Cost Optimization has these fields:
| Field | Required | Description |
|---|---|---|
target_type | yes | llm / mcp / rag / agent / api |
provider | yes for llm/rag/agent | e.g. openai, azure-openai, bedrock |
model_or_server | yes | LLM model id, MCP server id, or RAG/agent id |
application_id | no | Limit rule to a specific app (e.g. marketing-portal) |
user_role | no | Limit rule to a role (e.g. marketing, legal, intern) |
tokens_per_min_per_user | no | Per-user token budget per minute |
tokens_per_min_total | no | Aggregate token budget per minute across all users |
usd_per_day_cap | no | Hard cost cap per day (estimated from tokens × pricing) |
burst_tokens | no | Bucket size for spikes |
window_seconds | no | Counter window (default 60) |
fallback_route_id | no | Where traffic goes when this rule denies |
action_on_breach | no | block / fallback / degrade (default fallback) |
How rules merge
Rules apply in priority order. The most specific rule wins; less specific rules continue to apply for any unbounded field.
(application_id + user_role + provider + model) ← most specific
(application_id + provider + model)
(provider + model)
(provider)
(target_type) ← least specific
Example: a rule "OpenAI / gpt-4o limited to 5k tok/min/user" plus a rule "Marketing app limited to 50k tok/min total" both apply when Marketing calls GPT-4o.
Worked examples
"GPT-4o capped at $50/day total but $5/user/day for Marketing"
- target_type: llm
provider: openai
model_or_server: gpt-4o
usd_per_day_cap: 50.0
- target_type: llm
provider: openai
model_or_server: gpt-4o
application_id: marketing-portal
user_role: marketing
usd_per_day_cap: 5.0
action_on_breach: block
"Claude on Bedrock allowed only for Legal"
- target_type: llm
provider: bedrock
model_or_server: anthropic.claude-3-haiku-20240307-v1:0
user_role: legal
tokens_per_min_per_user: 8000
- target_type: llm
provider: bedrock
model_or_server: anthropic.claude-3-haiku-20240307-v1:0
action_on_breach: block # everyone else is blocked
"Internal RAG no per-user cap but global 10k tok/min"
- target_type: rag
provider: internal
model_or_server: kb-finance
tokens_per_min_total: 10000
burst_tokens: 2000
"MCP weather_lookup 100 calls/min"
- target_type: mcp
model_or_server: external-tools
mcp_tool: weather_lookup
tokens_per_min_total: 100 # interpreted as RPS for MCP
window_seconds: 60
"GPT-4o falls back to GPT-4o-mini above 200k tok/min"
- target_type: llm
provider: openai
model_or_server: gpt-4o
tokens_per_min_total: 200000
fallback_route_id: route-gpt-4o-mini
action_on_breach: fallback
Where the counters live
Token and request counters live in Redis when the cache topology is bundled or external. With disabled, counters are per-process only — useful for sandbox but unsuitable for global enforcement across replicas.
The compiler emits one Envoy rate-limit descriptor per rule. The RLS sidecar enforces them atomically.
Cost estimation
USD costs are computed from tokens_in × in_price + tokens_out × out_price using a per-provider/model price table maintained in the Control Plane AI Pilot → Cost Optimization pricing service. Override prices in Settings -> AI Pilot -> Cost Optimization -> Pricing overrides when you have a contracted rate.
Reporting and alerts
The /pilot -> Analytics tab and the Token Ledger both filter by every dimension in Observability Dimensions. Set up alerts in your SIEM keyed on AI_COST_RULE_BREACH events.