AI Pilot — Cost Optimization (multi-provider)

Audience: FinOps, platform engineers, AI governance owners Time: ~10 min read

A single AI Pilot bouncer can hold many cost rules, each scoped to a specific target. This page explains the rule shape, the merge contract, and includes worked enterprise examples.

Rule shape

Each rule in Settings -> AI Pilot -> Cost Optimization has these fields:

FieldRequiredDescription
target_typeyesllm / mcp / rag / agent / api
provideryes for llm/rag/agente.g. openai, azure-openai, bedrock
model_or_serveryesLLM model id, MCP server id, or RAG/agent id
application_idnoLimit rule to a specific app (e.g. marketing-portal)
user_rolenoLimit rule to a role (e.g. marketing, legal, intern)
tokens_per_min_per_usernoPer-user token budget per minute
tokens_per_min_totalnoAggregate token budget per minute across all users
usd_per_day_capnoHard cost cap per day (estimated from tokens × pricing)
burst_tokensnoBucket size for spikes
window_secondsnoCounter window (default 60)
fallback_route_idnoWhere traffic goes when this rule denies
action_on_breachnoblock / fallback / degrade (default fallback)

How rules merge

Rules apply in priority order. The most specific rule wins; less specific rules continue to apply for any unbounded field.

(application_id + user_role + provider + model)  ← most specific
(application_id + provider + model)
(provider + model)
(provider)
(target_type)                                     ← least specific

Example: a rule "OpenAI / gpt-4o limited to 5k tok/min/user" plus a rule "Marketing app limited to 50k tok/min total" both apply when Marketing calls GPT-4o.

Worked examples

"GPT-4o capped at $50/day total but $5/user/day for Marketing"

- target_type: llm
  provider: openai
  model_or_server: gpt-4o
  usd_per_day_cap: 50.0

- target_type: llm
  provider: openai
  model_or_server: gpt-4o
  application_id: marketing-portal
  user_role: marketing
  usd_per_day_cap: 5.0
  action_on_breach: block
- target_type: llm
  provider: bedrock
  model_or_server: anthropic.claude-3-haiku-20240307-v1:0
  user_role: legal
  tokens_per_min_per_user: 8000

- target_type: llm
  provider: bedrock
  model_or_server: anthropic.claude-3-haiku-20240307-v1:0
  action_on_breach: block       # everyone else is blocked

"Internal RAG no per-user cap but global 10k tok/min"

- target_type: rag
  provider: internal
  model_or_server: kb-finance
  tokens_per_min_total: 10000
  burst_tokens: 2000

"MCP weather_lookup 100 calls/min"

- target_type: mcp
  model_or_server: external-tools
  mcp_tool: weather_lookup
  tokens_per_min_total: 100      # interpreted as RPS for MCP
  window_seconds: 60

"GPT-4o falls back to GPT-4o-mini above 200k tok/min"

- target_type: llm
  provider: openai
  model_or_server: gpt-4o
  tokens_per_min_total: 200000
  fallback_route_id: route-gpt-4o-mini
  action_on_breach: fallback

Where the counters live

Token and request counters live in Redis when the cache topology is bundled or external. With disabled, counters are per-process only — useful for sandbox but unsuitable for global enforcement across replicas.

The compiler emits one Envoy rate-limit descriptor per rule. The RLS sidecar enforces them atomically.

Cost estimation

USD costs are computed from tokens_in × in_price + tokens_out × out_price using a per-provider/model price table maintained in the Control Plane AI Pilot → Cost Optimization pricing service. Override prices in Settings -> AI Pilot -> Cost Optimization -> Pricing overrides when you have a contracted rate.

Reporting and alerts

The /pilot -> Analytics tab and the Token Ledger both filter by every dimension in Observability Dimensions. Set up alerts in your SIEM keyed on AI_COST_RULE_BREACH events.