AI Pilot — Cost Optimization (multi-provider)

Audience: FinOps, platform engineers, AI governance owners Time: ~10 min read

A single AI Pilot bouncer can hold many cost rules, each scoped to a specific target. This page explains the rule shape, the merge contract, and includes worked enterprise examples.

Rule shape

Each rule in Settings -> AI Pilot -> Cost Optimization has these fields:

Field	Required	Description
`target_type`	yes	`llm` / `mcp` / `rag` / `agent` / `api`
`provider`	yes for `llm`/`rag`/`agent`	e.g. `openai`, `azure-openai`, `bedrock`
`model_or_server`	yes	LLM model id, MCP server id, or RAG/agent id
`application_id`	no	Limit rule to a specific app (e.g. `marketing-portal`)
`user_role`	no	Limit rule to a role (e.g. `marketing`, `legal`, `intern`)
`tokens_per_min_per_user`	no	Per-user token budget per minute
`tokens_per_min_total`	no	Aggregate token budget per minute across all users
`usd_per_day_cap`	no	Hard cost cap per day (estimated from tokens × pricing)
`burst_tokens`	no	Bucket size for spikes
`window_seconds`	no	Counter window (default 60)
`fallback_route_id`	no	Where traffic goes when this rule denies
`action_on_breach`	no	`block` / `fallback` / `degrade` (default `fallback`)

How rules merge

Rules apply in priority order. The most specific rule wins; less specific rules continue to apply for any unbounded field.

(application_id + user_role + provider + model)  ← most specific
(application_id + provider + model)
(provider + model)
(provider)
(target_type)                                     ← least specific

Example: a rule "OpenAI / gpt-4o limited to 5k tok/min/user" plus a rule "Marketing app limited to 50k tok/min total" both apply when Marketing calls GPT-4o.

Worked examples

"GPT-4o capped at $50/day total but $5/user/day for Marketing"

- target_type: llm
  provider: openai
  model_or_server: gpt-4o
  usd_per_day_cap: 50.0

- target_type: llm
  provider: openai
  model_or_server: gpt-4o
  application_id: marketing-portal
  user_role: marketing
  usd_per_day_cap: 5.0
  action_on_breach: block

"Claude on Bedrock allowed only for Legal"

- target_type: llm
  provider: bedrock
  model_or_server: anthropic.claude-3-haiku-20240307-v1:0
  user_role: legal
  tokens_per_min_per_user: 8000

- target_type: llm
  provider: bedrock
  model_or_server: anthropic.claude-3-haiku-20240307-v1:0
  action_on_breach: block       # everyone else is blocked

"Internal RAG no per-user cap but global 10k tok/min"

- target_type: rag
  provider: internal
  model_or_server: kb-finance
  tokens_per_min_total: 10000
  burst_tokens: 2000

"MCP `weather_lookup` 100 calls/min"

- target_type: mcp
  model_or_server: external-tools
  mcp_tool: weather_lookup
  tokens_per_min_total: 100      # interpreted as RPS for MCP
  window_seconds: 60

"GPT-4o falls back to GPT-4o-mini above 200k tok/min"

- target_type: llm
  provider: openai
  model_or_server: gpt-4o
  tokens_per_min_total: 200000
  fallback_route_id: route-gpt-4o-mini
  action_on_breach: fallback

Where the counters live

Token and request counters live in Redis when the cache topology is bundled or external. With disabled, counters are per-process only — useful for sandbox but unsuitable for global enforcement across replicas.

The compiler emits one Envoy rate-limit descriptor per rule. The RLS sidecar enforces them atomically.

Cost estimation

USD costs are computed from tokens_in × in_price + tokens_out × out_price using a per-provider/model price table maintained in the Control Plane AI Pilot → Cost Optimization pricing service. Override prices in Settings -> AI Pilot -> Cost Optimization -> Pricing overrides when you have a contracted rate.

Reporting and alerts

The /pilot -> Analytics tab and the Token Ledger both filter by every dimension in Observability Dimensions. Set up alerts in your SIEM keyed on AI_COST_RULE_BREACH events.