AI Pilot

Use AI Pilot (in Settings -> AI Pilot) to configure per-bouncer AI Pilot behavior for:

  • Observability (token, cost, TTFT/ITL visibility)
  • Cost Optimization (token-aware limits, fallback routing, prompt cache)
  • Content Safety (harm thresholds, prompt controls, blocklists, actions)

This page focuses on local bouncer overrides on top of global AI Pilot defaults.


What is being intercepted?

AI Pilot controls are enforced on traffic for resources that are bound to a bouncer in Settings -> Resources.

  • If a bouncer runs as sidecar, resources attached to that sidecar are in scope.
  • If a bouncer runs as gateway/edge, resources routed through that bouncer are in scope.
  • LLM, MCP, AI Agent, API/App, and Data endpoints are all treated as resources and can be filtered from AI Pilot UI.

Use Settings -> AI Pilot -> Intercept Scope to see exactly which targets are currently monitored/managed for the selected bouncer.


Global vs local model

Click to enlarge

  • Global defaults define enterprise baseline.
  • Local override tailors behavior for one bouncer/application.
  • Effective config is compiled and pushed to the bouncer runtime.

Configure Observability

In Observability, monitor:

  • Token consumption trends
  • Cost burn trends
  • TTFT (Time To First Token)
  • ITL (Inter-Token Latency)
  • Recent anonymized payload stream with redaction indicators

What to validate

  • Charts update for selected bouncer
  • Metrics window aligns to expected traffic period
  • Payload stream shows redaction flags where applicable

Forensics intelligence

The AI Pilot Observability section now includes a forensics summary and trace log sourced from control-plane audit and bouncer telemetry.

Use it to investigate:

  • Deny and failure spikes
  • Prompt shield and safety-triggered actions
  • Correlated request decisions over a selected time window

Configure Cost Optimization

In Cost Optimization, configure:

  1. Token-aware rate limits
    • tokens/min/user
    • burst tokens
    • window seconds
  2. Model fallback routing (ordered)
    • Primary -> fallback chain
  3. Prompt caching
    • enable/disable
    • TTL in seconds
ControlBaseline
Tokens/min/userStart conservative per app profile
Burst20-40% of per-minute budget
FallbackKeep at least two provider tiers
Prompt cache TTL300-900s, tune by response freshness

Configure Content Safety

In Content Safety, configure:

  • Harm threshold sliders: Hate, Sexual, Violence, Self-Harm (0-3)
  • Prompt shields and security controls
  • Custom blocklists (exact and regex)
  • Allowlist terms
  • Trusted domains
  • Violation action (block, redact, annotate)

For detailed control semantics, see:


Verification checklist

  • Selected bouncer is correct
  • Use global state is intentional per section
  • Override saved successfully
  • Effective behavior observed in runtime logs/audit
  • AI Control dashboard metrics align with test traffic