🤖 AI Pilot

Use AI Pilot (in Settings -> AI Pilot) to configure per-bouncer AI Pilot behavior for:

  • Observability (token, cost, TTFT/ITL visibility)
  • Cost Optimization (token-aware limits, fallback routing, prompt cache)
  • Content Safety (harm thresholds, prompt controls, blocklists, actions)

This page focuses on local bouncer overrides on top of global AI Pilot defaults.


📌 What is being intercepted?

AI Pilot controls are enforced on traffic for resources that are bound to a bouncer in Settings -> Resources.

  • If a bouncer runs as sidecar, resources attached to that sidecar are in scope.
  • If a bouncer runs as gateway/edge, resources routed through that bouncer are in scope.
  • LLM, MCP, AI Agent, API/App, and Data endpoints are all treated as resources and can be filtered from AI Pilot UI.

Use Settings -> AI Pilot -> Intercept Scope to see exactly which targets are currently monitored/managed for the selected bouncer.


📌 Global vs local model

Click to enlarge

  • Global defaults define enterprise baseline.
  • Local override tailors behavior for one bouncer/application.
  • Effective config is compiled and pushed to the bouncer runtime.

👁️ Configure Observability

In Observability, monitor:

  • Token consumption trends
  • Cost burn trends
  • TTFT (Time To First Token)
  • ITL (Inter-Token Latency)
  • Recent anonymized payload stream with redaction indicators

What to validate

  • Charts update for selected bouncer
  • Metrics window aligns to expected traffic period
  • Payload stream shows redaction flags where applicable

Forensics intelligence

The AI Pilot Observability section now includes a forensics summary and trace log sourced from control-plane audit and bouncer telemetry.

Use it to investigate:

  • Deny and failure spikes
  • Prompt shield and safety-triggered actions
  • Correlated request decisions over a selected time window

⚡ Configure Cost Optimization

In Cost Optimization, configure:

  1. Token-aware rate limits
    • tokens/min/user
    • burst tokens
    • window seconds
  2. Model fallback routing (ordered)
    • Primary -> fallback chain
  3. Prompt caching
    • enable/disable
    • TTL in seconds
ControlBaseline
Tokens/min/userStart conservative per app profile
Burst20-40% of per-minute budget
FallbackKeep at least two provider tiers
Prompt cache TTL300-900s, tune by response freshness

📌 Configure Content Safety

In Content Safety, configure:

  • Harm threshold sliders: Hate, Sexual, Violence, Self-Harm (0-3)
  • Prompt shields and security controls
  • Custom blocklists (exact and regex)
  • Allowlist terms
  • Trusted domains
  • Violation action (block, redact, annotate)

For detailed control semantics, see:


📌 Verification checklist

  • Selected bouncer is correct
  • Use global state is intentional per section
  • Override saved successfully
  • Effective behavior observed in runtime logs/audit
  • AI Control dashboard metrics align with test traffic