🤖 AI Pilot
Use AI Pilot (in Settings -> AI Pilot) to configure per-bouncer AI Pilot behavior for:
- Observability (token, cost, TTFT/ITL visibility)
- Cost Optimization (token-aware limits, fallback routing, prompt cache)
- Content Safety (harm thresholds, prompt controls, blocklists, actions)
This page focuses on local bouncer overrides on top of global AI Pilot defaults.
📌 What is being intercepted?
AI Pilot controls are enforced on traffic for resources that are bound to a bouncer in Settings -> Resources.
- If a bouncer runs as sidecar, resources attached to that sidecar are in scope.
- If a bouncer runs as gateway/edge, resources routed through that bouncer are in scope.
- LLM, MCP, AI Agent, API/App, and Data endpoints are all treated as resources and can be filtered from AI Pilot UI.
Use Settings -> AI Pilot -> Intercept Scope to see exactly which targets are currently monitored/managed for the selected bouncer.
📌 Global vs local model
Click to enlarge
- Global defaults define enterprise baseline.
- Local override tailors behavior for one bouncer/application.
- Effective config is compiled and pushed to the bouncer runtime.
👁️ Configure Observability
In Observability, monitor:
- Token consumption trends
- Cost burn trends
- TTFT (Time To First Token)
- ITL (Inter-Token Latency)
- Recent anonymized payload stream with redaction indicators
What to validate
- Charts update for selected bouncer
- Metrics window aligns to expected traffic period
- Payload stream shows redaction flags where applicable
Forensics intelligence
The AI Pilot Observability section now includes a forensics summary and trace log sourced from control-plane audit and bouncer telemetry.
Use it to investigate:
- Deny and failure spikes
- Prompt shield and safety-triggered actions
- Correlated request decisions over a selected time window
⚡ Configure Cost Optimization
In Cost Optimization, configure:
- Token-aware rate limits
- tokens/min/user
- burst tokens
- window seconds
- Model fallback routing (ordered)
- Primary -> fallback chain
- Prompt caching
- enable/disable
- TTL in seconds
Recommended enterprise baseline
| Control | Baseline |
|---|---|
| Tokens/min/user | Start conservative per app profile |
| Burst | 20-40% of per-minute budget |
| Fallback | Keep at least two provider tiers |
| Prompt cache TTL | 300-900s, tune by response freshness |
📌 Configure Content Safety
In Content Safety, configure:
- Harm threshold sliders: Hate, Sexual, Violence, Self-Harm (0-3)
- Prompt shields and security controls
- Custom blocklists (exact and regex)
- Allowlist terms
- Trusted domains
- Violation action (
block,redact,annotate)
For detailed control semantics, see:
📌 Verification checklist
- Selected bouncer is correct
-
Use globalstate is intentional per section - Override saved successfully
- Effective behavior observed in runtime logs/audit
- AI Control dashboard metrics align with test traffic