AI Pilot
Use AI Pilot (in Settings -> AI Pilot) to configure per-bouncer AI Pilot behavior for:
- Observability (token, cost, TTFT/ITL visibility)
- Cost Optimization (token-aware limits, fallback routing, prompt cache)
- Content Safety (harm thresholds, prompt controls, blocklists, actions)
This page focuses on local bouncer overrides on top of global AI Pilot defaults.
What is being intercepted?
AI Pilot controls are enforced on traffic for resources that are bound to a bouncer in Settings -> Resources.
- If a bouncer runs as sidecar, resources attached to that sidecar are in scope.
- If a bouncer runs as gateway/edge, resources routed through that bouncer are in scope.
- LLM, MCP, AI Agent, API/App, and Data endpoints are all treated as resources and can be filtered from AI Pilot UI.
Use Settings -> AI Pilot -> Intercept Scope to see exactly which targets are currently monitored/managed for the selected bouncer.
Global vs local model
Click to enlarge
- Global defaults define enterprise baseline.
- Local override tailors behavior for one bouncer/application.
- Effective config is compiled and pushed to the bouncer runtime.
Configure Observability
In Observability, monitor:
- Token consumption trends
- Cost burn trends
- TTFT (Time To First Token)
- ITL (Inter-Token Latency)
- Recent anonymized payload stream with redaction indicators
What to validate
- Charts update for selected bouncer
- Metrics window aligns to expected traffic period
- Payload stream shows redaction flags where applicable
Forensics intelligence
The AI Pilot Observability section now includes a forensics summary and trace log sourced from control-plane audit and bouncer telemetry.
Use it to investigate:
- Deny and failure spikes
- Prompt shield and safety-triggered actions
- Correlated request decisions over a selected time window
Configure Cost Optimization
In Cost Optimization, configure:
- Token-aware rate limits
- tokens/min/user
- burst tokens
- window seconds
- Model fallback routing (ordered)
- Primary -> fallback chain
- Prompt caching
- enable/disable
- TTL in seconds
Recommended enterprise baseline
| Control | Baseline |
|---|---|
| Tokens/min/user | Start conservative per app profile |
| Burst | 20-40% of per-minute budget |
| Fallback | Keep at least two provider tiers |
| Prompt cache TTL | 300-900s, tune by response freshness |
Configure Content Safety
In Content Safety, configure:
- Harm threshold sliders: Hate, Sexual, Violence, Self-Harm (0-3)
- Prompt shields and security controls
- Custom blocklists (exact and regex)
- Allowlist terms
- Trusted domains
- Violation action (
block,redact,annotate)
For detailed control semantics, see:
Verification checklist
- Selected bouncer is correct
-
Use globalstate is intentional per section - Override saved successfully
- Effective behavior observed in runtime logs/audit
- AI Control dashboard metrics align with test traffic