Control CoreDocs

AI Pilot

Use AI Pilot (in Settings -> AI Pilot) to configure per-bouncer AI Pilot behavior for:

Observability (token, cost, TTFT/ITL visibility)
Cost Optimization (token-aware limits, fallback routing, prompt cache)
Content Safety (harm thresholds, prompt controls, blocklists, actions)

This page focuses on local bouncer overrides on top of global AI Pilot defaults.

What is being intercepted?

AI Pilot controls are enforced on traffic for resources that are bound to a bouncer in Settings -> Resources.

If a bouncer runs as sidecar, resources attached to that sidecar are in scope.
If a bouncer runs as gateway/edge, resources routed through that bouncer are in scope.
LLM, MCP, AI Agent, API/App, and Data endpoints are all treated as resources and can be filtered from AI Pilot UI.

Use Settings -> AI Pilot -> Intercept Scope to see exactly which targets are currently monitored/managed for the selected bouncer.

Global vs local model

Click to enlarge

Global defaults define enterprise baseline.
Local override tailors behavior for one bouncer/application.
Effective config is compiled and pushed to the bouncer runtime.

Configure Observability

In Observability, monitor:

Token consumption trends
Cost burn trends
TTFT (Time To First Token)
ITL (Inter-Token Latency)
Recent anonymized payload stream with redaction indicators

What to validate

Charts update for selected bouncer
Metrics window aligns to expected traffic period
Payload stream shows redaction flags where applicable

Forensics intelligence

The AI Pilot Observability section now includes a forensics summary and trace log sourced from control-plane audit and bouncer telemetry.

Use it to investigate:

Deny and failure spikes
Prompt shield and safety-triggered actions
Correlated request decisions over a selected time window

Configure Cost Optimization

In Cost Optimization, configure:

Token-aware rate limits
- tokens/min/user
- burst tokens
- window seconds
Model fallback routing (ordered)
- Primary -> fallback chain
Prompt caching
- enable/disable
- TTL in seconds

Recommended enterprise baseline

Control	Baseline
Tokens/min/user	Start conservative per app profile
Burst	20-40% of per-minute budget
Fallback	Keep at least two provider tiers
Prompt cache TTL	300-900s, tune by response freshness

Configure Content Safety

In Content Safety, configure:

Harm threshold sliders: Hate, Sexual, Violence, Self-Harm (0-3)
Prompt shields and security controls
Custom blocklists (exact and regex)
Allowlist terms
Trusted domains
Violation action (block, redact, annotate)

For detailed control semantics, see:

Verification checklist

Selected bouncer is correct
Use global state is intentional per section
Override saved successfully
Effective behavior observed in runtime logs/audit
AI Control dashboard metrics align with test traffic