Notifications & Alert Channels
Audience: Platform administrators, SREs, on-call engineers
Time: ~12 min
Prerequisites: Control Plane deployed and reachable; admin role in the Control Plane UI; NOTIFICATION_ENCRYPTION_KEY (Fernet) configured in your secret store; SMTP relay reachable (for Email), or webhook/OAuth credentials for Slack/Teams/PagerDuty/ServiceNow.
The Notifications page (Settings → Notifications) is the single
owner of alert channel configuration in Control Core. Everything that
delivers a message to a human — Email, Slack, Microsoft Teams,
PagerDuty, ServiceNow, generic Webhook — is created, edited, tested,
enabled, and deleted from this page.
Action Destinations are separate. Profile-based destinations (
webhook_generic,api_gateway,workflow_router,policy_engine,approval_queue,break_glass_bridge) and the SIEM forwarder live under Settings → Action Destinations. The two surfaces share the same encrypted credential store but have independent CRUD, audit, and runtime delivery paths. Channel CRUD has a single home — this page.
Page layout
The Notifications page renders three sections, environment-scoped via the header environment selector (Sandbox / Production):
| Section | Purpose |
|---|---|
| General System Alerts | Built-in alert types (failed logins, policy errors, license/subscription warnings, bouncer offline, anomaly detection). For each alert type, toggle whether it sends to Email / Slack / Teams / PagerDuty / ServiceNow / Webhook. Severity threshold per environment. |
| Custom Alert Rules | User-defined rules with trigger conditions (metric / event filter), frequency caps, and target channel selections. Useful for "DM me on Slack when denial rate spikes for resource X". |
| Alert Channels | CRUD for the actual channel connections — endpoint URL, credential, environment scope, severity threshold, max attempts. This is the single source of truth for channel configuration. |
The environment badge in the header (yellow=Sandbox, red=Production) tells you which environment the current channels apply to. Channels are stored per-environment so a Slack misconfiguration in Sandbox cannot spam your Production incident channel.
Supported channel types
| Channel | Auth | Used by |
|---|---|---|
| Email (SMTP) | SMTP server + credentials, optional STARTTLS | Built-in alerts, custom rules, notification and email post-decision actions |
| Slack | Incoming webhook URL or OAuth bot token + channel | Built-in alerts, custom rules, notification actions |
| Microsoft Teams | Incoming webhook URL (Teams connector) | Built-in alerts, custom rules, notification actions |
| PagerDuty | Events API v2 integration key (Routing Key) | Built-in alerts, custom rules, break_glass_notify actions |
| ServiceNow | Instance URL + basic auth or OAuth; creates incidents on policy denials | Built-in alerts, custom rules, notification actions |
| Webhook | HTTPS URL, optional HMAC signature secret, optional bearer token | Built-in alerts, custom rules, notification and webhook actions referenced as channel:<id> |
Troubleshooting: If the Test button on a channel returns
connection refused, verify that the Control Plane can reach the channel endpoint. Common causes: corporate egress proxy not configured in the Control Plane container, DNS resolution failures inside the cluster, firewall blocking outbound on the channel's port. Exec into the Control Plane API container and runcurl -v <endpoint>to isolate transport from configuration.
Add a channel
This is the canonical CRUD flow — applies to every channel type.
- Navigate to Settings → Notifications. (~10 sec)
- Confirm the environment badge shows the environment you want (Sandbox / Production). Switch using the header environment selector if needed. (~5 sec)
- Scroll to Alert Channels and click Add Channel. (~5 sec)
- Choose the channel type and fill in the type-specific dialog: (~1–3 min)
- Email — SMTP host, port, username, password (encrypted at
rest),
Fromaddress, optionalReply-To, STARTTLS toggle, default recipient list. - Slack — Either Incoming Webhook URL or Bot Token + Channel ID / Name + Username + Icon emoji.
- Teams — Incoming Webhook URL (from the Teams Connectors flow);
optional default
themeColor. - PagerDuty — Events API v2 Integration Key (Routing Key),
default severity (
info/warning/error/critical), default source, optional dedup_key template. - ServiceNow — Instance URL (
https://yourcorp.service-now.com), auth method (basic / OAuth), credentials, default assignment_group, default category / subcategory. - Webhook — HTTPS URL, HTTP method (
POSTrecommended), optionalAuthorization: Bearer <token>, optional HMAC signature secret (HMAC-SHA256 over the JSON body, sent asX-Cc-Signature).
- Email — SMTP host, port, username, password (encrypted at
rest),
- Set Severity threshold — only events at this level or above
will be delivered to this channel (e.g. set to
warningto filter outinfochatter). (~10 sec) - Set Max attempts — how many times the delivery scheduler will retry on failure (default 5, capped at 10). (~10 sec)
- Toggle Enabled ON. (~5 sec)
- Click Save. (~5 sec)
- Click Test to send a probe payload. The page shows the HTTP status / SMTP response code. (~10 sec)
Troubleshooting: If Test succeeds but real alerts never arrive, check Settings → Notifications → Delivery Log (next section) to see whether the delivery was attempted and what response the channel returned. Common causes: severity threshold higher than the alert severity (lower the threshold), channel disabled at the environment level, alert rule pointing to a different channel ID, or the alert event was suppressed by frequency capping.
Delivery log
Each delivery attempt is recorded in notification_delivery_log with
status (pending → sent / failed / dlq), HTTP response code (or
SMTP response), retry count, and the redacted payload. View it under
the Delivery Log tab on this page (or query
/v1/notifications/deliveries?channel_id=<id> from the API).
The background delivery scheduler retries failed deliveries every
15 minutes with exponential backoff (max 1 hour between attempts,
capped by the channel's max_attempts). Once max_attempts is
exhausted, the row is moved to dlq and a notification_dlq audit
event is emitted.
General System Alerts
Built-in alert types fire on platform events. For each one, toggle the channels it should reach in the General System Alerts card:
| Alert type | Default severity | Typical use |
|---|---|---|
| Failed login attempts | warning | 5+ failed logins from same source IP within 10 minutes |
| Policy validation error | error | Rego compile failure during policy sync |
| License / subscription warning | warning | License expiring in < 30 days, telemetry connection lost |
| Bouncer offline | error | A registered bouncer missed > 3 heartbeats |
| Anomaly detected | warning / critical | Denial rate spike, suspicious source IP, unusual user activity, latency anomaly |
| Audit pipeline error | critical | Audit ingest dropped events, SIEM outbox stuck > 5 min |
For each row, click the channel icon (Email / Slack / Teams /
PagerDuty / ServiceNow / Webhook) to toggle delivery to that channel
type. Channel-level severity thresholds still apply — toggling Slack
ON for Failed login does nothing if your Slack channel is set to
error minimum.
Custom Alert Rules
For domain-specific alerts (e.g. "page on-call when denial rate on
/api/billing exceeds 10/min in production"), create a custom rule:
- Click Add Custom Rule. (~5 sec)
- Step 1 — Rule Details: name, description, severity. (~30 sec)
- Step 2 — Trigger Condition: metric (e.g.
denial_rate,latency_p99,audit_event_count), filter (resource, decision, user attribute), threshold and operator (>,>=,==). (~1 min) - Step 3 — Frequency: evaluation window (e.g. last 5 min), cooldown (don't re-fire within 30 min), maximum fires per day. (~30 sec)
- Step 4 — Channels: pick one or more channels created above. (~15 sec)
- Click Save Rule and toggle Enabled. (~10 sec)
The rule scheduler evaluates custom rules on the same cadence as the
metric ingest pipeline (every minute for hot metrics, every 5 minutes
for slow rollups). Audit events for fired rules are tagged
AI_ANOMALY_DETECTED or CUSTOM_ALERT_FIRED in the audit feed.
How notification actions resolve to channels
A control author can attach a notification action with
destination_ref: "channel:<id>":
post_decision_actions:
- action_type: notification
trigger: on_deny
destination_ref: "channel:42" # id from /v1/notifications/channels
payload:
message: "Denied request to {{ resource.path }} from {{ subject.email }}"
severity: warning
At runtime:
Bouncer → Control Plane (audit event)
│
▼
Post-Decision Action Dispatcher
│
▼
resolve "channel:42" against
notification_channels table
│
▼
Notification Delivery Service
│
▼
Email / Slack / Teams / PagerDuty / ServiceNow / Webhook
(via channel-specific transport adapter)
│
▼
notification_delivery_log row updated
with status, response, retry count
The control bundle stores only the channel id — the actual credential never leaves the encrypted store. Rotating a channel's credential under Alert Channels automatically applies to every control that references it.
Credential storage
- Channel credentials are encrypted at rest with
NOTIFICATION_ENCRYPTION_KEY(Fernet), stored innotification_channels.encrypted_credentials. - API responses surface only
credential_set: true|false— credentials are never returned byGET /v1/notifications/channels. - To rotate a credential, open the channel, paste the new value, click Save, then click Test to verify.
- To clear a credential, use Delete Channel — disabling alone preserves the encrypted blob.
- Demo deployments only (
DEMO_MODE=true) accept an ephemeral key — production deployments must supplyNOTIFICATION_ENCRYPTION_KEYor the encryption service refuses to start.
Multi-environment routing
Channels are environment-scoped to prevent cross-environment leakage:
| Practice | Sandbox | Production |
|---|---|---|
| Slack channel | #alerts-sandbox | #alerts-prod (PagerDuty-bridged) |
| Email recipients | dev team distribution list | on-call rota + ops manager |
| PagerDuty integration key | low-urgency service | high-urgency on-call service |
| Webhook URL | https://staging.alerts.corp/... | https://prod.alerts.corp/... |
| Severity threshold | info (verbose for debugging) | warning (signal only) |
Switch environments using the header selector. Channels created under Sandbox do not appear when you switch to Production, and vice versa. Credentials are stored per-row, so rotating a Production credential does not affect Sandbox.
Backend isolation (for SREs)
Notification Channels and Action Destinations are physically separated in the data layer. This is a ship-blocker invariant — they must never be merged.
| Concern | Notification Channels | Action Destinations |
|---|---|---|
| DB table | notification_channels | integrations (filtered by type='action_destination') |
| Control Plane API path | /v1/notifications/channels | /integrations/ |
| Encrypted credential field | notification_channels.encrypted_credentials (dedicated column) | configuration._encrypted_credential (per-row JSON blob) |
| Credential redaction helper | Operates on NotificationChannel rows only | Operates on Integration rows only |
| Runtime delivery path | Notification Delivery Service → channel adapters + notification_delivery_log | Post-Decision Action Dispatcher → profile transports / SIEM outbox |
| UI owner | /settings/notifications | /settings/action-destinations |
The platform ships a regression suite that asserts the data-layer disjointness on every release build. Any change that merges or cross-writes the two tables fails CI.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Test button returns 401/403 | Wrong credential | Re-paste the credential and save; retry Test. |
| Test succeeds but real alerts never arrive | Severity threshold higher than alert severity | Lower the channel's threshold, or raise the alert severity. |
| Slack OAuth flow fails with 501 | SLACK_CLIENT_ID / SLACK_CLIENT_SECRET env vars not set on the Control Plane API | Set the env vars and restart the Control Plane API, or use the Incoming Webhook URL option instead. |
| Teams webhook returns 410 Gone | Webhook URL was rotated or the connector was deleted | Recreate the connector in the Teams channel settings, paste the new URL, click Save. |
| PagerDuty incidents never created | Wrong Events API integration key, or routing key for a deleted service | Verify the integration key in PagerDuty → Service → Integrations; replace and save. |
| ServiceNow incidents created but assigned to wrong group | assignment_group mismatch with ServiceNow CMDB | Update the channel's default assignment_group to a value that exists in your ServiceNow instance. |
Webhook delivery failed with 403 and signature complaint | HMAC secret on Control Core does not match the receiver | Rotate the secret on both ends and click Save. |
| Channel disabled mid-day, deliveries stop | Some operator clicked Disabled | Settings → Audit Logs filtered on NOTIFICATION_CHANNEL_UPDATED shows who disabled it and when. Re-enable. |
notification_delivery_log rows stuck pending for > 5 min | Delivery scheduler not running on the Control Plane | Check Control Plane scheduler logs for notification_delivery_processor. Restart the Control Plane API container if necessary. |
| "Where did the in-page channel tab on Action Destinations go?" | The legacy Notification Channels tab on /settings/action-destinations was consolidated here | Channel CRUD lives only on this page now. The legacy tab can be temporarily re-enabled via the control_plane_paths.action_destinations_channels_tab feature flag. |
Next steps
- Action Destinations — destination
profiles for
webhook,workflow,approval_gate,policy_trigger,break_glass_notify, and the SIEM forwarder. - Actions & Post-Decision Flows — author-side guide to action types, triggers, and JSON shapes.
- Approval Gates — gates pair an
approval_queuedestination with a notification channel for approver pings. - Audit Logs — filter on
NOTIFICATION_CHANNEL_*andNOTIFICATION_DELIVERY_*event types.