Notifications & Alert Channels

Audience: Platform administrators, SREs, on-call engineers Time: ~12 min Prerequisites: Control Plane deployed and reachable; admin role in the Control Plane UI; NOTIFICATION_ENCRYPTION_KEY (Fernet) configured in your secret store; SMTP relay reachable (for Email), or webhook/OAuth credentials for Slack/Teams/PagerDuty/ServiceNow.

The Notifications page (Settings → Notifications) is the single owner of alert channel configuration in Control Core. Everything that delivers a message to a human — Email, Slack, Microsoft Teams, PagerDuty, ServiceNow, generic Webhook — is created, edited, tested, enabled, and deleted from this page.

Action Destinations are separate. Profile-based destinations (webhook_generic, api_gateway, workflow_router, policy_engine, approval_queue, break_glass_bridge) and the SIEM forwarder live under Settings → Action Destinations. The two surfaces share the same encrypted credential store but have independent CRUD, audit, and runtime delivery paths. Channel CRUD has a single home — this page.

Page layout

The Notifications page renders three sections, environment-scoped via the header environment selector (Sandbox / Production):

SectionPurpose
General System AlertsBuilt-in alert types (failed logins, policy errors, license/subscription warnings, bouncer offline, anomaly detection). For each alert type, toggle whether it sends to Email / Slack / Teams / PagerDuty / ServiceNow / Webhook. Severity threshold per environment.
Custom Alert RulesUser-defined rules with trigger conditions (metric / event filter), frequency caps, and target channel selections. Useful for "DM me on Slack when denial rate spikes for resource X".
Alert ChannelsCRUD for the actual channel connections — endpoint URL, credential, environment scope, severity threshold, max attempts. This is the single source of truth for channel configuration.

The environment badge in the header (yellow=Sandbox, red=Production) tells you which environment the current channels apply to. Channels are stored per-environment so a Slack misconfiguration in Sandbox cannot spam your Production incident channel.

Supported channel types

ChannelAuthUsed by
Email (SMTP)SMTP server + credentials, optional STARTTLSBuilt-in alerts, custom rules, notification and email post-decision actions
SlackIncoming webhook URL or OAuth bot token + channelBuilt-in alerts, custom rules, notification actions
Microsoft TeamsIncoming webhook URL (Teams connector)Built-in alerts, custom rules, notification actions
PagerDutyEvents API v2 integration key (Routing Key)Built-in alerts, custom rules, break_glass_notify actions
ServiceNowInstance URL + basic auth or OAuth; creates incidents on policy denialsBuilt-in alerts, custom rules, notification actions
WebhookHTTPS URL, optional HMAC signature secret, optional bearer tokenBuilt-in alerts, custom rules, notification and webhook actions referenced as channel:<id>

Troubleshooting: If the Test button on a channel returns connection refused, verify that the Control Plane can reach the channel endpoint. Common causes: corporate egress proxy not configured in the Control Plane container, DNS resolution failures inside the cluster, firewall blocking outbound on the channel's port. Exec into the Control Plane API container and run curl -v <endpoint> to isolate transport from configuration.

Add a channel

This is the canonical CRUD flow — applies to every channel type.

  1. Navigate to Settings → Notifications. (~10 sec)
  2. Confirm the environment badge shows the environment you want (Sandbox / Production). Switch using the header environment selector if needed. (~5 sec)
  3. Scroll to Alert Channels and click Add Channel. (~5 sec)
  4. Choose the channel type and fill in the type-specific dialog: (~1–3 min)
    • Email — SMTP host, port, username, password (encrypted at rest), From address, optional Reply-To, STARTTLS toggle, default recipient list.
    • Slack — Either Incoming Webhook URL or Bot Token + Channel ID / Name + Username + Icon emoji.
    • Teams — Incoming Webhook URL (from the Teams Connectors flow); optional default themeColor.
    • PagerDuty — Events API v2 Integration Key (Routing Key), default severity (info/warning/error/critical), default source, optional dedup_key template.
    • ServiceNow — Instance URL (https://yourcorp.service-now.com), auth method (basic / OAuth), credentials, default assignment_group, default category / subcategory.
    • Webhook — HTTPS URL, HTTP method (POST recommended), optional Authorization: Bearer <token>, optional HMAC signature secret (HMAC-SHA256 over the JSON body, sent as X-Cc-Signature).
  5. Set Severity threshold — only events at this level or above will be delivered to this channel (e.g. set to warning to filter out info chatter). (~10 sec)
  6. Set Max attempts — how many times the delivery scheduler will retry on failure (default 5, capped at 10). (~10 sec)
  7. Toggle Enabled ON. (~5 sec)
  8. Click Save. (~5 sec)
  9. Click Test to send a probe payload. The page shows the HTTP status / SMTP response code. (~10 sec)

Troubleshooting: If Test succeeds but real alerts never arrive, check Settings → Notifications → Delivery Log (next section) to see whether the delivery was attempted and what response the channel returned. Common causes: severity threshold higher than the alert severity (lower the threshold), channel disabled at the environment level, alert rule pointing to a different channel ID, or the alert event was suppressed by frequency capping.

Delivery log

Each delivery attempt is recorded in notification_delivery_log with status (pendingsent / failed / dlq), HTTP response code (or SMTP response), retry count, and the redacted payload. View it under the Delivery Log tab on this page (or query /v1/notifications/deliveries?channel_id=<id> from the API).

The background delivery scheduler retries failed deliveries every 15 minutes with exponential backoff (max 1 hour between attempts, capped by the channel's max_attempts). Once max_attempts is exhausted, the row is moved to dlq and a notification_dlq audit event is emitted.

General System Alerts

Built-in alert types fire on platform events. For each one, toggle the channels it should reach in the General System Alerts card:

Alert typeDefault severityTypical use
Failed login attemptswarning5+ failed logins from same source IP within 10 minutes
Policy validation errorerrorRego compile failure during policy sync
License / subscription warningwarningLicense expiring in < 30 days, telemetry connection lost
Bouncer offlineerrorA registered bouncer missed > 3 heartbeats
Anomaly detectedwarning / criticalDenial rate spike, suspicious source IP, unusual user activity, latency anomaly
Audit pipeline errorcriticalAudit ingest dropped events, SIEM outbox stuck > 5 min

For each row, click the channel icon (Email / Slack / Teams / PagerDuty / ServiceNow / Webhook) to toggle delivery to that channel type. Channel-level severity thresholds still apply — toggling Slack ON for Failed login does nothing if your Slack channel is set to error minimum.

Custom Alert Rules

For domain-specific alerts (e.g. "page on-call when denial rate on /api/billing exceeds 10/min in production"), create a custom rule:

  1. Click Add Custom Rule. (~5 sec)
  2. Step 1 — Rule Details: name, description, severity. (~30 sec)
  3. Step 2 — Trigger Condition: metric (e.g. denial_rate, latency_p99, audit_event_count), filter (resource, decision, user attribute), threshold and operator (>, >=, ==). (~1 min)
  4. Step 3 — Frequency: evaluation window (e.g. last 5 min), cooldown (don't re-fire within 30 min), maximum fires per day. (~30 sec)
  5. Step 4 — Channels: pick one or more channels created above. (~15 sec)
  6. Click Save Rule and toggle Enabled. (~10 sec)

The rule scheduler evaluates custom rules on the same cadence as the metric ingest pipeline (every minute for hot metrics, every 5 minutes for slow rollups). Audit events for fired rules are tagged AI_ANOMALY_DETECTED or CUSTOM_ALERT_FIRED in the audit feed.

How notification actions resolve to channels

A control author can attach a notification action with destination_ref: "channel:<id>":

post_decision_actions:
  - action_type: notification
    trigger: on_deny
    destination_ref: "channel:42"     # id from /v1/notifications/channels
    payload:
      message: "Denied request to {{ resource.path }} from {{ subject.email }}"
      severity: warning

At runtime:

Bouncer                 →  Control Plane (audit event)
                                  │
                                  ▼
                  Post-Decision Action Dispatcher
                                  │
                                  ▼
                  resolve "channel:42" against
                  notification_channels table
                                  │
                                  ▼
                  Notification Delivery Service
                                  │
                                  ▼
                  Email / Slack / Teams / PagerDuty / ServiceNow / Webhook
                  (via channel-specific transport adapter)
                                  │
                                  ▼
                  notification_delivery_log row updated
                  with status, response, retry count

The control bundle stores only the channel id — the actual credential never leaves the encrypted store. Rotating a channel's credential under Alert Channels automatically applies to every control that references it.

Credential storage

  • Channel credentials are encrypted at rest with NOTIFICATION_ENCRYPTION_KEY (Fernet), stored in notification_channels.encrypted_credentials.
  • API responses surface only credential_set: true|false — credentials are never returned by GET /v1/notifications/channels.
  • To rotate a credential, open the channel, paste the new value, click Save, then click Test to verify.
  • To clear a credential, use Delete Channel — disabling alone preserves the encrypted blob.
  • Demo deployments only (DEMO_MODE=true) accept an ephemeral key — production deployments must supply NOTIFICATION_ENCRYPTION_KEY or the encryption service refuses to start.

Multi-environment routing

Channels are environment-scoped to prevent cross-environment leakage:

PracticeSandboxProduction
Slack channel#alerts-sandbox#alerts-prod (PagerDuty-bridged)
Email recipientsdev team distribution liston-call rota + ops manager
PagerDuty integration keylow-urgency servicehigh-urgency on-call service
Webhook URLhttps://staging.alerts.corp/...https://prod.alerts.corp/...
Severity thresholdinfo (verbose for debugging)warning (signal only)

Switch environments using the header selector. Channels created under Sandbox do not appear when you switch to Production, and vice versa. Credentials are stored per-row, so rotating a Production credential does not affect Sandbox.

Backend isolation (for SREs)

Notification Channels and Action Destinations are physically separated in the data layer. This is a ship-blocker invariant — they must never be merged.

ConcernNotification ChannelsAction Destinations
DB tablenotification_channelsintegrations (filtered by type='action_destination')
Control Plane API path/v1/notifications/channels/integrations/
Encrypted credential fieldnotification_channels.encrypted_credentials (dedicated column)configuration._encrypted_credential (per-row JSON blob)
Credential redaction helperOperates on NotificationChannel rows onlyOperates on Integration rows only
Runtime delivery pathNotification Delivery Service → channel adapters + notification_delivery_logPost-Decision Action Dispatcher → profile transports / SIEM outbox
UI owner/settings/notifications/settings/action-destinations

The platform ships a regression suite that asserts the data-layer disjointness on every release build. Any change that merges or cross-writes the two tables fails CI.

Troubleshooting

SymptomLikely causeFix
Test button returns 401/403Wrong credentialRe-paste the credential and save; retry Test.
Test succeeds but real alerts never arriveSeverity threshold higher than alert severityLower the channel's threshold, or raise the alert severity.
Slack OAuth flow fails with 501SLACK_CLIENT_ID / SLACK_CLIENT_SECRET env vars not set on the Control Plane APISet the env vars and restart the Control Plane API, or use the Incoming Webhook URL option instead.
Teams webhook returns 410 GoneWebhook URL was rotated or the connector was deletedRecreate the connector in the Teams channel settings, paste the new URL, click Save.
PagerDuty incidents never createdWrong Events API integration key, or routing key for a deleted serviceVerify the integration key in PagerDuty → Service → Integrations; replace and save.
ServiceNow incidents created but assigned to wrong groupassignment_group mismatch with ServiceNow CMDBUpdate the channel's default assignment_group to a value that exists in your ServiceNow instance.
Webhook delivery failed with 403 and signature complaintHMAC secret on Control Core does not match the receiverRotate the secret on both ends and click Save.
Channel disabled mid-day, deliveries stopSome operator clicked DisabledSettings → Audit Logs filtered on NOTIFICATION_CHANNEL_UPDATED shows who disabled it and when. Re-enable.
notification_delivery_log rows stuck pending for > 5 minDelivery scheduler not running on the Control PlaneCheck Control Plane scheduler logs for notification_delivery_processor. Restart the Control Plane API container if necessary.
"Where did the in-page channel tab on Action Destinations go?"The legacy Notification Channels tab on /settings/action-destinations was consolidated hereChannel CRUD lives only on this page now. The legacy tab can be temporarily re-enabled via the control_plane_paths.action_destinations_channels_tab feature flag.

Next steps

  • Action Destinations — destination profiles for webhook, workflow, approval_gate, policy_trigger, break_glass_notify, and the SIEM forwarder.
  • Actions & Post-Decision Flows — author-side guide to action types, triggers, and JSON shapes.
  • Approval Gates — gates pair an approval_queue destination with a notification channel for approver pings.
  • Audit Logs — filter on NOTIFICATION_CHANNEL_* and NOTIFICATION_DELIVERY_* event types.