Auto-Classification

Audience: Platform engineers, governance owners Time: ~6 min read Prerequisites: Familiarity with Resource Enrichment.

Auto-classification proposes enrichment values for a resource using URL, name, and business-context heuristics. The classifier runs without an LLM call by default; SCCA fallback is opt-in. The proposed values feed directly into the control suggestions surfaced by the Policy Builder, so getting them right shortens the path from a newly registered resource to an enforced control.

Two endpoints

Endpoint	What it does
`POST /resources/{id}/auto-classify`	Read-only. Returns a `ProposedEnrichment` payload. The operator confirms before anything is written.
`POST /resources/{id}/apply-classification`	Write. Persists an operator-confirmed payload, sets `classification_source` (default `manual` if not provided), updates `last_classified_at`, writes a `RESOURCE_UPDATED` audit row.
`POST /resources/auto-classify-bulk`	Batch write. Heuristic-classifies many resources at once. Operator-set values are preserved.

ProposedEnrichment shape

{
  "resource_kind": "llm_endpoint",
  "ai_provider": "openai",
  "ai_model_family": "gpt-4*",
  "mcp_protocol_version": null,
  "agent_capabilities": [],
  "pii_categories": ["emails"],
  "egress_destinations": ["api.openai.com"],
  "suggested_data_classification": "internal",
  "suggested_compliance_tags": ["GDPR"],
  "confidence": 0.55,
  "source": "heuristic",
  "rationale": [
    "resource_kind=llm_endpoint from URL/name heuristic",
    "ai_provider=openai",
    "ai_model_family=gpt-4*",
    "pii_categories=emails",
    "suggested_data_classification=internal"
  ]
}

How the heuristic decides

The classifier scans the lower-cased concatenation of name + url + original_host + business_context against fixed substring tables:

Signal	Why it triggers
`/v1/chat/completions`, `/inference`, `/generate`	`resource_kind = llm_endpoint`
`/mcp/`, `mcp.`, `model-context-protocol`	`resource_kind = mcp_server`
`/agent/`, `/copilot`	`resource_kind = agent`
`/rag/`, `/retrieval`, `/vector`	`resource_kind = rag_index`
`openai`, `anthropic`, `bedrock`, `vertex`, `azure_openai`	`ai_provider` (only when kind is AI)
`email`, `card`, `medical`, `ssn`	`pii_categories`
`/upload`, `/run`, `/exec`, `/browse`, `/memory`	`agent_capabilities`

The first kind match wins; api is the safe default when nothing matches. Confidence accumulates from the strength of evidence, capped at 1.0.

Troubleshooting: Confidence is too low (< 0.4)? That just means the heuristic didn't find strong signals. Either (a) accept the proposal and refine in the Enrich modal, or (b) opt into SCCA fallback (next section).

SCCA fallback (optional)

When the feature flag flags.resources.scca_enrichment is on and heuristic confidence is below a threshold, the classifier can re-call SCCA's LLM service with a strict, schema-validated prompt. The output replaces the heuristic proposal but source becomes scca so you can audit it later.

This is off by default. Enable it in Settings → Feature Flags. SCCA fallback never runs without operator opt-in because it adds latency and an LLM call per classification.

What the classifier never does

Never overwrites operator-set values — when used in bulk mode, only empty cells are filled.
Never writes during auto-classify (it's a pure proposal).
Never includes PII in prompts when SCCA fallback runs — owner_email and similar fields are excluded.
Never branches on tenant ID, vendor names beyond the public substring tables, or framework paths — the classifier is generic; the substrings are public conventions.

Workflow: classify a single resource

# 1. Get the proposal
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://controlplane.example.com/api/resources/42/auto-classify

# 2. Review and adjust the JSON, then apply
curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "resource_kind":"llm_endpoint",
    "ai_provider":"openai",
    "data_classification":"confidential",
    "compliance_tags":["GDPR","PCI-DSS"],
    "pii_categories":["emails","card_numbers"],
    "classification_source":"manual"
  }' \
  https://controlplane.example.com/api/resources/42/apply-classification

Troubleshooting: apply-classification returns 404? Check the resource ID exists in the current environment. Resources are environment-scoped — a sandbox ID won't resolve in production.