Block beats flag
A block verdict from any classifier takes precedence over a flag from another. The combined action is block, using the highest-confidence block verdict.
Four built-in runtime classifiers cover PII detection, hallucination, toxicity, and prompt injection. Each runs inline on the telemetry stream, returns a structured verdict, and operates independently so results combine via precedence rules.
The native classifier bundle ships as part of the native edge shipper, currently in development. The specifications below describe the target state and preview implementation. Accuracy figures come from KoraSafe™ internal evaluation datasets. Independent third-party evaluation is planned before GA release.
| Detector | Version | Technique | Accuracy | P99 latency | What it detects |
|---|---|---|---|---|---|
| PII-detect | 3.2.0 | Pattern + ML | 99.4% | 1.4 ms | Personal identifiable information in prompts and responses: names, emails, phone numbers, SSNs, financial identifiers, health data markers, and configurable custom patterns. |
| Hallucination | 2.1.4 | NLI ensemble | 91.8% | 8.2 ms | Factual inconsistencies and unsupported claims in model output. Natural language inference scores entailment between claims and grounded context. Highest latency due to the NLI pass. |
| Toxicity | 4.0.1 | Small LM | 96.2% | 6.1 ms | Harmful, hateful, or policy-violating content in generated output. Runs a small fine-tuned language model classifier. Policy thresholds are configurable per system and sector pack. |
| Prompt-injection | 2.3.1 | Pattern + ML | 96.8% | 2.1 ms | Instruction override attempts in user input: jailbreaks, role-hijack prompts, indirect injection via documents or tool outputs. Combined pattern matching and ML scoring. |
Accuracy from KoraSafe™ internal evaluation sets. P99 latency at single-classifier throughput on reference hardware. Combined latency depends on which classifiers are enabled per policy.
Every classifier produces a structured verdict object. Multiple classifiers can fire on the same event; verdicts combine via the precedence rules described below.
When multiple classifiers fire on the same telemetry event, verdicts are combined using a deterministic precedence order:
A block verdict from any classifier takes precedence over a flag from another. The combined action is block, using the highest-confidence block verdict.
When two classifiers return the same action level (both block, or both flag), the one with higher confidence is used as the primary verdict record.
When two independent classifiers detect evidence on the same span, confidence is boosted on the combined verdict to reflect corroborating signal from different detection techniques.
classifier throughput at preview scale
telemetry fleet in preview configuration
k-value protection on egress events
K-value tracking enforces differential privacy anonymity on egress. Events flagging k < 5 are held pending cohort growth or withheld from aggregate outputs.
Document version: classifiers-preview-v1
Published by: KoraSafe™ Research
Last reviewed: 2026 Q2
Corresponds to: Native edge shipper, in development (pre-GA)
How probe scoring uses these classifiers in Stage 2 evaluation