Detect and stop AI violations.

Six detection guardians watch every prompt and response (PII, prompt injection, content safety, hallucination, fairness, behavioral drift). Four agentic guardians evaluate runtime actions before they execute (anomaly, authority, human approval, shadow agent discovery). All ten guardians write findings into the same audit chain your auditor reads from. Federation over replacement: KoraSafe normalizes the detectors you already run instead of forcing a swap.

Detection guardians

Catch the violation in the prompt or the response.

Six detection guardians watch every prompt and response across your AI surfaces. Findings carry the rule that fired, the regulation clause it cites, and route into the same audit chain your auditor reads from.

PII Sentinel

Presidio plus native PII checks across prompts, responses, and tool calls. GDPR, HIPAA, CCPA evidence pre-mapped.

Prompt Injection Guard

Heuristic and pattern-based detection for prompt injection attempts. Blocks before the malicious payload reaches the agent.

Content Safety Monitor

Native classifiers for moderation across hate, harassment, self-harm, sexual content, and violence categories.

Hallucination Detector

Eval-driven scoring of response groundedness against retrieved context. Flags responses unanchored from source material.

Fairness guardian

Live-traffic disparity monitoring on protected attributes plus an eval-mode track for labeled sets. Flags decisions where disparate impact crosses configured thresholds.

Behavioral Drift Detector

Watches output distribution shift over time against a baseline window. Flags model behavior moving outside calibrated bounds.

Agentic guardians

Stop the action before it runs.

Four agentic guardians evaluate runtime actions before they execute. The SDK calls a checkpoint before an agent spends money, writes data, or contacts a target; the checkpoint returns allow, hold, or block per request. Every decision writes evidence into the audit chain.

Anomaly Killer

Watches agent action frequency, scale, targets, geography, and time-of-day against per-system baselines. Auto-pause is opt-in.

Authority Limiter

Action-class RBAC before an agent spends, writes, or contacts. SDK gate at /api/action/checkpoint resolves allow / hold / block per request.

Human Approval Gate

Blocks configured action classes until a matching workflow approval is present. Maps to EU AI Act Article 14 oversight evidence.

Shadow Agent Sentinel

Discovers AI agents and tools running without registration. Browser, code, identity, and procurement signals feed a triage inbox.

Federation over replacement

Your existing detectors, now with governance context.

KoraSafe does not force a detector swap. The platform normalizes the tools you already run (Presidio, Portkey, LangSmith, MCP-native deployments) into one finding schema, then adds native detection where you have gaps. Evidence lineage, regulatory routing, and coverage reporting layer over what's already in your stack.

Connector mesh

Normalize detection feeds from seventeen live partner connectors (Presidio, Portkey, LangSmith, Lakera Guard, Bedrock Guardrails, Datadog, Fiddler, watsonx.governance, Holistic AI, Arize Phoenix, Azure CS, Galileo Luna, Vectara HEM, HiddenLayer, Credo AI, Arize AI, WhyLabs) into one finding schema without changing customer runtime stacks.

Guardian mapping

Route every finding (from your stack or KoraSafe's natives) into the right detection guardian domain. Mapped findings inherit the right regulatory citations automatically.

Coverage gaps surfaced

Pick two frameworks and see which obligations share a control in your environment. Cells color by detection coverage strength. Coverage delta diffs across framework versions, so when the EU AI Act adds an annex you see which obligations gained or lost coverage in your mesh.

Swap-ready config per system

Set defaults per guardian and override by system. When a new system registers, the guardians inherit org-level defaults and can be tuned without re-onboarding.

All ten guardians, in depth

Expand any guardian for capabilities, regulatory mapping, and honest state.

Each guardian below carries the same shape: what it does, what frameworks it provides evidence for, what's live versus roadmap. Click the guardian name to expand.

PII SentinelLive data exposure guard for PII and PHI across prompts, responses, and tool calls.

Catches PII and PHI in prompts, responses, and tool calls. Microsoft Presidio plus native checks back the live detector; redaction policies and framework mapping ship through the same evidence stream as the rest of the platform.

  • Regex plus ML. Presidio's analyzers cover the standard entity types; KoraSafe layers context-aware checks for account numbers and internal IDs that drift across customers.
  • Redaction policies. Findings route into the redaction pipeline. Customer-cloud redaction outputs ship as a separate evidence channel so PHI never leaves the trust boundary.
  • PHI context. HIPAA minimum-necessary detection (§164.502(b)) tags evidence with the regulatory clause, not a generic confidence score.
  • Framework mapping. Each finding carries control references to GDPR Art. 5/6, HIPAA Safeguards, and CCPA right-to-know triggers so evidence packages compose without manual stitching.

Regulatory evidence for: GDPR Art. 5 / 25 / 32, HIPAA Privacy Rule §164.502(b), CCPA / CPRA §1798.100, EU AI Act Art. 10, ISO 27001 A.5.34.

Honest state: Live in opt-in Preview. Presidio analyzers, native context checks, redaction routing, and framework cross-mapping all run today. Coming next: multilingual entity coverage beyond English, customer-tuned context rules through the admin UI, per-customer detector profile editor.

Prompt Injection GuardCatches jailbreaks, instruction overrides, and system-prompt leaks before the model acts.

Heuristic detection runs today in Preview; partner-backed classifier slots are on the roadmap. Customer-flagged patterns feed back into the eval set and grow the heuristic table over time.

  • Jailbreak patterns. Heuristics cover the public jailbreak catalog plus customer-flagged patterns.
  • Instruction override. Detects attempts to rewrite the system prompt, escalate tool privileges, or redirect agent goals mid-trace.
  • System-prompt leaks. Flags responses that echo internal system-prompt text. Useful when agents serve customers and prompts encode sensitive policy logic.
  • Partner roadmap. Partner integrations ship once partner agreements complete and pass internal review.

Regulatory evidence for: EU AI Act Art. 15, NIST AI RMF MANAGE 4.1, ISO 42001 A.6.2.4, SR 11-7 §IV.

Honest state: Live in opt-in Preview. Inline budget under 50 ms p99. Coming next: partner-backed classifier integration, multilingual coverage, per-tenant rule editor.

Content Safety MonitorFlags toxicity, self-harm signals, and hate-speech vectors across AI traffic.

Native text classifiers run today in Preview. Severity routing lets customer-support flows route self-harm signals to human help rather than a generic block. Multi-modal coverage and partner-backed slots are on the roadmap.

  • Violence and harm. Detects explicit instructions for physical harm, weapons assembly, and violence advocacy in prompts and responses.
  • Self-harm signals. Surfaces self-harm and crisis language with a separate severity track so customer support flows can route to human help.
  • Hate and harassment. Catches targeted harassment, protected-class slurs, and identity-based attacks. Customer-specific allowlists configurable.
  • Multi-modal roadmap. Images and audio inputs land in a follow-up phase; the guardian schema already shapes the multi-modal evidence row.

Regulatory evidence for: EU AI Act Art. 5 (prohibited manipulative content), EU AI Act Art. 50 (transparency for generative output), CFPB UDAAP 12 USC §5531, COPPA 16 CFR Part 312.

Honest state: Live in opt-in Preview (text only). Coming next: multi-modal coverage, multilingual rollout beyond English, partner-backed slot for vendor-attested classifier.

Hallucination DetectorChecks model output against cited sources, retrieval context, and trace consistency.

Native RAG-aware grounding ships today in Preview. The detector plugs into existing retrieval steps with no additional retrieval call. Broader factual-claim verification is on the roadmap.

  • Citation grounding. When the agent emits a citation, the detector checks the cited passage exists and supports the claim. Drift routes to trace lifecycle for review.
  • Trace consistency. Compares the final response against the in-trace retrieved context. Claims that drift from retrieval get flagged with the diff.
  • RAG-aware. Plugs into existing retrieval steps; reads what the agent already pulled and compares.
  • Claim verification roadmap. Broader factual-claim verification against external knowledge sources lands in a follow-up phase.

Regulatory evidence for: EU AI Act Art. 13 (transparency), EU AI Act Art. 15 (accuracy + robustness), NIST AI RMF MEASURE 2.7, FCRA 15 USC §1681e(b), SR 11-7 §V.

Honest state: Live in opt-in Preview. Inline budget under 200 ms p99. Source verification against trusted regulatory sources runs today; coming next: broader claim attribution graphs and per-customer source-trust profile.

Fairness guardianMonitors disparate impact, demographic parity, and individual-fairness drift across protected attributes.

Live-traffic disparity monitoring runs today without labeled eval sets; severity routes by regulatory regime and writes evidence into the audit chain with NYC LL 144 and EU AI Act Annex IV mappings attached. Eval-mode hooks still ship for teams that maintain labeled eval sets. Partner-backed metric coverage is on the roadmap.

  • Disparate impact. Tracks the 4/5 rule and statistical parity gaps across configured protected attributes.
  • Demographic parity. Surfaces parity violations per cohort. Severity weights tune to the regulatory regime.
  • Individual fairness. Catches near-duplicate inputs that get materially different outputs. Useful for catching local instability that aggregate parity misses.
  • Evaluation hooks. Plugs into existing eval pipelines. Results land in the evidence stream with regulatory mappings attached.

Regulatory evidence for: Civil Rights Act Title VII, ECOA 15 USC §1691, FHA 42 USC §3601, EEOC AI guidance, NYC Local Law 144 §20-870, DOJ ADA AI Hiring guidance, HUD Tenant Screening FHEO guidance, EU AI Act Art. 10.

Honest state: Live-traffic and eval-mode coverage both run today in opt-in Preview. Coming next: partner-backed metric integrations, per-customer protected-attribute editor.

Behavioral Drift DetectorCatches agent behavior changes, output distribution shift, and prompt-template drift over time.

Native statistical detection runs today on a weekly schedule in Preview. Findings carry the prior-period baseline pinned for the human reviewer. Continuous near-real-time scoring and deeper distributional tests are on the roadmap.

  • Agent behavior changes. Compares aggregate behavior week over week. Anomalies route into the finding lifecycle for review rather than firing as raw alerts.
  • Output drift. Tracks distribution of output classes (refusals, tool calls, completion lengths). Sudden swings surface as drift findings with the diff pinned.
  • Prompt-template drift. Detects when a deployed prompt template silently changes (template version, embedded variable, or wrapping logic).
  • Replay and alert. Drift findings trigger an evidence-backed replay request to confirm or dismiss the signal.

Regulatory evidence for: EU AI Act Art. 72 (post-market monitoring), NIST AI RMF MANAGE 4, SR 11-7 §VII, ISO 42001 A.6.2.6.

Honest state: Weekly scheduled scan live in opt-in Preview. Coming next: continuous near-real-time drift scoring, deeper distributional tests (KS, MMD), per-customer baseline calibration UI.

Anomaly KillerWatches agent action frequency, scale, targets, geography, and time-of-day against per-system baselines.

Runtime statistical checkpoint live in opt-in Preview. Outliers above the configured sigma threshold trip a finding; auto-pause is opt-in per system. Default frequency threshold 4σ, scale threshold 10x.

  • Frequency check. Flags action counts above the configured sigma threshold against a rolling per-system baseline. Tunable per system.
  • Scale check. Flags amounts, batch sizes, or write volumes far above median historical scale.
  • Target check. Flags new vendors, countries, or activity hours outside baseline. Geography and time-of-day windows learned from the last thirty days of action history.
  • Finding detail. Anomaly findings carry the action class, contributing baseline, and deviation magnitude so reviewers see why a behavior tripped.

Regulatory evidence for: EU AI Act Art. 14 (human oversight with automated stop conditions), SR 11-7 §VI (effective challenge + automated kill switches).

Honest state: Live in opt-in Preview. Frequency, scale, and target checks all evaluate today. Coming next: per-action-class custom sigma thresholds in the admin UI, baseline carve-outs for seasonal spikes, break-glass workflow integration.

Authority LimiterAction-class RBAC enforced before an agent spends, writes, or contacts a target.

The SDK calls a checkpoint at POST /api/action/checkpoint before each action. Authority Limiter loads the per-system authority profile and returns allow, hold, or block. In-profile actions continue; over-threshold actions create a review task; unknown scope is rejected.

  • Spending limits. Hold for approval when an action amount exceeds the configured threshold. Per-system, per-action-class. Approval thresholds and reviewer roles are policy-controlled.
  • Target allowlists. Block actions pointed at vendors, accounts, or systems outside the authorized profile. Allowlists support exact match plus pattern rules.
  • Read or write boundaries. Block action modes outside the agent's authorized scope. Read-only profiles cannot promote to write; sandbox profiles cannot promote to production.
  • Unified evidence. Authority violations carry the action signal class plus the workflow-task context that authorized the request.

Regulatory evidence for: EU AI Act Art. 14 (human oversight + scope limitation), NIST AI RMF MANAGE 3.2, ISO 42001 A.6.2.3.

Honest state: Live in opt-in Preview. Allow / hold / block decisions, spending limits, target allowlists, and read/write boundaries all enforce today. Coming next: per-action-class custom approval routing, multi-step approval chains, external entitlement integration.

Human Approval GateBlocks configured action classes until a matching workflow approval is present.

Pairs the runtime checkpoint to an approved task record and an audited reviewer decision. Missing or pending approvals generate a finding in the findings queue; an approved task lets the action through.

  • Policy-defined classes. A policy table declares which action classes require human review. Classes can be scoped per system, per agent, or fleet-wide.
  • Approval binding. An approved task record must be bound to the same action signature. Reviewer role and approval timestamp are captured in the binding.
  • Audited rejection. Missing or pending approvals generate a finding in the findings queue. Each finding carries the policy that fired, the action context, and the reviewer role expected.
  • Regulatory alignment. Maps to EU AI Act Article 14 human oversight requirements for high-risk systems. Each decision is captured as evidence the oversight requirement was honored at runtime.

Regulatory evidence for: EU AI Act Art. 14 (meaningful human review), GDPR Art. 22 (right to human review of automated decisions), NYC Local Law 144 §20-870, HIPAA Privacy Rule §164.530.

Honest state: Live in opt-in Preview. Policy classes, approval bindings, and audited rejection all run today. Reviewers receive notifications and complete the approval inside the existing workflow surface. Coming next: per-class custom reviewer-routing rules, expiring approvals with auto-revoke, SLA reporting on approval cycle time.

Shadow Agent SentinelDiscovers AI agents and tools running in your stack without registration.

Browser activity, code commits, identity provider events, and procurement records feed a discovery inbox. Analysts review the matched evidence, register the agent into the inventory, dismiss with reason, escalate, or mark experimental. Every transition writes an audit entry.

  • Browser signals. The KoraSafe Chrome extension records access to supported AI tools and LLM provider endpoints. Discoveries land in the inbox with the user, the surface, and the access timestamp.
  • Code commit signals. Repository scans flag new AI provider SDKs, model file paths, and prompt template files at commit time.
  • Identity provider events. Okta and similar feeds surface OAuth grants and SSO events for AI tools. Shadow grants surface before the tool processes customer data.
  • Procurement records. Vendor procurement and SaaS expense feeds flag AI-tool spend not yet in the AI inventory.

Regulatory evidence for: EU AI Act Art. 17 (registration of high-risk AI systems), ISO 42001 A.6.2.1 (AI system inventory), NIST AI RMF MAP 4.2.

Honest state: Discovery and inventory triage live in opt-in Preview. All four signal sources flow into the discovery inbox today. Coming next: per-source signal weighting in the admin UI, auto-classification heuristics for high-confidence discoveries, bulk action on triage queues above a configurable size.

Honest state

What ships now, what your team owns, what's still coming

All ten guardians are wired into the platform today. Detection guardians (PII Sentinel, Prompt Injection Guard, Content Safety Monitor, Hallucination Detector, Fairness guardian, Behavioral Drift Detector) run at sub-50ms p99 on the fast path with the native classifiers. Agentic guardians (Anomaly Killer, Authority Limiter, Human Approval Gate, Shadow Agent Sentinel) are in opt-in Preview. Your team owns the per-guardian tuning thresholds, the policy decisions on what each guardian blocks vs warns on, and the response workflows when a guardian fires; KoraSafe captures the evidence on every decision.

Six detection guardians at sub-50ms inline p99 plus the federation mesh against seventeen production connectors.

In progress

Four agentic guardians (anomaly, authority, human approval, shadow discovery) gated for cohort onboarding.

Coming next

A native classifier rev across the detection set, plus deeper distributional tests for behavioral drift.

Federation connectors, honest tiering

Federation over replacement: KoraSafe normalizes the detector stack you already run. The Live column lists production connectors with severity scoring and normalization; the Roadmap column names what's committed to ship next.

Production connector with severity scoring + normalization

  • Presidio (PII)
  • Portkey (gateway)
  • LangSmith (hallucination)
  • Lakera Guard (prompt injection)
  • AWS Bedrock Guardrails (content safety)
  • Datadog AI Observability (telemetry)
  • Galileo Luna (hallucination)
  • Vectara HEM (hallucination grounding)
  • Arize AI commercial observability (drift, fairness)
  • Arize Phoenix (observability)
  • HiddenLayer (prompt injection)
  • Holistic AI (fairness)
  • Credo AI (fairness)
  • Azure Content Safety (content safety)
  • WhyLabs (drift)
  • Fiddler (fairness)
  • IBM watsonx.governance (fairness)
Scaffolded

How the tiers move

Connectors move to Live only when the adapter sits in the source tree, the partner API runs under test, and an end-to-end production deployment has cleared the customer-runtime smoke test. Roadmap names commit KoraSafe to ship; they do not commit a date.

Source of truth: lib/connectors/ in the platform repo. If your detection stack runs on something not in either column, the federation adapter pattern accepts new connectors without changing the customer runtime.

Deployment shapes

Inline shapes, sidecar recommended

Same guardian code, different data path. Pick the trust boundary that fits your industry. Most teams start with the sidecar; gateway and embedded SDK exist for the architectural patterns where the sidecar is the wrong shape. For agents you cannot wire to the SDK (vendor SaaS, Microsoft Copilot deployments, browser-based copilots), the fourth access mode is black-box probe testing, which writes the same audit-grade evidence without touching the agent runtime.

Recommended

Sidecar

A pod that runs next to your application in Kubernetes. Your agent code calls the local sidecar over loopback; the sidecar evaluates all ten guardians and writes findings to the audit chain. No agent code change beyond the LLM client wrapper. Familiar pattern for any team that already runs Envoy, Linkerd, or Istio. Scales horizontally with your application.

Pick this when: you run on Kubernetes, you want zero changes to existing agent logic, and you want guardian rollout decoupled from agent deploys.

Supported

API gateway

A central proxy that fronts your LLM provider URLs. Every request to OpenAI, Anthropic, Bedrock, or a self-hosted model goes through the gateway first; guardians evaluate prompts on the way in and responses on the way out. One audit point for every agent across every team, no per-agent install. Requires you to reroute your LLM base URL from api.openai.com to the gateway endpoint.

Pick this when: you run on bare metal or non-K8s infrastructure, you want a single chokepoint for governance, or you have many small agents and centralized auditing matters more than per-pod isolation.

Embedded SDK

A library you import directly into your agent process. Guardians run in-process on the same thread as your inference call. Sub-10ms inline overhead because there's no network hop. The trade-off is coupling: guardian updates ship as library version bumps, and a runaway guardian can affect your agent's memory and CPU footprint.

Pick this when: you serve hyper-latency-sensitive workloads (real-time agent loops, voice, trading), you control the agent runtime end to end, and the operational coupling is acceptable.

Where your data lives

All three deployment shapes ship in two data postures. Pick the trust boundary; the guardian code is identical either way.

Managed cloud

KoraSafe-hosted data plane

Findings, evidence, and audit records persist in KoraSafe-managed cloud infrastructure. Simpler setup, faster onboarding, no infrastructure to operate. Per-tenant isolation enforced at the data layer. Suits most teams in non-regulated and lightly regulated industries.

Self-hosted edge

Customer-owned data plane

Findings, evidence, and audit records persist in customer-owned storage (your S3, Azure Blob, GCS, or on-prem MinIO). Only metadata telemetry crosses the trust boundary to KoraSafe; raw content stays in your network. Suits regulated industries (financial services, healthcare, public sector) and any team whose policy prohibits prompt or response content leaving the perimeter.

Deeper detail (install commands, network diagrams, latency budgets, failure modes) lives in docs/deployment/topology.md. Air-gap mode for SCIF and classified-network environments is documented separately in docs/deployment/air-gap.md. For third-party agents that cannot land in either data posture (vendor SaaS chatbots, Copilot fleets), see black-box probe testing.

In the product

See the guardians in the product

Kora agents watching every system for PII, prompt injection, hallucination, fairness drift, and behavioral profile change.

Kora guardian agents watching for PII, prompt injection, hallucination, drift
Talk to your security + compliance teams

Catch the violation. Stop the action. Federate the detection stack you already run.

Start your free trial for onboarding. All ten guardians configurable from the guardian portal, all writing into the same audit chain.