Govern agents you didn't build.

Black-box testing for third-party AI: Microsoft Copilot, Salesforce Einstein, ServiceNow Now Assist, vendor SaaS chatbots, browser-based copilots, and anything else your employees are using. Most enterprise AI is unsanctioned shadow AI deployed by other people; you cannot put an SDK inside it, you cannot route its traffic through a sidecar, and you still own the governance evidence when something goes wrong. KoraSafe probes the surface that's exposed to your workforce and writes the same audit-grade evidence the rest of the platform writes.

Access modes

Bring an agent under KoraSafe governance.

Pick the access mode that matches what you actually control. Agents you built with the SDK ride the inline path. Agents your detection vendors already watch flow in through the federation mesh. Agents you have no code access to (vendor SaaS, Copilot deployments, employee browser tools) come under governance through black-box probes.

This page

Black-box probe

Schedule synthetic conversations against the agent's exposed surface (chat UI, API, browser plugin). Probes assert behavior across six risk categories. Findings carry the prompt, the response, and the regulatory citation for the violation. Use this when you cannot install code in the agent and cannot route its traffic.

Connector-based monitoring

If a detection vendor already watches the agent (Lakera, Portkey, Datadog AI Observability, Bedrock Guardrails, Presidio, LangSmith), normalize their findings into the same guardian schema KoraSafe uses everywhere else. No new probe traffic, no SDK install, evidence flows in real time.

See federation connectors

Inline gateway, sidecar, or SDK

For agents your team builds and operates. The inline path enforces guardians in the request loop and writes evidence inline. Three deployment shapes (sidecar recommended, API gateway supported, embedded SDK) and two data postures (managed cloud, self-hosted edge). All ten guardians run on the inline path today.

See deployment topologies

What we test

Probe categories, one evidence schema.

Each probe category fires a curated set of synthetic prompts against the target agent. Responses are scored against the rule that the probe asserts. Every finding lands in the same audit chain your guardian findings land in, so evidence cross-references work without a second tool.

Prompt injection

Jailbreak prompts, instruction overrides, system-prompt leak attempts, indirect injection through shared documents and tool outputs. Catches the agent treating attacker text as trusted input.

Regulatory evidence for: EU AI Act Art. 15, NIST AI RMF MANAGE 4.1, ISO 42001 A.6.2.4.

PII leakage

Probes that trick the agent into echoing training data, system context, or other users' conversations. Detects exfiltration through summary, translation, format-change, and code-execution side channels.

Regulatory evidence for: GDPR Art. 5 / 25 / 32, HIPAA Privacy Rule §164.502(b), CCPA / CPRA §1798.100.

Hallucination

Probes assert claim accuracy against retrievable ground truth. Targets that ship a "cite your sources" feature get checked against the cited passages; ungrounded factual claims trip the probe.

Regulatory evidence for: EU AI Act Art. 13, EU AI Act Art. 15, NIST AI RMF MEASURE 2.7, FCRA 15 USC §1681e(b).

Autonomy

Probes the action scope an agent is willing to take. Tests whether the agent agrees to spend, write, contact, or commit outside the scope its documentation promises. Catches scope creep on agents marketed as read-only or advisory.

Regulatory evidence for: EU AI Act Art. 14, NIST AI RMF MANAGE 3.2, ISO 42001 A.6.2.3, SR 11-7 §IV.

Fairness

Probes near-duplicate prompts that differ only on a protected attribute. Measures disparate output, refusal-rate parity, and treatment-recommendation parity. Useful for vendor agents serving HR, finance, and customer-service workflows.

Regulatory evidence for: Civil Rights Act Title VII, ECOA 15 USC §1691, NYC Local Law 144 §20-870, EEOC AI guidance, EU AI Act Art. 10.

Content safety

Probes for toxicity, hate speech, self-harm signal generation, and dangerous-instruction compliance. Asserts the vendor's stated safety posture matches its actual response behavior on red-team prompts.

Regulatory evidence for: EU AI Act Art. 5, EU AI Act Art. 50, CFPB UDAAP 12 USC §5531, COPPA 16 CFR Part 312.

Evidence-grade output

Every probe writes the artifact your auditor reads.

Probe runs are not screenshots and not chat logs. Each run produces a signed evidence pack that maps to the regulatory clause the probe asserts. Auditors get the same evidence shape they get from inline guardians and federation findings.

Signed evidence pack per probe

JWS-signed PDF per probe run. Contains the prompt set, the agent's responses, the assertion that fired or passed, and a content hash chained to the audit ledger. Hand the pack to an auditor, a board, or a regulator without re-assembly.

Regulatory citation map

Every probe assertion carries the obligation it tests against (article, paragraph, framework version). Maps cover EU AI Act, GDPR, HIPAA, NYC Local Law 144, NIST AI RMF, ISO 42001, SR 11-7, ECOA, FCRA, and additional sector frameworks. New probes inherit the citation map automatically.

Scheduled monitoring

Pick the cadence per agent: weekly, monthly, quarterly, or on regulatory-delta trigger. When a citation in the framework changes, the affected probes re-run on the next scheduled tick and the evidence pack regenerates against the new clause. No manual re-test cycles.

Audit ledger integration

Probe findings land in the same hash-chained audit ledger the inline guardians and federation findings land in. One auditor portal serves all three governance access modes; no separate evidence silo for vendor agents.

Honest scope today

What ships now, what's coming.

The black-box probe substrate runs today: probe runner, scoring engine, signed evidence packs, scheduled monitoring, OAuth flows, multimodal trace ingestion, and customer telemetry forwarders all ship in opt-in Preview against third-party agents your team cannot wire to the SDK. Connector-based monitoring is live across fifteen federation connectors. The inline path (sidecar, API gateway, embedded SDK) runs across all ten guardians and both data postures. Continuous monitoring cadence is the remaining roadmap item.

In development

Black-box probe runner

Six probe categories run today against third-party agents. Probe library, scoring engine, signed evidence packs, regulatory citation mapping, OAuth flows, multimodal trace ingestion, customer telemetry forwarders, and scheduled cadences (weekly, monthly, quarterly, or regulatory-delta trigger) all ship today.

Connector-based monitoring

Seventeen federation connectors live in production (Presidio, Portkey, LangSmith, Lakera Guard, AWS Bedrock Guardrails, Datadog AI Observability, Galileo Luna, Vectara HEM, Arize, Arize Phoenix, HiddenLayer, Holistic AI, Credo AI, Azure Content Safety, WhyLabs, Fiddler, IBM watsonx.governance). Normalize findings from your existing detection stack into one schema.

See connector tiering

Inline gateway, sidecar, SDK

Three deployment shapes (sidecar recommended, API gateway supported, embedded SDK) and two data postures (managed cloud, self-hosted edge). All ten guardians evaluate on the inline path today.

Read the topology doc

In the product

See black-box testing in the product

Registered third-party agents under black-box monitoring. Endpoint registration, vendor + criticality, and probe status live in one inventory.

Black-box agent registration and inventory in the product
How to start

Ways in.

Start your free trial or get in touch if you want a guided probe run against a specific agent fleet. Read the topology doc if you want to evaluate the inline path in parallel. Or book a demo for the probe runner if you need the black-box route as the primary on-ramp.