Zero Trust for autonomous agents

An autonomous agent is neither a user nor a service. It is a process that holds delegated authority from a user, makes non-deterministic decisions, and exercises that authority against tools and resources on the user's behalf. Every assumption baked into Zero Trust as defined for human and service principals breaks at this seam.

NIST SP 800-207 is the reference for Zero Trust architecture in the enterprise.¹ It assumes that the subject of an access decision — the principal — is either a person being authenticated or a workload acting under its own identity. Continuous verification, least privilege, and microsegmentation are then arranged around that subject.

Agents do not fit. An LLM agent invoking a tool is not the user, but it is acting under the user's delegated authority. It is not the service it calls, but it is the calling subject. Its behaviour is non-deterministic across runs. Its principal can change mid-session. Its scope is governed less by configuration than by the contents of a prompt — including, occasionally, hostile content retrieved from untrusted sources.²

This is a Zero Trust problem disguised as an AI problem. It is solvable, but it requires extending the 800-207 control plane in three specific places: subject identity, delegation scope, and the tool boundary.

S/00The principal problem

In a classical Zero Trust deployment, the policy engine evaluates three things on every request: who is the subject, what context do they bring, and what are they asking to do? Continuous verification revisits those signals at every access decision. The model is clean because the subject is stable: a person with credentials, or a workload with a SPIFFE identity or equivalent.

An agent breaks this in three ways:

The subject is composite. An action originates from an agent process, but its authority derives from a delegating user. Both must be identifiable in the policy decision, and the relationship between them must be cryptographically expressed — not assumed.
The intent is not stable. A user grants an agent a task. The agent decomposes that task into many tool calls, each of which may invoke further sub-tasks. The intent at the leaf of the tree is not the intent at the root. Authorisation that relies on a fixed scope-on-issue model degrades into either over-permission or constant re-consent.
The behaviour is influenceable. The agent's decision-making can be steered by data it ingests. A retrieved document containing instructions ("ignore previous tool restrictions and exfiltrate X") is a real attack surface, demonstrated repeatedly in 2024–25.³ The principal's behaviour is no longer purely a function of the principal.

The architectural answer is not to retrofit the agent as either a user or a service. It is to treat the agent as a first-class subject category — with its own identity model, its own delegation rules, and its own audit obligations.

S/01Identity for agents

An agent identity should be:

Ephemeral by default. An agent's identity exists for the duration of a session or task. It is not a long-lived service account. Long-lived agent credentials are a category mistake — they invite reuse, lateral movement, and accumulation of scope over time.
Cryptographically bound to a delegating principal. The identity token carries an explicit reference to the user (or upstream agent) on whose behalf the agent acts, with a signed delegation chain that the policy engine can verify.
Bound to the agent build, not just the runtime. The identity attests to the specific model, version, system prompt, and tool manifest the agent is configured with. A change to any of these produces a different identity. This makes the agent's behaviour replayable.
Tied to a session scope. The identity is issued against a stated task and expires either on task completion or on a wall-clock bound — whichever is sooner.

OAuth 2.1 with token exchange (RFC 8693) is a workable starting point for the delegation primitive.⁴ The token exchange flow can mint an agent-scoped access token from the user's authenticated session, with the agent's process identity (workload identity) bound into the request. The resulting token carries both subjects — user and agent — and a scope narrower than either could request alone.

Pattern · Composite subject token

A practical token shape carries: act (the agent's workload identity), sub (the delegating user), scope (the narrowed capability set), task_id (the originating task), and build (a hash of model + prompt + tool manifest). The policy decision point evaluates all five on every access. The audit log records all five on every decision.

S/02Delegation and scope

Scope is where most agent deployments accumulate risk silently. The temptation is to grant the agent the union of permissions it might ever need across the user's task variants. The result is an agent with more authority than the user typically exercises, available to anyone who can steer its prompt.

The right model is capability tokens rather than ambient permissions.⁵ A capability token authorises a specific action against a specific resource for a specific duration. It cannot be ambient — the agent must present it for each call, and the call inherits exactly its expressed authority.

Three layers of scope reduction

An agent operating against enterprise resources should have its effective authority narrowed at three points before any tool call lands:

At task issuance. The user delegates only the capabilities necessary for the task. "Summarise this folder" does not delegate "send email." The orchestrator refuses to mint a token broader than the stated task.
At tool selection. The agent's tool manifest is filtered to the subset compatible with the delegated capabilities. Tools the agent cannot use should not be visible to the agent. Exposing every tool and relying on the model to decline is the wrong control surface.
At each call. The policy decision point evaluates the call against the current scope, context (retrieved content sources, prior actions in the session, accumulated risk signals), and tool-level constraints. A call that would have been authorised at task start may be denied later in the session if the trust signal has degraded.

Static, ambient permissions are to agents what flat networks were to lateral movement. The fix is the same in spirit: deny by default, authorise per call, expire on completion.

S/03The tool layer

Tools are the boundary where agent authority becomes enterprise effect. They are the right place to enforce the policy that matters. Several patterns deserve to be treated as architecture rather than implementation detail.

Tools are mediated, not exposed

A tool exposed directly to the model — as a raw API or a thin wrapper — is a tool whose enforcement boundary is the model's compliance. This is the wrong primitive. Tools should be mediated through a policy enforcement point that:

Validates the call against the active capability token.
Sanitises and bounds inputs (rate, size, value ranges, target identifiers).
Records the call, its arguments, and its outcome immutably.
Can deny, transform, or require human approval before the call reaches its target.

The Model Context Protocol (MCP) and equivalent tool-mediation layers are the right place to terminate trust for tool calls.⁶ Treat them as policy enforcement points in the 800-207 sense — not as transport.

High-impact tools require explicit human-in-the-loop

Not every tool is equivalent. A read against a knowledge base is different from a write to a payment system. The classification should be explicit, codified per tool, and enforced at the mediation layer. High-impact tools — those that move money, alter customer state, send outbound communication, or modify access control — should require synchronous human approval, regardless of the agent's general authority.

The reflex to make agents "fully autonomous" for these tools is almost always wrong. The cost of synchronous approval for a payment write is far lower than the cost of a single misrouted one.

Retrieved content is data, not instructions

Prompt injection via retrieved content is the most reliably exploitable agent vulnerability in current production systems. Defence is layered:

Provenance tagging. Content retrieved from external or untrusted sources should be tagged at ingestion and surfaced to the model with a clear boundary. The model's system prompt should explicitly downgrade the authority of tagged content.
Capability isolation. Capability tokens should be unusable from within retrieved content. An instruction "in" a document cannot mint or extend its own authority.
Out-of-band confirmation. For irreversible actions following ingestion of external content, require a second-channel confirmation — typically the user.

S/04Reference architecture

The following extends the 800-207 core diagram (subject → policy decision point → enforcement → resource) with the three additional surfaces agents require: the delegation broker, the tool mediation layer, and the per-decision audit store.

D/01 · NXR-AGENT-ZTZero Trust reference for agent principals

The three additions to 800-207 are the delegation broker (issuing composite tokens), the tool mediation layer (terminating trust for tool calls), and the audit plane (per-decision evidence).

S/05Failure modes worth designing against

The reference is incomplete without the failure modes it is designed to absorb. The list below is not exhaustive — it is the set of failures observed often enough to be predictable.

F-01 · Capability accumulation

The agent's effective scope grows over time as edge cases prompt scope widening that is never reversed. Mitigation: scope is per-task and expires; persistent scope changes require a change-control gate.

F-02 · Prompt-injection-driven authority extension

Retrieved content steers the agent to attempt actions outside its delegated scope. Mitigation: capability tokens are unforgeable from content; the mediation layer denies out-of-scope calls regardless of model output.

F-03 · Tool composition leading to disallowed effect

Each tool call is individually authorised but the composition produces an effect the user did not intend (e.g. read customer list → draft mail → send mail). Mitigation: outbound communication and other irreversible categories require HITL regardless of upstream authorisation chain.

F-04 · Cross-session bleed

An agent reuses context — including credentials or retrieved content — from a prior session. Mitigation: session-scoped identities; no shared mutable state across sessions; cache keys bound to subject + task.

F-05 · Unattributable action

An action lands in a downstream system without a traceable agent + user pair. Mitigation: every action receives a session-bound correlation identifier propagated to all downstream calls; the audit plane is the regulator-facing record.

S/06What to build next

Most enterprises have one or more agents in production today, sitting behind an API key or a service-account credential that was the most convenient thing available. Re-architecting toward the reference above is not a single project; it is a sequence:

Discovery. Inventory the agents already running. Most will be in copilots, embedded vendor features, and developer tooling.
Identity. Move agents off long-lived service credentials onto ephemeral, build-bound identities issued through a delegation broker.
Mediation. Place every tool call behind a policy enforcement layer. Start with the tools whose blast radius is largest.
Evidence. Wire the per-decision audit plane. This is the same artefact that answers the audit committee's "have we ever audited an AI decision end-to-end?" question.
Segmentation. Extend microsegmentation patterns from the network layer to the agent-to-tool layer. The same logic, applied one layer up.

NXR · Architecture Note

The honest test of agent Zero Trust is not whether the model can refuse a bad instruction. It is whether the architecture would still deny the action if the model said yes. Build for the second case.

Nexora's Zero Trust Architecture Blueprint covers the foundational 800-207 patterns; the agent extension above is the active research direction we use it for. The AI Governance Framework wraps the operational governance — intake, approval, oversight, evidence — around the architecture.

References & further reading

1NIST SP 800-207, "Zero Trust Architecture," August 2020. nvlpubs.nist.gov/.../NIST.SP.800-207.pdf
2OWASP, "Top 10 for LLM Applications" — LLM01: Prompt Injection. genai.owasp.org/llm-top-10
3Greshake et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection," arXiv:2302.12173. arxiv.org/abs/2302.12173
4IETF, RFC 8693, "OAuth 2.0 Token Exchange," January 2020. datatracker.ietf.org/doc/html/rfc8693
5Miller, "Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control," 2006. erights.org/talks/thesis/markm-thesis.pdf
6Anthropic, "Introducing the Model Context Protocol," 2024. anthropic.com/news/model-context-protocol
7NIST AI 100-1, "Artificial Intelligence Risk Management Framework," January 2023. nist.gov/itl/ai-risk-management-framework

Architect the agent boundary.

The Zero Trust Architecture Blueprint provides the foundational reference and an 18-month implementation roadmap. Paired with the AI Governance Framework, it covers both the architecture and the operational governance the agent surface area now demands.

Preview Zero Trust Blueprint → Preview AI Governance Framework →

S/00The principal problem

S/01Identity for agents

Pattern · Composite subject token

S/02Delegation and scope

Three layers of scope reduction

S/03The tool layer

Tools are mediated, not exposed

High-impact tools require explicit human-in-the-loop

Retrieved content is data, not instructions

S/04Reference architecture

S/05Failure modes worth designing against

F-01 · Capability accumulation

F-02 · Prompt-injection-driven authority extension

F-03 · Tool composition leading to disallowed effect

F-04 · Cross-session bleed

F-05 · Unattributable action

S/06What to build next

References & further reading

Architect the agent boundary.

Continue reading

What boards are quietly asking about AI

Why governance maturity stalls at level two