← Engineering Blog·SecurityApril 2025 · 10 min read

What We Learned Building a Zero-Trust Control Plane for AI Agents

Enterprise AI agents do not fail first at reasoning. They fail at control. The moment an agent can touch real systems, the question stops being “Can the model answer?” and becomes “Who is allowed to do what, with which data, under which conditions, and how will we prove it later?”

That is the security problem Orchestrik was built to solve. Orchestrik is a governed control layer between AI agents and enterprise systems. It enforces connector-level access control, keeps a tamper-evident audit trail, resolves credentials at runtime so the model never sees them, supports human approval gates, and can run on-premise or in air-gapped environments. This article is not a sales pitch. It is the set of lessons we learned building that control plane.

Prompt-level access control fails

This is the first hard truth. Telling a model in the prompt, “Do not access restricted systems,” is not security. It is instruction. And instruction is not enforcement.

OWASP now treats prompt injection as a primary LLM application risk because crafted inputs can manipulate model behaviour, bypass intended restrictions, and lead to unauthorised actions or data exposure. (OWASP GenAI Security Project) The underlying reason is simple: natural-language instructions and untrusted input get mixed in the same channel. OWASP's prevention guidance explicitly calls this out as a design weakness.

That is why Orchestrik does not rely on prompts as the final access boundary. The enforcement point sits at the connector and infrastructure layer, not inside model instructions. An agent cannot exceed what the control plane allows, regardless of what the prompt says or what a user tries to induce.

This design is closer to how zero trust is supposed to work. NIST defines zero trust as shifting security focus from broad perimeters to users, assets, and resources, with access decisions made per resource and enforced by dedicated control components such as policy engines and policy enforcement points. (NIST SP 800-207) That was one of our core conclusions: AI agents need a policy enforcement point outside the model.

For broader design context, see How to Design AI Agents, Agentic AI Readiness Evaluation Framework, and Agentic AI vs Automation.

The model should never see the credential

A second mistake we saw repeatedly in early agent systems was giving the model too much visibility into execution. If the agent can directly read API keys, database passwords, or SaaS tokens, then the system has already violated least privilege — even if the model behaves perfectly. Zero trust is built around minimising implicit trust and enforcing least-privilege access decisions per request. (NIST SP 800-207)

So Orchestrik uses a vault-mediated credential resolution flow. The agent decides it needs to call a connector. The call is intercepted by the control plane. The control plane resolves the required credential from the vault and injects it at execution time. The connector executes the action. The result is returned to the agent. The secret itself is never surfaced to the model at any point.

That separation matters. The agent gets the outcome of an authorised action, not the raw capability to act outside policy. This reduces credential sprawl, cuts accidental leakage into prompts or traces, and prevents the model from becoming a secondary secret store. It is a cleaner boundary between reasoning and execution.

Approvals must be logged when the decision happens

A surprising number of enterprise systems still reconstruct approvals later by joining workflow tables, email threads, chat messages, or operator notes. That is weak design. If an approval matters enough to gate an agent action, then the approval itself is part of the security event and must be captured at the time of decision. Later reconstruction is fragile. It creates ambiguity about timing, actor intent, exact scope, and whether the approval really existed before execution.

In Orchestrik, human-in-the-loop gates can be defined at the task level. The approver sees the original request and the agent's proposed action. The approval or rejection is written to an immutable target the moment the decision is made — not queued for later logging.

This also aligns with broader enterprise auditability requirements. NIST's zero-trust architecture guidance assumes that access decisions and enforcement need to be tied to explicit policy evaluation and observable control points, not inferred later from loose telemetry. (NIST CSWP 20)

Synchronous and async approval gates solve different problems

Approval gates sound simple until they hit production workloads.

Synchronous approval gates are right when the action is high-risk, immediately consequential, or user-facing. The system pauses, requests approval, and executes only after an explicit decision. This works well for destructive changes, financial-impacting updates, privileged access actions, or sensitive record modifications.

Asynchronous approval gates are better when the action can wait, the approver is not immediately available, or the workflow is batch-oriented. The task is queued, approval happens out of band, and execution resumes only after the decision is recorded.

The tradeoff is straightforward: synchronous gives stronger immediate control but adds latency and can hurt throughput. Asynchronous scales better operationally but introduces queueing, state management, retries, and timeout handling. There is no universal winner — the right choice depends on business criticality, expected response times, and whether the workflow is front-office or back-office. The key lesson was not “pick one.” It was make approval a first-class execution primitive rather than an afterthought bolted onto logs and tickets.

Policy has to exist at multiple layers

A secure agent is not defined only by what tools it has. It is defined by the intersection of user permissions, data permissions, agent permissions, and write scope.

In Orchestrik, policy control exists at the user level, data level, and agent level simultaneously. It defines which agents a user may access, which files or data scopes users and agents may read or write, what inputs an agent should accept, what outputs should be filtered, and whether an agent is read-only or allowed to write within explicitly permitted scopes.

That was another practical lesson: one boundary is not enough. An agent that is correctly scoped at the user level but has unconstrained write access is still a risk. An agent with correct data scope but no output filtering is still a risk. Real enterprise security for agents is layered — and each layer needs to hold independently.

This is how Orchestrik is built

Enterprise AI agents do not become secure because the prompt is clever. They become secure when the model is surrounded by a control plane that assumes the model is fallible. When someone asks, “How do you secure enterprise AI agents?” — this is the architecture answer, not the marketing one.

Connector-layer enforcement instead of prompt-only restrictions
Vault-mediated credential resolution — the model never sees a secret
Immutable approval logging at decision time, not reconstructed later
Synchronous and asynchronous approval patterns depending on workload criticality
Layered policy controls across users, data, agents, and write scopes

References