Secure Agentic AI Frameworks
Context
Agentic AI is the step change from AI that answers questions to AI that takes actions. An agentic system plans workflows, calls tools, makes decisions, and keeps going until the objective is met -- without waiting for human input at each step. The frameworks enabling this (LangChain, CrewAI, AutoGen, and others) have moved from prototypes to enterprise adoption in under two years. The security challenge is fundamentally different from securing a web application. An agent is an autonomous decision-maker that can reach enterprise tools, databases, and APIs. The attack surface is not the agent itself -- it is everything the agent can touch. A single over-privileged tool connection can expose the entire organisation.Summary
This pattern fills the gap between SP-027 (individual agent security) and SP-045 (AI governance). It provides the enterprise infrastructure and programme to run agentic AI safely at scale -- answering the question: what are the baseline security requirements? Seven control areas: (1) where agents run and how they are isolated, (2) which tools agents may use and how those are governed, (3) what guardrails prevent agents from causing harm, (4) how data flows through agent pipelines, (5) how agents communicate in multi-agent systems, (6) how costs and autonomous decision-making are bounded, and (7) how agent failures are detected and contained. Three governance principles cut across all seven areas. First, business owners -- not developers -- must decide what agents may do independently and what requires human approval; without this, agents default to full autonomy. Second, agents behave differently on every run, so security validation cannot rely on pass/fail tests alone -- it requires ongoing monitoring. Third, ungoverned agent deployments are the new shadow IT: when teams run agents outside the managed platform, they bypass every control this pattern establishes.Click any control badge to view its details. Download SVG
Key Control Areas
- Agent Execution Isolation and Environment Hardening (SC-07, CM-07, SC-39, AC-06): The most fundamental security decision is where agents run and what they can reach. SC-07 (Boundary Protection) is the anchor: every agent execution environment must be a defined security zone with explicit ingress and egress rules. Agents should run in ephemeral containers with read-only root filesystems, no persistent storage beyond what is explicitly provisioned, and network policies that whitelist only the specific endpoints the agent needs. CM-07 (Least Functionality) requires stripping the agent runtime to the minimum required capabilities: no shell access unless the agent's function requires it, no outbound internet unless specific URLs are whitelisted, no access to the container orchestration API. SC-39 (Process Isolation) ensures each agent session runs in its own process space -- one compromised agent session must not be able to access the memory, filesystem, or network connections of another. AC-06 (Least Privilege) applies at every layer: the container runs as a non-root user, the service account has minimal IAM permissions, API tokens are scoped to specific operations and expire after the session. For high-risk agent deployments, consider gVisor or Firecracker-level isolation rather than standard container boundaries. The execution environment must be reproducible and auditable: infrastructure-as-code definitions, immutable images, and cryptographic verification of the runtime stack.
- Tool Registry and Plugin Governance (CM-08, CM-03, SA-04, SA-11, AC-03): Agentic AI frameworks are built on tools -- functions that agents invoke to interact with the world. LangChain tools, CrewAI tools, AutoGen functions, and custom MCP (Model Context Protocol) servers are the enterprise's new API surface, and they require the same governance rigour as any other integration. CM-08 (System Component Inventory) requires a centralised tool registry: every tool available to any agent must be catalogued with its name, description, owner, risk rating, data classification, approved use cases, and the set of agents authorised to invoke it. CM-03 (Configuration Change Control) governs tool lifecycle: new tools go through a security review before being added to the registry; tool updates are tested in staging before promotion to production; deprecated tools are disabled on a defined timeline. SA-04 (Acquisition Process) applies to third-party tools and MCP servers: evaluate the security posture of tool providers, review source code for community tools, and assess what data the tool processes and where. SA-11 (Developer Security Testing) requires that every custom tool undergoes security testing: input validation, authorisation checks, error handling, and injection resistance. AC-03 (Access Enforcement) ensures tools enforce their own access controls and do not rely solely on the agent's claimed permissions. The OWASP ASI-02 (Tool Misuse) risk is addressed by combining tool-level access controls with agent-level authorisation -- the agent must be authorised to use the tool, and the tool must independently verify the request is valid.
- Guardrails Architecture (SI-10, AC-04, CM-02, SC-07): Guardrails are the technical enforcement layer that prevents agents from causing harm, distinct from the policy layer (SP-045) that defines what harm means. SI-10 (Information Input Validation) applies to both agent inputs and outputs: every message entering an agent must be screened for prompt injection, jailbreak attempts, and adversarial payloads; every action an agent proposes must be validated against a policy before execution. AC-04 (Information Flow Enforcement) prevents data from flowing where it should not: PII must not be sent to external APIs, confidential documents must not be summarised in logs, credentials must not appear in agent responses. CM-02 (Baseline Configuration) defines the guardrail ruleset: what topics the agent may discuss, what actions it may take, what data classifications it may process, what output formats are permitted, and what escalation paths exist when the agent encounters a boundary. SC-07 (Boundary Protection) enforces the guardrails at the infrastructure level: network policies prevent the agent from reaching disallowed endpoints even if the guardrail software is bypassed. Implement guardrails at multiple layers -- input filtering, output filtering, action approval, and infrastructure enforcement -- because no single layer is sufficient. NVIDIA NeMo Guardrails, Guardrails AI, and custom policy engines can provide the application-layer enforcement, but these must be complemented by infrastructure controls that the agent cannot circumvent. The OWASP ASI-01 (Agent Goal Hijack) and ASI-11 (Guardrail Bypass) risks require that guardrails are tested adversarially: red team the guardrails to verify they resist sophisticated bypass attempts, not just obvious ones.
- RAG Pipeline and Data Store Security (SC-28, AC-04, SI-10, CM-08): Retrieval-Augmented Generation (RAG) is the dominant architecture for grounding agents in enterprise data: documents are chunked, embedded into vectors, stored in a vector database, and retrieved at query time to provide context for the agent's responses. Every step in this pipeline is an attack surface. SC-28 (Protection of Information at Rest) requires encryption of vector stores, document stores, and embedding caches. The vector database is not just a cache -- it is a structured representation of enterprise knowledge and must be protected accordingly. AC-04 (Information Flow Enforcement) controls what data enters the RAG pipeline: documents must be classified before ingestion, and retrieval must be filtered by the querying agent's authorisation level -- an agent assisting a junior analyst must not retrieve board-level strategy documents even if they exist in the same vector store. SI-10 (Information Input Validation) addresses RAG poisoning: an attacker who can inject or modify documents in the source corpus can influence agent behaviour at scale. Validate document provenance, implement integrity checks on the ingestion pipeline, and monitor for unexpected changes to the document corpus. CM-08 (System Component Inventory) requires a data source registry: every document collection, database, API, and knowledge base feeding into RAG pipelines must be catalogued with its data classification, owner, and refresh cadence. Embedding models are themselves a supply chain dependency -- an embedding model that produces subtly biased vectors can influence retrieval without any visible change to the source documents.
- Multi-Agent Communication and Trust (AC-04, AC-05, IA-04, AU-03): Multi-agent systems -- where a planner agent delegates tasks to specialist agents, or a team of agents collaborates on a complex workflow -- introduce trust boundaries that single-agent architectures do not have. AC-04 (Information Flow Enforcement) governs what data agents can share: a code-generation agent should not have access to the context of a financial-analysis agent operating in the same framework instance. AC-05 (Separation of Duties) ensures that the agent proposing an action is not the same agent approving it -- in CrewAI's terminology, the agent that writes code should not be the agent that approves the pull request. IA-04 (Identifier Management) requires that every agent in a multi-agent system has a distinct identity that persists across interactions and can be correlated in audit trails -- anonymous agents in a swarm cannot be held accountable. AU-03 (Content of Audit Records) must capture the full delegation chain: which agent initiated the workflow, which sub-agents were spawned, what context was passed between them, and what each agent's contribution was to the final output. Delegation chain security requires specific controls: sub-agents must inherit at most the permissions of their parent orchestrator -- never exceeding them. Privilege cannot accumulate as tasks move through the agent pipeline: an orchestrator with read-only file access cannot spawn a sub-agent with write access. Delegation events must be logged with the orchestrator identity, sub-agent identity, and the context passed between them, enabling forensic reconstruction of how permissions flowed. For separation of duties in multi-agent pipelines, the agent that proposes an action should never be the agent that approves it -- this is the agentic equivalent of four-eyes approval, enforced at the identity and context level. Individual LLM-layer identity controls (API key scoping per agent, token rotation) are addressed in SP-027; this pattern addresses the trust relationships between agents in an orchestrated system. The OWASP ASI-10 (Cross-Agent Trust Exploitation) risk is acute: if one agent in a multi-agent system is compromised through prompt injection in a tool output, it can potentially inject malicious instructions into the shared context that influence all other agents. Implement context isolation between agents -- each agent should receive only the inputs explicitly passed to it, not the full conversation history of the orchestrating agent. Treat inter-agent messages as crossing a trust boundary, with the same validation applied to them as to external inputs.
- Cost, Resource, and Autonomy Governance (AC-06, AU-02, PM-09, CA-07): Agentic AI systems can consume resources in ways that traditional applications cannot. An agent caught in a reasoning loop can make thousands of API calls in minutes. A multi-agent system spawning sub-agents recursively can exhaust compute budgets. An agent with code execution capabilities can allocate cloud resources. AC-06 (Least Privilege) must include resource constraints: maximum API calls per session, maximum tokens per request, maximum execution time per task, maximum cost per workflow. AU-02 (Event Logging) must capture resource consumption: token usage, API call counts, execution duration, and estimated cost per agent session. PM-09 (Risk Management Strategy) must define the organisation's risk appetite for autonomous agent actions -- what is the maximum financial exposure from a single agent workflow before human approval is required? CA-07 (Continuous Monitoring) must include real-time alerting on resource consumption: an agent exceeding its expected token budget by 3x triggers investigation; an agent making API calls to an endpoint it has never previously called triggers review. The OWASP ASI-09 (Excessive Agent Autonomy) risk is addressed by defining explicit autonomy boundaries for each agent class: what decisions the agent can make independently, what requires notification, and what requires approval. Circuit breakers must be implemented at the infrastructure level: if an agent exceeds defined thresholds for cost, duration, API calls, or error rate, the runtime automatically pauses the workflow and alerts the operator.
- Agent Lifecycle and Deployment Management (CM-03, CM-04, SA-11, RA-03, CM-02): Agentic AI workflows are software and must follow software lifecycle governance -- but with additional considerations for non-deterministic behaviour. CM-03 (Configuration Change Control) governs agent deployment: agent prompts, tool configurations, guardrail rules, and model selections are versioned and deployed through the same CI/CD pipeline as other enterprise software. CM-04 (Impact Analysis) requires that every agent change is assessed for downstream impact: a prompt change may alter the agent's tool selection behaviour in ways that are not obvious from reading the prompt. SA-11 (Developer Security Testing) must include agent-specific testing: adversarial prompt testing, tool interaction fuzzing, guardrail bypass testing, and multi-step workflow validation. Standard unit tests are necessary but insufficient -- agent behaviour is probabilistic, so testing must include statistical validation across multiple runs. RA-03 (Risk Assessment) must be performed before deploying any new agent workflow to production: what data can this agent access, what actions can it take, what is the worst-case outcome if the agent behaves unexpectedly, and what compensating controls are in place? CM-02 (Baseline Configuration) defines the approved agent configurations: pinned model versions, locked tool sets, validated prompts, and tested guardrail rules. Promote configurations through environments (development, staging, production) with approval gates at each transition. Agent retirement is equally important: when a workflow is decommissioned, revoke the agent's credentials, remove its tool access, and archive its audit trail.
- Agent Incident Response and Failure Containment (IR-04, IR-06, IR-01, SI-04): When an agentic AI system fails -- producing incorrect outputs, taking unintended actions, leaking data, or being exploited through prompt injection -- the response must be swift and specific. IR-04 (Incident Handling) must include agent-specific runbooks: how to stop a running agent workflow immediately (kill switch), how to identify what actions the agent has already taken (audit trail review), how to assess the blast radius (what data was accessed, what tools were invoked, what external systems were contacted), and how to remediate (roll back changes, revoke credentials, notify affected parties). IR-06 (Incident Reporting) must include criteria for when an agent failure constitutes a reportable incident: an agent accessing data above its clearance, an agent contacting an external endpoint not on its allowlist, an agent producing outputs that violate the organisation's content policy, or an agent bypassing guardrails. IR-01 (Incident Response Policy and Procedures) must be updated to address agent-specific scenarios that traditional IR playbooks do not cover: what happens when an agent modifies production data incorrectly, when an agent sends an email or message on behalf of the organisation, or when a multi-agent system produces a cascading failure? SI-04 (System Monitoring) provides the detection layer: anomaly detection on agent behaviour patterns, real-time comparison of agent actions against expected workflows, and automated alerting when agents deviate from their baseline. Every agent deployment must have a documented kill switch: a mechanism to immediately halt the agent, revoke its access, and preserve its state for forensic analysis.
When to Use
The organisation is evaluating or deploying agentic AI frameworks (LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, Bedrock Agents) for production use cases. AI agents will have access to enterprise tools, APIs, databases, or code execution environments. Multi-agent architectures are planned where agents collaborate, delegate, or compete on complex workflows. The organisation needs to demonstrate to regulators, auditors, or customers that agentic AI deployments are governed and controlled. Customer-facing AI agents will take actions on behalf of customers (placing orders, modifying accounts, processing transactions). Internal AI agents will have access to sensitive data (financial records, customer PII, intellectual property, source code). The organisation wants to enable rapid experimentation with agentic AI while maintaining enterprise security baselines.
When NOT to Use
Organisation uses AI only for simple prompt-response interactions with no tool access, no autonomous actions, and no multi-step workflows -- SP-027 alone provides sufficient coverage. All AI usage is confined to managed SaaS products (ChatGPT Enterprise, Microsoft Copilot, GitHub Copilot) where the agent infrastructure is the provider's responsibility. The organisation has no plans to build custom agentic workflows or deploy open-source agent frameworks. Note: if the organisation uses any tool-augmented AI (even a single agent with file access or API access), a subset of this pattern's controls becomes relevant.
Typical Challenges
The most common challenge is the pace of framework evolution: LangChain, CrewAI, and AutoGen release breaking changes frequently, and security controls built against one version may not function correctly after an upgrade. Framework supply chain risk is significant -- these frameworks have deep dependency trees with hundreds of transitive packages, and vulnerability scanning of agentic framework dependencies reveals a higher CVE density than mature enterprise middleware. Non-deterministic agent behaviour makes testing fundamentally harder than deterministic software: the same prompt and tools may produce different action sequences on different runs, requiring statistical testing approaches rather than binary pass/fail assertions. Multi-agent communication lacks standardised protocols -- each framework implements inter-agent messaging differently, making it difficult to apply uniform security controls across heterogeneous agent architectures. Vector database security is immature: most vector databases prioritise performance over access control, and fine-grained retrieval authorisation (ensuring an agent only retrieves documents it is authorised to see) must typically be implemented at the application layer. Developer enthusiasm for agentic AI often outpaces security review capacity, leading to shadow agent deployments that bypass the governed platform. Cost unpredictability is a practical concern: agent workflows that reason extensively before acting can consume 10-100x more tokens than expected, and without circuit breakers, a single misbehaving workflow can generate significant API costs.
Threat Resistance
This pattern directly addresses the OWASP Top 10 for Agentic Applications. Agent Goal Hijack (ASI-01) is mitigated through multi-layer guardrails with adversarial testing, ensuring that prompt injection in tool outputs or retrieved documents cannot redirect the agent's objective. Tool Misuse (ASI-02) is prevented through the centralised tool registry with per-tool access controls and input validation, ensuring agents can only invoke approved tools with valid parameters. Cross-Agent Trust Exploitation (ASI-10) is addressed through context isolation between agents, treating inter-agent messages as trust boundaries with the same validation applied to external inputs. RAG Poisoning is mitigated through document provenance validation, ingestion pipeline integrity checks, and retrieval-level authorisation filters on the vector store. Cascading Hallucination (ASI-04) is addressed through output validation at each step of multi-agent workflows, preventing one agent's incorrect output from propagating unchecked through the pipeline. Runaway resource consumption is controlled through cost circuit breakers, token budgets, and execution time limits enforced at the infrastructure level. Framework supply chain compromise is mitigated through dependency scanning, pinned framework versions, and reproducible agent runtime builds. Shadow agent deployment is detected through the agent inventory requirement and network monitoring for unauthorised agent-to-model-provider traffic.
Role Responsibilities
Agentic AI uniquely blurs traditional role boundaries. A developer writing an agent prompt is simultaneously defining security policy (guardrail rules), making architectural decisions (tool selection, data access), and encoding business logic (autonomy limits, escalation paths). A business owner who fails to define action tiers leaves the developer to guess what the agent should do autonomously — and the developer will default to maximum autonomy because it is easier to build. The following matrix maps primary accountability for each control area. Bold items highlight responsibilities most commonly missed or misassigned.
Agent Execution Isolation (SC-07, CM-07, SC-39, AC-06)
Implements container configurations and infrastructure-as-code. Builds and signs runtime images. Integrates sandbox tooling (gVisor, Firecracker).
Selects isolation model and tier. Defines network policies, egress allowlists, container lifetime, and resource limits.
No direct accountability. Isolation tier may affect cost and performance trade-offs that require business input for high-assurance deployments.
Tool Registry & Governance (CM-08, CM-03, SA-04, SA-11, AC-03)
Builds custom tools and submits to registry with documentation. Writes and executes security tests for each tool.
Defines tool risk classification scheme. Reviews tool security posture before registry approval. Sets per-agent tool access policies.
Approves which business systems and data classifications agents may access through tools. Owns the decision of whether a tool exposing customer data or financial systems is acceptable for agent use.
Guardrails Architecture (SI-10, AC-04, CM-02, SC-07)
Implements guardrail rules in code. Integrates guardrail engines (NeMo, Guardrails AI). Conducts adversarial testing of guardrail effectiveness.
Designs multi-layer guardrail architecture (input, output, action, infrastructure). Defines enforcement points and bypass-resistant layering.
Defines what agents must never do. The guardrail ruleset encodes business policy — prohibited topics, prohibited actions, acceptable data handling, escalation thresholds. Without explicit business owner input, developers encode their own assumptions about acceptable behaviour.
RAG Pipeline Security (SC-28, AC-04, SI-10, CM-08)
Builds ingestion and retrieval pipelines. Implements document-level access filtering, provenance tracking, and integrity checks.
Designs access control model for vector stores. Defines classification-to-agent-role mapping and data flow rules.
Classifies source data and approves which document collections may feed which agent workflows. An unclassified corpus ingested into a shared vector store is an uncontrolled data exposure.
Multi-Agent Trust (AC-04, AC-05, IA-04, AU-03)
Implements inter-agent authentication, context isolation, and delegation chain logging.
Designs the delegation and trust model — permission inheritance rules, separation of duties enforcement, context boundary definitions.
Approves delegation chains for high-risk workflows — which agents may delegate to which others, and what actions the delegation chain may ultimately perform.
Cost & Autonomy Governance (AC-06, AU-02, PM-09, CA-07)
Implements circuit breakers, token budgets, cost tracking, and alerting integrations.
Sets resource limits per agent class. Defines alerting thresholds and circuit breaker triggers.
Defines the autonomy tiers — which decisions agents make independently, which require notification, which require human approval before execution. Sets cost appetite per workflow and maximum financial exposure before human approval is required. This is the single most important business owner responsibility in agentic AI: without explicit tier definitions, agents default to full autonomy.
Lifecycle & Deployment (CM-03, CM-04, SA-11, RA-03, CM-02)
Owns CI/CD pipeline for agent artefacts (prompts, tool configs, guardrails, model selections). Runs adversarial and statistical testing. Manages agent retirement.
Defines promotion gates and approval requirements per environment. Sets baseline configuration standards and testing coverage requirements.
Signs off on production deployment of new agent capabilities. Reviews risk assessment outputs for agent workflows accessing sensitive data or acting on behalf of customers.
Incident Response (IR-04, IR-06, IR-01, SI-04)
Implements kill switches. Preserves agent state for forensic analysis. Maintains and tests incident runbooks.
Designs agent-specific incident response procedures. Defines blast radius assessment methodology. Integrates agent monitoring with SOC workflows.
Defines when an agent failure becomes a reportable incident — criteria for regulatory notification, customer communication, and escalation to executive leadership. Owns the decision of what constitutes acceptable vs. unacceptable agent behaviour post-incident.
The business owner's most critical responsibilities are defining what agents may do (autonomy tiers), what data they may access (classification), and what constitutes failure (incident criteria). These are business decisions that cannot be delegated to developers or architects. If the business owner does not make them explicitly, the developer makes them implicitly — and will default to maximum autonomy and broadest data access because these are easier to implement.
Assumptions
The organisation has decided to adopt one or more agentic AI frameworks (LangChain, CrewAI, AutoGen, LangGraph, Semantic Kernel, Amazon Bedrock Agents, or similar) for business-critical or business-supporting workflows. A cloud or on-premises container platform is available for agent execution (Kubernetes, ECS, or equivalent). SP-027 (Secure AI Integration) controls are implemented or being implemented for individual agent security. SP-045 (AI Governance) is established or planned for AI management governance. The organisation has software engineering and DevOps capability to implement infrastructure-as-code, CI/CD pipelines, and monitoring for agent workloads. Budget exists for vector databases, guardrail tooling, monitoring infrastructure, and agent-specific security tooling. The organisation's security team has or is developing competence in AI/ML security -- this is a specialist domain that cannot be fully addressed by traditional application security skills alone.
Developing Areas
- Model Context Protocol (MCP) security: Anthropic's MCP has become the de facto standard for connecting AI agents to external tools and data sources, with thousands of MCP servers deployed across enterprises. The protocol defines a structured interface for tool discovery, invocation, and result handling, and its security model has matured significantly since initial release — but critical gaps remain that enterprises must address. The MCP threat landscape divides along a critical architectural boundary: local versus remote servers. Local MCP servers (stdio transport) run on the user's machine with the user's privileges, creating a code execution risk if the server source is untrusted — enterprises must display exact commands before execution, require explicit consent with clear warnings, highlight dangerous patterns (sudo, rm -rf, network operations), and execute in sandboxed environments. Remote MCP servers (HTTP/SSE transport) face the full spectrum of web application threats plus MCP-specific attack vectors: tool poisoning via metadata manipulation (T-AAF-013), confused deputy attacks through credential reuse (T-AAF-014), SSRF via OAuth discovery URLs (T-AAF-015), and token passthrough that breaks accountability (T-AAF-016). For remote MCP servers, the specification mandates OAuth 2.1 with PKCE, but critical controls remain implementation-dependent. Enterprises must enforce: per-client consent before third-party authorisation flows, exact redirect URI validation (no wildcards, no pattern matching), audience-bound tokens (tokens issued for one MCP server must be rejected by all others), and SSRF protection on all OAuth discovery and callback URLs — block private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254), enforce HTTPS for all non-loopback connections, and pin DNS resolution between check and use. Tool poisoning (T-AAF-013) is an MCP-specific threat with no direct analogue in traditional API security: tool descriptions and parameter schemas are consumed directly by the LLM, and an attacker controlling an MCP server can embed adversarial instructions invisible in the UI but influential in the model's reasoning. Scope minimisation is critical: request minimal baseline scopes and implement incremental elevation via WWW-Authenticate challenges rather than requesting all permissions at connection time. Auto-run behaviour — where clients automatically execute tool calls without user confirmation — must be disabled by default in enterprise deployments, as it implicitly trusts all responses and maximises blast radius from a compromised server.
- Agent-to-agent protocol standardisation: The AI industry lacks a standard protocol for secure multi-agent communication. LangGraph uses graph-based state passing, CrewAI uses role-based delegation, AutoGen uses conversation-based messaging, and Amazon Bedrock uses event-driven orchestration. This fragmentation means security controls must be framework-specific, increasing the cost and complexity of securing heterogeneous agent environments. Proposals for standardised agent communication protocols (Google's A2A, Anthropic's MCP extensions) are emerging but not yet mature enough for enterprise adoption.
- Agentic AI in regulated industries: Financial services, healthcare, and government agencies face additional constraints when deploying agentic AI. DORA (EU Digital Operational Resilience Act) requires that ICT third-party risk management covers AI agent infrastructure. FINMA expects Swiss banks to apply operational risk controls to autonomous AI systems. The FDA is developing guidance for AI agents in clinical decision support. These regulatory requirements add a compliance dimension to the security architecture that generic framework guidance does not address. Organisations in regulated industries should expect to demonstrate that their agentic AI deployments satisfy industry-specific resilience, auditability, and transparency requirements.
- Hardware-level agent isolation: Current agent isolation relies on container boundaries, which provide process-level separation but share the host kernel. For high-assurance environments, hardware-level isolation using confidential computing (AMD SEV, Intel TDX, ARM CCA) can provide cryptographic guarantees that agent workloads are isolated from each other and from the infrastructure operator. This is particularly relevant for multi-tenant agent platforms where agents from different customers or business units run on shared infrastructure. The performance overhead of confidential computing is decreasing but remains non-trivial for latency-sensitive agent workflows.
- Agent observability and tracing standards: Distributed tracing for agentic AI (tracking a request through planning, tool invocation, sub-agent delegation, and response generation) is emerging but not standardised. OpenTelemetry is being extended for AI agent workflows, and frameworks like LangSmith (LangChain), AgentOps, and Arize provide agent-specific observability, but interoperability between these tools is limited. Enterprises operating multiple agent frameworks need a unified observability layer to correlate agent actions across frameworks, detect anomalies, and perform forensic analysis of agent incidents.