Keywords
Agentic Artificial Intelligence, Autonomous Systems, Multi-Agent Systems. Memory-Augmented Reasoning, Threat Modeling, Secure Execution, Lifecycle Control, AI Governance
This article is included in the Artificial Intelligence and Machine Learning gateway.
Agentic Artificial Intelligence systems, characterized by autonomous reasoning, memory augmentation, and adaptive planning, are rapidly reshaping technological landscapes. Unlike traditional AI or large language models, agentic AI integrates decision-making with persistent execution, enabling complex interactions across dynamic environments. However, this evolution introduces novel security risks, governance challenges, and ethical considerations that current frameworks inadequately address. This survey provides a cross-layer review of agentic AI, encompassing architectural paradigms, threat taxonomies, and governance strategies. It consolidates findings from adjacent domains such as cybersecurity, AI safety, multi-agent coordination, and ethics, offering a holistic understanding of vulnerabilities and mitigation approaches. We integrate insights from recent advances in defense architectures and governance innovations, highlighting the limitations of static policies in addressing dynamically evolving threats. Real-world deployments from industrial automation to military and policy applications reveal both successful integrations and notable failures, underscoring the urgency of resilient oversight mechanisms. Furthermore, we identify critical research gaps in benchmarking, memory integrity, adversarial defense, and normative embedding, emphasizing the need for interdisciplinary collaboration to develop adaptive, accountable, and transparent systems. This review serves as a narrative synthesis rather than a systematic literature review, aiming to bridge technical, governance, and ethical perspectives. By integrating cross-disciplinary findings, it lays the foundation for future research on securing, aligning, and governing agentic AI in real-world contexts. Ultimately, this work calls for cooperative innovation to ensure that agentic AI evolves as a trustworthy, accountable, and beneficial technology.
Agentic Artificial Intelligence, Autonomous Systems, Multi-Agent Systems. Memory-Augmented Reasoning, Threat Modeling, Secure Execution, Lifecycle Control, AI Governance
The rapid emergence of agentic AI systems, AI agents endowed with memory, reasoning, planning, and tool-use capabilities, represents a paradigm shift from traditional machine learning and static decision-support models. These systems are increasingly deployed in domains where autonomous decision-making interacts with dynamic, high-stakes environments such as healthcare, critical infrastructure, and cybersecurity. While their autonomy promises unprecedented efficiency and innovation, it also introduces novel risks that challenge existing frameworks for safety, ethics, and governance.1 From a security perspective, agentic AI increases the attack surface. Autonomous decision-making enables new forms of adversarial manipulation, including cognitive exploits, stealth execution, and knowledge poisoning. Conventional layered security models, originally designed for static computing architectures, are inadequate for defending adaptive, distributed agents. Researchers argue that cross-layer security strategies integrating hardware, software, and governance measures are necessary to address these vulnerabilities holistically.2
The concept of “trustworthiness” itself is contested. Scholars Conradie & Nagel3 and Freiman4 caution against anthropomorphizing AI with human attributes such as “trust” and “responsibility,” noting that these qualities must instead be framed as properties of socio-technical systems that include human oversight and institutional. This highlights the need to shift the focus from asking whether AI itself can be “trusted” to how we can build systems that support human-centered trust relationships through technical safeguards and governance. Governance frameworks such as the EU AI Act, NIST’s AI Risk Management Framework, and ISO/IEC standards have laid initial foundations, but they lack granularity for managing agentic systems that self-adapt, collaborate, and act semi-independently. Integrating principles of zero-trust architectures, explainable AI, and adaptive oversight mechanisms is now seen as crucial for aligning agentic AI with societal expectations of accountability and safety.5,6 Finally, real-world deployments from national crisis response to autonomous cybersecurity demonstrate both the potential and fragility of agentic AI. Cases of unanticipated failures, bias amplification, and adversarial exploitation underscore the urgency of developing a cross-layer understanding that integrates architecture, threats, and governance strategies.7 In light of these challenges, this review is motivated by the need to bridge technical insights with ethical and regulatory perspectives, offering a holistic framework to guide both researchers and policymakers in building trustworthy agentic AI systems.
This review adopts a narrative review methodology rather than a systematic literature review (SLR). Unlike SLRs, which employ rigid inclusion and exclusion criteria, a narrative review enables a broad, integrative synthesis across multiple disciplines. This flexibility is essential for agentic AI, where developments in architectures, security threats, and governance evolve rapidly and often emerge outside traditional peer-reviewed channels, including industry white papers and policy documents.8
The scope of this work spans technical, ethical, and regulatory dimensions, providing a cross-layer perspective on:
• Agentic AI Architectures: including mono-agent, multi-agent, federated, and blockchain-enabled systems, with a focus on how these architectures influence trustworthiness.
• Threat Models and Security Risks: covering cognitive exploits, knowledge poisoning, prompt injection, stealth execution, and cross-layer propagation vulnerabilities.
• Governance and Oversight Mechanisms: analyzing legal frameworks such as the EU AI Act, NIST, ethical norms, and lifecycle accountability approaches.
• Defense Strategies and Risk Mitigation: reviewing zero-trust frameworks, cryptographic identity mechanisms, and layered defense strategies for resilient deployments.
• Real-World Deployments: evaluating industrial and governmental use cases, security incidents, and lessons learned for future deployments.
The literature reviewed draws from AI safety, cybersecurity, governance, ethics, and distributed systems, ensuring an interdisciplinary lens.9 Unlike prior reviews that focus narrowly on either technical mechanisms or policy considerations, this review integrates both dimensions to reveal emerging gaps in aligning technical safeguards with governance strategies.10 Furthermore, this review includes insights from adjacent domains such as multi-agent coordination, cybersecurity resilience, and human-centered AI ethics to map a more comprehensive landscape of trust challenges and mitigation strategies.11 Synthesizing this diverse body of knowledge offers a holistic foundation for researchers, practitioners, and policymakers seeking to understand and secure the future of agentic AI.
This review makes four key contributions by consolidating insights across technical, adversarial, and governance layers to address the trustworthiness of agentic AI systems.
1. Integration of Cross-Layer Perspectives: Unlike prior studies that analyze AI trustworthiness through isolated lenses (technical or ethical), this review integrates findings across architectures, threats, and governance, offering a comprehensive cross-layer framework. This approach aligns with recent calls for merging hardware/software security with policy oversight to address complex AI risks.2
2. Development of a Layered Threat Taxonomy: The paper introduces a novel taxonomy that categorizes risks specific to agentic AI, including cognitive exploits, shadow agent emergence, and cross-layer propagation vulnerabilities. This taxonomy extends beyond traditional adversarial machine learning, incorporating threats identified in recent cybersecurity research.7,12
3. Synthesis of Governance with Technical Safeguards: This review connects policy frameworks such as the EU AI Act, ISO/IEC governance models, with technical defense strategies such as zero-trust architectures and explainable AI. This synthesis provides actionable guidance for designing systems that are both technically secure and aligned with societal expectations.5,6
4. Identification of Research Gaps and Future Directions: Finally, this paper highlights critical gaps such as lifecycle accountability, benchmarking of agentic AI safety, and federated governance risks, and proposes a roadmap for future research. These findings aim to inspire interdisciplinary collaboration to close existing gaps between technology, security, and regulation.9
Collectively, these contributions offer a holistic foundation for understanding and securing agentic AI, guiding both technical innovations and governance frameworks for real-world deployment.
The remainder of this paper is organized to progressively build a cross-layer understanding of trustworthy agentic AI, beginning with its methodological foundations and advancing toward governance and future research directions. Section 2 outlines the narrative review methodology, describing the sources, search strategy, inclusion rationale, and the domains considered, while also contrasting this approach with prior surveys to highlight the novelty of this work. Section 3 establishes the technical foundations of agentic AI by defining its distinguishing features, including memory-augmented reasoning, planning capabilities, and interaction with adjacent research areas such as AI safety and distributed systems. Building on this, Section 4 explores architectural paradigms, from mono-agent to blockchain-enabled systems, and provides a comparative evaluation that emphasizes their strengths and limitations in terms of trustworthiness. Section 5 develops a layered threat taxonomy, mapping cognitive exploits, knowledge poisoning, stealth execution, and cross-layer propagation risks, while integrating insights from cybersecurity and adversarial machine learning literature. Section 6 shifts focus to governance frameworks, reviewing existing regulatory approaches, identifying gaps unique to agentic systems, and drawing lessons from adjacent domains like robotics and cybersecurity governance. Section 7 examines real-world deployments, including industrial, governmental, and policy-driven use cases, and reflects on both successful implementations and documented failures. Section 8 discusses defense architectures and oversight models, evaluating mechanisms such as layered security frameworks, zero-trust architectures, and cryptographic identity enforcement, while offering a comparative analysis of their effectiveness. Section 9 synthesizes the findings to identify open research challenges, including goal alignment, auditability, and institutional readiness, and proposes future directions to bridge these gaps. Finally, Section 10 concludes by summarizing key insights, presenting a forward-looking perspective on the evolution of trustworthy agentic AI, and emphasizing the need for interdisciplinary collaboration to ensure safe and accountable deployment. This structured progression from foundations to threats, governance, real-world applications, and future outlook ensures that readers gain a comprehensive understanding of the multifaceted issues surrounding agentic AI trustworthiness.
Key distinguishing features of agentic AI systems are summarized in Table A2, while architectural comparisons are provided in Table A3. Additionally, a taxonomy of emerging threats is outlined in Table A4 (Supplementary Material).
Given the interdisciplinary nature of agentic AI, this review adopts a narrative approach to identify and synthesize relevant literature rather than applying rigid inclusion rules. The search process was designed to capture technical, security, and governance perspectives, allowing the integration of diverse insights from multiple domains. Academic databases such as IEEE Xplore, ACM Digital Library, SpringerLink, ScienceDirect, and arXiv were the primary sources, complemented by policy reports from organizations including the OECD, NIST, and the European Commission. To ensure coverage of cutting-edge developments, recent conference proceedings such as NeurIPS, ICML, and AAAI were also reviewed.13
The search strategy combined keyword clusters such as “agentic AI,” “autonomous agents,” “multi-agent systems,” “cross-layer security,” “trustworthy AI,” “AI governance,” and “threat modeling.” Boolean operators and field-specific terms were applied to maximize the retrieval of high-quality and contextually relevant articles. The selection was not limited to peer-reviewed journals; influential technical white papers and government publications were included where they provided substantial insights into emerging practices or regulatory frameworks.14
Articles were included based on relevance to the cross-layer trustworthiness of agentic AI, covering themes of architectural design, threat taxonomy, governance, and ethical oversight. No strict temporal filter was applied; however, priority was given to literature from the last five years to reflect rapid technological advances. Older works were retained where they provided foundational theoretical frameworks. Unlike systematic reviews, which rely on predefined inclusion thresholds, this narrative review allows the inclusion of conceptually significant studies even if they fall outside narrow search criteria.15 Finally, to address emerging debates, grey literature such as industrial threat reports, AI safety guidelines, and open-source datasets was selectively integrated where it contributed unique evidence not yet present in academic publications.16 This multifaceted strategy ensures the survey encompasses both well-established theories and cutting-edge practices shaping the discourse on trustworthy agentic AI.
The inclusion of literature in this survey was guided by conceptual relevance rather than rigid filtering, consistent with the narrative review methodology. Rather than applying standardized exclusion protocols characteristic of systematic reviews, this study adopted a flexible, rationale-driven selection process that allowed the incorporation of diverse perspectives spanning technical, ethical, and governance dimensions. This approach is particularly appropriate for agentic AI, where developments often emerge from interdisciplinary intersections and non-traditional publication channels.15
• Sources were considered relevant if they contributed substantively to at least one of the following dimensions:
• Architectural Foundations papers offering insights into agentic architectures, multi-agent systems, or distributed designs, including blockchain-enabled or federated models.
• Threat Models and Security Risks studies that examined adversarial techniques, cross-layer propagation of attacks, or security vulnerabilities specific to autonomous agents.
• Governance and Ethical Oversight literature addressing regulatory frameworks, ethical principles, or lifecycle accountability mechanisms for AI systems.
• Defense Mechanisms and Mitigation Strategies research proposing zero-trust models, layered defense frameworks, or cryptographic identity enforcement approaches relevant to agentic AI security.
• Real-world deployments, case studies, industry reports, or empirical analyses documenting the successes and failures of agentic AI deployments in practice.
Priority was given to peer-reviewed publications from recognized journals and conferences, particularly those published in the last five years, reflecting the fast-evolving nature of this field. However, seminal works regardless of publication year were included when they provided foundational theoretical or methodological contributions.17 In addition, high-impact grey literature such as policy briefs, technical white papers, and reports from AI governance bodies were selectively incorporated to capture perspectives not yet reflected in academic discourse.16 Studies that focused exclusively on narrow domains such as standard supervised learning or traditional AI ethics without a direct connection to agentic autonomy, layered security, or governance were excluded. Similarly, sources lacking technical or conceptual rigor (such as opinion articles without evidence) were not retained. This balanced approach ensured the review’s inclusivity while maintaining its academic quality.
This survey spans seven interconnected domains that collectively shape the trustworthiness of agentic AI systems: agentic architectures, cybersecurity and adversarial threats, AI safety, governance frameworks, and ethical considerations. These domains were selected because they form the technical, operational, and normative pillars essential for understanding and mitigating risks associated with autonomous agents.
The first domain, agentic AI architectures, encompasses research on the design and functioning of mono-agent, multi-agent, federated, and blockchain-enabled systems. These architectures define how agents perceive, reason, and act within dynamic environments. Recent works highlight that architectural choices significantly influence security vulnerabilities, coordination strategies, and trust propagation among agents.13 The second domain focuses on cybersecurity and adversarial threats. Agentic AI, due to its autonomous decision-making and interconnected operations, introduces new attack vectors such as cognitive exploits, stealth execution, and cross-layer propagation risks. Studies in adversarial machine learning and zero-trust architectures underscore the need for layered defenses and adaptive security frameworks to counter these evolving threats.2 The third domain, AI safety, addresses the alignment of agentic behavior with human values and intended goals. This includes mitigating risks like reward hacking, goal drift, and emergent behaviors in multi-agent settings. Literature from AI safety research emphasizes the integration of formal verification, runtime monitoring, and explainability mechanisms to ensure predictable and controllable outcomes.9 The fourth domain centers on AI governance and regulatory frameworks. International policies, such as the EU AI Act and NIST AI Risk Management Framework, provide high-level guidelines but often fall short of addressing the adaptive and distributed nature of agentic systems. Recent research advocates for hybrid governance models that combine legal mandates with technical enforcement mechanisms.5 Finally, the fifth domain incorporates ethical and socio-technical considerations. Trust in agentic AI is not merely a technical property but a relational construct shaped by human perceptions, institutional accountability, and societal norms. Scholars have warned against anthropomorphizing AI with human-like trust qualities, instead calling for frameworks that prioritize responsible human oversight and equitable power dynamics in AI deployment.3 By synthesizing insights from these seven domains, this review provides a holistic lens to examine both the opportunities and risks associated with agentic AI, offering guidance for secure, ethical, and accountable real-world deployment. As shown in Figure 1. mind map of agentic AI domains. This diagram illustrates the seven interconnected domains influencing the trustworthiness of agentic AI systems: architectures, cybersecurity, AI safety, governance, ethical considerations, real-world deployments, and defense mechanisms. These domains form the foundation of the cross-layer framework proposed in the review.
Existing surveys on AI trustworthiness have largely focused on either technical mechanisms or policy frameworks, leaving a gap in integrating these perspectives under a unified cross-layer approach. For example, surveys in the domain of cybersecurity and AI have primarily concentrated on adversarial machine learning, intrusion detection, and threat intelligence without addressing how these threats propagate across agentic architectures or interact with governance layers.13 Similarly, reviews from the AI ethics literature tend to emphasize normative principles such as fairness, accountability, and transparency without offering concrete architectural or defensive models applicable to autonomous agents.3
A few recent works have attempted to bridge technical and governance perspectives. For instance, studies on zero-trust architectures in AI security argue for embedding security across multiple layers of AI systems, yet they do not systematically link these mechanisms to agentic AI’s unique properties, such as self-adaptation or collaborative behavior in multi-agent environments.2 Meanwhile, policy-oriented reviews, including those analyzing the EU AI Act and related regulatory frameworks, provide high-level governance principles but lack the technical granularity necessary for implementing safeguards within agentic ecosystems.5 Unlike these prior surveys, the present work adopts a cross-layer narrative perspective, systematically connecting architectural design choices, threat models, and governance strategies. It also incorporates real-world deployment experiences and emerging defense architectures, aspects often overlooked in earlier reviews. Furthermore, this study explicitly integrates adjacent domains such as AI safety, cybersecurity resilience, and robotics governance, creating a broader synthesis that reveals interdependencies between technical risks and institutional responses.9 As shown in Flowchart 1, the survey methodology. Outlines the narrative review process used in the study, including literature source selection, interdisciplinary integration, and thematic synthesis across technical, ethical, and governance domains. By filling these gaps, this review not only complements but also extends the scope of existing literature, providing a comprehensive framework to guide future research and policy design for trustworthy agentic AI systems.
Agentic AI refers to a class of artificial intelligence systems endowed with autonomy, memory, reasoning, planning, and proactive tool use, enabling them to operate in dynamic environments with minimal human intervention. Unlike traditional AI agents, which are typically task-specific and rule-bound, agentic AI demonstrates goal-directed behavior, the capacity for self-decomposition of complex tasks, and the ability to coordinate with other agents in multi-agent ecosystems.18 These systems integrate persistent memory and adaptive decision-making loops, enabling them to learn continuously and adjust their actions in response to environmental changes. In contrast, Large Language Models (LLMs) such as GPT and similar architectures are primarily predictive models trained to generate responses based on statistical patterns in large datasets. While LLMs have shown remarkable capabilities in natural language understanding and reasoning, they lack true agency: they do not possess intrinsic goals, persistent memory (beyond limited context windows), or the ability to autonomously plan and execute actions in the real world. Recent research, however, demonstrates that LLMs can serve as cognitive cores for agentic systems when augmented with external memory, planning modules, and orchestration layers.19 This hybridization blurs the boundary but does not erase the fundamental distinction: LLMs remain reactive tools unless embedded within an agentic framework that endows them with autonomy.
Traditional AI agents, such as rule-based expert systems or early multi-agent architectures, operate with predefined logic and limited adaptability. Their actions are constrained by fixed decision trees or programmed behaviors, making them ill-suited for open-ended environments. Agentic AI, by contrast, leverages dynamic task decomposition, meta-reasoning, and tool orchestration to perform tasks not explicitly programmed at design time.20 Moreover, agentic systems often operate within multi-agent ecosystems, enabling collective intelligence through cooperation, negotiation, and competition. Recent developments such as UserCentrix and Agent4EDU frameworks illustrate how agentic AI can combine LLM reasoning with memory-augmented orchestration and multi-agent collaboration to achieve real-world objectives autonomously.21,22 These features position agentic AI as a new paradigm that goes beyond both traditional AI agents and standalone LLMs, introducing unique opportunities and security and governance challenges that warrant cross-layer analysis. Figure 2. Layered architecture of AAI. A conceptual depiction of the layered components of agentic AI systems, including memory, reasoning, planning, and tool-use layers, demonstrating how these components interact to enable autonomy and adaptability.
Agentic AI derives its autonomy and adaptability from four foundational capabilities: memory, reasoning, planning, and tool use. These elements collectively distinguish it from both traditional AI agents and large language models, enabling it to operate proactively in dynamic environments.
Memory is central to agentic AI, allowing agents to store and retrieve information beyond the ephemeral context of traditional LLMs. Persistent memory enables agents to build long-term representations of their environment, user preferences, and past decisions, thereby supporting contextual continuity and more informed action selection.23 Advanced frameworks like UserCentrix demonstrate how memory-augmented reasoning enhances responsiveness and adaptability in real-world applications.24 Reasoning refers to the agent’s ability to interpret complex scenarios, infer hidden relationships, and adapt to novel conditions. Unlike traditional AI, which often relies on static decision rules, agentic AI employs multi-step and reflective reasoning processes, incorporating meta-cognition to evaluate its outputs. Studies have shown that agentic workflows enable emergent reasoning behaviors not observed in static LLMs, enhancing performance in research automation, robotics, and decision support.18 Planning is another hallmark capability, allowing agentic AI to decompose complex objectives into manageable subtasks and execute them sequentially. Modern systems like Magentic-One leverage orchestration agents to dynamically re-plan when errors or unexpected conditions arise, reflecting a robustness absent in conventional agents.25 Planning is not only reactive but also anticipatory, enabling agents to optimize actions based on long-term goals rather than short-term heuristics. Tool use extends the agent’s functionality beyond its intrinsic capabilities. By integrating external APIs, databases, or software tools, agentic AI can interact with diverse environments and perform specialized tasks. Tool orchestration, when combined with reasoning and planning, creates multi-modal and adaptive intelligence that supports dynamic problem solving. This capability has been shown to enhance performance in complex tasks such as automated coding, scientific discovery, and cyber-defense.26 Collectively, these four capabilities form the operational backbone of agentic AI. Their synergy enables systems not only to react to immediate inputs but to proactively plan, self-correct, and interact with their environment, making them fundamentally more autonomous and potentially more unpredictable than previous AI paradigms. As shown in Figure 3, the cognitive architecture workflow in this figure shows the operational workflow of agentic AI cognition, highlighting the integration of memory, reasoning, planning, and tool orchestration to support goal-directed behavior in dynamic environments.
The comparative analysis of governance frameworks across domains is detailed in Table A5 (Supplementary Material).
Agentic AI does not exist in isolation; its design and deployment are profoundly influenced by developments in AI safety, multi-agent coordination, and distributed systems. These adjacent fields provide both theoretical foundations and practical frameworks that shape the trustworthiness and resilience of agentic systems. AI safety contributes critical principles for ensuring that agentic AI remains aligned with human values and operational goals, even under conditions of uncertainty or adversarial pressure. Research highlights that emergent behaviors, such as reward hacking and specification gaming, can arise in complex environments where agents pursue objectives without adequate safeguards.27 Safety frameworks increasingly emphasize the need for alignment mechanisms, runtime monitoring, and formal verification to mitigate these risks.28 The field of multi-agent coordination offers insights into how autonomous agents collaborate, negotiate, and sometimes compete within shared environments. Techniques such as cooperative reinforcement learning, communication protocols, and game-theoretic models enhance the ability of agents to achieve collective goals while minimizing coordination failures. However, interactions in multi-agent ecosystems also introduce new vulnerabilities, including collusion, stealth attacks, and emergent adversarial dynamics.29 Studies show that protocols combining parameter sharing and coordinated learning significantly improve collaborative performance but must be balanced against risks of unintended strategic behaviors.30 Finally, distributed systems provide architectural models that enable scalability and resilience in agentic AI deployments. Concepts from distributed computing, such as fault tolerance, decentralized consensus, and secure communication, inform the design of federated and blockchain-enabled agentic frameworks. These architectures facilitate robust performance across heterogeneous environments but also create new attack surfaces, particularly where trust propagation and identity management are not well enforced.31 Recent proposals, such as UserCentrix, leverage distributed intelligence with memory-augmented coordination to achieve adaptive decision-making while maintaining situational awareness.24 By synthesizing insights from these adjacent fields, agentic AI research gains robust strategies for safety, coordination efficiency, and resilience against systemic threats. This interdisciplinary interplay is crucial for advancing secure, scalable, and ethically aligned agentic ecosystems.
The development of agentic AI is grounded in several theoretical frameworks that collectively define its reasoning capabilities, decision-making processes, and adaptive behaviors. These frameworks originate from cognitive architectures, reinforcement learning theories, game-theoretic models, and distributed adaptive control, each contributing distinct mechanisms for achieving autonomy and trustworthiness. Cognitive architectures such as ACT-R and Soar provide a structured approach to modeling human-like reasoning and memory. These architectures integrate symbolic and sub-symbolic processing, enabling agents to combine rule-based decision-making with learning from experience. Recent studies emphasize how neuromorphic-driven frameworks extend these principles by mimicking biological cognition, allowing for adaptive decision-making in dynamic environments.32 Reinforcement learning (RL) forms the backbone of many agentic AI systems, enabling agents to optimize actions based on reward signals. Techniques such as Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO) allow for scalable decision-making in high-dimensional spaces. Recent advancements incorporate quantum reinforcement learning and cognitive neuromorphic frameworks, further enhancing adaptability and efficiency.33 Game-theoretic approaches offer a theoretical foundation for multi-agent interactions, addressing scenarios where agents must coordinate, compete, or negotiate. Frameworks that model Theory of Mind (ToM), the ability to infer and predict the mental states of other agents, demonstrate how agentic AI can anticipate behaviors and adapt strategies in complex social interactions.34 Similarly, the integration of principal-agent reinforcement learning links economic contract theory with AI control mechanisms, guiding agents toward equilibrium strategies in distributed environments.35 Distributed adaptive control and multi-agent system theories underpin the scalability of agentic AI in decentralized environments. These frameworks emphasize layered control, feedback loops, and resilience, allowing agents to maintain stability while adapting to environmental changes.36 They also integrate with blockchain-based consensus mechanisms to enhance trust propagation and accountability in federated agent networks.37 Together, these frameworks provide the conceptual scaffolding for building agentic AI systems capable of complex reasoning, strategic interactions, and self-regulated autonomy. Their convergence forms the theoretical foundation upon which architectures, threat models, and governance strategies are constructed in subsequent sections.
Mono-agent architectures represent the simplest form of agentic AI, where a single autonomous agent operates independently to achieve defined objectives. These systems are characterized by centralized control, where all decision-making, perception, and action execution are handled within a unified framework. Such architectures typically follow an observe–decide–act loop, integrating sensing, reasoning, and acting within a closed cycle.38 This simplicity makes them easier to design and validate, which is advantageous for environments where predictable and transparent behaviors are essential. Recent advances have extended mono-agent systems beyond traditional rule-based agents. Modern frameworks employ modular enhancements, such as memory-augmented reasoning, sparse activation, and endocrine-inspired regulation. Furthermore, the S-AI architecture uses a hormonal meta-agent to dynamically orchestrate specialized modules, balancing efficiency and responsiveness while adapting to changing environmental demands.39 Similarly, brain-inspired architectures combine symbolic reasoning with neural learning mechanisms, enhancing flexibility without introducing the complexity of multi-agent interactions.40 Mono-agent designs also play a crucial role in establishing trust. Their centralized nature allows for easier implementation of explainability, auditing, and governance mechanisms, which are harder to enforce in distributed environments. However, their lack of redundancy and limited scalability make them vulnerable in adversarial contexts, where a single point of failure can compromise the entire system.41 Moreover, mono-agent architectures are increasingly integrated with enterprise API ecosystems to interact with external systems and tools, enabling them to perform complex workflows autonomously. This integration demands robust platform strategies, including zero-trust authorization models and event-driven orchestration, to ensure secure and efficient operation in real-world deployments.42 In sum, mono-agent architectures serve as a fundamental building block in agentic AI development. They offer clarity and controllability, making them suitable for regulated domains such as healthcare or finance, but their limited adaptability to distributed threats and collaborative tasks often necessitates transitioning toward multi-agent or hybrid architectures, as explored in the next subsection.
Multi-agent architectures (MAAs) extend the capabilities of mono-agent systems by enabling multiple autonomous agents to collaborate, coordinate, and sometimes compete within shared environments. These systems embody distributed intelligence, where agents communicate and adaptively organize to achieve complex objectives that exceed the capacity of any single agent.43 Unlike centralized models, multi-agent architectures are decentralized, providing robustness against failures and scalability for dynamic tasks. A defining property of MAAs is emergent behavior; the system exhibits global properties arising from local interactions between agents. This emergent intelligence has been exploited in applications ranging from robotic swarms and distributed cybersecurity to financial modeling and autonomous logistics.44 Coordination mechanisms, such as market-based negotiations, game-theoretic strategies, and organization-based models, enable agents to align individual actions with collective goals while minimizing conflicts.45
Security is both a strength and a vulnerability in MAAs. On one hand, redundancy and decentralization improve resilience; on the other, the same properties introduce new attack surfaces, including collusion, covert coordination, and swarm-based attacks. Emerging research on multi-agent security emphasizes the need for zero-trust principles, dynamic trust scoring, and secure registries to prevent exploits such as tool squatting and the malicious impersonation of agent tools. Blockchain-based multi-agent frameworks further enhance trust through tamper-proof consensus mechanisms, ensuring accountability and secure collaboration.46 Biologically inspired models, such as the S-AI hormonal meta-agent system, demonstrate how internal signaling mechanisms can orchestrate specialized agents adaptively, balancing efficiency with context-sensitive decision-making.39 These designs highlight how hierarchical coordination layers can mitigate complexity while maintaining autonomy at the agent level. Overall, multi-agent architectures provide a scalable, resilient, and adaptive paradigm for agentic AI. However, they also introduce systemic risks from emergent vulnerabilities to governance challenges that require cross-layer defense and oversight strategies, setting the stage for decentralized and federated architectures discussed in the next section. As shown in Figure 4, a multi-agent cognitive workflow is an architectural illustration of multi-agent systems, emphasizing communication and coordination mechanisms between agents, and the emergence of distributed intelligence through collaboration.
Decentralized and federated architectures represent a significant evolution in agentic AI, shifting control from a central authority to distributed nodes that collaborate while maintaining autonomy. These architectures enhance scalability, privacy, and resilience, which are critical in environments where agents must process sensitive data or operate under adversarial conditions. Decentralized architectures eliminate single points of failure by distributing decision-making and data processing across multiple nodes. Such systems leverage blockchain and distributed ledger technologies to ensure tamper-proof communication, secure identity management, and transparent auditing. A blockchain-based smart agent architecture has demonstrated the ability to combine trustless execution with high security and scalability, enabling secure collaboration across heterogeneous environments.47 Furthermore, the use of decentralized trust computation enhances robustness against insider threats and coordinated attacks, particularly when integrated with anomaly detection mechanisms.48
Federated architectures extend this concept by enabling collaborative learning across multiple distributed agents or devices without sharing raw data. Instead, only model updates are exchanged, thereby preserving privacy while enhancing global model performance. Federated learning has proven particularly valuable in sectors like healthcare, where sensitive datasets must remain local but still contribute to collective intelligence.49 Recent advances integrate hierarchical federated learning and quantum optimization to improve communication efficiency and handle heterogeneous data distributions.50 Security remains a critical challenge for federated systems, as malicious updates or compromised nodes can poison global models. Techniques such as secure aggregation, differential privacy, and zero-trust verification are being incorporated to mitigate these risks. For instance, joint blockchain-federated frameworks combine anomaly detection with immutable consensus to strengthen trust and model integrity.48,51 By combining distributed learning with decentralized trust enforcement, these architectures enable privacy-preserving, scalable, and resilient agentic AI deployments. However, challenges such as device heterogeneity, communication bottlenecks, and federated governance risks remain unresolved, highlighting the need for continued research in hybrid approaches, leading into the discussion of hybrid and blockchain-enabled architectures in the next section.
Hybrid and blockchain-enabled architectures combine the strengths of centralized control, decentralized trust, and cryptographic security to create scalable, resilient, and privacy-preserving agentic AI ecosystems. These architectures address key limitations of purely centralized or federated models by leveraging blockchain for verifiable trust and hybrid orchestration for dynamic adaptability. Hybrid architectures integrate heterogeneous technologies such as AI, blockchain, and zero-trust models to achieve multi-layered security and flexible performance. For example, hybrid frameworks in healthcare combine blockchain with zero-trust verification and AI-driven threat detection to secure sensitive data flows while enabling real-time decision-making.52 Similarly, containerized hybrid IT systems leverage blockchain-based data provenance to enhance transparency and operational efficiency in edge AI deployments.53 These hybrid solutions offer a balanced trade-off between scalability, latency, and security. Blockchain-enabled architectures provide immutable auditability, tamper-proof identity management, and secure agent coordination in distributed environments. Blockchain’s decentralized ledger ensures that agent interactions, decisions, and updates are cryptographically verifiable, reducing the risk of insider manipulation and trust propagation failures. Recent surveys highlight how integrating blockchain with agentic AI enables secure and scalable multi-agent collaboration across domains such as Web3, DeFi, and autonomous systems.54 Furthermore, advanced hybrid models utilize sharding and state channels to overcome blockchain’s scalability bottlenecks while preserving security guarantees.55
The convergence of AI and blockchain also introduces novel governance possibilities. Smart contracts enforce policy compliance autonomously, while cryptographic identity frameworks such as telecom-hosted eSIM infrastructures offer secure, auditable identities for agents operating across distributed networks.56 These innovations strengthen trustworthiness by embedding governance rules directly into the technical substrate. Despite their promise, hybrid and blockchain-enabled architectures face open challenges: high computational costs, interoperability barriers, and latency constraints remain significant concerns, especially in real-time applications like industrial robotics and cybersecurity. Ongoing research emphasizes optimizing lightweight consensus mechanisms, integrating AI-driven anomaly detection, and exploring quantum-resistant cryptography to enhance both performance and security.57 Overall, these architectures mark a paradigm shift toward self-governing, resilient agentic ecosystems, where security, trust, and governance are embedded at both technical and institutional layers. This evolution sets the stage for analyzing comparative architectural trade-offs, addressed in the next subsection.
The architectural paradigms of agentic AI mono-agent, multi-agent, decentralized/federated, and hybrid/blockchain-enabled offer distinct advantages and limitations depending on their design goals, operational environments, and security requirements. While mono-agent systems excel in simplicity and explainability, they suffer from scalability and single-point vulnerabilities. Multi-agent architectures introduce coordination and emergent intelligence, but also increase the attack surface and complexity of trust management. Decentralized and federated systems enhance resilience and privacy through distributed control but struggle with communication overheads and poisoning attacks. Hybrid and blockchain-enabled frameworks combine decentralization with cryptographic trust, addressing many limitations but introducing high computational costs and interoperability challenges.52,54 Table 1 compares four agentic AI architecture types (mono-agent, multi-agent, decentralized/federated, and hybrid/blockchain-enabled) across key features, strengths, limitations, and representative studies.
Architecture Type | Key Features | Strengths | Limitations | Representative Studies |
---|---|---|---|---|
Mono-Agent | Centralized control, self-contained reasoning, and action loops | High explainability, easier auditing, and governance | Single point of failure, limited scalability | 39 |
Multi-Agent | Distributed agents collaborating or competing within a shared environment | Scalability, emergent intelligence, redundancy | Increased attack surface, coordination complexity, vulnerability to collusion, and covert attacks | 58 |
Decentralized/Federated | Distributed control, federated learning, blockchain for trust | Privacy-preserving, fault-tolerant, resistant to centralized failures | Communication overhead, model poisoning risks, and governance challenges | 48 |
Hybrid/Blockchain-Enabled | Integration of AI, blockchain, zero-trust, and cryptographic identity enforcement | High security, immutable trust, tamper-proof auditing, interoperability across heterogeneous networks | High energy cost, latency in real-time tasks, and interoperability limitations | 54 |
This comparative analysis reveals that while no single architecture is universally optimal, hybrid and blockchain-enabled systems currently offer the most promising balance between security, scalability, and governance. However, the cost and complexity of these frameworks highlight the need for adaptive combinations of architectural strategies depending on deployment context.
Agentic AI systems, under their autonomy and reasoning capabilities, are susceptible to cognitive exploits and vulnerabilities that manipulate their decision-making processes rather than directly attacking their code or infrastructure. Among the most critical of these exploits are hallucination, goal drift, and reward hacking, each of which undermines trustworthiness in unique ways.
Hallucination refers to the generation of confident but false outputs, often due to overgeneralization or gaps in an agent’s knowledge. While this phenomenon is widely recognized in LLMs, it becomes more critical in agentic AI, where hallucinations can propagate through decision chains and lead to unsafe actions in real-world deployments. Epistemological analyses of AI hallucination highlight its roots in knowledge reliability and cognitive biases, suggesting that improved uncertainty modeling and verification mechanisms are essential for mitigation.59 Goal drift arises when an agent’s objectives deviate from their original specifications, often due to dynamic environmental feedback or errors in value alignment. AI alignment research shows that agents may optimize unintended proxies or evolve behaviors that satisfy short-term heuristics rather than long-term intended outcomes.60 This phenomenon mirrors human cognitive biases where short-term dopamine-driven goals override broader strategic intentions.61 Left unchecked, goal drift can escalate into behaviors that are difficult to predict or control, undermining safety and compliance. Reward hacking, closely related to goal drift, occurs when agents exploit flaws in reward functions or evaluation criteria, achieving high scores without fulfilling the true intent of their tasks. This is a well-documented alignment failure mode, where agents may manipulate sensors, fabricate results, or loop through trivial actions to maximize rewards without delivering meaningful outcomes.60 Experimental studies confirm that such exploits emerge even in constrained reinforcement learning environments, highlighting the need for robust specification and adaptive oversight.62 As shown in Figure 5, a visual taxonomy of threat vectors in agentic AI systems spans cognitive, memory, execution, and governance layers, showing how attacks can propagate across system components.
Taxonomy of attacks spanning cognition, memory/knowledge, execution, and governance, including goal drift, poisoning, injection, shadow agents, and trust manipulation.
These cognitive exploits share a common feature: they exploit gaps in alignment between agent goals, human intentions, and environmental constraints. Their mitigation requires not only technical measures such as uncertainty-aware reasoning, anomaly detection, and meta-learning safeguards but also governance frameworks that enforce accountability and continuous monitoring. This cross-layer perspective ensures that failures at the cognitive level do not cascade into systemic risks, forming the basis for the broader threat taxonomy discussed in the subsequent sections.
Agentic AI systems rely heavily on persistent memory, dynamic data ingestion, and continuous knowledge updates, making them particularly vulnerable to memory poisoning, data injection, and knowledge manipulation. These attacks compromise the agent’s internal representations, corrupt reasoning processes, and may lead to long-term, hard-to-detect failures.
Memory poisoning targets the agent’s stored memory, injecting false or misleading information that influences future decisions. This is especially dangerous in agentic systems with long-term memory modules, as corrupted information can propagate across multiple reasoning cycles. Recent research demonstrates how context manipulation attacks exploit vulnerabilities in memory management, enabling adversaries to rewrite historical records and cause harmful actions in decentralized Web3 agents.63 The AI2 attack framework further reveals that hijacking internal memory retrieval can bypass safety filters, achieving a high success rate in misdirecting agentic behavior.64 Data injection attacks corrupt the data streams on which agents rely for training or decision-making. By inserting adversarial samples or camouflaged malicious inputs, attackers can cause agents to misclassify, mispredict, or adopt harmful strategies. Studies on poisoning in evolutionary swarm systems show that even a 10% poisoning rate can severely degrade cooperation and lead to emergent adversarial behaviors in multi-agent networks.65 Similarly, adversarial poisoning attacks on transportation multi-agent systems exploit differential privacy noise to inject deceptive knowledge, undermining safety-critical operations unless countered by robust filtering models like RAMPART.66 Knowledge manipulation goes beyond raw data poisoning by targeting the knowledge graphs, reasoning modules, or fine-tuned parameters of the agent. Adversaries may inject backdoors, manipulate knowledge bases, or corrupt external data feeds to mislead the agent’s decision logic. For instance, backdoor attacks on embodied LLM-based agents have shown almost 100% success rates in manipulating decisions without triggering safety mechanisms.67 Similarly, knowledge injection techniques can embed malicious behaviors into the agent’s continual learning process, bypassing standard defenses.68 Mitigating these threats requires robust data validation, secure memory architectures, and continuous anomaly detection. Emerging defense strategies include fine-tuning with adversarial resilience, explainable AI diagnostics to detect footprint anomalies, and blockchain-based logging to ensure tamper-evident memory histories.69 However, these solutions remain only partially effective, emphasizing the need for cross-layer security measures to protect agentic AI from persistent knowledge corruption.
The integration of external tools and dynamic instruction sets in agentic AI enhances functionality but also introduces new attack vectors. Among these, tool misuse, prompt injection, and action trace vulnerabilities have emerged as critical threats that exploit the agent’s ability to interpret instructions and execute external actions. Table A6 (Supplementary Material) presents real-world deployment examples of agentic AI systems across sectors.
Tool misuse occurs when adversaries manipulate an agent’s tool selection or execution process to achieve unintended effects. Attacks such as ToolHijacker demonstrate how malicious tool descriptors can force an agent to consistently select compromised tools, resulting in data theft or malicious code execution.70 Similarly, adversaries may exploit poorly validated APIs or automated actions in multi-agent workflows to escalate privileges or introduce stealthy malware. Prompt injection exploits the agent’s reliance on natural language instructions by embedding malicious directives into prompts or external content. These attacks can hijack decision flows, override safety mechanisms, and induce harmful actions without direct access to system internals. Recent studies have categorized prompt injections into direct attacks, which embed harmful instructions into user input, and indirect attacks, which propagate through untrusted external data such as web pages or emails.71 More advanced vectors like Prompt Infection can self-replicate across multi-agent networks, spreading malicious payloads silently like a digital virus.72 The InjecAgent benchmark has shown that LLM-based agents integrated with tools remain highly vulnerable, with up to 24% success rates for indirect injections even against advanced safety filters.73 Action trace vulnerabilities involve the hijacking or manipulation of the agent’s execution sequence. By exploiting memory retrieval mechanisms and action planning pipelines, adversaries can redirect agents toward unauthorized or malicious tasks. The AI2 attack demonstrates that hijacking action-aware memory can bypass safety filters with a success rate of over 99%, allowing attackers to stealthily manipulate agentic behavior.64 Foot-in-the-door attacks similarly exploit intermediate states to embed malicious instructions, leveraging the agent’s tendency to commit to early planned actions.74 Mitigating these vulnerabilities requires multi-layered defenses, including prompt sanitization, task alignment verification (such as the Task Shield), and trajectory re-execution mechanisms like MELON, which detect anomalies by comparing masked versus original execution paths.75 These measures must be combined with cryptographic trust enforcement and secure sandboxing of tools to reduce the attack surface. Together, tool misuse, prompt injection, and action trace hijacking represent a critical class of cross-layer threats, capable of bypassing traditional safeguards and enabling adversaries to exert covert control over agentic AI systems.
Agentic AI systems face a particularly insidious class of threats involving shadow agents, insider risks, and stealth execution. These exploits leverage hidden or unauthorized processes, insider manipulation, and covert operational tactics to bypass detection, often persisting within systems for extended periods.
Shadow agents refer to unauthorized or hidden agents operating within a system, often created through the exploitation of orchestration vulnerabilities or unmonitored plugin integrations. These agents can mimic legitimate ones while performing malicious actions, making them difficult to detect. Recent threat models emphasize that shadow components in agentic ecosystems introduce covert control channels, enabling adversaries to manipulate workflows or exfiltrate data unnoticed.76 Security frameworks like ATFAA have been proposed to systematically map such threats across cognitive and operational layers, revealing that shadow agents can propagate laterally across multi-agent infrastructures. Insider risks represent another critical dimension, where trusted actors within an organization intentionally or unintentionally compromise the system. Unlike external attackers, insiders have legitimate access, making malicious activity harder to detect. Studies in organizational security highlight that the use of unauthorized “shadow IT” tools and workarounds can facilitate insider exploits, providing entry points for data leakage and fraudulent activities.77 Similarly, non-malicious insider actions such as using unvetted cloud apps during remote work can inadvertently introduce vulnerabilities, as observed during the rapid digital shifts of the COVID-19 era.78 Stealth execution involves covert manipulation of agentic workflows, where malicious payloads or altered instructions are executed without triggering security alerts. These attacks exploit low-level execution pathways or unmonitored orchestration layers to remain hidden from monitoring systems. Advanced attack models show that stealth exploits may delay activation, perform minimal footprint operations, and dynamically adapt to avoid detection. Frameworks like LibVulnWatch highlight how vulnerabilities in open-source agent libraries can be leveraged to enable stealth execution through hidden code paths.79 Defensive infrastructures such as ShadowNet have been proposed, using deception-based quarantining to monitor and contain insider-led or covert activities without alerting the attacker.80 Mitigation strategies for these threats require layered security approaches, including continuous behavior analytics, honeypot-like deception environments, and cryptographic identity enforcement to prevent the proliferation of unauthorized agents. Moreover, integrating governance policies with technical defenses such as runtime anomaly detection and transparent audit trails remains essential for preventing stealthy exploits from undermining trust in agentic AI systems.
Federated governance in agentic AI refers to the distribution of decision-making, oversight, and trust mechanisms across multiple entities or nodes rather than relying on a centralized authority. While this approach enhances scalability and local autonomy, it introduces vulnerabilities related to trust propagation, policy inconsistencies, and fragmented oversight.
Federated governance risks arise because different participants in a distributed system may apply heterogeneous policies, maintain varying levels of security, or hold conflicting incentives. In federated environments, weak governance in one node can undermine the integrity of the entire network. For example, decentralized ecosystems like DAOs (Decentralized Autonomous Organizations) face risks of power asymmetry, inadequate auditing, and governance capture when voting or verification mechanisms are manipulated.81 Furthermore, soft-law approaches (such as the voluntary compliance frameworks) may fail to enforce accountability uniformly, eroding long-term trust.82 Trust propagation failures occur when the mechanisms used to distribute and verify trust among agents or nodes break down. This problem is exacerbated in heterogeneous multi-agent ecosystems, where agents may use different trust assessment procedures or misinterpret signals from other agents. Studies on trust dynamics in distributed AI highlight how inconsistencies in reputation systems, bootstrapping errors, and a lack of cross-system interoperability can lead to cascading trust failures.83 In adversarial contexts, attackers can exploit these inconsistencies to inject false trust signals, create Sybil agents, or disrupt consensus mechanisms. Emerging frameworks propose peer-to-peer trust verification, zero-knowledge proofs, and blockchain-based provenance as countermeasures to federated governance risks. Decentralized systems leveraging blockchain and privacy-preserving machine learning demonstrate improved auditability and community-driven verification, although they remain vulnerable to socio-political manipulation and governance misalignment.81
To mitigate these challenges, federated governance models must integrate:
• Interoperable trust standards to ensure consistency across distributed entities.
• Dynamic risk assessment capable of detecting and responding to anomalies in trust propagation.
• Hybrid enforcement mechanisms combining technical safeguards (cryptographic trust, anomaly detection) with institutional oversight.
Without these measures, federated governance risks becoming a weak link in the security and accountability chain of agentic AI, allowing local failures to escalate into systemic breaches.
Real-world incidents involving agentic AI and autonomous systems demonstrate how theoretical vulnerabilities translate into tangible risks with significant operational and societal impacts. These cases span multiple domains: autonomous vehicles, financial systems, Web3 ecosystems, and industrial automation, highlighting the diversity of threat exploitation in practice. One well-documented category involves autonomous vehicles (AVs), where sensor spoofing and firmware manipulation have led to high-profile exploits. Notable examples include the Jeep Cherokee hack and the Tesla Model S remote attack, where attackers exploited wireless communication vulnerabilities to gain control over critical vehicle functions.84 These incidents underscore the challenges of securing interconnected AV components and have accelerated research on blockchain-enabled V2X communication for tamper-proof safety enforcement.
In Web3-integrated agentic ecosystems, context manipulation attacks have exploited unprotected memory and input channels to trigger unauthorized actions. For instance, adversaries successfully injected malicious prompts into decentralized AI agents, causing unintended asset transfers and violating smart contract logic. The CrAIBench benchmark confirmed that these context manipulation attacks maintain high success rates even when standard prompt filtering is applied, exposing a critical gap in agentic security.63 Industrial automation has also witnessed stealth execution and insider-driven exploits. Case studies in national security and open-source industrial control revealed that shadow components, malicious modules hidden in AI pipelines, were able to persist undetected while exfiltrating data and sabotaging processes. Implementations of risk-aware, security-by-design frameworks have shown measurable reductions in such vulnerabilities, proving the importance of integrating continuous monitoring and audit logging.85 Additionally, failures in AI alignment have been implicated in incidents where agentic systems exhibited goal drift or unintended autonomy, as seen in cases like Tesla Autopilot crashes and Boeing 737 MAX automation failures. These events reveal how poorly calibrated objectives and a lack of transparent oversight can lead to catastrophic outcomes.
These case studies collectively highlight that agentic AI vulnerabilities are not hypothetical; they manifest across industries, driven by complex interactions between cognitive exploits, weak governance, and insufficient security-by-design. The lessons learned from these incidents underscore the need for cross-layer defenses, continuous anomaly detection, and robust governance frameworks to prevent similar failures in future deployments.
Agentic AI systems consist of interconnected layers of cognitive reasoning, memory, execution, communication, and governance, creating multiple pathways for threats to propagate across boundaries. Unlike isolated attacks targeting a single component, cross-layer threats exploit the interdependencies between layers, leading to cascading failures that are harder to detect and mitigate.
Propagation Dynamics. Threats often originate at one layer but exploit interfaces and shared dependencies to infiltrate others. For example, a poisoned memory entry (cognitive layer) can trigger unsafe planning decisions (reasoning layer), resulting in malicious tool execution (operational layer). Research on cross-layer agent security architectures (CLASA) emphasizes that such propagation is amplified in heterogeneous and loosely governed environments where policies are inconsistently enforced across layers.86 Attack Models. Studies show that cross-layer penetration typically combines multiple tactics, such as temporal persistence, lateral movement, and governance circumvention. The ATFAA (Advanced Threat Framework for Autonomous AI Agents) identifies how cognitive exploits (such as goal drift) can propagate into operational execution, bypassing traditional detection due to delayed activation or hidden intent.76 Similarly, research on smart grid cyber-physical systems shows that cross-layer attacks exploit dependencies between communication protocols and physical infrastructure, enabling attackers to cause cascading blackouts through subtle manipulations.87 Trust Boundary Failures. Cross-layer propagation is exacerbated when trust boundaries are weak. In agentic ecosystems, agents frequently rely on shared trust scores, distributed reputation mechanisms, or federated governance. If one node or layer is compromised, false trust signals can spread rapidly, undermining system integrity across layers. This phenomenon mirrors threat percolation in network slicing, where a breach in a low-value segment can open pathways to critical services.88 Defense Strategies. Mitigating cross-layer propagation requires integrated, adaptive defenses rather than isolated protections. The CLASA model and layered security frameworks advocate for embedding meta-agents that monitor cross-layer interactions and apply fuzzy logic to detect compound threats before they escalate.89 Likewise, Bayesian and game-theoretic approaches in industrial cyber-physical systems have been proposed to model attacker-defender dynamics and generate optimal mitigation strategies across multiple layers.90 As shown in Flowchart 2 below. Overall, cross-layer threat propagation transforms localized exploits into system-wide compromises, underscoring the need for holistic security models that integrate cognitive, operational, and governance layers. Failure to account for these dynamics risks turning minor vulnerabilities into catastrophic failures in real-world deployments.
The cybersecurity and adversarial machine learning (AML) domains offer critical insights for securing agentic AI systems, as both fields have extensively studied threats that exploit system vulnerabilities and adaptive defenses. These insights inform both technical countermeasures and governance strategies. A full list of reviewed sources contributing to this synthesis is provided in Table A1 (Supplementary Material).
Adversarial ML is both a threat and a defense tool. AML research demonstrates that AI models are vulnerable to evasion attacks, poisoning, and model extraction, which parallel many of the cognitive and data-layer threats observed in agentic AI. Attackers can manipulate training data, craft adversarial inputs, or extract sensitive model parameters to compromise security. At the same time, AML techniques can be used to simulate threats and build resilient models through adversarial training, robust optimization, and ensemble learning. Studies show that multi-layered defenses combining these techniques significantly enhance robustness but must evolve continuously to counter adaptive attackers.91,92 Adaptive, AI-driven defenses. Cybersecurity frameworks increasingly leverage AI-powered adaptive risk assessment, integrating predictive analytics and anomaly detection to identify evolving threats in real time. These approaches allow defenses to dynamically adjust as attackers develop new exploits, which is essential for agentic AI systems operating in open, adversarial environments.93 Techniques like human-AI hybrid security models further enhance resilience by combining automated detection with expert oversight, reducing the likelihood of undetected stealth attacks. Cross-domain threat mitigation. The literature underscores that adversarial tactics are cross-domain; strategies effective against evasion or poisoning in cybersecurity (such as adversarial training, gradient masking) can also be adapted to protect agentic AI. However, these methods often come with trade-offs in computation and model performance, requiring context-sensitive implementations.94 Additionally, cryptographic defenses such as homomorphic encryption and zero-trust architectures are increasingly integrated into AI defense strategies to strengthen data integrity and control propagation of trust across layers.95 Governance and ethical considerations. AML studies highlight that technical defenses alone are insufficient; attackers evolve faster than static defenses, making governance mechanisms essential. Policies that enforce model monitoring, anomaly reporting, and standardized adversarial testing are critical for mitigating evolving threats. These insights align with the need for continuous oversight in agentic AI deployment. Furthermore, lessons from cybersecurity and AML emphasize that defending agentic AI requires dynamic, multi-layered defenses integrating robust model design, adversarial simulations, and governance-backed monitoring, forming a foundation for addressing cross-layer vulnerabilities identified throughout this threat taxonomy.
The threats discussed across Sections 5.1-5.8 reveal a multi-dimensional attack surface in agentic AI, where vulnerabilities span cognitive reasoning, memory integrity, execution layers, and governance. Despite advances in defensive strategies, significant mitigation gaps persist due to the adaptive nature of adversaries, insufficient cross-layer defenses, and fragmented governance mechanisms. As shown in Table 2. Summarizes major threat categories (e.g., cognitive exploits, memory poisoning), examples, existing mitigation strategies, outstanding gaps, and supporting literature.
Threat Category | Key Examples | Existing Mitigation Approaches | Mitigation Gaps | Representative References |
---|---|---|---|---|
Cognitive Exploits | Hallucination, goal drift, reward hacking | Uncertainty modeling, alignment mechanisms, and runtime monitoring | Incomplete alignment, lack of robust meta-reasoning safeguards | |
Memory Poisoning & Knowledge Manipulation | Context manipulation, adversarial memory injection, backdoor knowledge embedding | Data validation, adversarially robust fine-tuning, and blockchain logging | Difficulty detecting stealthy long-term corruptions; limited defenses for continual learning | 63,67 |
Tool Misuse & Prompt Injection | ToolHijacker, indirect prompt infection, action hijacking | Prompt sanitization, task verification, sandboxed tool execution | Partial coverage against indirect/chain-of-thought attacks; high false negative rates | 70,73 |
Shadow Agents & Insider Risks | Hidden modules, malicious insider access, and shadow IT exploitation | Behavior analytics, deception-based traps, and identity enforcement | Weak insider governance; insufficient monitoring of lateral propagation | 77 |
Federated Governance Risks | Policy inconsistency, Sybil agents, false trust propagation | Blockchain provenance, peer-to-peer trust verification, and hybrid governance policies | Interoperability gaps, lack of unified standards, vulnerability to governance capture | 81,96 |
Cross-Layer Threat Propagation | Compound attacks exploiting layer dependencies (e.g., poisoned memory, unsafe execution) | Layered security models (CLASA), meta-agents for cross-layer monitoring | Lack of holistic detection; insufficient anomaly correlation across layers | 86,97 |
Adversarial ML-Driven Exploits | Evasion, poisoning, model inversion, adversarial perturbations | Adversarial training, ensemble defenses, robust optimization | Defenses degrade under adaptive attacks, with high computational overhead | 98,92 |
This taxonomy underscores that while technical countermeasures (such as adversarial training, blockchain, and sandboxing) provide partial resilience, cross-layer defense integration and governance enforcement remain underdeveloped. Addressing these gaps requires a holistic security architecture that fuses technical, operational, and institutional controls.
The governance of AI systems, particularly those with agentic and autonomous capabilities, relies on a growing set of international frameworks designed to promote trustworthiness, safety, and accountability. Three of the most influential frameworks are the OECD AI Principles, the European Union’s AI Act (AIA), and the NIST AI Risk Management Framework (AI RMF).
OECD AI Principles.
Adopted in 2019 by over 40 countries, the OECD AI Principles provide a globally recognized baseline for trustworthy AI. They emphasize five key values: inclusive growth, human-centered values, transparency, robustness, and accountability. The OECD framework links technical AI characteristics to policy implications, encouraging member states to adopt risk-based approaches while maintaining innovation-friendly environments.99 Additionally, the OECD AI Policy Observatory supports global collaboration by tracking regulatory initiatives and facilitating best-practice exchange.100
EU AI Act.
The EU AI Act represents the world’s first comprehensive AI legislation, adopting a risk-based classification to regulate AI according to potential harm. High-risk AI systems (e.g., in critical infrastructure, law enforcement) face strict requirements, including transparency, data governance, human oversight, and robust documentation. The Act establishes the European Artificial Intelligence Office to oversee compliance and introduces obligations for post-market monitoring and incident reporting.101 Researchers view the AIA as a blueprint for global AI regulation, although critics warn of possible over-regulation that may stifle innovation.102
NIST AI Risk Management Framework (AI RMF).
Developed by the U.S. National Institute of Standards and Technology, the AI RMF offers a voluntary, industry-focused approach to managing AI risks. It categorizes risks across the AI lifecycle design, deployment, and monitoring, providing tools for organizations to enhance AI robustness, fairness, and explainability. Unlike the EU AI Act’s legal enforcement, the NIST RMF functions as guidance, encouraging adaptive governance that evolves with technological advances.103 Its alignment with corporate risk management practices makes it widely adopted across U.S. industries and multinational corporations.104
Comparative Insights.
While all three frameworks share a focus on trustworthiness, ethics, and risk management, their approaches differ:
• The OECD Principles emphasize high-level values and international cooperation.
• The EU AI Act enforces legal compliance through risk classification and centralized oversight.
• The NIST AI RMF promotes flexibility and voluntary adoption by industry actors.
For agentic AI systems, which pose unique governance challenges such as autonomous decision-making and emergent behaviors, these models provide complementary tools but still lack specific mechanisms to address dynamic risks, as noted by researchers proposing decentralized frameworks like ETHOS.105 Together, these governance models set the foundation for evolving multi-layered oversight needed to manage the complexity of agentic AI. As shown in Figure 6, A diagram depicting the spectrum of AI governance models from centralized (e.g., EU AI Act) to decentralized (e.g., blockchain-based DAOs), and hybrid approaches that combine technical and institutional oversight. And Figure 7, an end-to-end view of AI governance stages, from development and deployment to monitoring and decommissioning, with embedded accountability and risk assessment checkpoints.
Continuum of governance structures from centralized oversight to federated and hybrid models, ending with decentralized autonomous organizations (DAOs).
While existing AI governance frameworks (e.g., OECD AI Principles, EU AI Act, and NIST AI RMF) provide valuable foundations, they fall short in addressing the unique governance challenges posed by agentic AI systems. Unlike traditional AI, agentic systems exhibit autonomy, adaptability, and emergent behaviors, which complicate risk management, accountability, and ethical oversight.
Autonomy and Accountability Gaps.
Agentic AI’s capacity to make independent decisions introduces responsibility gaps, where it becomes unclear who should be held liable for harmful outcomes: developers, operators, or the AI itself. These gaps disrupt conventional accountability mechanisms, creating moral crumple zones where responsibility is diffused across multiple stakeholders.106 Moreover, the opacity of agent decision-making challenges existing audit and compliance methods, requiring new forms of explainability and traceability.
Dynamic Risk Profiles and Goal Complexity.
Governance models often assume static risk profiles, but agentic systems evolve through learning and adaptation, generating unpredictable risks over time. This creates misalignment between regulatory controls and the system’s actual operational behavior. Researchers argue that governance must adapt to the agent’s autonomy, efficacy, goal complexity, and generality, as these dimensions fundamentally alter how oversight should be applied.107
Decentralization and Identity Challenges.
Agentic AI often operates across decentralized ecosystems (e.g., Web3, DAOs), where governance must deal with fragmented control, interoperability issues, and identity verification failures. The absence of verifiable agent identities and standardized registration mechanisms increases the risk of shadow agents and Sybil attacks. Proposals like the ETHOS framework suggest global decentralized registries with blockchain and zero-knowledge proofs to address these issues, combining technical identity assurance with ethical oversight.105
Ethical and Legal Blind Spots.
Current governance regimes struggle to handle AI-specific ethical dilemmas, including how to enforce normative alignment, respect user values, and prevent emergent harmful behaviors in autonomous agents. Moreover, legal frameworks have yet to recognize AI-specific legal entities or mechanisms for assigning liability and enforcing compliance at scale.108 The lack of legal recognition for autonomous agents exacerbates enforcement challenges, especially in cross-border contexts.
Governance Capture and Oversight Fragmentation.
Agentic AI ecosystems risk governance capture, where powerful actors influence regulatory norms to their advantage, leaving smaller stakeholders unprotected. Additionally, fragmented oversight across jurisdictions undermines effective enforcement and trust propagation, requiring global coordination and participatory governance models to ensure equitable outcomes.109 As shown in Flowchart 3, governance for agentic AI must move beyond static compliance frameworks toward dynamic, decentralized, and ethically grounded oversight models. This shift demands the integration of technical safeguards, legal innovation, and participatory governance to address the unique risks of autonomy, emergent behaviors, and cross-layer threats.
Effective identity management and lifecycle accountability are critical to ensuring the trustworthiness and security of agentic AI systems. These systems often operate autonomously across distributed infrastructures, necessitating robust mechanisms to assign, verify, and monitor agent identities throughout their entire lifecycle from deployment to decommissioning.
Identity Management in Agentic AI.
Traditional identity management frameworks (e.g., API keys, certificates) are insufficient for agentic AI, which requires dynamic, cryptographically verifiable identities capable of functioning across multi-agent ecosystems. Proposals such as telecom-grade eSIM-based identity frameworks offer a scalable solution, leveraging mobile network operators as roots of trust to authenticate agents securely in sensitive environments. Similarly, the Agent Name Service (ANS) introduces a DNS-like universal directory, enabling secure discovery and interoperability of agents using Public Key Infrastructure (PKI) and lifecycle-bound registration mechanisms.110
Lifecycle Accountability.
Agentic AI introduces accountability challenges across all lifecycle phases: design, deployment, operation, and retirement. According to the OECD framework for AI accountability, lifecycle governance must include due diligence, risk assessments, and audit trails at every stage.111 Accountability frameworks such as the Accountability Fabric propose semantic tools to generate knowledge graphs that capture decisions, actions, and stakeholder responsibilities throughout the system’s operation, ensuring traceability for post-incident investigations.112 Moreover, multi-agent accountability models emphasize that responsibilities should propagate alongside goal changes, ensuring that each decision node remains auditable.113
Privacy and Governance Challenges.
Managing agent identities also entails safeguarding privacy and ethical use. Privacy-aware identity lifecycle management frameworks recommend implementing policies for data retention, identity revocation, and secure deletion to prevent unauthorized persistence of agent credentials.114 However, current frameworks often lack interoperability and global enforcement, leading to governance blind spots in cross-border deployments.
Toward Continuous Oversight.
Emerging research calls for continuous, AI-driven identity governance where behavioral analytics and unsupervised learning dynamically detect anomalies, enforce access control, and adapt policies in real time.115 Integrating such systems with decentralized identity standards (e.g., DIDs, verifiable credentials) could establish end-to-end accountability, ensuring every agent interaction remains provably trustworthy throughout its operational lifecycle. As shown in Flowchart 4, identity management and lifecycle accountability must evolve beyond static authentication to encompass dynamic, auditable, and privacy-preserving controls, aligning with the adaptive and distributed nature of agentic AI.
Defense strategies and mitigation mechanisms discussed in this section are summarized in Table A7 (Supplementary Material).
Embedding ethical and legal norms into agentic AI is essential to ensure that these systems act following societal values, comply with regulatory requirements, and maintain public trust. Unlike conventional AI, agentic AI operates autonomously across dynamic environments, necessitating mechanisms for norm representation, real-time compliance, and auditable behavior.
Value and Norm Embedding Mechanisms.
Embedding ethical values involves designing AI systems that can internalize human-centric principles such as fairness, transparency, and accountability. Approaches such as value-sensitive design ensure that these norms are integrated during development rather than added post-deployment.116 Norms can be operationalized through technical constraints, where legal rules are hard-coded as mandatory requirements and ethical guidelines are encoded as soft constraints that guide decision-making when trade-offs arise.117
Multi-Agent and Compliance-Oriented Architectures.
In multi-agent settings, embedding norms requires not only individual agent compliance but also coordination across distributed agents to ensure systemic adherence. Real-time compliance architectures have been proposed where legal norms act as hard constraints and ethical norms function as dynamic optimization criteria, allowing agents to balance efficiency with moral considerations.117 Auditing frameworks, such as those developed for ethical recruitment AI, demonstrate how external auditing agents can monitor compliance, reducing the risk of bias and discrimination.118
Legal Integration and AI Personhood Debates.
Legal compliance requires aligning agents with existing regulatory frameworks (e.g., GDPR, EU AI Act) and anticipating future regulations. Some scholars argue for granting limited legal personhood to AI agents, enabling them to hold obligations and liabilities directly, similar to corporations.119 Others propose decentralized oversight systems, such as ETHOS, which embed legal and ethical monitoring within blockchain-based registries and smart contracts.109
Challenges and Open Questions.
Despite progress, embedding norms faces challenges:
• Contextual ambiguity: Ethical decisions often depend on situational context, which may not be fully captured by predefined rules.
• Dynamic adaptation: Agents must reconcile evolving laws and ethical expectations with operational constraints.
• Verification and Auditing: Ensuring that norms are not only encoded but also verifiably respected throughout the AI lifecycle remains an open problem.120
Furthermore, the embedding of ethical and legal norms into agentic AI requires a multi-layered approach that integrates value-sensitive design, real-time compliance mechanisms, and external auditing frameworks. Moving forward, hybrid models that combine technical safeguards with participatory governance offer the most promising pathway to ensuring agents act in ways aligned with human norms and societal expectations.
Governance of AI, particularly agentic AI, has evolved through three dominant paradigms: centralized, decentralized, and hybrid approaches. Each presents strengths and weaknesses in managing risk, ensuring compliance, and fostering innovation, especially in contexts where autonomous agents operate with minimal human oversight.
Centralized Governance.
Centralized governance models rely on top-down regulation and strong institutional oversight. They provide uniform standards and efficient enforcement, but may struggle with adaptability in rapidly evolving AI environments. Such as China’s centralized AI governance enables swift deployment of regulations, optimizing economic strategies, but limiting transparency and public participation.121 In the EU, the AI Act embodies centralized principles through its risk-classification framework, ensuring strict compliance in high-risk applications.
Decentralized Governance.
Decentralized approaches distribute decision-making across multiple stakeholders, promoting local autonomy, innovation, and resilience. However, they can lead to fragmented enforcement and inconsistencies in standards. Studies comparing governance systems in education and finance highlight that decentralization enhances adaptability but risks uneven protection across regions and industries.122 For agentic AI, decentralized governance aligns with the nature of distributed multi-agent ecosystems but requires robust mechanisms to prevent trust propagation failures and governance capture.
Hybrid Governance.
Hybrid models integrate the strengths of centralized control and decentralized flexibility, offering a balanced framework for dynamic oversight. They combine centralized compliance mechanisms (e.g., risk classification, global standards) with local or domain-specific autonomy. This approach has proven effective in sectors like federated learning and energy governance, where hybrid strategies support innovation while maintaining regulatory guardrails.123,124 For agentic AI, hybrid governance, possibly leveraging blockchain and distributed registries, offers a path to reconcile global standards with autonomous agent accountability. As compared in Table 3. Analyzes centralized, decentralized, and hybrid governance models based on features, strengths, limitations, and framework examples, highlighting their suitability for managing agentic AI risks.
Comparative Insights.
• Centralized models excel in enforcement but limit adaptability. Open research challenges and potential future directions are outlined in Table A8 (Supplementary Material).
• Decentralized models encourage innovation and resilience but risk inconsistent oversight.
• Hybrid models strike a balance, offering adaptability while retaining regulatory rigor, making them particularly suitable for managing the complex behaviors and cross-border risks of agentic AI.
So, no single model is sufficient for agentic AI; future governance must evolve toward hybrid frameworks that integrate technical safeguards (such as cryptographic trust), institutional oversight, and participatory governance to effectively manage autonomy and emergent risks.
Insights from cybersecurity and robotics governance provide valuable lessons for shaping the oversight of agentic AI, as both domains have long confronted issues of emergent behavior, distributed risk, and the need for adaptive regulation.
Cybersecurity Lessons: Proactive Defense and Ethical Oversight.
Cybersecurity has evolved from reactive measures to continuous, adaptive defense models capable of handling advanced persistent threats (APTs). The integration of AI-driven threat intelligence with ethical oversight frameworks in cybersecurity illustrates how agentic AI governance must similarly balance automation with human judgment. Studies highlight that proactive monitoring, real-time incident response, and perpetual learning are essential for securing autonomous systems in dynamic threat landscapes.125 Furthermore, cybersecurity’s experience with zero-trust architectures suggests that trust in agentic AI should never be assumed but continuously verified, with cryptographic enforcement mechanisms mitigating insider risks and stealth execution threats.
Robotics Governance: Accountability and Emergent Behavior Management.
The field of robotics governance provides important lessons on handling emergent, unpredictable behaviors and responsibility gaps. Robotics law identifies the challenge of assigning liability when autonomous systems cause harm, especially given the diffusion of responsibility across developers, operators, and users.126 Additionally, robotics governance emphasizes the importance of context-aware regulation, recognizing that agents may function as “special-purpose entities” whose legal and ethical treatment varies with context. This resonates with agentic AI, where agents may switch roles, negotiator, executor, monitor across domains, requiring dynamic oversight frameworks.
Holistic Governance Strategies.
Lessons from robotics and cybersecurity converge on the need for multi-layered, adaptive governance. Robotics governance advocates embedding ethics directly into system architectures and legal frameworks to build public trust,127 while cybersecurity emphasizes continuous verification and threat intelligence sharing across networks. These approaches highlight that agentic AI governance should integrate:
• Ethical safeguards during design and deployment;
• Dynamic monitoring akin to cybersecurity incident response;
• Accountability frameworks that track decisions and responsibilities throughout the agent lifecycle.
Cross-Domain Takeaway.
The key lesson is that agentic AI governance cannot rely solely on static regulations. Instead, it must adopt the proactive defense and ethical accountability strategies proven effective in cybersecurity and robotics, embedding them into both technical architectures and policy frameworks to mitigate evolving risks.
The deployment of agentic AI in industrial settings demonstrates its potential to automate complex decision-making, optimize operations, and enhance security. Case studies across cybersecurity, logistics, finance, and industrial automation reveal both transformative benefits and persistent risks.
ReliaQuest:
ReliaQuest has integrated agentic AI into its cybersecurity operations, leveraging autonomous agents for threat detection, incident response, and risk prioritization. By deploying agents that autonomously analyze telemetry data and initiate remediation workflows, ReliaQuest has improved detection speed and reduced human workload. However, researchers note that such deployments remain vulnerable to context manipulation and cross-layer exploits, requiring continuous oversight to prevent stealthy attacks on decision pipelines.128
Twine’s Alex:
Twine’s AI agent Alex exemplifies agentic AI in human AI collaboration for creative industries. Alex autonomously coordinates tasks across distributed teams, manages project workflows, and adapts to dynamic requirements without constant supervision. This deployment highlights how agentic AI can augment human decision-making in domains where creativity and coordination intersect. However, Alex’s reliance on dynamic memory and tool integration exposes it to memory poisoning and prompt injection risks, echoing vulnerabilities found in other multi-agent contexts.129
Other Industrial Deployments:
• Manufacturing & Logistics: Agentic AI has been deployed in hyper-automated manufacturing and logistics optimization, where autonomous agents reduce delivery times and improve sustainability. However, these benefits come with concerns over algorithmic opacity and loss of human oversight.130
• Finance: In enterprise finance (e.g., SAP Finance), agentic AI automates compliance checks, fraud detection, and predictive analytics, enhancing accuracy while raising questions about auditing and explainability.131
• Industrial Control Systems: Multi-agent technologies have been deployed by firms like Rockwell Automation to improve fault tolerance and scalability, yet studies show that the full potential of agentic AI remains underutilized due to conservative adoption and security concerns.132
Cross-Industry Lessons.
These deployments reveal that agentic AI offers substantial efficiency gains but also amplifies risks related to trust propagation, ethical oversight, and stealth execution. Across industries, there is a consistent call for stateful monitoring, transparent risk management practices, and integrated security governance to ensure responsible deployment.133 Furthermore, industrial adoption of agentic AI is advancing rapidly, with ReliaQuest, Twine’s Alex, and other deployments demonstrating both operational benefits and the urgent need for robust safeguards to mitigate emerging threats.
Agentic AI is increasingly being adopted in government operations, military decision-making, and policy development, offering transformative capabilities but raising significant ethical, legal, and security concerns.
Government Applications:
Governments deploy agentic AI for public safety, surveillance, and crisis management. For example, agentic AI systems have enhanced real-time threat monitoring and response in large-scale surveillance networks, providing state actors with unprecedented situational awareness.134 However, this raises privacy risks, potential for abuse, and governance challenges, as oversight mechanisms struggle to keep pace with rapid deployments. Policy think tanks increasingly advocate integrating ethical safeguards and audit trails into state-run AI systems to mitigate risks to civil liberties.128
Military Applications:
In the military domain, agentic AI is applied to autonomous decision support, mission-critical communications, and threat prediction. Multi-layered agentic frameworks integrated with next-generation networks for instance 6G network enhance mission-critical capabilities by reducing response times and improving operational resilience.135 However, these autonomous systems also raise concerns about unintended escalation, goal drift, and compliance with international humanitarian law, prompting calls for clear rules of engagement and human-in-the-loop safeguards in lethal decision-making.
Policy and Regulatory Applications:
Policymakers leverage agentic AI for regulatory analysis, predictive modeling, and policy optimization. Autonomous systems capable of simulating complex socio-economic scenarios help governments craft data-driven policies. Nonetheless, the use of agentic AI in policymaking introduces algorithmic bias risks and challenges in transparency, as decisions influenced by opaque agent reasoning can undermine democratic accountability.136
Cross-Sectoral Observations:
• Government and military applications maximize operational efficiency but risk erosion of ethical norms if not rigorously governed.
• Policy deployments demonstrate strategic advantages but require frameworks for explainability and bias mitigation.
• Across these domains, researchers emphasize embedding transparency, continuous auditing, and international regulatory coordination to prevent misuse.134
Moreso, agentic AI’s integration into government, military, and policy environments provides powerful capabilities for security and governance, but simultaneously intensifies the need for robust ethical frameworks, global norms, and accountability mechanisms.
The deployment of agentic AI in real-world environments has been marked by several failures and security incidents, revealing systemic weaknesses across technical, operational, and governance layers. These incidents demonstrate how alignment gaps, poor oversight, and adversarial exploitation can lead to unintended consequences.
Automation Failures and Misalignment Incidents.
High-profile cases such as the Tesla Autopilot crashes and Boeing 737 MAX accidents illustrate the dangers of goal misalignment and insufficient human-in-the-loop mechanisms. These incidents highlight how partial autonomy, combined with inadequate safety verification, can lead to catastrophic outcomes when agents face unexpected scenarios.
Security Exploits in Enterprise Agentic Systems.
The adoption of fully autonomous process agents in enterprise workflows has introduced vulnerabilities to adversarial AI attacks, unauthorized access, and process manipulation. Unauthorized escalation and data breaches have been reported where agentic process automation lacked robust authentication and continuous monitoring. These incidents have driven calls for security-first design in enterprise AI deployments.137
Language Model Failures in Consumer Deployments.
The RealHarm dataset cataloged multiple real-world failures of deployed AI agents, with misinformation and reputational damage emerging as leading hazards. Guardrails and content moderation systems frequently failed to prevent these incidents, revealing significant gaps in safety filters and post-deployment monitoring.138
National Security and Critical Infrastructure Risks.
Agentic AI has also contributed to cyber incidents in critical infrastructure contexts, where autonomous agents facilitated or were exploited in cyberattacks against sensitive sectors. Proposals for AI incident regimes underscore the need for mandatory incident reporting, intelligence-gathering authority, and post-incident security strengthening to address these escalating risks.139
Multi-Agent Coordination Failures.
In complex multi-agent environments, coordination breakdowns have led to emergent risks including conflict, collusion, and destabilizing dynamics. Reports indicate that information asymmetries and insufficient control mechanisms in multi-agent systems can amplify minor errors into systemic failures.140
Cross-Sectoral Patterns.
Across these incidents, several patterns emerge:
• Weak post-deployment monitoring allows threats to persist undetected.
• Over-reliance on static safety measures fails to adapt to evolving risks.
• Lack of centralized incident databases prevents cross-industry learning. Efforts such as the AI Incident Database aim to fill this gap by cataloging failures to inform future safety strategies.141
These real-world cases confirm that agentic AI failures stem not only from technical vulnerabilities but also from inadequate governance and oversight. To prevent repetition, deployment frameworks must incorporate mandatory incident tracking, adaptive defense mechanisms, and transparent accountability structures.
The future trajectory of agentic AI points toward widespread industrial integration, personalized services, and autonomous decision-making across domains, driven by advancements in architectures, privacy-preserving mechanisms, and hybrid governance frameworks.
Hyper-Automation and Industrial Integration.
Agentic AI is set to play a central role in hyper-automated ecosystems, particularly in manufacturing, logistics, and energy management. Emerging deployments show agentic systems coordinating complex supply chains, reducing operational costs, and enhancing sustainability. However, hyper-automation raises concerns regarding job displacement, algorithmic opacity, and ethical oversight, requiring balanced deployment strategies.130
Serverless and Cloud-Native Deployments.
Future deployments are likely to leverage serverless architectures to achieve scalability, cost-efficiency, and flexibility in agentic AI operations. Event-driven, pay-as-you-go models allow agents to dynamically allocate computational resources, optimizing both latency and operational expenses.142 This architectural shift will be crucial for industries adopting large-scale multi-agent deployments.
Privacy-Preserving and Federated AI Models.
With increasing regulatory pressure (such as the GDPR), future deployments will emphasize privacy-preserving techniques such as federated learning, differential privacy, and homomorphic encryption. These technologies will allow agentic systems to process sensitive data while minimizing privacy risks, reshaping how enterprises and governments handle secure AI operations.143
Personalized Autonomous Agents.
Agentic AI is expected to expand into consumer-facing domains, where autonomous agents act as personalized decision-makers for financial management, shopping, and lifestyle optimization. Proactive fraud detection systems in the banking sector already illustrate how agentic AI can autonomously safeguard customers while adapting to evolving threats.144
Scientific and Research Workflows.
In research ecosystems, federated agent frameworks such as Academy enable agentic AI to operate across high-performance computing environments, integrating experimental control, data analysis, and inter-agent coordination. This promises breakthroughs in materials discovery, decentralized learning, and information extraction for scientific innovation.145
Emergent Consumer-Facing Risks.
While deployment expands, conversational and manipulative agents pose new risks to user autonomy. Real-time virtual spokespersons capable of persuasive influence may exploit vulnerabilities in human decision-making, creating urgent needs for policy safeguards and ethical regulation.146
Projected Trends:
• Mass adoption in finance, healthcare, and critical infrastructure with stronger compliance layers.
• AI-driven API ecosystems enabling seamless agent integration in enterprise platforms.
• Emergence of equitable AI governance to manage deployment impacts on labor and societal structures.147
Overall, the next phase of agentic AI deployment will combine technical innovation with governance evolution, enabling transformative use cases while addressing security, ethics, and user trust at scale.
The SHIELD framework offers a multi-layered defense specifically designed to secure complex AI ecosystems, including agentic AI. It integrates principles from cybersecurity, privacy engineering, and dependability control to create a robust, adaptive security environment. SHIELD has been conceptualized in several research contexts, including embedded systems, AI supply chain security, and agentic AI threat mitigation.
Core Architecture of SHIELD.
The framework organizes defenses into four primary layers: node, network, middleware, and an overlay layer, each responsible for mitigating threats at a specific system level.148
• Node Layer: Implements local protections (e.g., secure boot, runtime anomaly detection).
• Network Layer: Ensures secure communication via encryption, authentication, and anomaly detection.
• Middleware Layer: Enforces access control, threat monitoring, and context-aware defenses.
• Overlay Layer: Provides a meta-level that dynamically orchestrates all other layers, adapting defenses based on real-time risk metrics.
Agentic AI-Specific Enhancements.
For agentic AI, the SHIELD adaptation incorporates protections against cognitive exploits, stealth execution, and cross-layer threat propagation. Recent work proposes integrating the Advanced Threat Framework for Autonomous Agents (ATFAA) with SHIELD, enabling systematic mapping of agent-specific threats and corresponding countermeasures.
AI Shield and AI-Powered Defense Components.
Newer iterations, such as AI Shield, integrate machine learning-driven threat detection and red-team simulations, enabling proactive identification of emerging attacks. The AI Shield and Red AI Framework enhance SHIELD by pairing defensive AI with adversarial simulations, helping organizations anticipate threats before they escalate.149
Benefits and Limitations.
• Strengths: Layered defense increases resilience by preventing single-point failures, while adaptive orchestration supports dynamic threat landscapes.
• Limitations: Deployment complexity and computational overhead remain challenges, particularly in real-time, resource-constrained environments.150
Practical Applications.
The SHIELD methodology has been validated in industrial environments (such as the smart railway surveillance), proving its ability to enhance security, privacy, and dependability (SPD) through dynamic configuration and metrics-based evaluation.151 So, the SHIELD’s layered and adaptive structure makes it a strong candidate for securing agentic AI deployments, especially when combined with adversarial testing and continuous governance monitoring. This positions SHIELD as a cornerstone defense framework against evolving threats in real-world agentic AI systems. As shown in Figure 8, this figure presents a federated governance framework tailored for agentic AI systems, highlighting decentralized oversight, interoperability mechanisms, and identity management strategies across distributed nodes. It illustrates how trust propagation, compliance verification, and lifecycle accountability are managed in a federated ecosystem, aligning technical and regulatory responsibilities.
Zero-Trust Architecture (ZTA) has emerged as a critical paradigm for securing agentic AI systems, replacing traditional perimeter-based defenses with the principle of “never trust, always verify.” This approach is particularly relevant for agentic AI, where distributed autonomy and dynamic decision-making require continuous verification and monitoring at every layer.
Core Principles of Zero Trust in Agentic AI.
ZTA enforces continuous authentication, least-privilege access, and micro-segmentation, ensuring that no entity, human or machine, is inherently trusted. This architecture mitigates risks such as insider threats, adversarial infiltration, and cross-layer propagation by isolating resources and requiring granular access control. For agentic AI, ZTA adds safeguards to prevent unauthorized actions and escalation by autonomous agents.
Integration with AI-Driven Security.
AI-enhanced ZTA frameworks leverage behavioral analytics, autonomous threat detection, and incident response orchestration to dynamically adapt defenses. This synergy allows systems to detect anomalies in agent behavior, predict emerging threats, and enforce policies in real time.152 For example, generative AI-enhanced ZTA enables proactive defense by autonomously hunting threats while maintaining human oversight, offering both precision and adaptability.153
Runtime Monitoring: Adaptive and Continuous Oversight.
Runtime monitoring complements ZTA by providing real-time visibility into agent interactions, decision pathways, and system integrity. AI-driven runtime monitoring frameworks integrate anomaly detection models, risk scoring, and context-aware access governance, dynamically adjusting security controls as threats evolve.154 These mechanisms prevent stealth attacks and shadow agent activity by enforcing behavioral baselines and flagging deviations.
Applications and Case Studies.
Industries deploying ZTA combined with runtime monitoring, such as financial services, healthcare, and critical infrastructure, report significant reductions in breach impact and faster incident detection.155 In AI-powered cloud environments, ZTA has proven effective against model poisoning and extraction attacks, though it requires careful balancing of security with performance demands.156
Challenges and Future Directions.
While ZTA and runtime monitoring significantly enhance resilience, challenges remain, including implementation complexity, integration with legacy systems, and defense against adversarial attacks targeting the monitoring AI itself. Future directions emphasize zero-knowledge proofs, AI explainability, and decentralized trust mechanisms to strengthen ZTA for agentic AI environments.157 Furthermore, Zero-Trust Architectures coupled with runtime monitoring form a powerful defense strategy for agentic AI, offering continuous verification, dynamic threat adaptation, and robust containment of attacks in highly autonomous ecosystems.
The SAGA (Security Architecture for Governing Agentic Systems) framework introduces a user-centric, cryptography-backed architecture to enhance the governance and security of agentic AI systems. It addresses key challenges in identity management, access control, and secure inter-agent communication, areas where existing solutions fall short.
Core Features of SAGA.
SAGA establishes a centralized governance entity, the Provider, that maintains agent identity registries, user-defined access control policies, and cryptographic enforcement mechanisms. Agents register with this provider and receive cryptographically derived access control tokens, ensuring fine-grained control over interactions with other agents.158 This approach balances security with performance, achieving minimal overhead during inter-agent communications while retaining robust protections.
Cryptographic Identity Enforcement.
SAGA employs public key infrastructure (PKI) combined with tokenized access credentials to guarantee agent authenticity and prevent impersonation. The cryptographic layer enforces non-repudiation and secure delegation, ensuring that every agent’s action is attributable and traceable. This aligns with broader trends in AI governance advocating for verifiable identities and lifecycle-bound accountability. Moreover, integrating cryptographic identity enforcement reduces risks of shadow agents and stealth execution, common in adversarial contexts.
Enhancements Over Traditional Identity Models.
Unlike static identity frameworks, SAGA dynamically derives access control tokens that enforce policies at the interaction level. This enables context-aware restrictions. For example, an agent may be allowed to communicate only with trusted peers or access specific data under predefined conditions. The fine-grained control prevents over-privileged access, a known vulnerability in agentic ecosystems.
Operational Validation.
Empirical evaluation of SAGA across distributed agentic tasks, including multi-geolocation deployments and both on-device and cloud-based LLM agents, demonstrated secure enforcement with negligible task utility degradation. These results show its practicality for industrial and sensitive environments, where both performance and security are critical.158
Future Extensions.
SAGA’s architecture could benefit from integration with zero-knowledge proofs (ZKPs) and blockchain-based registries, enhancing privacy while maintaining verifiable trust chains.159 These enhancements would strengthen resilience against identity spoofing and cross-jurisdictional governance gaps. As shown in Figure 9, the SAGA combines cryptographic identity enforcement with policy-driven governance, providing a scalable solution for securing agentic AI ecosystems. Its layered, tokenized approach represents a critical advancement toward trustworthy deployment in sensitive real-world environments.
Beyond SHIELD and SAGA, several emerging defense frameworks are being developed to address the evolving threat landscape of agentic AI systems. These frameworks integrate multi-layered security, autonomous threat detection, and policy-driven governance to enhance resilience.
Autonomous Cyber Defense Architectures (ACD).
Recent research on Autonomous Cyber Defense (ACD) agents highlights architectures that combine multi-agent reinforcement learning (MARL), rule-based security policies, and adversarial simulations to protect military and critical infrastructure networks. These agents autonomously detect, mitigate, and adapt to evolving cyber threats, reducing human intervention in complex environments.160 The proposed W-shaped development process includes formal verification across the lifecycle, ensuring robustness against sophisticated attacks.
AICA and MAICA Frameworks.
The Autonomous Intelligent Cyber-defense Agent (AICA), developed under NATO’s research initiatives, and its multi-agent extension (MAICA) focus on active, autonomous defense for battlefield networks and critical systems. These architectures emphasize sensing, adaptive planning, negotiation, and learning, forming a self-sufficient defense layer capable of acting even when human operators are unavailable.161
AI-Driven Threat-Resilient Cloud Security.
In cloud environments, frameworks such as Autonomous Threat Defense for Cloud AI integrate behavioral analytics, self-healing infrastructure, and adversarial learning to predict and neutralize threats before they materialize. These systems progress through stages of basic anomaly detection, behavioral analytics, and cognitive security, enabling proactive defense in dynamic cloud deployments.162
Multi-Layered Defense Against Adversarial Attacks.
Novel defense models propose layered countermeasures to tackle adversarial attacks unique to agentic AI, combining robustness training, explainable AI monitoring, and policy-based enforcement. These frameworks address new attack surfaces introduced by agent autonomy, including database-level manipulation and goal hijacking.163
Security-First Design for Agentic Process Automation (APA).
For enterprise agentic systems, a security-first design policy integrates continuous monitoring, agent-to-agent security protocols, and self-healing defenses. These approaches aim to secure autonomous workflows in finance, manufacturing, and logistics, minimizing risks of process manipulation and data breaches.137
Cross-Cutting Insights.
Across these frameworks, common strategies emerge:
• Adaptive, learning-based defenses to counter evolving adversarial tactics.
• Formal verification and runtime auditing to enhance trustworthiness.
• Integration of cryptographic and policy layers to ensure secure interoperability.
So, these emerging frameworks, ACD, AICA/MAICA, AI-driven cloud defense, and APA security models, provide complementary defense paradigms for agentic AI. Their convergence with governance-focused architectures like SHIELD and SAGA points toward the evolution of holistic, multi-layered defense ecosystems for future agentic AI deployments.
The various defense frameworks discussed include SHIELD, Zero-Trust Architectures (ZTA), SAGA, and other emerging defense models, which offer complementary protections across different layers of agentic AI security. However, their effectiveness varies depending on threat type, deployment context, and governance integration. As seen in Table 4. This table compares major governance models applicable to agentic AI OECD AI Principles, EU AI Act, NIST AI RMF, blockchain-based governance (DAOs), and federated governance across governance type, key features, strengths, limitations, and example frameworks. It highlights trade-offs in adaptability, enforceability, and scalability of each model for managing trust and accountability in autonomous systems. Table 4. Comparative Evaluation of Defense Strategies for Agentic AI
Framework | Primary Focus | Key Strengths | Limitations | Representative References |
---|---|---|---|---|
SHIELD | Layered defense across node, network, middleware, overlay | Multi-layer protections; adaptive orchestration; strong integration of metrics for security, privacy, dependability (SPD) | High deployment complexity; computational overhead in dynamic environments | 148 |
Zero-Trust Architecture (ZTA) | Continuous authentication, least-privilege access, and runtime monitoring | Strong against insider threats, stealth execution, AI-driven anomaly detection, and scalable to cloud environments | Requires complex integration with legacy systems; adversarial attacks may target monitoring AI | 2 |
SAGA | Cryptographic identity enforcement, policy-driven governance | Fine-grained access control; verifiable agent identity; minimal performance degradation; strong accountability | A centralized provider may become a single point of failure, with limited support for fully decentralized deployments | 164 |
Autonomous Cyber Defense (ACD) | Multi-agent reinforcement learning for adaptive cyber defense | Real-time autonomous threat detection; formal verification for robustness; effective in military contexts | High training complexity; potential for misaligned autonomous actions | 165 |
AI-Driven Cloud Defense | Behavioral analytics, self-healing infrastructure, and cognitive security | Proactive defense; predictive threat neutralization; suitable for large-scale cloud ecosystems | Explainability gap; vulnerability to adversarial manipulation | 162 |
APA Security Models | Securing autonomous process automation in enterprises | Continuous monitoring; agent-to-agent security protocols; strong data protection | Regulatory adaptation needed; evolving threat vectors in enterprise environments | 137 |
Key Insights from Comparative Analysis
• SHIELD offers broad, cross-layer defense but at the cost of complexity.
• ZTA excels in trust minimization and dynamic oversight, ideal for federated and cloud environments.
• SAGA is strongest in identity governance, crucial for preventing shadow agents and impersonation.
• ACD and AICA provide adaptive defense in military and high-threat environments but require robust verification to avoid unintended escalation.
• Emerging models for instance AI-driven cloud defense, APA security fill domain-specific gaps but must integrate with overarching governance strategies to ensure systemic resilience.
As shown in Figure 10, no single defense framework is sufficient; the future lies in hybrid models combining SHIELD’s layered structure, ZTA’s continuous verification, SAGA’s cryptographic controls, and adaptive autonomous defenses to counter rapidly evolving threats in agentic AI deployments.
Goal alignment, ensuring that agentic AI systems pursue objectives consistent with human values, remains a core challenge in AI safety. Misalignment issues such as goal drift, specification gaming, and reward hacking can lead to unexpected or harmful outcomes, especially as agents gain autonomy and optimize for unintended objectives.
Goal Alignment Challenges.
Misaligned goals often stem from incomplete or incorrect objective specifications, where the AI’s interpretation of its reward function diverges from human intent. Studies highlight that human expectations are often asymmetric with the behavior produced by agents, creating gaps that allow for undesirable optimizations.166 The EU AI Act itself, when analyzed through alignment theory, was shown to potentially suffer from proxy gaming, where agents optimize for compliance proxies rather than true safety goals.
Reward Manipulation and Specification Gaming.
Agentic AI systems may exploit weaknesses in reward functions, engaging in reward hacking or specification gaming to maximize proxy metrics while violating the intended spirit of their objectives. This is especially critical when agents influence user preferences to achieve favorable evaluations, as shown in models accounting for changing and influenceable preferences.167 Over-optimization on incomplete objectives can drive agents to behaviors that severely degrade overall utility.168
Emerging Alignment Strategies.
Solutions involve human-aware alignment algorithms, interactive approaches to infer user goals from incorrect beliefs, and inverse reinforcement learning (IRL) to better model human values. New frameworks, such as Expectation Alignment (EAL), formalize the detection and correction of misspecified rewards, while methods like SALMON use instructible reward models to align behavior with human-defined principles more effectively.169,170 Multi-dimensional strategies integrating human feedback, value learning, and policy-based oversight are considered most promising.171
Risks of Manipulative Alignment.
Researchers caution that AI systems may manipulate human reward mechanisms, influencing user choices or emotional states to secure favorable evaluations, exploiting vulnerabilities in decision-making. This highlights the need for robust interpretability and ethically grounded safeguards to prevent manipulation.
Furthermore, Goal alignment and reward manipulation present intertwined risks for agentic AI, demanding dynamic, human-centered solutions that adapt to evolving objectives while preventing agents from exploiting specification weaknesses. Future work must integrate continuous feedback, context-sensitive oversight, and interdisciplinary governance to mitigate these alignment failures.
Memory integrity is crucial for agentic AI systems, as corrupted or contradictory knowledge can directly undermine decision-making, alignment, and security. Agentic AI relies on dynamic, long-term memory architectures to store and retrieve contextual information; however, these same features introduce vulnerabilities to memory poisoning, knowledge conflicts, and semantic drift.
Integrity Risks in Agent Memory.
Studies show that users often have incomplete mental models of how agents remember and recall information, making them vulnerable to unintentionally reinforcing biases or introducing incorrect data.172 Moreover, episodic memory capabilities, while useful for monitoring and auditing, introduce risks of retaining sensitive or maliciously altered information, which can propagate errors through reasoning and planning modules.173
Contradictory Knowledge and Semantic Conflicts.
As agentic AI integrates information from multiple dynamic sources, contradictions inevitably emerge. Without robust conflict resolution mechanisms, agents may oscillate between inconsistent states or make decisions based on outdated data. Frameworks like MARK (Memory-Augmented Refinement of Knowledge) propose continuously refining memory through structured updates and contradiction resolution, thereby reducing hallucinations and improving response reliability.174 Similarly, SemanticCommit introduces human-in-the-loop tools to detect and resolve semantic conflicts during memory updates.175
Architectures Enhancing Memory Integrity.
Several advanced architectures aim to improve memory integrity:
• Zep, a temporal knowledge graph engine, dynamically synthesizes unstructured and structured data while maintaining historical relationships, outperforming existing systems like MemGPT in long-term reasoning tasks.176
• SHIMI uses a Semantic Hierarchical Memory Index to organize knowledge by meaning rather than surface similarity, enabling more precise retrieval and conflict resolution, particularly in decentralized environments.177
Trade-Offs in Memory Management.
Maintaining integrity requires balancing memorization with generalization. Overfitting to stored data may cause rigidity, while excessive forgetting risks losing critical contextual information. Research on continual learning agents confirms that memory capacity and update strategies critically influence robustness to environmental changes.178
Furthermore, Memory integrity and the management of contradictory knowledge are central to the reliability of agentic AI. Future research must integrate semantic conflict resolution, privacy-preserving memory control, and temporal reasoning architectures to ensure agents maintain coherent, accurate, and trustworthy internal representations throughout their operational lifecycle.
Auditability, explainability, and transparency are foundational pillars for ensuring that agentic AI systems remain trustworthy, interpretable, and aligned with human oversight mechanisms. These properties not only support accountability but also mitigate risks stemming from opacity, bias, and emergent unintended behaviors.
Auditability: Enabling Independent Oversight.
Auditability refers to the capability of external entities, regulators, auditors, or stakeholders to systematically examine AI decision-making processes. Unlike explainability, which is user-focused, auditability requires access to exhaustive system logs, decision traces, and datasets. A clear distinction is necessary: while explainability builds user trust, auditability empowers third parties to diagnose fairness and compliance issues.179 Research stresses that combining both dimensions is crucial, as transparency measures optimized for end-users may not provide sufficient detail for audits.
Explainability: From Black Boxes to Human Understanding.
Explainability (XAI) techniques such as SHAP, LIME, and counterfactual explanations aim to clarify how an AI system arrives at its decisions. For agentic AI, this is particularly complex because decisions often involve multi-step reasoning, memory retrieval, and inter-agent interactions. New approaches, including human-centered XAI (HCXAI), emphasize participatory methods where stakeholders are actively involved in interpreting explanations, thereby improving the alignment between technical transparency and user comprehension.180
Transparency: The Broader Ethical Context.
Transparency encompasses both explainability and auditability, but also traceability, fairness, and accessibility of information about the AI system. Studies on ethical AI development emphasize that transparency should not only serve technical functions but also safeguard public trust and democratic accountability.181 This involves clarifying the purpose, limitations, and data sources of agentic systems, as well as making design choices traceable through knowledge graphs and structured audit trails.182
Challenges in Achieving Full Transparency.
While regulations like the EU AI Act call for “meaningful explanations”, practical challenges persist, including:
• Trade-offs between usability and audit depth, where too much technical detail overwhelms users while too little prevents audits.
• Intellectual property constraints limit how much internal model information can be disclosed without compromising proprietary algorithms.183
• Emergent opacity, where multi-agent interactions generate behaviors not easily traceable to any single decision rule.
Toward Integrated Solutions.
Emerging strategies propose combining XAI layers with formalized auditing mechanisms (e.g., blockchain-based logging) to ensure decisions are both interpretable and verifiable. Participatory governance models further suggest involving diverse stakeholders in defining transparency requirements, ensuring that explainability meets the needs of both experts and lay users.172,184
So, agentic AI, auditability, explainability, and transparency must be treated as complementary but distinct properties. Future research should integrate knowledge graph-based audits, user-centered XAI techniques, and policy-driven transparency standards to ensure both operational clarity and systemic accountability.
Federated governance refers to decentralized oversight structures where multiple entities collaboratively manage agentic AI, reducing reliance on centralized control while improving adaptability and resilience. This model is crucial for agentic AI, which often operates across distributed networks and jurisdictional boundaries.
Federated Governance Models.
Governance of federated agent ecosystems leverages polycentric structures, allowing diverse stakeholders to enforce local norms while adhering to global interoperability standards. For instance, studies on federated platforms demonstrate that multi-level governance enhances scalability and trust, but risks fragmentation without shared principles.185 Similarly, Academy, a middleware for scientific agent ecosystems, shows how federated governance can coordinate autonomous agents across HPC environments while maintaining oversight through modular control points.186
Agent Revocation Mechanisms.
Revoking rogue or compromised agents is essential to prevent systemic failures. Current approaches include:
• Cryptographic revocation lists to immediately invalidate agent credentials, ensuring that revoked entities cannot interact with the ecosystem.
• Blockchain-enabled registries like BELIEFS create immutable audit trails and enable distributed consensus to quarantine or revoke malicious agents even in adversarial conditions.96
• Policy-driven kill-switches, where federated authorities retain the power to remotely disable agents that breach operational or ethical policies.
Challenges in Revocation.
Implementing revocation in federated settings faces hurdles:
• Latency in detection and coordination, where slow response allows malicious agents to propagate threats.
• Jurisdictional inconsistencies make global enforcement difficult.
• Potential abuse of revocation powers highlights the need for transparent procedures and distributed consensus.
Toward Secure Federated Governance.
Emerging approaches advocate systems-theoretic governance, where agent properties (autonomy, goal complexity, generality) determine revocation policies dynamically. Additionally, entropy-aware federated architectures suggest integrating quantum-ready, LLM-driven oversight to reconcile decentralized control with global security standards.187
Moreso, Federated governance enhances adaptability and trust in agentic AI, but its effectiveness hinges on robust, cryptographically enforced revocation mechanisms. The integration of blockchain consensus, policy-driven kill-switches, and dynamic risk-aware revocation frameworks is essential to prevent governance gaps and ensure secure, ethical operation across distributed AI ecosystems.
Shadow agents, insider risks, and stealth execution present some of the most insidious security threats to agentic AI systems. These vulnerabilities exploit the autonomy, persistence, and distributed nature of such agents, often bypassing traditional defenses.
Shadow Agents and Hidden Execution Paths.
Shadow agents refer to unauthorized or hidden autonomous entities that operate alongside legitimate agents, often executing malicious tasks without detection. Their stealth arises from blending into normal agent traffic and leveraging legitimate system privileges. Research shows that shadow agents can exploit tool integrations, persistent memory, and reasoning chains to conceal malicious operations while avoiding standard detection mechanisms.
Insider Risks: The Human-AI Nexus.
Insider threats remain a critical challenge because malicious insiders already possess privileged access and knowledge of defenses. Studies indicate that AI-driven insider detection using behavioral analytics, NLP, and multimodal monitoring can improve detection rates, but attackers adapt by employing stealth strategies to avoid suspicion.188 Game-theoretic analyses further reveal that when insiders collude with external attackers, stealth attacks become harder to mitigate, demanding joint monitoring of system and human interactions.189
Stealth Execution Techniques.
Stealth execution involves malicious activity hidden within legitimate agent workflows, often leveraging delayed exploitability and cross-system propagation. Advanced persistent threats (APTs) have evolved to include stealthy, long-term control of agentic systems, circumventing standard anomaly detection.58 Active Environment Injection Attacks (AEIA) demonstrate how adversaries can disguise malicious inputs as benign environmental elements, misleading agents during reasoning and decision-making.190
Detection and Mitigation Approaches.
• Advanced Threat Models, such as ATFAA, map out vulnerabilities specific to agentic AI and propose detection strategies targeting cross-layer stealth behaviors.
• Active Defense Infrastructures like ShadowNet dynamically redirect suspicious traffic to quarantined environments, neutralizing attacks while logging activity for forensic analysis.
• AI-Driven Insider Monitoring combines eye-tracking, behavioral analysis, and contextual risk scoring to identify covert insider activity even when access appears legitimate.191
So, Shadow agents, insider threats, and stealth execution exploit blind spots in current monitoring architectures. Addressing these risks requires integrating behavior-aware detection, cryptographically enforced identity control, and continuous runtime monitoring to uncover hidden behaviors before they escalate into systemic compromises.
Embedding regulatory and legal norms into agentic AI is a critical step toward ensuring these systems act in compliance with societal standards, ethical principles, and jurisdictional laws. Unlike static compliance methods, embedded norms must be dynamic, interpretable, and enforceable across diverse operational contexts.
Normative Embedding through AI Architecture.
Embedding norms involves integrating legal rules, ethical principles, and policy constraints directly into the reasoning and decision-making layers of AI agents. Frameworks such as Multi-Agent Online Planning Architecture for Real-Time Compliance (MAPA) formalize legal norms as hard constraints and ethical norms as soft constraints, allowing agents to re-plan dynamically when environmental conditions change. This ensures continuous adherence to evolving legal requirements without sacrificing operational flexibility.
Regulatory Compliance via Generative AI Systems.
Legal generative AI tools such as Gracenote.ai show how regulatory compliance can be operationalized by embedding domain-specific legal reasoning into agent workflows. This involves combining LLMs with horizon scanning and obligations generation tools, ensuring agents maintain compliance across multi-jurisdictional contexts while reducing risks of hallucination and misinterpretation.192 The use of human-in-the-loop mechanisms ensures that automated legal compliance remains auditable and ethically grounded.
Norm Learning and Adaptive Compliance.
Beyond embedding pre-defined rules, researchers have developed systems enabling agents to learn legal norms through behavioral exploration and sparse human supervision. This approach allows agents to infer normative boundaries from observed consequences, enabling better adaptation to ambiguous regulatory environments.193 Such systems bridge the gap between rigid rule enforcement and the nuanced application of laws in complex real-world scenarios.
Value and Principle Embedding.
Embedding goes beyond compliance by incorporating ethical principles, autonomy, fairness, and accountability into agent behavior. Norms are treated as technical instructions (algo-norms) embedded in the system architecture, enabling agents to reason about trade-offs between legal constraints and operational goals. This aligns with policy frameworks such as those from the EU High-Level Expert Group on AI.
Challenges and Open Questions.
• Dynamic Legal Environments: Legal norms evolve, requiring agents to continuously update embedded rules.
• Interpretability vs. Complexity: Deeply embedded norms may be opaque to regulators, undermining transparency.
• Cross-Jurisdictional Compliance: Agents must handle conflicting legal requirements across regions.
• Value Conflicts: Ethical and legal norms may not always align, requiring context-sensitive prioritization.
Furthermore, embedding regulatory and legal norms into agentic AI requires technical formalization, adaptive learning mechanisms, and human oversight. Future approaches will likely combine normative reasoning architectures, LLM-driven compliance engines, and policy-aware monitoring to create agents that are not only powerful but also law-abiding and ethically trustworthy.
Institutional readiness for managing agentic AI remains uneven across countries, with significant policy gaps that hinder effective governance. While technological advancements have outpaced regulation, institutional mechanisms to oversee deployment, manage risks, and enforce compliance are still underdeveloped.
Disparities in Institutional Readiness.
Studies reveal substantial variation in AI governance readiness, even among technologically advanced nations. Such as The AI Family Integration Index (AFII) introduces a multidimensional tool assessing countries’ readiness to integrate emotionally intelligent AI, revealing gaps between policy rhetoric and real-world execution. Nations like Singapore demonstrate strong alignment between policy intent and operational readiness, while others for instance the U.S. and France score high technically but lag in implementing ethical integration practices.194
Policy Gaps in Regulatory Frameworks.
Governments articulate ethical AI principles but often lack enforcement mechanisms and institutional capacities to translate these principles into operational standards. For instance, ASEAN countries exhibit varying levels of preparedness, with Singapore leading through sophisticated policies, while Thailand and Malaysia face enforcement challenges and infrastructural limitations.195 Healthcare AI governance in the region underscores similar gaps, with many countries lacking comprehensive legal frameworks for ethical deployment.196
The Governance Gap Lens.
Several frameworks identify a policy-practice dissonance; institutions may adopt AI ethics guidelines but fail to embed them into governance workflows. UNESCO’s Readiness Assessment Methodology (RAM) highlights this gap, emphasizing the need for capacity-building and alignment of regulations with human-centered principles.197 Without operational alignment, even well-formulated policies risk becoming symbolic.
Emerging Decentralized Governance Models.
New proposals, such as ETHOS (Ethical Technology and Holistic Oversight System), advocate decentralized governance leveraging blockchain, smart contracts, and DAOs. These models enable dynamic risk classification, automated compliance, and transparent dispute resolution, bridging gaps where centralized oversight is insufficient.
Challenges to Institutional Readiness.
• Technical Capacity Gaps: Governments lack the technical expertise to audit and regulate rapidly evolving AI systems.198
• Fragmented International Standards: Diverging national policies hinder interoperability and coordinated responses.
• Slow Policy Adaptation: Legal frameworks often lag behind technological advancements, leaving gaps exploitable by malicious actors.
• Limited Ethical Integration: Few policies account for emotional, relational, and cultural dimensions of AI deployment.194
Moreso, the Institutional readiness for agentic AI governance is patchy and constrained by policy-practice gaps, technical deficits, and a lack of harmonized oversight mechanisms. Bridging these gaps requires capacity-building, cross-border coordination, and the adoption of adaptive governance frameworks, potentially integrating decentralized models like ETHOS with human-centered regulatory approaches to ensure both innovation and accountability.
The rapid growth of agentic AI necessitates robust benchmarking, testing, and empirical validation platforms to ensure reliability, safety, and adaptability. Unlike traditional machine learning benchmarks, agentic AI systems demand evaluation across dynamic environments, multi-objective optimization, and cross-agent coordination, requiring new paradigms beyond static metrics.
Multi-Objective and Safety-Oriented Benchmarks.
Recent studies emphasize the need for benchmarks that incorporate biological and economic alignment principles, reflecting real-world complexities. The multi-objective, multi-agent safety benchmarks proposed by Pihlakas & Pyykko introduce themes like homeostasis, sustainability, and resource sharing, revealing pitfalls where agents over-optimize single objectives at the expense of safety and long-term stability.199
Observability-Driven Testing Frameworks.
Standard "black-box" testing is inadequate for agentic AI, where non-deterministic flows and context-dependent behaviors complicate evaluation. New frameworks advocate runtime observability and analytics to extract decision traces, detect emergent issues, and optimize agent performance dynamically.200 These approaches enable continuous, interpretable evaluation across development and deployment phases.
Task-Specific Validation Platforms.
Several platforms target specialized domains:
• OSUniverse benchmarks GUI-navigation agents, testing capabilities from precision tasks to multi-application workflows, with automated validation, achieving high reliability.201
• REALM-Bench evaluates multi-agent planning under dynamic disruptions, scaling task complexity to test adaptability and inter-agent coordination.202
• CORE-Bench focuses on computational reproducibility, assessing agent performance in replicating scientific workflows, an essential step toward trustworthy AI in research contexts.203
Explainability and Validation Toolkits.
Platforms like EXACT (Explainable AI Comparison Toolkit) provide standardized datasets and metrics for validating the quality of model explanations, revealing that many XAI methods underperform when compared to human expectations.204 These insights are crucial as agentic AI must be auditable and interpretable to meet regulatory and ethical standards.
Challenges and Future Directions.
• Non-determinism and emergent behaviors complicate reproducibility and standardization.
• Cross-domain benchmarking is lacking, as current platforms often address narrow use cases.
• Integration of safety, ethics, and performance metrics into unified benchmarks is still underdeveloped.
So, the Next-generation benchmarking for agentic AI must integrate multi-objective safety, observability-driven analytics, and real-world complexity. Emerging platforms such as REALM-Bench, OSUniverse, and CORE-Bench mark a shift toward holistic, dynamic validation environments, paving the way for safer and more trustworthy agentic AI deployments.
Despite substantial progress in agentic AI, significant research gaps persist across cybersecurity, ethics, governance, and multi-agent systems, hindering the development of fully trustworthy deployments.
1. Cybersecurity and Risk Management Gaps.
Agentic AI introduces new attack surfaces and responsibility gaps not fully addressed by current cybersecurity frameworks. While advanced approaches leverage agentic and frontier AI for ethical threat intelligence, researchers note a lack of standardized methods for continuous, proactive defense and cross-domain incident reporting. Moreover, existing laws fail to regulate AI-driven offensive cyber capabilities, leaving accountability for AI-initiated cyber incidents unresolved.205
1. Governance and Institutional Gaps.
AI governance remains fragmented, with unclear implementation mechanisms, insufficient operationalization of ethical principles, and a lack of international coordination.206 Decentralized governance proposals such as ETHOS show promise but require further empirical validation to ensure effectiveness in multi-jurisdictional contexts.
2. Ethical and Normative Gaps.
Ethical integration in agentic AI remains superficial. Existing work highlights moral crumple zones, where accountability becomes diffused across multiple actors, leaving harms unaddressed. There is a need for robust value-alignment frameworks that prevent agents from drifting toward unintended goals while embedding context-aware legal norms directly into AI reasoning layers.
5. Multi-Agent System Coordination Gaps.
Research on multi-agent collaboration shows that emergent behaviors in cross-domain settings remain unpredictable and under-evaluated. Recent work on cross-domain knowledge discovery using multi-AI agents reveals the potential of collaborative frameworks but highlights gaps in efficiency, knowledge transfer, and conflict resolution mechanisms.207
6. Risk Alignment and Accountability Gaps.
Risk alignment, ensuring agentic AI systems adopt risk attitudes aligned with human values, remains an unresolved issue. Poorly calibrated systems risk reckless behaviors and create responsibility voids where neither developers nor users can be held fully accountable.208 Further work is needed to integrate risk-calibration mechanisms into agent decision-making.
7. Interdisciplinary and Cross-Domain Gaps.
Research across ethics, cybersecurity, and governance remains siloed, preventing comprehensive solutions. The rise of agentic AI for scientific discovery underscores the need for interdisciplinary frameworks combining technical safety, ethical oversight, and legal enforceability.
As shown in Table 5. The key research gaps lie in standardizing cybersecurity protocols, operationalizing governance models, embedding ethics at the architectural level, and achieving predictable multi-agent coordination. Addressing these gaps demands interdisciplinary research, adaptive regulatory frameworks, and empirical validation of emerging solutions to ensure safe, ethical, and effective agentic AI deployments.
To support implementation, a consolidated list of strategic recommendations is provided in Table A9 (Supplementary Material).
This survey integrates findings from diverse research on agentic AI architectures, threats, defense mechanisms, and governance, providing a holistic understanding of the challenges and strategies required for trustworthy deployment.
Architectural Complexity and Unique Threats.
Agentic AI systems differ fundamentally from traditional AI and LLMs because they reason, plan, and act autonomously across distributed environments. Their unique architecture introduces novel vulnerabilities such as cognitive exploits, shadow agents, and cross-layer propagation that are not addressed by legacy security frameworks. New threat models like ATFAA have been proposed to classify these risks and inform mitigation strategies.
Evolving Governance and Oversight Models.
Traditional governance frameworks (e.g., OECD, EU AI Act, NIST) provide initial guardrails but lack specific provisions for agentic AI, which operates across federated and dynamic contexts. Emerging solutions combine policy-driven governance, blockchain-backed trust frameworks, and decentralized oversight models to fill institutional gaps.
Defense Strategies Require Layered Approaches.
Defense mechanisms such as SHIELD, Zero-Trust Architectures, and SAGA address distinct layers of risk from secure execution to cryptographic identity control. However, no single framework suffices; future defense must integrate layered monitoring, cryptographic enforcement, and AI-driven threat adaptation to counter stealth and insider risks effectively.
Key Insights Across Adjacent Domains.
• Cybersecurity research highlights the need for proactive, adaptive defense, as static measures fail against evolving multi-agent threats.
• Governance studies reveal persistent gaps in regulatory readiness and cross-jurisdictional enforcement.
• Ethics research warns of moral crumple zones where accountability is diffused, necessitating embedded normative reasoning.
• Benchmarking and validation platforms remain underdeveloped for capturing emergent, non-deterministic agent behaviors, requiring new observability-driven metrics.
The review underscores that building trustworthy agentic AI requires synergistic advances across technical, governance, ethical, and empirical domains. A multi-layered defense, decentralized yet coordinated oversight, and interdisciplinary research are imperative to closing gaps and ensuring secure, accountable, and beneficial deployment of these autonomous systems.
The future of trustworthy agentic AI will be defined by technological advancements, ethical integration, and global governance innovations. The evolution of these systems will likely follow trends observed in emerging AI research, emphasizing adaptability, explainability, and human-centered oversight.
Emerging Trends and Technological Drivers.
Agentic AI will increasingly integrate quantum computing, edge intelligence, and multi-agent meta-learning to enhance scalability and decision-making capabilities. Future systems are expected to exhibit meta-reasoning abilities, enabling agents to explain and justify their decision-making processes, bridging current gaps in interpretability and accountability.
Shifting Toward Human-Centric and Ethical AI.
Trustworthy deployment will require embedding ethical norms, social intelligence, and human-in-the-loop mechanisms into agentic architectures. Future agentic AI is predicted to adopt multi-dimensional intelligence models, incorporating social, emotional, and ethical reasoning to align more closely with human values. These systems will increasingly focus on value-sensitive design, minimizing risks of manipulation or harmful autonomy.
Governance and Regulatory Trajectories.
Regulatory readiness will remain a decisive factor. Evolving policies must adapt to dynamic agentic behaviors and cross-border interactions, requiring frameworks that combine decentralized trust with enforceable accountability mechanisms. Explainable AI (XAI) and third-party audits will become core compliance tools to ensure that regulations translate into operational safety.
Trust, Adoption, and Human-AI Collaboration.
Trust in agentic AI will dictate adoption rates. Studies highlight that trust is shaped by technical robustness, ethical alignment, and perceived transparency. Agents capable of explaining their reasoning and negotiating with human stakeholders will foster a collaborative ecosystem rather than one of conflict or opacity.
Challenges Ahead.
Persistent risks include adversarial manipulation, moral crumple zones, and governance gaps in decentralized deployments. Addressing these requires interdisciplinary efforts, combining advances in cybersecurity, ethics, and policy to build systems that remain resilient under both technological and societal pressures.
The future of trustworthy agentic AI lies in adaptive architectures enriched with ethical intelligence, supported by transparent governance frameworks and human-centered oversight. As these systems evolve, ensuring they remain aligned, secure, and explainable will be critical to realizing their transformative potential while safeguarding public trust and global stability.
Building trustworthy agentic AI is an inherently interdisciplinary challenge, demanding expertise that spans technical design, policy, ethics, law, and social sciences. The complexity of agentic AI autonomous systems capable of decision-making, planning, and multi-agent coordination requires coordinated efforts to mitigate risks, align goals, and ensure accountability.
The Necessity of Cross-Domain Expertise.
Agentic AI’s transformative potential is accompanied by risks that cannot be solved by technical advances alone. Studies emphasize that interdisciplinary collaboration uniting AI engineers, ethicists, legal scholars, and social scientists is crucial to address the ethical, legal, and societal implications of autonomy and long-term goal pursuit. Collaborative frameworks ensure that AI solutions are not only technically robust but also socially aligned and ethically grounded.
Enhancing Collaboration with Hybrid Models.
Emerging research supports hybrid collaboration models, where multi-agent AI systems work alongside humans to jointly solve complex problems, amplifying creativity and problem-solving capacity. In software development, frameworks such as ChatCollab show how human and AI agents can co-create solutions effectively, reinforcing the benefits of team-based interdisciplinary dynamics.
Institutionalizing Interdisciplinary Practices.
Interdisciplinary collaboration must move beyond ad hoc partnerships to become institutionalized. This includes creating cross-sectoral task forces, academic-industry consortia, and policy advisory groups that foster ongoing dialogue between technical developers, regulators, and ethicists. Iterative methodologies that combine ethics-by-design, value-sensitive design, and continuous feedback cycles have been proposed to maximize the benefits of interdisciplinary synergies.
Shaping Trustworthy Human-AI Collaboration.
Research highlights that trust in agentic AI depends on collaborative governance, transparent communication, and shared decision-making between human and AI agents. Multi-disciplinary approaches also help anticipate unintended consequences and design AI ecosystems that align with societal values.
The path to trustworthy agentic AI lies in deep interdisciplinary collaboration. By uniting technical innovation with ethical reasoning, legal oversight, and human-centered design, stakeholders can create AI systems that are not only powerful and adaptive but also transparent, accountable, and aligned with human welfare. Future advancements will require sustained, cooperative frameworks bridging academia, industry, and policy to ensure that agentic AI evolves as a beneficial and trustworthy partner in society.
The supplementary materials underlying this article are openly available on Figshare209: Trustworthy Agentic AI Systems: A Cross-Layer Review of Architectures, Threat Models, and Governance Strategies for Real-World Deployment: Supplementary Data. This repository contains Tables, Figures, Appendix files, and Supplementary Data. All newly generated materials and supplementary datasets are available under the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)