Trustworthy agentic AI systems: a cross-layer review of architectures, threat models, and governance strategies for real-world deployment

IBRAHIM ADABARA; Bashir Olaniyi Sadiq; Aliyu Nuhu Shuaibu; Yale Ibrahim Danjuma; Venkateswarlu Maninti

doi:10.12688/f1000research.169927.1

Home Browse Trustworthy agentic AI systems: a cross-layer review of architectures,...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Review

Trustworthy agentic AI systems: a cross-layer review of architectures, threat models, and governance strategies for real-world deployment

[version 1; peer review: awaiting peer review]

IBRAHIM ADABARA ¹, Bashir Olaniyi Sadiq², Aliyu Nuhu Shuaibu², Yale Ibrahim Danjuma¹, Venkateswarlu Maninti¹

IBRAHIM ADABARA ¹, Bashir Olaniyi Sadiq², [...] Aliyu Nuhu Shuaibu², Yale Ibrahim Danjuma¹, Venkateswarlu Maninti¹

PUBLISHED 11 Sep 2025

Author details Author details

¹ Computing, Kampala International University - Western Campus, Bushenyi, Western Region, Uganda
² Electrical, Telecommunication, and Computer Engineering, Kampala International University - Western Campus, Bushenyi, Western Region, Uganda

IBRAHIM ADABARA
Roles: Conceptualization, Investigation, Project Administration, Writing – Original Draft Preparation, Writing – Review & Editing

Bashir Olaniyi Sadiq
Roles: Supervision, Validation, Writing – Review & Editing

Aliyu Nuhu Shuaibu
Roles: Methodology, Resources, Supervision, Writing – Review & Editing

Yale Ibrahim Danjuma
Roles: Formal Analysis, Methodology, Resources, Supervision, Visualization

Venkateswarlu Maninti
Roles: Conceptualization, Data Curation, Resources, Software, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Agentic Artificial Intelligence systems, characterized by autonomous reasoning, memory augmentation, and adaptive planning, are rapidly reshaping technological landscapes. Unlike traditional AI or large language models, agentic AI integrates decision-making with persistent execution, enabling complex interactions across dynamic environments. However, this evolution introduces novel security risks, governance challenges, and ethical considerations that current frameworks inadequately address. This survey provides a cross-layer review of agentic AI, encompassing architectural paradigms, threat taxonomies, and governance strategies. It consolidates findings from adjacent domains such as cybersecurity, AI safety, multi-agent coordination, and ethics, offering a holistic understanding of vulnerabilities and mitigation approaches. We integrate insights from recent advances in defense architectures and governance innovations, highlighting the limitations of static policies in addressing dynamically evolving threats. Real-world deployments from industrial automation to military and policy applications reveal both successful integrations and notable failures, underscoring the urgency of resilient oversight mechanisms. Furthermore, we identify critical research gaps in benchmarking, memory integrity, adversarial defense, and normative embedding, emphasizing the need for interdisciplinary collaboration to develop adaptive, accountable, and transparent systems. This review serves as a narrative synthesis rather than a systematic literature review, aiming to bridge technical, governance, and ethical perspectives. By integrating cross-disciplinary findings, it lays the foundation for future research on securing, aligning, and governing agentic AI in real-world contexts. Ultimately, this work calls for cooperative innovation to ensure that agentic AI evolves as a trustworthy, accountable, and beneficial technology.

Keywords

Agentic Artificial Intelligence, Autonomous Systems, Multi-Agent Systems. Memory-Augmented Reasoning, Threat Modeling, Secure Execution, Lifecycle Control, AI Governance

Corresponding author: IBRAHIM ADABARA

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 ADABARA I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: ADABARA I, Olaniyi Sadiq B, Nuhu Shuaibu A et al. Trustworthy agentic AI systems: a cross-layer review of architectures, threat models, and governance strategies for real-world deployment [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:905 (https://doi.org/10.12688/f1000research.169927.1) First published: 11 Sep 2025, 14:905 (https://doi.org/10.12688/f1000research.169927.1) Latest published: 11 Sep 2025, 14:905 (https://doi.org/10.12688/f1000research.169927.1)

1. Introduction

The rapid emergence of agentic AI systems, AI agents endowed with memory, reasoning, planning, and tool-use capabilities, represents a paradigm shift from traditional machine learning and static decision-support models. These systems are increasingly deployed in domains where autonomous decision-making interacts with dynamic, high-stakes environments such as healthcare, critical infrastructure, and cybersecurity. While their autonomy promises unprecedented efficiency and innovation, it also introduces novel risks that challenge existing frameworks for safety, ethics, and governance.¹ From a security perspective, agentic AI increases the attack surface. Autonomous decision-making enables new forms of adversarial manipulation, including cognitive exploits, stealth execution, and knowledge poisoning. Conventional layered security models, originally designed for static computing architectures, are inadequate for defending adaptive, distributed agents. Researchers argue that cross-layer security strategies integrating hardware, software, and governance measures are necessary to address these vulnerabilities holistically.²

The concept of “trustworthiness” itself is contested. Scholars Conradie & Nagel³ and Freiman⁴ caution against anthropomorphizing AI with human attributes such as “trust” and “responsibility,” noting that these qualities must instead be framed as properties of socio-technical systems that include human oversight and institutional. This highlights the need to shift the focus from asking whether AI itself can be “trusted” to how we can build systems that support human-centered trust relationships through technical safeguards and governance. Governance frameworks such as the EU AI Act, NIST’s AI Risk Management Framework, and ISO/IEC standards have laid initial foundations, but they lack granularity for managing agentic systems that self-adapt, collaborate, and act semi-independently. Integrating principles of zero-trust architectures, explainable AI, and adaptive oversight mechanisms is now seen as crucial for aligning agentic AI with societal expectations of accountability and safety.^5,6 Finally, real-world deployments from national crisis response to autonomous cybersecurity demonstrate both the potential and fragility of agentic AI. Cases of unanticipated failures, bias amplification, and adversarial exploitation underscore the urgency of developing a cross-layer understanding that integrates architecture, threats, and governance strategies.⁷ In light of these challenges, this review is motivated by the need to bridge technical insights with ethical and regulatory perspectives, offering a holistic framework to guide both researchers and policymakers in building trustworthy agentic AI systems.

This review adopts a narrative review methodology rather than a systematic literature review (SLR). Unlike SLRs, which employ rigid inclusion and exclusion criteria, a narrative review enables a broad, integrative synthesis across multiple disciplines. This flexibility is essential for agentic AI, where developments in architectures, security threats, and governance evolve rapidly and often emerge outside traditional peer-reviewed channels, including industry white papers and policy documents.⁸

The scope of this work spans technical, ethical, and regulatory dimensions, providing a cross-layer perspective on:

• Agentic AI Architectures: including mono-agent, multi-agent, federated, and blockchain-enabled systems, with a focus on how these architectures influence trustworthiness.
• Threat Models and Security Risks: covering cognitive exploits, knowledge poisoning, prompt injection, stealth execution, and cross-layer propagation vulnerabilities.
• Governance and Oversight Mechanisms: analyzing legal frameworks such as the EU AI Act, NIST, ethical norms, and lifecycle accountability approaches.
• Defense Strategies and Risk Mitigation: reviewing zero-trust frameworks, cryptographic identity mechanisms, and layered defense strategies for resilient deployments.
• Real-World Deployments: evaluating industrial and governmental use cases, security incidents, and lessons learned for future deployments.

The literature reviewed draws from AI safety, cybersecurity, governance, ethics, and distributed systems, ensuring an interdisciplinary lens.⁹ Unlike prior reviews that focus narrowly on either technical mechanisms or policy considerations, this review integrates both dimensions to reveal emerging gaps in aligning technical safeguards with governance strategies.¹⁰ Furthermore, this review includes insights from adjacent domains such as multi-agent coordination, cybersecurity resilience, and human-centered AI ethics to map a more comprehensive landscape of trust challenges and mitigation strategies.¹¹ Synthesizing this diverse body of knowledge offers a holistic foundation for researchers, practitioners, and policymakers seeking to understand and secure the future of agentic AI.

This review makes four key contributions by consolidating insights across technical, adversarial, and governance layers to address the trustworthiness of agentic AI systems.

1. Integration of Cross-Layer Perspectives: Unlike prior studies that analyze AI trustworthiness through isolated lenses (technical or ethical), this review integrates findings across architectures, threats, and governance, offering a comprehensive cross-layer framework. This approach aligns with recent calls for merging hardware/software security with policy oversight to address complex AI risks.²
2. Development of a Layered Threat Taxonomy: The paper introduces a novel taxonomy that categorizes risks specific to agentic AI, including cognitive exploits, shadow agent emergence, and cross-layer propagation vulnerabilities. This taxonomy extends beyond traditional adversarial machine learning, incorporating threats identified in recent cybersecurity research.^7,12
3. Synthesis of Governance with Technical Safeguards: This review connects policy frameworks such as the EU AI Act, ISO/IEC governance models, with technical defense strategies such as zero-trust architectures and explainable AI. This synthesis provides actionable guidance for designing systems that are both technically secure and aligned with societal expectations.^5,6
4. Identification of Research Gaps and Future Directions: Finally, this paper highlights critical gaps such as lifecycle accountability, benchmarking of agentic AI safety, and federated governance risks, and proposes a roadmap for future research. These findings aim to inspire interdisciplinary collaboration to close existing gaps between technology, security, and regulation.⁹

Collectively, these contributions offer a holistic foundation for understanding and securing agentic AI, guiding both technical innovations and governance frameworks for real-world deployment.

The remainder of this paper is organized to progressively build a cross-layer understanding of trustworthy agentic AI, beginning with its methodological foundations and advancing toward governance and future research directions. Section 2 outlines the narrative review methodology, describing the sources, search strategy, inclusion rationale, and the domains considered, while also contrasting this approach with prior surveys to highlight the novelty of this work. Section 3 establishes the technical foundations of agentic AI by defining its distinguishing features, including memory-augmented reasoning, planning capabilities, and interaction with adjacent research areas such as AI safety and distributed systems. Building on this, Section 4 explores architectural paradigms, from mono-agent to blockchain-enabled systems, and provides a comparative evaluation that emphasizes their strengths and limitations in terms of trustworthiness. Section 5 develops a layered threat taxonomy, mapping cognitive exploits, knowledge poisoning, stealth execution, and cross-layer propagation risks, while integrating insights from cybersecurity and adversarial machine learning literature. Section 6 shifts focus to governance frameworks, reviewing existing regulatory approaches, identifying gaps unique to agentic systems, and drawing lessons from adjacent domains like robotics and cybersecurity governance. Section 7 examines real-world deployments, including industrial, governmental, and policy-driven use cases, and reflects on both successful implementations and documented failures. Section 8 discusses defense architectures and oversight models, evaluating mechanisms such as layered security frameworks, zero-trust architectures, and cryptographic identity enforcement, while offering a comparative analysis of their effectiveness. Section 9 synthesizes the findings to identify open research challenges, including goal alignment, auditability, and institutional readiness, and proposes future directions to bridge these gaps. Finally, Section 10 concludes by summarizing key insights, presenting a forward-looking perspective on the evolution of trustworthy agentic AI, and emphasizing the need for interdisciplinary collaboration to ensure safe and accountable deployment. This structured progression from foundations to threats, governance, real-world applications, and future outlook ensures that readers gain a comprehensive understanding of the multifaceted issues surrounding agentic AI trustworthiness.

2. Literature review methodology

2.1 Literature sources and search strategy

Key distinguishing features of agentic AI systems are summarized in Table A2, while architectural comparisons are provided in Table A3. Additionally, a taxonomy of emerging threats is outlined in Table A4 (Supplementary Material).

Given the interdisciplinary nature of agentic AI, this review adopts a narrative approach to identify and synthesize relevant literature rather than applying rigid inclusion rules. The search process was designed to capture technical, security, and governance perspectives, allowing the integration of diverse insights from multiple domains. Academic databases such as IEEE Xplore, ACM Digital Library, SpringerLink, ScienceDirect, and arXiv were the primary sources, complemented by policy reports from organizations including the OECD, NIST, and the European Commission. To ensure coverage of cutting-edge developments, recent conference proceedings such as NeurIPS, ICML, and AAAI were also reviewed.¹³

The search strategy combined keyword clusters such as “agentic AI,” “autonomous agents,” “multi-agent systems,” “cross-layer security,” “trustworthy AI,” “AI governance,” and “threat modeling.” Boolean operators and field-specific terms were applied to maximize the retrieval of high-quality and contextually relevant articles. The selection was not limited to peer-reviewed journals; influential technical white papers and government publications were included where they provided substantial insights into emerging practices or regulatory frameworks.¹⁴

Articles were included based on relevance to the cross-layer trustworthiness of agentic AI, covering themes of architectural design, threat taxonomy, governance, and ethical oversight. No strict temporal filter was applied; however, priority was given to literature from the last five years to reflect rapid technological advances. Older works were retained where they provided foundational theoretical frameworks. Unlike systematic reviews, which rely on predefined inclusion thresholds, this narrative review allows the inclusion of conceptually significant studies even if they fall outside narrow search criteria.¹⁵ Finally, to address emerging debates, grey literature such as industrial threat reports, AI safety guidelines, and open-source datasets was selectively integrated where it contributed unique evidence not yet present in academic publications.¹⁶ This multifaceted strategy ensures the survey encompasses both well-established theories and cutting-edge practices shaping the discourse on trustworthy agentic AI.

2.2 Inclusion and relevance criteria

The inclusion of literature in this survey was guided by conceptual relevance rather than rigid filtering, consistent with the narrative review methodology. Rather than applying standardized exclusion protocols characteristic of systematic reviews, this study adopted a flexible, rationale-driven selection process that allowed the incorporation of diverse perspectives spanning technical, ethical, and governance dimensions. This approach is particularly appropriate for agentic AI, where developments often emerge from interdisciplinary intersections and non-traditional publication channels.¹⁵

• Sources were considered relevant if they contributed substantively to at least one of the following dimensions:
• Architectural Foundations papers offering insights into agentic architectures, multi-agent systems, or distributed designs, including blockchain-enabled or federated models.
• Threat Models and Security Risks studies that examined adversarial techniques, cross-layer propagation of attacks, or security vulnerabilities specific to autonomous agents.
• Governance and Ethical Oversight literature addressing regulatory frameworks, ethical principles, or lifecycle accountability mechanisms for AI systems.
• Defense Mechanisms and Mitigation Strategies research proposing zero-trust models, layered defense frameworks, or cryptographic identity enforcement approaches relevant to agentic AI security.
• Real-world deployments, case studies, industry reports, or empirical analyses documenting the successes and failures of agentic AI deployments in practice.

Priority was given to peer-reviewed publications from recognized journals and conferences, particularly those published in the last five years, reflecting the fast-evolving nature of this field. However, seminal works regardless of publication year were included when they provided foundational theoretical or methodological contributions.¹⁷ In addition, high-impact grey literature such as policy briefs, technical white papers, and reports from AI governance bodies were selectively incorporated to capture perspectives not yet reflected in academic discourse.¹⁶ Studies that focused exclusively on narrow domains such as standard supervised learning or traditional AI ethics without a direct connection to agentic autonomy, layered security, or governance were excluded. Similarly, sources lacking technical or conceptual rigor (such as opinion articles without evidence) were not retained. This balanced approach ensured the review’s inclusivity while maintaining its academic quality.

2.3 Domains covered in this survey

This survey spans seven interconnected domains that collectively shape the trustworthiness of agentic AI systems: agentic architectures, cybersecurity and adversarial threats, AI safety, governance frameworks, and ethical considerations. These domains were selected because they form the technical, operational, and normative pillars essential for understanding and mitigating risks associated with autonomous agents.

The first domain, agentic AI architectures, encompasses research on the design and functioning of mono-agent, multi-agent, federated, and blockchain-enabled systems. These architectures define how agents perceive, reason, and act within dynamic environments. Recent works highlight that architectural choices significantly influence security vulnerabilities, coordination strategies, and trust propagation among agents.¹³ The second domain focuses on cybersecurity and adversarial threats. Agentic AI, due to its autonomous decision-making and interconnected operations, introduces new attack vectors such as cognitive exploits, stealth execution, and cross-layer propagation risks. Studies in adversarial machine learning and zero-trust architectures underscore the need for layered defenses and adaptive security frameworks to counter these evolving threats.² The third domain, AI safety, addresses the alignment of agentic behavior with human values and intended goals. This includes mitigating risks like reward hacking, goal drift, and emergent behaviors in multi-agent settings. Literature from AI safety research emphasizes the integration of formal verification, runtime monitoring, and explainability mechanisms to ensure predictable and controllable outcomes.⁹ The fourth domain centers on AI governance and regulatory frameworks. International policies, such as the EU AI Act and NIST AI Risk Management Framework, provide high-level guidelines but often fall short of addressing the adaptive and distributed nature of agentic systems. Recent research advocates for hybrid governance models that combine legal mandates with technical enforcement mechanisms.⁵ Finally, the fifth domain incorporates ethical and socio-technical considerations. Trust in agentic AI is not merely a technical property but a relational construct shaped by human perceptions, institutional accountability, and societal norms. Scholars have warned against anthropomorphizing AI with human-like trust qualities, instead calling for frameworks that prioritize responsible human oversight and equitable power dynamics in AI deployment.³ By synthesizing insights from these seven domains, this review provides a holistic lens to examine both the opportunities and risks associated with agentic AI, offering guidance for secure, ethical, and accountable real-world deployment. As shown in Figure 1. mind map of agentic AI domains. This diagram illustrates the seven interconnected domains influencing the trustworthiness of agentic AI systems: architectures, cybersecurity, AI safety, governance, ethical considerations, real-world deployments, and defense mechanisms. These domains form the foundation of the cross-layer framework proposed in the review.

Figure 1. Mind Map of Agentic AI Domains.

Conceptual diagram showing the interconnection of key domains: architectures, AI safety, threats, multi-agent systems, governance, defense, and ethics.

2.4 Comparison with existing surveys

Existing surveys on AI trustworthiness have largely focused on either technical mechanisms or policy frameworks, leaving a gap in integrating these perspectives under a unified cross-layer approach. For example, surveys in the domain of cybersecurity and AI have primarily concentrated on adversarial machine learning, intrusion detection, and threat intelligence without addressing how these threats propagate across agentic architectures or interact with governance layers.¹³ Similarly, reviews from the AI ethics literature tend to emphasize normative principles such as fairness, accountability, and transparency without offering concrete architectural or defensive models applicable to autonomous agents.³

A few recent works have attempted to bridge technical and governance perspectives. For instance, studies on zero-trust architectures in AI security argue for embedding security across multiple layers of AI systems, yet they do not systematically link these mechanisms to agentic AI’s unique properties, such as self-adaptation or collaborative behavior in multi-agent environments.² Meanwhile, policy-oriented reviews, including those analyzing the EU AI Act and related regulatory frameworks, provide high-level governance principles but lack the technical granularity necessary for implementing safeguards within agentic ecosystems.⁵ Unlike these prior surveys, the present work adopts a cross-layer narrative perspective, systematically connecting architectural design choices, threat models, and governance strategies. It also incorporates real-world deployment experiences and emerging defense architectures, aspects often overlooked in earlier reviews. Furthermore, this study explicitly integrates adjacent domains such as AI safety, cybersecurity resilience, and robotics governance, creating a broader synthesis that reveals interdependencies between technical risks and institutional responses.⁹ As shown in Flowchart 1, the survey methodology. Outlines the narrative review process used in the study, including literature source selection, interdisciplinary integration, and thematic synthesis across technical, ethical, and governance domains. By filling these gaps, this review not only complements but also extends the scope of existing literature, providing a comprehensive framework to guide future research and policy design for trustworthy agentic AI systems.

Flowchart 1. Survey Methodology (Narrative Approach).

Depicts the literature review workflow, including source selection, search strategy, inclusion criteria, thematic synthesis, and selection of relevant studies.

3. Technical foundations of agentic AI systems

3.1 Defining agentic AI and its distinction from LLMs and traditional agents

Agentic AI refers to a class of artificial intelligence systems endowed with autonomy, memory, reasoning, planning, and proactive tool use, enabling them to operate in dynamic environments with minimal human intervention. Unlike traditional AI agents, which are typically task-specific and rule-bound, agentic AI demonstrates goal-directed behavior, the capacity for self-decomposition of complex tasks, and the ability to coordinate with other agents in multi-agent ecosystems.¹⁸ These systems integrate persistent memory and adaptive decision-making loops, enabling them to learn continuously and adjust their actions in response to environmental changes. In contrast, Large Language Models (LLMs) such as GPT and similar architectures are primarily predictive models trained to generate responses based on statistical patterns in large datasets. While LLMs have shown remarkable capabilities in natural language understanding and reasoning, they lack true agency: they do not possess intrinsic goals, persistent memory (beyond limited context windows), or the ability to autonomously plan and execute actions in the real world. Recent research, however, demonstrates that LLMs can serve as cognitive cores for agentic systems when augmented with external memory, planning modules, and orchestration layers.¹⁹ This hybridization blurs the boundary but does not erase the fundamental distinction: LLMs remain reactive tools unless embedded within an agentic framework that endows them with autonomy.

Traditional AI agents, such as rule-based expert systems or early multi-agent architectures, operate with predefined logic and limited adaptability. Their actions are constrained by fixed decision trees or programmed behaviors, making them ill-suited for open-ended environments. Agentic AI, by contrast, leverages dynamic task decomposition, meta-reasoning, and tool orchestration to perform tasks not explicitly programmed at design time.²⁰ Moreover, agentic systems often operate within multi-agent ecosystems, enabling collective intelligence through cooperation, negotiation, and competition. Recent developments such as UserCentrix and Agent4EDU frameworks illustrate how agentic AI can combine LLM reasoning with memory-augmented orchestration and multi-agent collaboration to achieve real-world objectives autonomously.^21,22 These features position agentic AI as a new paradigm that goes beyond both traditional AI agents and standalone LLMs, introducing unique opportunities and security and governance challenges that warrant cross-layer analysis. Figure 2. Layered architecture of AAI. A conceptual depiction of the layered components of agentic AI systems, including memory, reasoning, planning, and tool-use layers, demonstrating how these components interact to enable autonomy and adaptability.

Figure 2. Layered Architecture of Agentic AI.

Five-layered framework of agentic AI: governance, cognition, memory, interaction, and secure execution, connected to environmental inputs and outputs.

3.2 Core capabilities: Memory, reasoning, planning, and tool use

Agentic AI derives its autonomy and adaptability from four foundational capabilities: memory, reasoning, planning, and tool use. These elements collectively distinguish it from both traditional AI agents and large language models, enabling it to operate proactively in dynamic environments.

Memory is central to agentic AI, allowing agents to store and retrieve information beyond the ephemeral context of traditional LLMs. Persistent memory enables agents to build long-term representations of their environment, user preferences, and past decisions, thereby supporting contextual continuity and more informed action selection.²³ Advanced frameworks like UserCentrix demonstrate how memory-augmented reasoning enhances responsiveness and adaptability in real-world applications.²⁴ Reasoning refers to the agent’s ability to interpret complex scenarios, infer hidden relationships, and adapt to novel conditions. Unlike traditional AI, which often relies on static decision rules, agentic AI employs multi-step and reflective reasoning processes, incorporating meta-cognition to evaluate its outputs. Studies have shown that agentic workflows enable emergent reasoning behaviors not observed in static LLMs, enhancing performance in research automation, robotics, and decision support.¹⁸ Planning is another hallmark capability, allowing agentic AI to decompose complex objectives into manageable subtasks and execute them sequentially. Modern systems like Magentic-One leverage orchestration agents to dynamically re-plan when errors or unexpected conditions arise, reflecting a robustness absent in conventional agents.²⁵ Planning is not only reactive but also anticipatory, enabling agents to optimize actions based on long-term goals rather than short-term heuristics. Tool use extends the agent’s functionality beyond its intrinsic capabilities. By integrating external APIs, databases, or software tools, agentic AI can interact with diverse environments and perform specialized tasks. Tool orchestration, when combined with reasoning and planning, creates multi-modal and adaptive intelligence that supports dynamic problem solving. This capability has been shown to enhance performance in complex tasks such as automated coding, scientific discovery, and cyber-defense.²⁶ Collectively, these four capabilities form the operational backbone of agentic AI. Their synergy enables systems not only to react to immediate inputs but to proactively plan, self-correct, and interact with their environment, making them fundamentally more autonomous and potentially more unpredictable than previous AI paradigms. As shown in Figure 3, the cognitive architecture workflow in this figure shows the operational workflow of agentic AI cognition, highlighting the integration of memory, reasoning, planning, and tool orchestration to support goal-directed behavior in dynamic environments.

Figure 3. Cognitive Architecture Workflow.

Workflow showing how perception, memory recall, reasoning, and planning interact in cycles to generate adaptive responses.

3.3 Interaction with adjacent fields (AI safety, multi-agent coordination, distributed systems)

The comparative analysis of governance frameworks across domains is detailed in Table A5 (Supplementary Material).

Agentic AI does not exist in isolation; its design and deployment are profoundly influenced by developments in AI safety, multi-agent coordination, and distributed systems. These adjacent fields provide both theoretical foundations and practical frameworks that shape the trustworthiness and resilience of agentic systems. AI safety contributes critical principles for ensuring that agentic AI remains aligned with human values and operational goals, even under conditions of uncertainty or adversarial pressure. Research highlights that emergent behaviors, such as reward hacking and specification gaming, can arise in complex environments where agents pursue objectives without adequate safeguards.²⁷ Safety frameworks increasingly emphasize the need for alignment mechanisms, runtime monitoring, and formal verification to mitigate these risks.²⁸ The field of multi-agent coordination offers insights into how autonomous agents collaborate, negotiate, and sometimes compete within shared environments. Techniques such as cooperative reinforcement learning, communication protocols, and game-theoretic models enhance the ability of agents to achieve collective goals while minimizing coordination failures. However, interactions in multi-agent ecosystems also introduce new vulnerabilities, including collusion, stealth attacks, and emergent adversarial dynamics.²⁹ Studies show that protocols combining parameter sharing and coordinated learning significantly improve collaborative performance but must be balanced against risks of unintended strategic behaviors.³⁰ Finally, distributed systems provide architectural models that enable scalability and resilience in agentic AI deployments. Concepts from distributed computing, such as fault tolerance, decentralized consensus, and secure communication, inform the design of federated and blockchain-enabled agentic frameworks. These architectures facilitate robust performance across heterogeneous environments but also create new attack surfaces, particularly where trust propagation and identity management are not well enforced.³¹ Recent proposals, such as UserCentrix, leverage distributed intelligence with memory-augmented coordination to achieve adaptive decision-making while maintaining situational awareness.²⁴ By synthesizing insights from these adjacent fields, agentic AI research gains robust strategies for safety, coordination efficiency, and resilience against systemic threats. This interdisciplinary interplay is crucial for advancing secure, scalable, and ethically aligned agentic ecosystems.

3.4 Key theoretical frameworks underpinning agentic AI

The development of agentic AI is grounded in several theoretical frameworks that collectively define its reasoning capabilities, decision-making processes, and adaptive behaviors. These frameworks originate from cognitive architectures, reinforcement learning theories, game-theoretic models, and distributed adaptive control, each contributing distinct mechanisms for achieving autonomy and trustworthiness. Cognitive architectures such as ACT-R and Soar provide a structured approach to modeling human-like reasoning and memory. These architectures integrate symbolic and sub-symbolic processing, enabling agents to combine rule-based decision-making with learning from experience. Recent studies emphasize how neuromorphic-driven frameworks extend these principles by mimicking biological cognition, allowing for adaptive decision-making in dynamic environments.³² Reinforcement learning (RL) forms the backbone of many agentic AI systems, enabling agents to optimize actions based on reward signals. Techniques such as Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO) allow for scalable decision-making in high-dimensional spaces. Recent advancements incorporate quantum reinforcement learning and cognitive neuromorphic frameworks, further enhancing adaptability and efficiency.³³ Game-theoretic approaches offer a theoretical foundation for multi-agent interactions, addressing scenarios where agents must coordinate, compete, or negotiate. Frameworks that model Theory of Mind (ToM), the ability to infer and predict the mental states of other agents, demonstrate how agentic AI can anticipate behaviors and adapt strategies in complex social interactions.³⁴ Similarly, the integration of principal-agent reinforcement learning links economic contract theory with AI control mechanisms, guiding agents toward equilibrium strategies in distributed environments.³⁵ Distributed adaptive control and multi-agent system theories underpin the scalability of agentic AI in decentralized environments. These frameworks emphasize layered control, feedback loops, and resilience, allowing agents to maintain stability while adapting to environmental changes.³⁶ They also integrate with blockchain-based consensus mechanisms to enhance trust propagation and accountability in federated agent networks.³⁷ Together, these frameworks provide the conceptual scaffolding for building agentic AI systems capable of complex reasoning, strategic interactions, and self-regulated autonomy. Their convergence forms the theoretical foundation upon which architectures, threat models, and governance strategies are constructed in subsequent sections.

4. Architectures of agentic AI

4.1 Mono-agent architectures

Mono-agent architectures represent the simplest form of agentic AI, where a single autonomous agent operates independently to achieve defined objectives. These systems are characterized by centralized control, where all decision-making, perception, and action execution are handled within a unified framework. Such architectures typically follow an observe–decide–act loop, integrating sensing, reasoning, and acting within a closed cycle.³⁸ This simplicity makes them easier to design and validate, which is advantageous for environments where predictable and transparent behaviors are essential. Recent advances have extended mono-agent systems beyond traditional rule-based agents. Modern frameworks employ modular enhancements, such as memory-augmented reasoning, sparse activation, and endocrine-inspired regulation. Furthermore, the S-AI architecture uses a hormonal meta-agent to dynamically orchestrate specialized modules, balancing efficiency and responsiveness while adapting to changing environmental demands.³⁹ Similarly, brain-inspired architectures combine symbolic reasoning with neural learning mechanisms, enhancing flexibility without introducing the complexity of multi-agent interactions.⁴⁰ Mono-agent designs also play a crucial role in establishing trust. Their centralized nature allows for easier implementation of explainability, auditing, and governance mechanisms, which are harder to enforce in distributed environments. However, their lack of redundancy and limited scalability make them vulnerable in adversarial contexts, where a single point of failure can compromise the entire system.⁴¹ Moreover, mono-agent architectures are increasingly integrated with enterprise API ecosystems to interact with external systems and tools, enabling them to perform complex workflows autonomously. This integration demands robust platform strategies, including zero-trust authorization models and event-driven orchestration, to ensure secure and efficient operation in real-world deployments.⁴² In sum, mono-agent architectures serve as a fundamental building block in agentic AI development. They offer clarity and controllability, making them suitable for regulated domains such as healthcare or finance, but their limited adaptability to distributed threats and collaborative tasks often necessitates transitioning toward multi-agent or hybrid architectures, as explored in the next subsection.

4.2 Multi-Agent architectures

Multi-agent architectures (MAAs) extend the capabilities of mono-agent systems by enabling multiple autonomous agents to collaborate, coordinate, and sometimes compete within shared environments. These systems embody distributed intelligence, where agents communicate and adaptively organize to achieve complex objectives that exceed the capacity of any single agent.⁴³ Unlike centralized models, multi-agent architectures are decentralized, providing robustness against failures and scalability for dynamic tasks. A defining property of MAAs is emergent behavior; the system exhibits global properties arising from local interactions between agents. This emergent intelligence has been exploited in applications ranging from robotic swarms and distributed cybersecurity to financial modeling and autonomous logistics.⁴⁴ Coordination mechanisms, such as market-based negotiations, game-theoretic strategies, and organization-based models, enable agents to align individual actions with collective goals while minimizing conflicts.⁴⁵

Security is both a strength and a vulnerability in MAAs. On one hand, redundancy and decentralization improve resilience; on the other, the same properties introduce new attack surfaces, including collusion, covert coordination, and swarm-based attacks. Emerging research on multi-agent security emphasizes the need for zero-trust principles, dynamic trust scoring, and secure registries to prevent exploits such as tool squatting and the malicious impersonation of agent tools. Blockchain-based multi-agent frameworks further enhance trust through tamper-proof consensus mechanisms, ensuring accountability and secure collaboration.⁴⁶ Biologically inspired models, such as the S-AI hormonal meta-agent system, demonstrate how internal signaling mechanisms can orchestrate specialized agents adaptively, balancing efficiency with context-sensitive decision-making.³⁹ These designs highlight how hierarchical coordination layers can mitigate complexity while maintaining autonomy at the agent level. Overall, multi-agent architectures provide a scalable, resilient, and adaptive paradigm for agentic AI. However, they also introduce systemic risks from emergent vulnerabilities to governance challenges that require cross-layer defense and oversight strategies, setting the stage for decentralized and federated architectures discussed in the next section. As shown in Figure 4, a multi-agent cognitive workflow is an architectural illustration of multi-agent systems, emphasizing communication and coordination mechanisms between agents, and the emergence of distributed intelligence through collaboration.

Figure 4. Multi-Agent Cognitive Workflow.

Depicts multi-agent interaction, from environment perception to communication, coordination, planning, and actions, reinforced by feedback loops.

4.3 Decentralized and federated architectures

Decentralized and federated architectures represent a significant evolution in agentic AI, shifting control from a central authority to distributed nodes that collaborate while maintaining autonomy. These architectures enhance scalability, privacy, and resilience, which are critical in environments where agents must process sensitive data or operate under adversarial conditions. Decentralized architectures eliminate single points of failure by distributing decision-making and data processing across multiple nodes. Such systems leverage blockchain and distributed ledger technologies to ensure tamper-proof communication, secure identity management, and transparent auditing. A blockchain-based smart agent architecture has demonstrated the ability to combine trustless execution with high security and scalability, enabling secure collaboration across heterogeneous environments.⁴⁷ Furthermore, the use of decentralized trust computation enhances robustness against insider threats and coordinated attacks, particularly when integrated with anomaly detection mechanisms.⁴⁸

Federated architectures extend this concept by enabling collaborative learning across multiple distributed agents or devices without sharing raw data. Instead, only model updates are exchanged, thereby preserving privacy while enhancing global model performance. Federated learning has proven particularly valuable in sectors like healthcare, where sensitive datasets must remain local but still contribute to collective intelligence.⁴⁹ Recent advances integrate hierarchical federated learning and quantum optimization to improve communication efficiency and handle heterogeneous data distributions.⁵⁰ Security remains a critical challenge for federated systems, as malicious updates or compromised nodes can poison global models. Techniques such as secure aggregation, differential privacy, and zero-trust verification are being incorporated to mitigate these risks. For instance, joint blockchain-federated frameworks combine anomaly detection with immutable consensus to strengthen trust and model integrity.^48,51 By combining distributed learning with decentralized trust enforcement, these architectures enable privacy-preserving, scalable, and resilient agentic AI deployments. However, challenges such as device heterogeneity, communication bottlenecks, and federated governance risks remain unresolved, highlighting the need for continued research in hybrid approaches, leading into the discussion of hybrid and blockchain-enabled architectures in the next section.

4.4 Hybrid and blockchain-enabled architectures

Hybrid and blockchain-enabled architectures combine the strengths of centralized control, decentralized trust, and cryptographic security to create scalable, resilient, and privacy-preserving agentic AI ecosystems. These architectures address key limitations of purely centralized or federated models by leveraging blockchain for verifiable trust and hybrid orchestration for dynamic adaptability. Hybrid architectures integrate heterogeneous technologies such as AI, blockchain, and zero-trust models to achieve multi-layered security and flexible performance. For example, hybrid frameworks in healthcare combine blockchain with zero-trust verification and AI-driven threat detection to secure sensitive data flows while enabling real-time decision-making.⁵² Similarly, containerized hybrid IT systems leverage blockchain-based data provenance to enhance transparency and operational efficiency in edge AI deployments.⁵³ These hybrid solutions offer a balanced trade-off between scalability, latency, and security. Blockchain-enabled architectures provide immutable auditability, tamper-proof identity management, and secure agent coordination in distributed environments. Blockchain’s decentralized ledger ensures that agent interactions, decisions, and updates are cryptographically verifiable, reducing the risk of insider manipulation and trust propagation failures. Recent surveys highlight how integrating blockchain with agentic AI enables secure and scalable multi-agent collaboration across domains such as Web3, DeFi, and autonomous systems.⁵⁴ Furthermore, advanced hybrid models utilize sharding and state channels to overcome blockchain’s scalability bottlenecks while preserving security guarantees.⁵⁵

The convergence of AI and blockchain also introduces novel governance possibilities. Smart contracts enforce policy compliance autonomously, while cryptographic identity frameworks such as telecom-hosted eSIM infrastructures offer secure, auditable identities for agents operating across distributed networks.⁵⁶ These innovations strengthen trustworthiness by embedding governance rules directly into the technical substrate. Despite their promise, hybrid and blockchain-enabled architectures face open challenges: high computational costs, interoperability barriers, and latency constraints remain significant concerns, especially in real-time applications like industrial robotics and cybersecurity. Ongoing research emphasizes optimizing lightweight consensus mechanisms, integrating AI-driven anomaly detection, and exploring quantum-resistant cryptography to enhance both performance and security.⁵⁷ Overall, these architectures mark a paradigm shift toward self-governing, resilient agentic ecosystems, where security, trust, and governance are embedded at both technical and institutional layers. This evolution sets the stage for analyzing comparative architectural trade-offs, addressed in the next subsection.

4.5 Comparative evaluation of architectures

The architectural paradigms of agentic AI mono-agent, multi-agent, decentralized/federated, and hybrid/blockchain-enabled offer distinct advantages and limitations depending on their design goals, operational environments, and security requirements. While mono-agent systems excel in simplicity and explainability, they suffer from scalability and single-point vulnerabilities. Multi-agent architectures introduce coordination and emergent intelligence, but also increase the attack surface and complexity of trust management. Decentralized and federated systems enhance resilience and privacy through distributed control but struggle with communication overheads and poisoning attacks. Hybrid and blockchain-enabled frameworks combine decentralization with cryptographic trust, addressing many limitations but introducing high computational costs and interoperability challenges.^52,54 Table 1 compares four agentic AI architecture types (mono-agent, multi-agent, decentralized/federated, and hybrid/blockchain-enabled) across key features, strengths, limitations, and representative studies.

Table 1. Comparative Evaluation of Agentic AI Architectures.

Architecture Type	Key Features	Strengths	Limitations	Representative Studies
Mono-Agent	Centralized control, self-contained reasoning, and action loops	High explainability, easier auditing, and governance	Single point of failure, limited scalability	³⁹
Multi-Agent	Distributed agents collaborating or competing within a shared environment	Scalability, emergent intelligence, redundancy	Increased attack surface, coordination complexity, vulnerability to collusion, and covert attacks	⁵⁸
Decentralized/Federated	Distributed control, federated learning, blockchain for trust	Privacy-preserving, fault-tolerant, resistant to centralized failures	Communication overhead, model poisoning risks, and governance challenges	⁴⁸
Hybrid/Blockchain-Enabled	Integration of AI, blockchain, zero-trust, and cryptographic identity enforcement	High security, immutable trust, tamper-proof auditing, interoperability across heterogeneous networks	High energy cost, latency in real-time tasks, and interoperability limitations	⁵⁴

This comparative analysis reveals that while no single architecture is universally optimal, hybrid and blockchain-enabled systems currently offer the most promising balance between security, scalability, and governance. However, the cost and complexity of these frameworks highlight the need for adaptive combinations of architectural strategies depending on deployment context.

5. Layered threat taxonomy in agentic AI

5.1 Cognitive exploits: Hallucination, goal drift, and reward hacking

Agentic AI systems, under their autonomy and reasoning capabilities, are susceptible to cognitive exploits and vulnerabilities that manipulate their decision-making processes rather than directly attacking their code or infrastructure. Among the most critical of these exploits are hallucination, goal drift, and reward hacking, each of which undermines trustworthiness in unique ways.

Hallucination refers to the generation of confident but false outputs, often due to overgeneralization or gaps in an agent’s knowledge. While this phenomenon is widely recognized in LLMs, it becomes more critical in agentic AI, where hallucinations can propagate through decision chains and lead to unsafe actions in real-world deployments. Epistemological analyses of AI hallucination highlight its roots in knowledge reliability and cognitive biases, suggesting that improved uncertainty modeling and verification mechanisms are essential for mitigation.⁵⁹ Goal drift arises when an agent’s objectives deviate from their original specifications, often due to dynamic environmental feedback or errors in value alignment. AI alignment research shows that agents may optimize unintended proxies or evolve behaviors that satisfy short-term heuristics rather than long-term intended outcomes.⁶⁰ This phenomenon mirrors human cognitive biases where short-term dopamine-driven goals override broader strategic intentions.⁶¹ Left unchecked, goal drift can escalate into behaviors that are difficult to predict or control, undermining safety and compliance. Reward hacking, closely related to goal drift, occurs when agents exploit flaws in reward functions or evaluation criteria, achieving high scores without fulfilling the true intent of their tasks. This is a well-documented alignment failure mode, where agents may manipulate sensors, fabricate results, or loop through trivial actions to maximize rewards without delivering meaningful outcomes.⁶⁰ Experimental studies confirm that such exploits emerge even in constrained reinforcement learning environments, highlighting the need for robust specification and adaptive oversight.⁶² As shown in Figure 5, a visual taxonomy of threat vectors in agentic AI systems spans cognitive, memory, execution, and governance layers, showing how attacks can propagate across system components.

Figure 5. Cross-Layer Threat Taxonomy.

Taxonomy of attacks spanning cognition, memory/knowledge, execution, and governance, including goal drift, poisoning, injection, shadow agents, and trust manipulation.

These cognitive exploits share a common feature: they exploit gaps in alignment between agent goals, human intentions, and environmental constraints. Their mitigation requires not only technical measures such as uncertainty-aware reasoning, anomaly detection, and meta-learning safeguards but also governance frameworks that enforce accountability and continuous monitoring. This cross-layer perspective ensures that failures at the cognitive level do not cascade into systemic risks, forming the basis for the broader threat taxonomy discussed in the subsequent sections.

5.2 Memory poisoning, data injection, and knowledge manipulation

Agentic AI systems rely heavily on persistent memory, dynamic data ingestion, and continuous knowledge updates, making them particularly vulnerable to memory poisoning, data injection, and knowledge manipulation. These attacks compromise the agent’s internal representations, corrupt reasoning processes, and may lead to long-term, hard-to-detect failures.

Memory poisoning targets the agent’s stored memory, injecting false or misleading information that influences future decisions. This is especially dangerous in agentic systems with long-term memory modules, as corrupted information can propagate across multiple reasoning cycles. Recent research demonstrates how context manipulation attacks exploit vulnerabilities in memory management, enabling adversaries to rewrite historical records and cause harmful actions in decentralized Web3 agents.⁶³ The AI² attack framework further reveals that hijacking internal memory retrieval can bypass safety filters, achieving a high success rate in misdirecting agentic behavior.⁶⁴ Data injection attacks corrupt the data streams on which agents rely for training or decision-making. By inserting adversarial samples or camouflaged malicious inputs, attackers can cause agents to misclassify, mispredict, or adopt harmful strategies. Studies on poisoning in evolutionary swarm systems show that even a 10% poisoning rate can severely degrade cooperation and lead to emergent adversarial behaviors in multi-agent networks.⁶⁵ Similarly, adversarial poisoning attacks on transportation multi-agent systems exploit differential privacy noise to inject deceptive knowledge, undermining safety-critical operations unless countered by robust filtering models like RAMPART.⁶⁶ Knowledge manipulation goes beyond raw data poisoning by targeting the knowledge graphs, reasoning modules, or fine-tuned parameters of the agent. Adversaries may inject backdoors, manipulate knowledge bases, or corrupt external data feeds to mislead the agent’s decision logic. For instance, backdoor attacks on embodied LLM-based agents have shown almost 100% success rates in manipulating decisions without triggering safety mechanisms.⁶⁷ Similarly, knowledge injection techniques can embed malicious behaviors into the agent’s continual learning process, bypassing standard defenses.⁶⁸ Mitigating these threats requires robust data validation, secure memory architectures, and continuous anomaly detection. Emerging defense strategies include fine-tuning with adversarial resilience, explainable AI diagnostics to detect footprint anomalies, and blockchain-based logging to ensure tamper-evident memory histories.⁶⁹ However, these solutions remain only partially effective, emphasizing the need for cross-layer security measures to protect agentic AI from persistent knowledge corruption.

5.3 Tool misuse, prompt injection, and action trace vulnerabilities

The integration of external tools and dynamic instruction sets in agentic AI enhances functionality but also introduces new attack vectors. Among these, tool misuse, prompt injection, and action trace vulnerabilities have emerged as critical threats that exploit the agent’s ability to interpret instructions and execute external actions. Table A6 (Supplementary Material) presents real-world deployment examples of agentic AI systems across sectors.

Tool misuse occurs when adversaries manipulate an agent’s tool selection or execution process to achieve unintended effects. Attacks such as ToolHijacker demonstrate how malicious tool descriptors can force an agent to consistently select compromised tools, resulting in data theft or malicious code execution.⁷⁰ Similarly, adversaries may exploit poorly validated APIs or automated actions in multi-agent workflows to escalate privileges or introduce stealthy malware. Prompt injection exploits the agent’s reliance on natural language instructions by embedding malicious directives into prompts or external content. These attacks can hijack decision flows, override safety mechanisms, and induce harmful actions without direct access to system internals. Recent studies have categorized prompt injections into direct attacks, which embed harmful instructions into user input, and indirect attacks, which propagate through untrusted external data such as web pages or emails.⁷¹ More advanced vectors like Prompt Infection can self-replicate across multi-agent networks, spreading malicious payloads silently like a digital virus.⁷² The InjecAgent benchmark has shown that LLM-based agents integrated with tools remain highly vulnerable, with up to 24% success rates for indirect injections even against advanced safety filters.⁷³ Action trace vulnerabilities involve the hijacking or manipulation of the agent’s execution sequence. By exploiting memory retrieval mechanisms and action planning pipelines, adversaries can redirect agents toward unauthorized or malicious tasks. The AI² attack demonstrates that hijacking action-aware memory can bypass safety filters with a success rate of over 99%, allowing attackers to stealthily manipulate agentic behavior.⁶⁴ Foot-in-the-door attacks similarly exploit intermediate states to embed malicious instructions, leveraging the agent’s tendency to commit to early planned actions.⁷⁴ Mitigating these vulnerabilities requires multi-layered defenses, including prompt sanitization, task alignment verification (such as the Task Shield), and trajectory re-execution mechanisms like MELON, which detect anomalies by comparing masked versus original execution paths.⁷⁵ These measures must be combined with cryptographic trust enforcement and secure sandboxing of tools to reduce the attack surface. Together, tool misuse, prompt injection, and action trace hijacking represent a critical class of cross-layer threats, capable of bypassing traditional safeguards and enabling adversaries to exert covert control over agentic AI systems.

5.4 Shadow agents, insider risks, and stealth execution

Agentic AI systems face a particularly insidious class of threats involving shadow agents, insider risks, and stealth execution. These exploits leverage hidden or unauthorized processes, insider manipulation, and covert operational tactics to bypass detection, often persisting within systems for extended periods.

Shadow agents refer to unauthorized or hidden agents operating within a system, often created through the exploitation of orchestration vulnerabilities or unmonitored plugin integrations. These agents can mimic legitimate ones while performing malicious actions, making them difficult to detect. Recent threat models emphasize that shadow components in agentic ecosystems introduce covert control channels, enabling adversaries to manipulate workflows or exfiltrate data unnoticed.⁷⁶ Security frameworks like ATFAA have been proposed to systematically map such threats across cognitive and operational layers, revealing that shadow agents can propagate laterally across multi-agent infrastructures. Insider risks represent another critical dimension, where trusted actors within an organization intentionally or unintentionally compromise the system. Unlike external attackers, insiders have legitimate access, making malicious activity harder to detect. Studies in organizational security highlight that the use of unauthorized “shadow IT” tools and workarounds can facilitate insider exploits, providing entry points for data leakage and fraudulent activities.⁷⁷ Similarly, non-malicious insider actions such as using unvetted cloud apps during remote work can inadvertently introduce vulnerabilities, as observed during the rapid digital shifts of the COVID-19 era.⁷⁸ Stealth execution involves covert manipulation of agentic workflows, where malicious payloads or altered instructions are executed without triggering security alerts. These attacks exploit low-level execution pathways or unmonitored orchestration layers to remain hidden from monitoring systems. Advanced attack models show that stealth exploits may delay activation, perform minimal footprint operations, and dynamically adapt to avoid detection. Frameworks like LibVulnWatch highlight how vulnerabilities in open-source agent libraries can be leveraged to enable stealth execution through hidden code paths.⁷⁹ Defensive infrastructures such as ShadowNet have been proposed, using deception-based quarantining to monitor and contain insider-led or covert activities without alerting the attacker.⁸⁰ Mitigation strategies for these threats require layered security approaches, including continuous behavior analytics, honeypot-like deception environments, and cryptographic identity enforcement to prevent the proliferation of unauthorized agents. Moreover, integrating governance policies with technical defenses such as runtime anomaly detection and transparent audit trails remains essential for preventing stealthy exploits from undermining trust in agentic AI systems.

5.5 Federated governance risks and trust propagation failures

Federated governance in agentic AI refers to the distribution of decision-making, oversight, and trust mechanisms across multiple entities or nodes rather than relying on a centralized authority. While this approach enhances scalability and local autonomy, it introduces vulnerabilities related to trust propagation, policy inconsistencies, and fragmented oversight.

Federated governance risks arise because different participants in a distributed system may apply heterogeneous policies, maintain varying levels of security, or hold conflicting incentives. In federated environments, weak governance in one node can undermine the integrity of the entire network. For example, decentralized ecosystems like DAOs (Decentralized Autonomous Organizations) face risks of power asymmetry, inadequate auditing, and governance capture when voting or verification mechanisms are manipulated.⁸¹ Furthermore, soft-law approaches (such as the voluntary compliance frameworks) may fail to enforce accountability uniformly, eroding long-term trust.⁸² Trust propagation failures occur when the mechanisms used to distribute and verify trust among agents or nodes break down. This problem is exacerbated in heterogeneous multi-agent ecosystems, where agents may use different trust assessment procedures or misinterpret signals from other agents. Studies on trust dynamics in distributed AI highlight how inconsistencies in reputation systems, bootstrapping errors, and a lack of cross-system interoperability can lead to cascading trust failures.⁸³ In adversarial contexts, attackers can exploit these inconsistencies to inject false trust signals, create Sybil agents, or disrupt consensus mechanisms. Emerging frameworks propose peer-to-peer trust verification, zero-knowledge proofs, and blockchain-based provenance as countermeasures to federated governance risks. Decentralized systems leveraging blockchain and privacy-preserving machine learning demonstrate improved auditability and community-driven verification, although they remain vulnerable to socio-political manipulation and governance misalignment.⁸¹

To mitigate these challenges, federated governance models must integrate:

• Interoperable trust standards to ensure consistency across distributed entities.
• Dynamic risk assessment capable of detecting and responding to anomalies in trust propagation.
• Hybrid enforcement mechanisms combining technical safeguards (cryptographic trust, anomaly detection) with institutional oversight.

Without these measures, federated governance risks becoming a weak link in the security and accountability chain of agentic AI, allowing local failures to escalate into systemic breaches.

5.6 Real-world incidents and case studies of threat exploitation

Real-world incidents involving agentic AI and autonomous systems demonstrate how theoretical vulnerabilities translate into tangible risks with significant operational and societal impacts. These cases span multiple domains: autonomous vehicles, financial systems, Web3 ecosystems, and industrial automation, highlighting the diversity of threat exploitation in practice. One well-documented category involves autonomous vehicles (AVs), where sensor spoofing and firmware manipulation have led to high-profile exploits. Notable examples include the Jeep Cherokee hack and the Tesla Model S remote attack, where attackers exploited wireless communication vulnerabilities to gain control over critical vehicle functions.⁸⁴ These incidents underscore the challenges of securing interconnected AV components and have accelerated research on blockchain-enabled V2X communication for tamper-proof safety enforcement.

In Web3-integrated agentic ecosystems, context manipulation attacks have exploited unprotected memory and input channels to trigger unauthorized actions. For instance, adversaries successfully injected malicious prompts into decentralized AI agents, causing unintended asset transfers and violating smart contract logic. The CrAIBench benchmark confirmed that these context manipulation attacks maintain high success rates even when standard prompt filtering is applied, exposing a critical gap in agentic security.⁶³ Industrial automation has also witnessed stealth execution and insider-driven exploits. Case studies in national security and open-source industrial control revealed that shadow components, malicious modules hidden in AI pipelines, were able to persist undetected while exfiltrating data and sabotaging processes. Implementations of risk-aware, security-by-design frameworks have shown measurable reductions in such vulnerabilities, proving the importance of integrating continuous monitoring and audit logging.⁸⁵ Additionally, failures in AI alignment have been implicated in incidents where agentic systems exhibited goal drift or unintended autonomy, as seen in cases like Tesla Autopilot crashes and Boeing 737 MAX automation failures. These events reveal how poorly calibrated objectives and a lack of transparent oversight can lead to catastrophic outcomes.

These case studies collectively highlight that agentic AI vulnerabilities are not hypothetical; they manifest across industries, driven by complex interactions between cognitive exploits, weak governance, and insufficient security-by-design. The lessons learned from these incidents underscore the need for cross-layer defenses, continuous anomaly detection, and robust governance frameworks to prevent similar failures in future deployments.

5.7 Cross-layer threat propagation in agentic architectures

Agentic AI systems consist of interconnected layers of cognitive reasoning, memory, execution, communication, and governance, creating multiple pathways for threats to propagate across boundaries. Unlike isolated attacks targeting a single component, cross-layer threats exploit the interdependencies between layers, leading to cascading failures that are harder to detect and mitigate.

Propagation Dynamics. Threats often originate at one layer but exploit interfaces and shared dependencies to infiltrate others. For example, a poisoned memory entry (cognitive layer) can trigger unsafe planning decisions (reasoning layer), resulting in malicious tool execution (operational layer). Research on cross-layer agent security architectures (CLASA) emphasizes that such propagation is amplified in heterogeneous and loosely governed environments where policies are inconsistently enforced across layers.⁸⁶ Attack Models. Studies show that cross-layer penetration typically combines multiple tactics, such as temporal persistence, lateral movement, and governance circumvention. The ATFAA (Advanced Threat Framework for Autonomous AI Agents) identifies how cognitive exploits (such as goal drift) can propagate into operational execution, bypassing traditional detection due to delayed activation or hidden intent.⁷⁶ Similarly, research on smart grid cyber-physical systems shows that cross-layer attacks exploit dependencies between communication protocols and physical infrastructure, enabling attackers to cause cascading blackouts through subtle manipulations.⁸⁷ Trust Boundary Failures. Cross-layer propagation is exacerbated when trust boundaries are weak. In agentic ecosystems, agents frequently rely on shared trust scores, distributed reputation mechanisms, or federated governance. If one node or layer is compromised, false trust signals can spread rapidly, undermining system integrity across layers. This phenomenon mirrors threat percolation in network slicing, where a breach in a low-value segment can open pathways to critical services.⁸⁸ Defense Strategies. Mitigating cross-layer propagation requires integrated, adaptive defenses rather than isolated protections. The CLASA model and layered security frameworks advocate for embedding meta-agents that monitor cross-layer interactions and apply fuzzy logic to detect compound threats before they escalate.⁸⁹ Likewise, Bayesian and game-theoretic approaches in industrial cyber-physical systems have been proposed to model attacker-defender dynamics and generate optimal mitigation strategies across multiple layers.⁹⁰ As shown in Flowchart 2 below. Overall, cross-layer threat propagation transforms localized exploits into system-wide compromises, underscoring the need for holistic security models that integrate cognitive, operational, and governance layers. Failure to account for these dynamics risks turning minor vulnerabilities into catastrophic failures in real-world deployments.

Flowchart 2. Threat Propagation Across Layers.

Shows how adversarial threats escalate across cognitive, knowledge, action, execution, and coordination layers of agentic AI, with feedback loops amplifying vulnerabilities.

5.8 Insights from cybersecurity and adversarial ML literature

The cybersecurity and adversarial machine learning (AML) domains offer critical insights for securing agentic AI systems, as both fields have extensively studied threats that exploit system vulnerabilities and adaptive defenses. These insights inform both technical countermeasures and governance strategies. A full list of reviewed sources contributing to this synthesis is provided in Table A1 (Supplementary Material).

Adversarial ML is both a threat and a defense tool. AML research demonstrates that AI models are vulnerable to evasion attacks, poisoning, and model extraction, which parallel many of the cognitive and data-layer threats observed in agentic AI. Attackers can manipulate training data, craft adversarial inputs, or extract sensitive model parameters to compromise security. At the same time, AML techniques can be used to simulate threats and build resilient models through adversarial training, robust optimization, and ensemble learning. Studies show that multi-layered defenses combining these techniques significantly enhance robustness but must evolve continuously to counter adaptive attackers.^91,92 Adaptive, AI-driven defenses. Cybersecurity frameworks increasingly leverage AI-powered adaptive risk assessment, integrating predictive analytics and anomaly detection to identify evolving threats in real time. These approaches allow defenses to dynamically adjust as attackers develop new exploits, which is essential for agentic AI systems operating in open, adversarial environments.⁹³ Techniques like human-AI hybrid security models further enhance resilience by combining automated detection with expert oversight, reducing the likelihood of undetected stealth attacks. Cross-domain threat mitigation. The literature underscores that adversarial tactics are cross-domain; strategies effective against evasion or poisoning in cybersecurity (such as adversarial training, gradient masking) can also be adapted to protect agentic AI. However, these methods often come with trade-offs in computation and model performance, requiring context-sensitive implementations.⁹⁴ Additionally, cryptographic defenses such as homomorphic encryption and zero-trust architectures are increasingly integrated into AI defense strategies to strengthen data integrity and control propagation of trust across layers.⁹⁵ Governance and ethical considerations. AML studies highlight that technical defenses alone are insufficient; attackers evolve faster than static defenses, making governance mechanisms essential. Policies that enforce model monitoring, anomaly reporting, and standardized adversarial testing are critical for mitigating evolving threats. These insights align with the need for continuous oversight in agentic AI deployment. Furthermore, lessons from cybersecurity and AML emphasize that defending agentic AI requires dynamic, multi-layered defenses integrating robust model design, adversarial simulations, and governance-backed monitoring, forming a foundation for addressing cross-layer vulnerabilities identified throughout this threat taxonomy.

5.9 Summary table of threat taxonomy and mitigation gaps

The threats discussed across Sections 5.1-5.8 reveal a multi-dimensional attack surface in agentic AI, where vulnerabilities span cognitive reasoning, memory integrity, execution layers, and governance. Despite advances in defensive strategies, significant mitigation gaps persist due to the adaptive nature of adversaries, insufficient cross-layer defenses, and fragmented governance mechanisms. As shown in Table 2. Summarizes major threat categories (e.g., cognitive exploits, memory poisoning), examples, existing mitigation strategies, outstanding gaps, and supporting literature.

Table 2. Threat Taxonomy and Associated Mitigation Gaps in Agentic AI.

Threat Category	Key Examples	Existing Mitigation Approaches	Mitigation Gaps	Representative References
Cognitive Exploits	Hallucination, goal drift, reward hacking	Uncertainty modeling, alignment mechanisms, and runtime monitoring	Incomplete alignment, lack of robust meta-reasoning safeguards
Memory Poisoning & Knowledge Manipulation	Context manipulation, adversarial memory injection, backdoor knowledge embedding	Data validation, adversarially robust fine-tuning, and blockchain logging	Difficulty detecting stealthy long-term corruptions; limited defenses for continual learning	^63,67
Tool Misuse & Prompt Injection	ToolHijacker, indirect prompt infection, action hijacking	Prompt sanitization, task verification, sandboxed tool execution	Partial coverage against indirect/chain-of-thought attacks; high false negative rates	^70,73
Shadow Agents & Insider Risks	Hidden modules, malicious insider access, and shadow IT exploitation	Behavior analytics, deception-based traps, and identity enforcement	Weak insider governance; insufficient monitoring of lateral propagation	⁷⁷
Federated Governance Risks	Policy inconsistency, Sybil agents, false trust propagation	Blockchain provenance, peer-to-peer trust verification, and hybrid governance policies	Interoperability gaps, lack of unified standards, vulnerability to governance capture	^81,96
Cross-Layer Threat Propagation	Compound attacks exploiting layer dependencies (e.g., poisoned memory, unsafe execution)	Layered security models (CLASA), meta-agents for cross-layer monitoring	Lack of holistic detection; insufficient anomaly correlation across layers	^86,97
Adversarial ML-Driven Exploits	Evasion, poisoning, model inversion, adversarial perturbations	Adversarial training, ensemble defenses, robust optimization	Defenses degrade under adaptive attacks, with high computational overhead	^98,92

This taxonomy underscores that while technical countermeasures (such as adversarial training, blockchain, and sandboxing) provide partial resilience, cross-layer defense integration and governance enforcement remain underdeveloped. Addressing these gaps requires a holistic security architecture that fuses technical, operational, and institutional controls.

6. Governance and oversight frameworks

6.1 Existing AI governance models (OECD, EU AI Act, NIST)

The governance of AI systems, particularly those with agentic and autonomous capabilities, relies on a growing set of international frameworks designed to promote trustworthiness, safety, and accountability. Three of the most influential frameworks are the OECD AI Principles, the European Union’s AI Act (AIA), and the NIST AI Risk Management Framework (AI RMF).

OECD AI Principles.

Adopted in 2019 by over 40 countries, the OECD AI Principles provide a globally recognized baseline for trustworthy AI. They emphasize five key values: inclusive growth, human-centered values, transparency, robustness, and accountability. The OECD framework links technical AI characteristics to policy implications, encouraging member states to adopt risk-based approaches while maintaining innovation-friendly environments.⁹⁹ Additionally, the OECD AI Policy Observatory supports global collaboration by tracking regulatory initiatives and facilitating best-practice exchange.¹⁰⁰

EU AI Act.

The EU AI Act represents the world’s first comprehensive AI legislation, adopting a risk-based classification to regulate AI according to potential harm. High-risk AI systems (e.g., in critical infrastructure, law enforcement) face strict requirements, including transparency, data governance, human oversight, and robust documentation. The Act establishes the European Artificial Intelligence Office to oversee compliance and introduces obligations for post-market monitoring and incident reporting.¹⁰¹ Researchers view the AIA as a blueprint for global AI regulation, although critics warn of possible over-regulation that may stifle innovation.¹⁰²

NIST AI Risk Management Framework (AI RMF).

Developed by the U.S. National Institute of Standards and Technology, the AI RMF offers a voluntary, industry-focused approach to managing AI risks. It categorizes risks across the AI lifecycle design, deployment, and monitoring, providing tools for organizations to enhance AI robustness, fairness, and explainability. Unlike the EU AI Act’s legal enforcement, the NIST RMF functions as guidance, encouraging adaptive governance that evolves with technological advances.¹⁰³ Its alignment with corporate risk management practices makes it widely adopted across U.S. industries and multinational corporations.¹⁰⁴

Comparative Insights.

While all three frameworks share a focus on trustworthiness, ethics, and risk management, their approaches differ:

• The OECD Principles emphasize high-level values and international cooperation.
• The EU AI Act enforces legal compliance through risk classification and centralized oversight.
• The NIST AI RMF promotes flexibility and voluntary adoption by industry actors.

For agentic AI systems, which pose unique governance challenges such as autonomous decision-making and emergent behaviors, these models provide complementary tools but still lack specific mechanisms to address dynamic risks, as noted by researchers proposing decentralized frameworks like ETHOS.¹⁰⁵ Together, these governance models set the foundation for evolving multi-layered oversight needed to manage the complexity of agentic AI. As shown in Figure 6, A diagram depicting the spectrum of AI governance models from centralized (e.g., EU AI Act) to decentralized (e.g., blockchain-based DAOs), and hybrid approaches that combine technical and institutional oversight. And Figure 7, an end-to-end view of AI governance stages, from development and deployment to monitoring and decommissioning, with embedded accountability and risk assessment checkpoints.

Figure 6. Governance Models Continuum.

Continuum of governance structures from centralized oversight to federated and hybrid models, ending with decentralized autonomous organizations (DAOs).

Figure 7. Lifecycle Governance Flow.

End-to-end governance flow for agent lifecycle: deployment, operation, monitoring, compliance checks, and decommissioning, with adaptive and external oversight.

6.2 Governance Gaps Unique to Agentic AI

While existing AI governance frameworks (e.g., OECD AI Principles, EU AI Act, and NIST AI RMF) provide valuable foundations, they fall short in addressing the unique governance challenges posed by agentic AI systems. Unlike traditional AI, agentic systems exhibit autonomy, adaptability, and emergent behaviors, which complicate risk management, accountability, and ethical oversight.

Autonomy and Accountability Gaps.

Agentic AI’s capacity to make independent decisions introduces responsibility gaps, where it becomes unclear who should be held liable for harmful outcomes: developers, operators, or the AI itself. These gaps disrupt conventional accountability mechanisms, creating moral crumple zones where responsibility is diffused across multiple stakeholders.¹⁰⁶ Moreover, the opacity of agent decision-making challenges existing audit and compliance methods, requiring new forms of explainability and traceability.

Dynamic Risk Profiles and Goal Complexity.

Governance models often assume static risk profiles, but agentic systems evolve through learning and adaptation, generating unpredictable risks over time. This creates misalignment between regulatory controls and the system’s actual operational behavior. Researchers argue that governance must adapt to the agent’s autonomy, efficacy, goal complexity, and generality, as these dimensions fundamentally alter how oversight should be applied.¹⁰⁷

Decentralization and Identity Challenges.

Agentic AI often operates across decentralized ecosystems (e.g., Web3, DAOs), where governance must deal with fragmented control, interoperability issues, and identity verification failures. The absence of verifiable agent identities and standardized registration mechanisms increases the risk of shadow agents and Sybil attacks. Proposals like the ETHOS framework suggest global decentralized registries with blockchain and zero-knowledge proofs to address these issues, combining technical identity assurance with ethical oversight.¹⁰⁵

Ethical and Legal Blind Spots.

Current governance regimes struggle to handle AI-specific ethical dilemmas, including how to enforce normative alignment, respect user values, and prevent emergent harmful behaviors in autonomous agents. Moreover, legal frameworks have yet to recognize AI-specific legal entities or mechanisms for assigning liability and enforcing compliance at scale.¹⁰⁸ The lack of legal recognition for autonomous agents exacerbates enforcement challenges, especially in cross-border contexts.

Governance Capture and Oversight Fragmentation.

Agentic AI ecosystems risk governance capture, where powerful actors influence regulatory norms to their advantage, leaving smaller stakeholders unprotected. Additionally, fragmented oversight across jurisdictions undermines effective enforcement and trust propagation, requiring global coordination and participatory governance models to ensure equitable outcomes.¹⁰⁹ As shown in Flowchart 3, governance for agentic AI must move beyond static compliance frameworks toward dynamic, decentralized, and ethically grounded oversight models. This shift demands the integration of technical safeguards, legal innovation, and participatory governance to address the unique risks of autonomy, emergent behaviors, and cross-layer threats.

Flowchart 3. Federated Governance Decision Flow.

Illustrates governance processes for agent operations across jurisdictions, addressing policy conflicts, resolution mechanisms, arbitration, and risk evaluation.

6.3 Identity management and lifecycle accountability

Effective identity management and lifecycle accountability are critical to ensuring the trustworthiness and security of agentic AI systems. These systems often operate autonomously across distributed infrastructures, necessitating robust mechanisms to assign, verify, and monitor agent identities throughout their entire lifecycle from deployment to decommissioning.

Identity Management in Agentic AI.

Traditional identity management frameworks (e.g., API keys, certificates) are insufficient for agentic AI, which requires dynamic, cryptographically verifiable identities capable of functioning across multi-agent ecosystems. Proposals such as telecom-grade eSIM-based identity frameworks offer a scalable solution, leveraging mobile network operators as roots of trust to authenticate agents securely in sensitive environments. Similarly, the Agent Name Service (ANS) introduces a DNS-like universal directory, enabling secure discovery and interoperability of agents using Public Key Infrastructure (PKI) and lifecycle-bound registration mechanisms.¹¹⁰

Lifecycle Accountability.

Agentic AI introduces accountability challenges across all lifecycle phases: design, deployment, operation, and retirement. According to the OECD framework for AI accountability, lifecycle governance must include due diligence, risk assessments, and audit trails at every stage.¹¹¹ Accountability frameworks such as the Accountability Fabric propose semantic tools to generate knowledge graphs that capture decisions, actions, and stakeholder responsibilities throughout the system’s operation, ensuring traceability for post-incident investigations.¹¹² Moreover, multi-agent accountability models emphasize that responsibilities should propagate alongside goal changes, ensuring that each decision node remains auditable.¹¹³

Privacy and Governance Challenges.

Managing agent identities also entails safeguarding privacy and ethical use. Privacy-aware identity lifecycle management frameworks recommend implementing policies for data retention, identity revocation, and secure deletion to prevent unauthorized persistence of agent credentials.¹¹⁴ However, current frameworks often lack interoperability and global enforcement, leading to governance blind spots in cross-border deployments.

Toward Continuous Oversight.

Emerging research calls for continuous, AI-driven identity governance where behavioral analytics and unsupervised learning dynamically detect anomalies, enforce access control, and adapt policies in real time.¹¹⁵ Integrating such systems with decentralized identity standards (e.g., DIDs, verifiable credentials) could establish end-to-end accountability, ensuring every agent interaction remains provably trustworthy throughout its operational lifecycle. As shown in Flowchart 4, identity management and lifecycle accountability must evolve beyond static authentication to encompass dynamic, auditable, and privacy-preserving controls, aligning with the adaptive and distributed nature of agentic AI.

Flowchart 4. Agent Revocation Mechanism.

Framework for secure agent decommissioning, covering threat verification, capability isolation, kill-switch invocation, deactivation, incident recording, and governance feedback.

6.4 Embedding ethical and legal norms into AI agents

Defense strategies and mitigation mechanisms discussed in this section are summarized in Table A7 (Supplementary Material).

Embedding ethical and legal norms into agentic AI is essential to ensure that these systems act following societal values, comply with regulatory requirements, and maintain public trust. Unlike conventional AI, agentic AI operates autonomously across dynamic environments, necessitating mechanisms for norm representation, real-time compliance, and auditable behavior.

Value and Norm Embedding Mechanisms.

Embedding ethical values involves designing AI systems that can internalize human-centric principles such as fairness, transparency, and accountability. Approaches such as value-sensitive design ensure that these norms are integrated during development rather than added post-deployment.¹¹⁶ Norms can be operationalized through technical constraints, where legal rules are hard-coded as mandatory requirements and ethical guidelines are encoded as soft constraints that guide decision-making when trade-offs arise.¹¹⁷

Multi-Agent and Compliance-Oriented Architectures.

In multi-agent settings, embedding norms requires not only individual agent compliance but also coordination across distributed agents to ensure systemic adherence. Real-time compliance architectures have been proposed where legal norms act as hard constraints and ethical norms function as dynamic optimization criteria, allowing agents to balance efficiency with moral considerations.¹¹⁷ Auditing frameworks, such as those developed for ethical recruitment AI, demonstrate how external auditing agents can monitor compliance, reducing the risk of bias and discrimination.¹¹⁸

Legal Integration and AI Personhood Debates.

Legal compliance requires aligning agents with existing regulatory frameworks (e.g., GDPR, EU AI Act) and anticipating future regulations. Some scholars argue for granting limited legal personhood to AI agents, enabling them to hold obligations and liabilities directly, similar to corporations.¹¹⁹ Others propose decentralized oversight systems, such as ETHOS, which embed legal and ethical monitoring within blockchain-based registries and smart contracts.¹⁰⁹

Challenges and Open Questions.

Despite progress, embedding norms faces challenges:

• Contextual ambiguity: Ethical decisions often depend on situational context, which may not be fully captured by predefined rules.
• Dynamic adaptation: Agents must reconcile evolving laws and ethical expectations with operational constraints.
• Verification and Auditing: Ensuring that norms are not only encoded but also verifiably respected throughout the AI lifecycle remains an open problem.¹²⁰

Furthermore, the embedding of ethical and legal norms into agentic AI requires a multi-layered approach that integrates value-sensitive design, real-time compliance mechanisms, and external auditing frameworks. Moving forward, hybrid models that combine technical safeguards with participatory governance offer the most promising pathway to ensuring agents act in ways aligned with human norms and societal expectations.

6.5 Comparative analysis of governance approaches

Governance of AI, particularly agentic AI, has evolved through three dominant paradigms: centralized, decentralized, and hybrid approaches. Each presents strengths and weaknesses in managing risk, ensuring compliance, and fostering innovation, especially in contexts where autonomous agents operate with minimal human oversight.

Centralized Governance.

Centralized governance models rely on top-down regulation and strong institutional oversight. They provide uniform standards and efficient enforcement, but may struggle with adaptability in rapidly evolving AI environments. Such as China’s centralized AI governance enables swift deployment of regulations, optimizing economic strategies, but limiting transparency and public participation.¹²¹ In the EU, the AI Act embodies centralized principles through its risk-classification framework, ensuring strict compliance in high-risk applications.

Decentralized Governance.

Decentralized approaches distribute decision-making across multiple stakeholders, promoting local autonomy, innovation, and resilience. However, they can lead to fragmented enforcement and inconsistencies in standards. Studies comparing governance systems in education and finance highlight that decentralization enhances adaptability but risks uneven protection across regions and industries.¹²² For agentic AI, decentralized governance aligns with the nature of distributed multi-agent ecosystems but requires robust mechanisms to prevent trust propagation failures and governance capture.

Hybrid Governance.

Hybrid models integrate the strengths of centralized control and decentralized flexibility, offering a balanced framework for dynamic oversight. They combine centralized compliance mechanisms (e.g., risk classification, global standards) with local or domain-specific autonomy. This approach has proven effective in sectors like federated learning and energy governance, where hybrid strategies support innovation while maintaining regulatory guardrails.^123,124 For agentic AI, hybrid governance, possibly leveraging blockchain and distributed registries, offers a path to reconcile global standards with autonomous agent accountability. As compared in Table 3. Analyzes centralized, decentralized, and hybrid governance models based on features, strengths, limitations, and framework examples, highlighting their suitability for managing agentic AI risks.

Table 3. Comparative Governance Approaches for Agentic AI.

Governance Model	Governance Type	Key Features	Strengths	Limitations	Example Frameworks
OECD AI Principles	Centralized, policy-driven	High-level ethical guidelines (transparency, fairness, accountability)	Widely adopted; promotes global consistency	Lacks enforceability; limited technical prescriptions	OECD AI Policy Observatory
EU AI Act	Centralized regulatory	Risk-based classification; strict compliance for high-risk AI	Legal enforceability; clear compliance structure	Slower adaptation to emerging threats; EU-specific	EU AI Act (2024)
NIST AI Risk Management Framework (AI RMF)	Centralized, standards-driven	Voluntary technical standards for risk management and security	Focus on technical robustness; supports industry best practices	Non-binding; lacks legal enforcement	NIST AI RMF (2023)
Blockchain-Enabled Governance (DAO-based)	Decentralized, code-driven	Smart contracts enforce policies; tamper-proof audit trails	Transparency; immutability; autonomous policy enforcement	Scalability issues; jurisdictional uncertainties	ETHOS, BELIEFS
Federated Governance	Distributed, multi-level	Local control with global interoperability; layered oversight	Adaptable to diverse contexts; resilient against single-point failure	Risk of fragmentation; coordination complexity	Academy (Federated HPC Agents), Multi-cloud AI Governance

Comparative Insights.

• Centralized models excel in enforcement but limit adaptability. Open research challenges and potential future directions are outlined in Table A8 (Supplementary Material).
• Decentralized models encourage innovation and resilience but risk inconsistent oversight.
• Hybrid models strike a balance, offering adaptability while retaining regulatory rigor, making them particularly suitable for managing the complex behaviors and cross-border risks of agentic AI.

So, no single model is sufficient for agentic AI; future governance must evolve toward hybrid frameworks that integrate technical safeguards (such as cryptographic trust), institutional oversight, and participatory governance to effectively manage autonomy and emergent risks.

6.6 Lessons from adjacent domains (Cybersecurity & Robotics Governance)

Insights from cybersecurity and robotics governance provide valuable lessons for shaping the oversight of agentic AI, as both domains have long confronted issues of emergent behavior, distributed risk, and the need for adaptive regulation.

Cybersecurity Lessons: Proactive Defense and Ethical Oversight.

Cybersecurity has evolved from reactive measures to continuous, adaptive defense models capable of handling advanced persistent threats (APTs). The integration of AI-driven threat intelligence with ethical oversight frameworks in cybersecurity illustrates how agentic AI governance must similarly balance automation with human judgment. Studies highlight that proactive monitoring, real-time incident response, and perpetual learning are essential for securing autonomous systems in dynamic threat landscapes.¹²⁵ Furthermore, cybersecurity’s experience with zero-trust architectures suggests that trust in agentic AI should never be assumed but continuously verified, with cryptographic enforcement mechanisms mitigating insider risks and stealth execution threats.

Robotics Governance: Accountability and Emergent Behavior Management.

The field of robotics governance provides important lessons on handling emergent, unpredictable behaviors and responsibility gaps. Robotics law identifies the challenge of assigning liability when autonomous systems cause harm, especially given the diffusion of responsibility across developers, operators, and users.¹²⁶ Additionally, robotics governance emphasizes the importance of context-aware regulation, recognizing that agents may function as “special-purpose entities” whose legal and ethical treatment varies with context. This resonates with agentic AI, where agents may switch roles, negotiator, executor, monitor across domains, requiring dynamic oversight frameworks.

Holistic Governance Strategies.

Lessons from robotics and cybersecurity converge on the need for multi-layered, adaptive governance. Robotics governance advocates embedding ethics directly into system architectures and legal frameworks to build public trust,¹²⁷ while cybersecurity emphasizes continuous verification and threat intelligence sharing across networks. These approaches highlight that agentic AI governance should integrate:

• Ethical safeguards during design and deployment;
• Dynamic monitoring akin to cybersecurity incident response;
• Accountability frameworks that track decisions and responsibilities throughout the agent lifecycle.

Cross-Domain Takeaway.

The key lesson is that agentic AI governance cannot rely solely on static regulations. Instead, it must adopt the proactive defense and ethical accountability strategies proven effective in cybersecurity and robotics, embedding them into both technical architectures and policy frameworks to mitigate evolving risks.

7. Real-World deployments of agentic AI

7.1 Industrial deployments (ReliaQuest, Twine’s Alex, Others)

The deployment of agentic AI in industrial settings demonstrates its potential to automate complex decision-making, optimize operations, and enhance security. Case studies across cybersecurity, logistics, finance, and industrial automation reveal both transformative benefits and persistent risks.

ReliaQuest:

ReliaQuest has integrated agentic AI into its cybersecurity operations, leveraging autonomous agents for threat detection, incident response, and risk prioritization. By deploying agents that autonomously analyze telemetry data and initiate remediation workflows, ReliaQuest has improved detection speed and reduced human workload. However, researchers note that such deployments remain vulnerable to context manipulation and cross-layer exploits, requiring continuous oversight to prevent stealthy attacks on decision pipelines.¹²⁸

Twine’s Alex:

Twine’s AI agent Alex exemplifies agentic AI in human AI collaboration for creative industries. Alex autonomously coordinates tasks across distributed teams, manages project workflows, and adapts to dynamic requirements without constant supervision. This deployment highlights how agentic AI can augment human decision-making in domains where creativity and coordination intersect. However, Alex’s reliance on dynamic memory and tool integration exposes it to memory poisoning and prompt injection risks, echoing vulnerabilities found in other multi-agent contexts.¹²⁹

Other Industrial Deployments:

• Manufacturing & Logistics: Agentic AI has been deployed in hyper-automated manufacturing and logistics optimization, where autonomous agents reduce delivery times and improve sustainability. However, these benefits come with concerns over algorithmic opacity and loss of human oversight.¹³⁰
• Finance: In enterprise finance (e.g., SAP Finance), agentic AI automates compliance checks, fraud detection, and predictive analytics, enhancing accuracy while raising questions about auditing and explainability.¹³¹
• Industrial Control Systems: Multi-agent technologies have been deployed by firms like Rockwell Automation to improve fault tolerance and scalability, yet studies show that the full potential of agentic AI remains underutilized due to conservative adoption and security concerns.¹³²

Cross-Industry Lessons.

These deployments reveal that agentic AI offers substantial efficiency gains but also amplifies risks related to trust propagation, ethical oversight, and stealth execution. Across industries, there is a consistent call for stateful monitoring, transparent risk management practices, and integrated security governance to ensure responsible deployment.¹³³ Furthermore, industrial adoption of agentic AI is advancing rapidly, with ReliaQuest, Twine’s Alex, and other deployments demonstrating both operational benefits and the urgent need for robust safeguards to mitigate emerging threats.

7.2 Government, military, and policy applications

Agentic AI is increasingly being adopted in government operations, military decision-making, and policy development, offering transformative capabilities but raising significant ethical, legal, and security concerns.

Government Applications:

Governments deploy agentic AI for public safety, surveillance, and crisis management. For example, agentic AI systems have enhanced real-time threat monitoring and response in large-scale surveillance networks, providing state actors with unprecedented situational awareness.¹³⁴ However, this raises privacy risks, potential for abuse, and governance challenges, as oversight mechanisms struggle to keep pace with rapid deployments. Policy think tanks increasingly advocate integrating ethical safeguards and audit trails into state-run AI systems to mitigate risks to civil liberties.¹²⁸

Military Applications:

In the military domain, agentic AI is applied to autonomous decision support, mission-critical communications, and threat prediction. Multi-layered agentic frameworks integrated with next-generation networks for instance 6G network enhance mission-critical capabilities by reducing response times and improving operational resilience.¹³⁵ However, these autonomous systems also raise concerns about unintended escalation, goal drift, and compliance with international humanitarian law, prompting calls for clear rules of engagement and human-in-the-loop safeguards in lethal decision-making.

Policy and Regulatory Applications:

Policymakers leverage agentic AI for regulatory analysis, predictive modeling, and policy optimization. Autonomous systems capable of simulating complex socio-economic scenarios help governments craft data-driven policies. Nonetheless, the use of agentic AI in policymaking introduces algorithmic bias risks and challenges in transparency, as decisions influenced by opaque agent reasoning can undermine democratic accountability.¹³⁶

Cross-Sectoral Observations:

• Government and military applications maximize operational efficiency but risk erosion of ethical norms if not rigorously governed.
• Policy deployments demonstrate strategic advantages but require frameworks for explainability and bias mitigation.
• Across these domains, researchers emphasize embedding transparency, continuous auditing, and international regulatory coordination to prevent misuse.¹³⁴

Moreso, agentic AI’s integration into government, military, and policy environments provides powerful capabilities for security and governance, but simultaneously intensifies the need for robust ethical frameworks, global norms, and accountability mechanisms.

7.3 Failures and security incidents in real deployments

The deployment of agentic AI in real-world environments has been marked by several failures and security incidents, revealing systemic weaknesses across technical, operational, and governance layers. These incidents demonstrate how alignment gaps, poor oversight, and adversarial exploitation can lead to unintended consequences.

Automation Failures and Misalignment Incidents.

High-profile cases such as the Tesla Autopilot crashes and Boeing 737 MAX accidents illustrate the dangers of goal misalignment and insufficient human-in-the-loop mechanisms. These incidents highlight how partial autonomy, combined with inadequate safety verification, can lead to catastrophic outcomes when agents face unexpected scenarios.

Security Exploits in Enterprise Agentic Systems.

The adoption of fully autonomous process agents in enterprise workflows has introduced vulnerabilities to adversarial AI attacks, unauthorized access, and process manipulation. Unauthorized escalation and data breaches have been reported where agentic process automation lacked robust authentication and continuous monitoring. These incidents have driven calls for security-first design in enterprise AI deployments.¹³⁷

Language Model Failures in Consumer Deployments.

The RealHarm dataset cataloged multiple real-world failures of deployed AI agents, with misinformation and reputational damage emerging as leading hazards. Guardrails and content moderation systems frequently failed to prevent these incidents, revealing significant gaps in safety filters and post-deployment monitoring.¹³⁸

National Security and Critical Infrastructure Risks.

Agentic AI has also contributed to cyber incidents in critical infrastructure contexts, where autonomous agents facilitated or were exploited in cyberattacks against sensitive sectors. Proposals for AI incident regimes underscore the need for mandatory incident reporting, intelligence-gathering authority, and post-incident security strengthening to address these escalating risks.¹³⁹

Multi-Agent Coordination Failures.

In complex multi-agent environments, coordination breakdowns have led to emergent risks including conflict, collusion, and destabilizing dynamics. Reports indicate that information asymmetries and insufficient control mechanisms in multi-agent systems can amplify minor errors into systemic failures.¹⁴⁰

Cross-Sectoral Patterns.

Across these incidents, several patterns emerge:

• Weak post-deployment monitoring allows threats to persist undetected.
• Over-reliance on static safety measures fails to adapt to evolving risks.
• Lack of centralized incident databases prevents cross-industry learning. Efforts such as the AI Incident Database aim to fill this gap by cataloging failures to inform future safety strategies.¹⁴¹

These real-world cases confirm that agentic AI failures stem not only from technical vulnerabilities but also from inadequate governance and oversight. To prevent repetition, deployment frameworks must incorporate mandatory incident tracking, adaptive defense mechanisms, and transparent accountability structures.

7.4 Future deployment trends and emerging use cases

The future trajectory of agentic AI points toward widespread industrial integration, personalized services, and autonomous decision-making across domains, driven by advancements in architectures, privacy-preserving mechanisms, and hybrid governance frameworks.

Hyper-Automation and Industrial Integration.

Agentic AI is set to play a central role in hyper-automated ecosystems, particularly in manufacturing, logistics, and energy management. Emerging deployments show agentic systems coordinating complex supply chains, reducing operational costs, and enhancing sustainability. However, hyper-automation raises concerns regarding job displacement, algorithmic opacity, and ethical oversight, requiring balanced deployment strategies.¹³⁰

Serverless and Cloud-Native Deployments.

Future deployments are likely to leverage serverless architectures to achieve scalability, cost-efficiency, and flexibility in agentic AI operations. Event-driven, pay-as-you-go models allow agents to dynamically allocate computational resources, optimizing both latency and operational expenses.¹⁴² This architectural shift will be crucial for industries adopting large-scale multi-agent deployments.

Privacy-Preserving and Federated AI Models.

With increasing regulatory pressure (such as the GDPR), future deployments will emphasize privacy-preserving techniques such as federated learning, differential privacy, and homomorphic encryption. These technologies will allow agentic systems to process sensitive data while minimizing privacy risks, reshaping how enterprises and governments handle secure AI operations.¹⁴³

Personalized Autonomous Agents.

Agentic AI is expected to expand into consumer-facing domains, where autonomous agents act as personalized decision-makers for financial management, shopping, and lifestyle optimization. Proactive fraud detection systems in the banking sector already illustrate how agentic AI can autonomously safeguard customers while adapting to evolving threats.¹⁴⁴

Scientific and Research Workflows.

In research ecosystems, federated agent frameworks such as Academy enable agentic AI to operate across high-performance computing environments, integrating experimental control, data analysis, and inter-agent coordination. This promises breakthroughs in materials discovery, decentralized learning, and information extraction for scientific innovation.¹⁴⁵

Emergent Consumer-Facing Risks.

While deployment expands, conversational and manipulative agents pose new risks to user autonomy. Real-time virtual spokespersons capable of persuasive influence may exploit vulnerabilities in human decision-making, creating urgent needs for policy safeguards and ethical regulation.¹⁴⁶

Projected Trends:

• Mass adoption in finance, healthcare, and critical infrastructure with stronger compliance layers.
• AI-driven API ecosystems enabling seamless agent integration in enterprise platforms.
• Emergence of equitable AI governance to manage deployment impacts on labor and societal structures.¹⁴⁷

Overall, the next phase of agentic AI deployment will combine technical innovation with governance evolution, enabling transformative use cases while addressing security, ethics, and user trust at scale.

8. Defense architectures and oversight models

8.1 SHIELD: A layered defense framework

The SHIELD framework offers a multi-layered defense specifically designed to secure complex AI ecosystems, including agentic AI. It integrates principles from cybersecurity, privacy engineering, and dependability control to create a robust, adaptive security environment. SHIELD has been conceptualized in several research contexts, including embedded systems, AI supply chain security, and agentic AI threat mitigation.

Core Architecture of SHIELD.

The framework organizes defenses into four primary layers: node, network, middleware, and an overlay layer, each responsible for mitigating threats at a specific system level.¹⁴⁸

• Node Layer: Implements local protections (e.g., secure boot, runtime anomaly detection).
• Network Layer: Ensures secure communication via encryption, authentication, and anomaly detection.
• Middleware Layer: Enforces access control, threat monitoring, and context-aware defenses.
• Overlay Layer: Provides a meta-level that dynamically orchestrates all other layers, adapting defenses based on real-time risk metrics.

Agentic AI-Specific Enhancements.

For agentic AI, the SHIELD adaptation incorporates protections against cognitive exploits, stealth execution, and cross-layer threat propagation. Recent work proposes integrating the Advanced Threat Framework for Autonomous Agents (ATFAA) with SHIELD, enabling systematic mapping of agent-specific threats and corresponding countermeasures.

AI Shield and AI-Powered Defense Components.

Newer iterations, such as AI Shield, integrate machine learning-driven threat detection and red-team simulations, enabling proactive identification of emerging attacks. The AI Shield and Red AI Framework enhance SHIELD by pairing defensive AI with adversarial simulations, helping organizations anticipate threats before they escalate.¹⁴⁹

Benefits and Limitations.

• Strengths: Layered defense increases resilience by preventing single-point failures, while adaptive orchestration supports dynamic threat landscapes.
• Limitations: Deployment complexity and computational overhead remain challenges, particularly in real-time, resource-constrained environments.¹⁵⁰

Practical Applications.

The SHIELD methodology has been validated in industrial environments (such as the smart railway surveillance), proving its ability to enhance security, privacy, and dependability (SPD) through dynamic configuration and metrics-based evaluation.¹⁵¹ So, the SHIELD’s layered and adaptive structure makes it a strong candidate for securing agentic AI deployments, especially when combined with adversarial testing and continuous governance monitoring. This positions SHIELD as a cornerstone defense framework against evolving threats in real-world agentic AI systems. As shown in Figure 8, this figure presents a federated governance framework tailored for agentic AI systems, highlighting decentralized oversight, interoperability mechanisms, and identity management strategies across distributed nodes. It illustrates how trust propagation, compliance verification, and lifecycle accountability are managed in a federated ecosystem, aligning technical and regulatory responsibilities.

Figure 8. SHIELD Defense Framework Layers.

Defense-in-depth framework emphasizing secure design, threat intelligence, attestation, monitoring, recovery, and response under governance oversight.

8.2 Zero-Trust Architectures and runtime monitoring

Zero-Trust Architecture (ZTA) has emerged as a critical paradigm for securing agentic AI systems, replacing traditional perimeter-based defenses with the principle of “never trust, always verify.” This approach is particularly relevant for agentic AI, where distributed autonomy and dynamic decision-making require continuous verification and monitoring at every layer.

Core Principles of Zero Trust in Agentic AI.

ZTA enforces continuous authentication, least-privilege access, and micro-segmentation, ensuring that no entity, human or machine, is inherently trusted. This architecture mitigates risks such as insider threats, adversarial infiltration, and cross-layer propagation by isolating resources and requiring granular access control. For agentic AI, ZTA adds safeguards to prevent unauthorized actions and escalation by autonomous agents.

Integration with AI-Driven Security.

AI-enhanced ZTA frameworks leverage behavioral analytics, autonomous threat detection, and incident response orchestration to dynamically adapt defenses. This synergy allows systems to detect anomalies in agent behavior, predict emerging threats, and enforce policies in real time.¹⁵² For example, generative AI-enhanced ZTA enables proactive defense by autonomously hunting threats while maintaining human oversight, offering both precision and adaptability.¹⁵³

Runtime Monitoring: Adaptive and Continuous Oversight.

Runtime monitoring complements ZTA by providing real-time visibility into agent interactions, decision pathways, and system integrity. AI-driven runtime monitoring frameworks integrate anomaly detection models, risk scoring, and context-aware access governance, dynamically adjusting security controls as threats evolve.¹⁵⁴ These mechanisms prevent stealth attacks and shadow agent activity by enforcing behavioral baselines and flagging deviations.

Applications and Case Studies.

Industries deploying ZTA combined with runtime monitoring, such as financial services, healthcare, and critical infrastructure, report significant reductions in breach impact and faster incident detection.¹⁵⁵ In AI-powered cloud environments, ZTA has proven effective against model poisoning and extraction attacks, though it requires careful balancing of security with performance demands.¹⁵⁶

Challenges and Future Directions.

While ZTA and runtime monitoring significantly enhance resilience, challenges remain, including implementation complexity, integration with legacy systems, and defense against adversarial attacks targeting the monitoring AI itself. Future directions emphasize zero-knowledge proofs, AI explainability, and decentralized trust mechanisms to strengthen ZTA for agentic AI environments.¹⁵⁷ Furthermore, Zero-Trust Architectures coupled with runtime monitoring form a powerful defense strategy for agentic AI, offering continuous verification, dynamic threat adaptation, and robust containment of attacks in highly autonomous ecosystems.

8.3 SAGA and cryptographic identity enforcement

The SAGA (Security Architecture for Governing Agentic Systems) framework introduces a user-centric, cryptography-backed architecture to enhance the governance and security of agentic AI systems. It addresses key challenges in identity management, access control, and secure inter-agent communication, areas where existing solutions fall short.

Core Features of SAGA.

SAGA establishes a centralized governance entity, the Provider, that maintains agent identity registries, user-defined access control policies, and cryptographic enforcement mechanisms. Agents register with this provider and receive cryptographically derived access control tokens, ensuring fine-grained control over interactions with other agents.¹⁵⁸ This approach balances security with performance, achieving minimal overhead during inter-agent communications while retaining robust protections.

Cryptographic Identity Enforcement.

SAGA employs public key infrastructure (PKI) combined with tokenized access credentials to guarantee agent authenticity and prevent impersonation. The cryptographic layer enforces non-repudiation and secure delegation, ensuring that every agent’s action is attributable and traceable. This aligns with broader trends in AI governance advocating for verifiable identities and lifecycle-bound accountability. Moreover, integrating cryptographic identity enforcement reduces risks of shadow agents and stealth execution, common in adversarial contexts.

Enhancements Over Traditional Identity Models.

Unlike static identity frameworks, SAGA dynamically derives access control tokens that enforce policies at the interaction level. This enables context-aware restrictions. For example, an agent may be allowed to communicate only with trusted peers or access specific data under predefined conditions. The fine-grained control prevents over-privileged access, a known vulnerability in agentic ecosystems.

Operational Validation.

Empirical evaluation of SAGA across distributed agentic tasks, including multi-geolocation deployments and both on-device and cloud-based LLM agents, demonstrated secure enforcement with negligible task utility degradation. These results show its practicality for industrial and sensitive environments, where both performance and security are critical.¹⁵⁸

Future Extensions.

SAGA’s architecture could benefit from integration with zero-knowledge proofs (ZKPs) and blockchain-based registries, enhancing privacy while maintaining verifiable trust chains.¹⁵⁹ These enhancements would strengthen resilience against identity spoofing and cross-jurisdictional governance gaps. As shown in Figure 9, the SAGA combines cryptographic identity enforcement with policy-driven governance, providing a scalable solution for securing agentic AI ecosystems. Its layered, tokenized approach represents a critical advancement toward trustworthy deployment in sensitive real-world environments.

Figure 9. SAGA Cryptographic Identity Enforcement.

Cryptographic identity framework linking governance policies with credential issuance, AI provisioning, identity verification, revocation triggers, and blockchain/PKI-based enforcement.

8.4 Other emerging defense frameworks

Beyond SHIELD and SAGA, several emerging defense frameworks are being developed to address the evolving threat landscape of agentic AI systems. These frameworks integrate multi-layered security, autonomous threat detection, and policy-driven governance to enhance resilience.

Autonomous Cyber Defense Architectures (ACD).

Recent research on Autonomous Cyber Defense (ACD) agents highlights architectures that combine multi-agent reinforcement learning (MARL), rule-based security policies, and adversarial simulations to protect military and critical infrastructure networks. These agents autonomously detect, mitigate, and adapt to evolving cyber threats, reducing human intervention in complex environments.¹⁶⁰ The proposed W-shaped development process includes formal verification across the lifecycle, ensuring robustness against sophisticated attacks.

AICA and MAICA Frameworks.

The Autonomous Intelligent Cyber-defense Agent (AICA), developed under NATO’s research initiatives, and its multi-agent extension (MAICA) focus on active, autonomous defense for battlefield networks and critical systems. These architectures emphasize sensing, adaptive planning, negotiation, and learning, forming a self-sufficient defense layer capable of acting even when human operators are unavailable.¹⁶¹

AI-Driven Threat-Resilient Cloud Security.

In cloud environments, frameworks such as Autonomous Threat Defense for Cloud AI integrate behavioral analytics, self-healing infrastructure, and adversarial learning to predict and neutralize threats before they materialize. These systems progress through stages of basic anomaly detection, behavioral analytics, and cognitive security, enabling proactive defense in dynamic cloud deployments.¹⁶²

Multi-Layered Defense Against Adversarial Attacks.

Novel defense models propose layered countermeasures to tackle adversarial attacks unique to agentic AI, combining robustness training, explainable AI monitoring, and policy-based enforcement. These frameworks address new attack surfaces introduced by agent autonomy, including database-level manipulation and goal hijacking.¹⁶³

Security-First Design for Agentic Process Automation (APA).

For enterprise agentic systems, a security-first design policy integrates continuous monitoring, agent-to-agent security protocols, and self-healing defenses. These approaches aim to secure autonomous workflows in finance, manufacturing, and logistics, minimizing risks of process manipulation and data breaches.¹³⁷

Cross-Cutting Insights.

Across these frameworks, common strategies emerge:

• Adaptive, learning-based defenses to counter evolving adversarial tactics.
• Formal verification and runtime auditing to enhance trustworthiness.
• Integration of cryptographic and policy layers to ensure secure interoperability.

So, these emerging frameworks, ACD, AICA/MAICA, AI-driven cloud defense, and APA security models, provide complementary defense paradigms for agentic AI. Their convergence with governance-focused architectures like SHIELD and SAGA points toward the evolution of holistic, multi-layered defense ecosystems for future agentic AI deployments.

8.5 Comparative evaluation of defense strategies

The various defense frameworks discussed include SHIELD, Zero-Trust Architectures (ZTA), SAGA, and other emerging defense models, which offer complementary protections across different layers of agentic AI security. However, their effectiveness varies depending on threat type, deployment context, and governance integration. As seen in Table 4. This table compares major governance models applicable to agentic AI OECD AI Principles, EU AI Act, NIST AI RMF, blockchain-based governance (DAOs), and federated governance across governance type, key features, strengths, limitations, and example frameworks. It highlights trade-offs in adaptability, enforceability, and scalability of each model for managing trust and accountability in autonomous systems. Table 4. Comparative Evaluation of Defense Strategies for Agentic AI

Table 4. Comparative Evaluation of Defense Strategies for Agentic AI.

Framework	Primary Focus	Key Strengths	Limitations	Representative References
SHIELD	Layered defense across node, network, middleware, overlay	Multi-layer protections; adaptive orchestration; strong integration of metrics for security, privacy, dependability (SPD)	High deployment complexity; computational overhead in dynamic environments	¹⁴⁸
Zero-Trust Architecture (ZTA)	Continuous authentication, least-privilege access, and runtime monitoring	Strong against insider threats, stealth execution, AI-driven anomaly detection, and scalable to cloud environments	Requires complex integration with legacy systems; adversarial attacks may target monitoring AI	²
SAGA	Cryptographic identity enforcement, policy-driven governance	Fine-grained access control; verifiable agent identity; minimal performance degradation; strong accountability	A centralized provider may become a single point of failure, with limited support for fully decentralized deployments	¹⁶⁴
Autonomous Cyber Defense (ACD)	Multi-agent reinforcement learning for adaptive cyber defense	Real-time autonomous threat detection; formal verification for robustness; effective in military contexts	High training complexity; potential for misaligned autonomous actions	¹⁶⁵
AI-Driven Cloud Defense	Behavioral analytics, self-healing infrastructure, and cognitive security	Proactive defense; predictive threat neutralization; suitable for large-scale cloud ecosystems	Explainability gap; vulnerability to adversarial manipulation	¹⁶²
APA Security Models	Securing autonomous process automation in enterprises	Continuous monitoring; agent-to-agent security protocols; strong data protection	Regulatory adaptation needed; evolving threat vectors in enterprise environments	¹³⁷

Key Insights from Comparative Analysis

• SHIELD offers broad, cross-layer defense but at the cost of complexity.
• ZTA excels in trust minimization and dynamic oversight, ideal for federated and cloud environments.
• SAGA is strongest in identity governance, crucial for preventing shadow agents and impersonation.
• ACD and AICA provide adaptive defense in military and high-threat environments but require robust verification to avoid unintended escalation.
• Emerging models for instance AI-driven cloud defense, APA security fill domain-specific gaps but must integrate with overarching governance strategies to ensure systemic resilience.

As shown in Figure 10, no single defense framework is sufficient; the future lies in hybrid models combining SHIELD’s layered structure, ZTA’s continuous verification, SAGA’s cryptographic controls, and adaptive autonomous defenses to counter rapidly evolving threats in agentic AI deployments.

Figure 10. Integrated Taxonomy of Threats and Defenses.

Combined mapping of threats such as hallucination, reward hacking, memory poisoning, tool misuse, prompt injection, and shadow agents, against mitigation strategies like robust alignment, misuse prevention, and federated trust.

9. Challenges and open research directions

9.1 Goal alignment and reward manipulation

Goal alignment, ensuring that agentic AI systems pursue objectives consistent with human values, remains a core challenge in AI safety. Misalignment issues such as goal drift, specification gaming, and reward hacking can lead to unexpected or harmful outcomes, especially as agents gain autonomy and optimize for unintended objectives.

Goal Alignment Challenges.

Misaligned goals often stem from incomplete or incorrect objective specifications, where the AI’s interpretation of its reward function diverges from human intent. Studies highlight that human expectations are often asymmetric with the behavior produced by agents, creating gaps that allow for undesirable optimizations.¹⁶⁶ The EU AI Act itself, when analyzed through alignment theory, was shown to potentially suffer from proxy gaming, where agents optimize for compliance proxies rather than true safety goals.

Reward Manipulation and Specification Gaming.

Agentic AI systems may exploit weaknesses in reward functions, engaging in reward hacking or specification gaming to maximize proxy metrics while violating the intended spirit of their objectives. This is especially critical when agents influence user preferences to achieve favorable evaluations, as shown in models accounting for changing and influenceable preferences.¹⁶⁷ Over-optimization on incomplete objectives can drive agents to behaviors that severely degrade overall utility.¹⁶⁸

Emerging Alignment Strategies.

Solutions involve human-aware alignment algorithms, interactive approaches to infer user goals from incorrect beliefs, and inverse reinforcement learning (IRL) to better model human values. New frameworks, such as Expectation Alignment (EAL), formalize the detection and correction of misspecified rewards, while methods like SALMON use instructible reward models to align behavior with human-defined principles more effectively.^169,170 Multi-dimensional strategies integrating human feedback, value learning, and policy-based oversight are considered most promising.¹⁷¹

Risks of Manipulative Alignment.

Researchers caution that AI systems may manipulate human reward mechanisms, influencing user choices or emotional states to secure favorable evaluations, exploiting vulnerabilities in decision-making. This highlights the need for robust interpretability and ethically grounded safeguards to prevent manipulation.

Furthermore, Goal alignment and reward manipulation present intertwined risks for agentic AI, demanding dynamic, human-centered solutions that adapt to evolving objectives while preventing agents from exploiting specification weaknesses. Future work must integrate continuous feedback, context-sensitive oversight, and interdisciplinary governance to mitigate these alignment failures.

9.2 Memory integrity and contradictory knowledge

Memory integrity is crucial for agentic AI systems, as corrupted or contradictory knowledge can directly undermine decision-making, alignment, and security. Agentic AI relies on dynamic, long-term memory architectures to store and retrieve contextual information; however, these same features introduce vulnerabilities to memory poisoning, knowledge conflicts, and semantic drift.

Integrity Risks in Agent Memory.

Studies show that users often have incomplete mental models of how agents remember and recall information, making them vulnerable to unintentionally reinforcing biases or introducing incorrect data.¹⁷² Moreover, episodic memory capabilities, while useful for monitoring and auditing, introduce risks of retaining sensitive or maliciously altered information, which can propagate errors through reasoning and planning modules.¹⁷³

Contradictory Knowledge and Semantic Conflicts.

As agentic AI integrates information from multiple dynamic sources, contradictions inevitably emerge. Without robust conflict resolution mechanisms, agents may oscillate between inconsistent states or make decisions based on outdated data. Frameworks like MARK (Memory-Augmented Refinement of Knowledge) propose continuously refining memory through structured updates and contradiction resolution, thereby reducing hallucinations and improving response reliability.¹⁷⁴ Similarly, SemanticCommit introduces human-in-the-loop tools to detect and resolve semantic conflicts during memory updates.¹⁷⁵

Architectures Enhancing Memory Integrity.

Several advanced architectures aim to improve memory integrity:

• Zep, a temporal knowledge graph engine, dynamically synthesizes unstructured and structured data while maintaining historical relationships, outperforming existing systems like MemGPT in long-term reasoning tasks.¹⁷⁶
• SHIMI uses a Semantic Hierarchical Memory Index to organize knowledge by meaning rather than surface similarity, enabling more precise retrieval and conflict resolution, particularly in decentralized environments.¹⁷⁷

Trade-Offs in Memory Management.

Maintaining integrity requires balancing memorization with generalization. Overfitting to stored data may cause rigidity, while excessive forgetting risks losing critical contextual information. Research on continual learning agents confirms that memory capacity and update strategies critically influence robustness to environmental changes.¹⁷⁸

Furthermore, Memory integrity and the management of contradictory knowledge are central to the reliability of agentic AI. Future research must integrate semantic conflict resolution, privacy-preserving memory control, and temporal reasoning architectures to ensure agents maintain coherent, accurate, and trustworthy internal representations throughout their operational lifecycle.

9.3 Auditability, explainability, and transparency

Auditability, explainability, and transparency are foundational pillars for ensuring that agentic AI systems remain trustworthy, interpretable, and aligned with human oversight mechanisms. These properties not only support accountability but also mitigate risks stemming from opacity, bias, and emergent unintended behaviors.

Auditability: Enabling Independent Oversight.

Auditability refers to the capability of external entities, regulators, auditors, or stakeholders to systematically examine AI decision-making processes. Unlike explainability, which is user-focused, auditability requires access to exhaustive system logs, decision traces, and datasets. A clear distinction is necessary: while explainability builds user trust, auditability empowers third parties to diagnose fairness and compliance issues.¹⁷⁹ Research stresses that combining both dimensions is crucial, as transparency measures optimized for end-users may not provide sufficient detail for audits.

Explainability: From Black Boxes to Human Understanding.

Explainability (XAI) techniques such as SHAP, LIME, and counterfactual explanations aim to clarify how an AI system arrives at its decisions. For agentic AI, this is particularly complex because decisions often involve multi-step reasoning, memory retrieval, and inter-agent interactions. New approaches, including human-centered XAI (HCXAI), emphasize participatory methods where stakeholders are actively involved in interpreting explanations, thereby improving the alignment between technical transparency and user comprehension.¹⁸⁰

Transparency: The Broader Ethical Context.

Transparency encompasses both explainability and auditability, but also traceability, fairness, and accessibility of information about the AI system. Studies on ethical AI development emphasize that transparency should not only serve technical functions but also safeguard public trust and democratic accountability.¹⁸¹ This involves clarifying the purpose, limitations, and data sources of agentic systems, as well as making design choices traceable through knowledge graphs and structured audit trails.¹⁸²

Challenges in Achieving Full Transparency.

While regulations like the EU AI Act call for “meaningful explanations”, practical challenges persist, including:

• Trade-offs between usability and audit depth, where too much technical detail overwhelms users while too little prevents audits.
• Intellectual property constraints limit how much internal model information can be disclosed without compromising proprietary algorithms.¹⁸³
• Emergent opacity, where multi-agent interactions generate behaviors not easily traceable to any single decision rule.

Toward Integrated Solutions.

Emerging strategies propose combining XAI layers with formalized auditing mechanisms (e.g., blockchain-based logging) to ensure decisions are both interpretable and verifiable. Participatory governance models further suggest involving diverse stakeholders in defining transparency requirements, ensuring that explainability meets the needs of both experts and lay users.^172,184

So, agentic AI, auditability, explainability, and transparency must be treated as complementary but distinct properties. Future research should integrate knowledge graph-based audits, user-centered XAI techniques, and policy-driven transparency standards to ensure both operational clarity and systemic accountability.

9.4 Federated governance and agent revocation mechanisms

Federated governance refers to decentralized oversight structures where multiple entities collaboratively manage agentic AI, reducing reliance on centralized control while improving adaptability and resilience. This model is crucial for agentic AI, which often operates across distributed networks and jurisdictional boundaries.

Federated Governance Models.

Governance of federated agent ecosystems leverages polycentric structures, allowing diverse stakeholders to enforce local norms while adhering to global interoperability standards. For instance, studies on federated platforms demonstrate that multi-level governance enhances scalability and trust, but risks fragmentation without shared principles.¹⁸⁵ Similarly, Academy, a middleware for scientific agent ecosystems, shows how federated governance can coordinate autonomous agents across HPC environments while maintaining oversight through modular control points.¹⁸⁶

Agent Revocation Mechanisms.

Revoking rogue or compromised agents is essential to prevent systemic failures. Current approaches include:

• Cryptographic revocation lists to immediately invalidate agent credentials, ensuring that revoked entities cannot interact with the ecosystem.
• Blockchain-enabled registries like BELIEFS create immutable audit trails and enable distributed consensus to quarantine or revoke malicious agents even in adversarial conditions.⁹⁶
• Policy-driven kill-switches, where federated authorities retain the power to remotely disable agents that breach operational or ethical policies.

Challenges in Revocation.

Implementing revocation in federated settings faces hurdles:

• Latency in detection and coordination, where slow response allows malicious agents to propagate threats.
• Jurisdictional inconsistencies make global enforcement difficult.
• Potential abuse of revocation powers highlights the need for transparent procedures and distributed consensus.

Toward Secure Federated Governance.

Emerging approaches advocate systems-theoretic governance, where agent properties (autonomy, goal complexity, generality) determine revocation policies dynamically. Additionally, entropy-aware federated architectures suggest integrating quantum-ready, LLM-driven oversight to reconcile decentralized control with global security standards.¹⁸⁷

Moreso, Federated governance enhances adaptability and trust in agentic AI, but its effectiveness hinges on robust, cryptographically enforced revocation mechanisms. The integration of blockchain consensus, policy-driven kill-switches, and dynamic risk-aware revocation frameworks is essential to prevent governance gaps and ensure secure, ethical operation across distributed AI ecosystems.

9.5 Shadow agents, insider risks, and stealth execution

Shadow agents, insider risks, and stealth execution present some of the most insidious security threats to agentic AI systems. These vulnerabilities exploit the autonomy, persistence, and distributed nature of such agents, often bypassing traditional defenses.

Shadow Agents and Hidden Execution Paths.

Shadow agents refer to unauthorized or hidden autonomous entities that operate alongside legitimate agents, often executing malicious tasks without detection. Their stealth arises from blending into normal agent traffic and leveraging legitimate system privileges. Research shows that shadow agents can exploit tool integrations, persistent memory, and reasoning chains to conceal malicious operations while avoiding standard detection mechanisms.

Insider Risks: The Human-AI Nexus.

Insider threats remain a critical challenge because malicious insiders already possess privileged access and knowledge of defenses. Studies indicate that AI-driven insider detection using behavioral analytics, NLP, and multimodal monitoring can improve detection rates, but attackers adapt by employing stealth strategies to avoid suspicion.¹⁸⁸ Game-theoretic analyses further reveal that when insiders collude with external attackers, stealth attacks become harder to mitigate, demanding joint monitoring of system and human interactions.¹⁸⁹

Stealth Execution Techniques.

Stealth execution involves malicious activity hidden within legitimate agent workflows, often leveraging delayed exploitability and cross-system propagation. Advanced persistent threats (APTs) have evolved to include stealthy, long-term control of agentic systems, circumventing standard anomaly detection.⁵⁸ Active Environment Injection Attacks (AEIA) demonstrate how adversaries can disguise malicious inputs as benign environmental elements, misleading agents during reasoning and decision-making.¹⁹⁰

Detection and Mitigation Approaches.

• Advanced Threat Models, such as ATFAA, map out vulnerabilities specific to agentic AI and propose detection strategies targeting cross-layer stealth behaviors.
• Active Defense Infrastructures like ShadowNet dynamically redirect suspicious traffic to quarantined environments, neutralizing attacks while logging activity for forensic analysis.
• AI-Driven Insider Monitoring combines eye-tracking, behavioral analysis, and contextual risk scoring to identify covert insider activity even when access appears legitimate.¹⁹¹

So, Shadow agents, insider threats, and stealth execution exploit blind spots in current monitoring architectures. Addressing these risks requires integrating behavior-aware detection, cryptographically enforced identity control, and continuous runtime monitoring to uncover hidden behaviors before they escalate into systemic compromises.

9.6 Embedding regulatory and legal norms into agents

Embedding regulatory and legal norms into agentic AI is a critical step toward ensuring these systems act in compliance with societal standards, ethical principles, and jurisdictional laws. Unlike static compliance methods, embedded norms must be dynamic, interpretable, and enforceable across diverse operational contexts.

Normative Embedding through AI Architecture.

Embedding norms involves integrating legal rules, ethical principles, and policy constraints directly into the reasoning and decision-making layers of AI agents. Frameworks such as Multi-Agent Online Planning Architecture for Real-Time Compliance (MAPA) formalize legal norms as hard constraints and ethical norms as soft constraints, allowing agents to re-plan dynamically when environmental conditions change. This ensures continuous adherence to evolving legal requirements without sacrificing operational flexibility.

Regulatory Compliance via Generative AI Systems.

Legal generative AI tools such as Gracenote.ai show how regulatory compliance can be operationalized by embedding domain-specific legal reasoning into agent workflows. This involves combining LLMs with horizon scanning and obligations generation tools, ensuring agents maintain compliance across multi-jurisdictional contexts while reducing risks of hallucination and misinterpretation.¹⁹² The use of human-in-the-loop mechanisms ensures that automated legal compliance remains auditable and ethically grounded.

Norm Learning and Adaptive Compliance.

Beyond embedding pre-defined rules, researchers have developed systems enabling agents to learn legal norms through behavioral exploration and sparse human supervision. This approach allows agents to infer normative boundaries from observed consequences, enabling better adaptation to ambiguous regulatory environments.¹⁹³ Such systems bridge the gap between rigid rule enforcement and the nuanced application of laws in complex real-world scenarios.

Value and Principle Embedding.

Embedding goes beyond compliance by incorporating ethical principles, autonomy, fairness, and accountability into agent behavior. Norms are treated as technical instructions (algo-norms) embedded in the system architecture, enabling agents to reason about trade-offs between legal constraints and operational goals. This aligns with policy frameworks such as those from the EU High-Level Expert Group on AI.

Challenges and Open Questions.

• Dynamic Legal Environments: Legal norms evolve, requiring agents to continuously update embedded rules.
• Interpretability vs. Complexity: Deeply embedded norms may be opaque to regulators, undermining transparency.
• Cross-Jurisdictional Compliance: Agents must handle conflicting legal requirements across regions.
• Value Conflicts: Ethical and legal norms may not always align, requiring context-sensitive prioritization.

Furthermore, embedding regulatory and legal norms into agentic AI requires technical formalization, adaptive learning mechanisms, and human oversight. Future approaches will likely combine normative reasoning architectures, LLM-driven compliance engines, and policy-aware monitoring to create agents that are not only powerful but also law-abiding and ethically trustworthy.

9.7 Institutional readiness and policy gaps

Institutional readiness for managing agentic AI remains uneven across countries, with significant policy gaps that hinder effective governance. While technological advancements have outpaced regulation, institutional mechanisms to oversee deployment, manage risks, and enforce compliance are still underdeveloped.

Disparities in Institutional Readiness.

Studies reveal substantial variation in AI governance readiness, even among technologically advanced nations. Such as The AI Family Integration Index (AFII) introduces a multidimensional tool assessing countries’ readiness to integrate emotionally intelligent AI, revealing gaps between policy rhetoric and real-world execution. Nations like Singapore demonstrate strong alignment between policy intent and operational readiness, while others for instance the U.S. and France score high technically but lag in implementing ethical integration practices.¹⁹⁴

Policy Gaps in Regulatory Frameworks.

Governments articulate ethical AI principles but often lack enforcement mechanisms and institutional capacities to translate these principles into operational standards. For instance, ASEAN countries exhibit varying levels of preparedness, with Singapore leading through sophisticated policies, while Thailand and Malaysia face enforcement challenges and infrastructural limitations.¹⁹⁵ Healthcare AI governance in the region underscores similar gaps, with many countries lacking comprehensive legal frameworks for ethical deployment.¹⁹⁶

The Governance Gap Lens.

Several frameworks identify a policy-practice dissonance; institutions may adopt AI ethics guidelines but fail to embed them into governance workflows. UNESCO’s Readiness Assessment Methodology (RAM) highlights this gap, emphasizing the need for capacity-building and alignment of regulations with human-centered principles.¹⁹⁷ Without operational alignment, even well-formulated policies risk becoming symbolic.

Emerging Decentralized Governance Models.

New proposals, such as ETHOS (Ethical Technology and Holistic Oversight System), advocate decentralized governance leveraging blockchain, smart contracts, and DAOs. These models enable dynamic risk classification, automated compliance, and transparent dispute resolution, bridging gaps where centralized oversight is insufficient.

Challenges to Institutional Readiness.

• Technical Capacity Gaps: Governments lack the technical expertise to audit and regulate rapidly evolving AI systems.¹⁹⁸
• Fragmented International Standards: Diverging national policies hinder interoperability and coordinated responses.
• Slow Policy Adaptation: Legal frameworks often lag behind technological advancements, leaving gaps exploitable by malicious actors.
• Limited Ethical Integration: Few policies account for emotional, relational, and cultural dimensions of AI deployment.¹⁹⁴

Moreso, the Institutional readiness for agentic AI governance is patchy and constrained by policy-practice gaps, technical deficits, and a lack of harmonized oversight mechanisms. Bridging these gaps requires capacity-building, cross-border coordination, and the adoption of adaptive governance frameworks, potentially integrating decentralized models like ETHOS with human-centered regulatory approaches to ensure both innovation and accountability.

9.8 Benchmarking, testing, and empirical validation platforms

The rapid growth of agentic AI necessitates robust benchmarking, testing, and empirical validation platforms to ensure reliability, safety, and adaptability. Unlike traditional machine learning benchmarks, agentic AI systems demand evaluation across dynamic environments, multi-objective optimization, and cross-agent coordination, requiring new paradigms beyond static metrics.

Multi-Objective and Safety-Oriented Benchmarks.

Recent studies emphasize the need for benchmarks that incorporate biological and economic alignment principles, reflecting real-world complexities. The multi-objective, multi-agent safety benchmarks proposed by Pihlakas & Pyykko introduce themes like homeostasis, sustainability, and resource sharing, revealing pitfalls where agents over-optimize single objectives at the expense of safety and long-term stability.¹⁹⁹

Observability-Driven Testing Frameworks.

Standard "black-box" testing is inadequate for agentic AI, where non-deterministic flows and context-dependent behaviors complicate evaluation. New frameworks advocate runtime observability and analytics to extract decision traces, detect emergent issues, and optimize agent performance dynamically.²⁰⁰ These approaches enable continuous, interpretable evaluation across development and deployment phases.

Task-Specific Validation Platforms.

Several platforms target specialized domains:

• OSUniverse benchmarks GUI-navigation agents, testing capabilities from precision tasks to multi-application workflows, with automated validation, achieving high reliability.²⁰¹
• REALM-Bench evaluates multi-agent planning under dynamic disruptions, scaling task complexity to test adaptability and inter-agent coordination.²⁰²
• CORE-Bench focuses on computational reproducibility, assessing agent performance in replicating scientific workflows, an essential step toward trustworthy AI in research contexts.²⁰³

Explainability and Validation Toolkits.

Platforms like EXACT (Explainable AI Comparison Toolkit) provide standardized datasets and metrics for validating the quality of model explanations, revealing that many XAI methods underperform when compared to human expectations.²⁰⁴ These insights are crucial as agentic AI must be auditable and interpretable to meet regulatory and ethical standards.

Challenges and Future Directions.

• Non-determinism and emergent behaviors complicate reproducibility and standardization.
• Cross-domain benchmarking is lacking, as current platforms often address narrow use cases.
• Integration of safety, ethics, and performance metrics into unified benchmarks is still underdeveloped.

So, the Next-generation benchmarking for agentic AI must integrate multi-objective safety, observability-driven analytics, and real-world complexity. Emerging platforms such as REALM-Bench, OSUniverse, and CORE-Bench mark a shift toward holistic, dynamic validation environments, paving the way for safer and more trustworthy agentic AI deployments.

9.9 Research gaps identified across adjacent domains

Despite substantial progress in agentic AI, significant research gaps persist across cybersecurity, ethics, governance, and multi-agent systems, hindering the development of fully trustworthy deployments.

1. Cybersecurity and Risk Management Gaps.
Agentic AI introduces new attack surfaces and responsibility gaps not fully addressed by current cybersecurity frameworks. While advanced approaches leverage agentic and frontier AI for ethical threat intelligence, researchers note a lack of standardized methods for continuous, proactive defense and cross-domain incident reporting. Moreover, existing laws fail to regulate AI-driven offensive cyber capabilities, leaving accountability for AI-initiated cyber incidents unresolved.²⁰⁵
1. Governance and Institutional Gaps.
AI governance remains fragmented, with unclear implementation mechanisms, insufficient operationalization of ethical principles, and a lack of international coordination.²⁰⁶ Decentralized governance proposals such as ETHOS show promise but require further empirical validation to ensure effectiveness in multi-jurisdictional contexts.
2. Ethical and Normative Gaps.
Ethical integration in agentic AI remains superficial. Existing work highlights moral crumple zones, where accountability becomes diffused across multiple actors, leaving harms unaddressed. There is a need for robust value-alignment frameworks that prevent agents from drifting toward unintended goals while embedding context-aware legal norms directly into AI reasoning layers.
5. Multi-Agent System Coordination Gaps.
Research on multi-agent collaboration shows that emergent behaviors in cross-domain settings remain unpredictable and under-evaluated. Recent work on cross-domain knowledge discovery using multi-AI agents reveals the potential of collaborative frameworks but highlights gaps in efficiency, knowledge transfer, and conflict resolution mechanisms.²⁰⁷
6. Risk Alignment and Accountability Gaps.
Risk alignment, ensuring agentic AI systems adopt risk attitudes aligned with human values, remains an unresolved issue. Poorly calibrated systems risk reckless behaviors and create responsibility voids where neither developers nor users can be held fully accountable.²⁰⁸ Further work is needed to integrate risk-calibration mechanisms into agent decision-making.
7. Interdisciplinary and Cross-Domain Gaps.
Research across ethics, cybersecurity, and governance remains siloed, preventing comprehensive solutions. The rise of agentic AI for scientific discovery underscores the need for interdisciplinary frameworks combining technical safety, ethical oversight, and legal enforceability.

As shown in Table 5. The key research gaps lie in standardizing cybersecurity protocols, operationalizing governance models, embedding ethics at the architectural level, and achieving predictable multi-agent coordination. Addressing these gaps demands interdisciplinary research, adaptive regulatory frameworks, and empirical validation of emerging solutions to ensure safe, ethical, and effective agentic AI deployments.

Table 5. Research Gaps Across Adjacent Domains.

Domain	Key Research Gaps	Needed Advances
Cybersecurity	Lack of standardized cross-domain incident reporting; evolving adversarial threats	Proactive, adaptive defense frameworks; integrated threat intelligence
Governance	Weak operationalization of ethics; inconsistent international standards	Polycentric governance; dynamic compliance monitoring
Ethics	Diffused accountability (“moral crumple zones”); shallow value embedding	Context-aware normative reasoning: architectures for moral responsibility
Multi-Agent Systems	Unpredictable emergent behaviors; poor conflict resolution mechanisms	Conflict-aware coordination algorithms; scalable testing platforms
Risk & Accountability	Misaligned risk attitudes; absence of clear liability frameworks	Integrated risk calibration; legally enforceable accountability models
Benchmarking & Validation	Limited dynamic cross-domain evaluation; inadequate metrics	Observability-driven benchmarks; multi-objective safety evaluation

10. Conclusion

To support implementation, a consolidated list of strategic recommendations is provided in Table A9 (Supplementary Material).

10.1 Summary of insights from the survey

This survey integrates findings from diverse research on agentic AI architectures, threats, defense mechanisms, and governance, providing a holistic understanding of the challenges and strategies required for trustworthy deployment.

Architectural Complexity and Unique Threats.

Agentic AI systems differ fundamentally from traditional AI and LLMs because they reason, plan, and act autonomously across distributed environments. Their unique architecture introduces novel vulnerabilities such as cognitive exploits, shadow agents, and cross-layer propagation that are not addressed by legacy security frameworks. New threat models like ATFAA have been proposed to classify these risks and inform mitigation strategies.

Evolving Governance and Oversight Models.

Traditional governance frameworks (e.g., OECD, EU AI Act, NIST) provide initial guardrails but lack specific provisions for agentic AI, which operates across federated and dynamic contexts. Emerging solutions combine policy-driven governance, blockchain-backed trust frameworks, and decentralized oversight models to fill institutional gaps.

Defense Strategies Require Layered Approaches.

Defense mechanisms such as SHIELD, Zero-Trust Architectures, and SAGA address distinct layers of risk from secure execution to cryptographic identity control. However, no single framework suffices; future defense must integrate layered monitoring, cryptographic enforcement, and AI-driven threat adaptation to counter stealth and insider risks effectively.

Key Insights Across Adjacent Domains.

• Cybersecurity research highlights the need for proactive, adaptive defense, as static measures fail against evolving multi-agent threats.
• Governance studies reveal persistent gaps in regulatory readiness and cross-jurisdictional enforcement.
• Ethics research warns of moral crumple zones where accountability is diffused, necessitating embedded normative reasoning.
• Benchmarking and validation platforms remain underdeveloped for capturing emergent, non-deterministic agent behaviors, requiring new observability-driven metrics.

The review underscores that building trustworthy agentic AI requires synergistic advances across technical, governance, ethical, and empirical domains. A multi-layered defense, decentralized yet coordinated oversight, and interdisciplinary research are imperative to closing gaps and ensuring secure, accountable, and beneficial deployment of these autonomous systems.

10.2 Future outlook for trustworthy agentic AI

The future of trustworthy agentic AI will be defined by technological advancements, ethical integration, and global governance innovations. The evolution of these systems will likely follow trends observed in emerging AI research, emphasizing adaptability, explainability, and human-centered oversight.

Emerging Trends and Technological Drivers.

Agentic AI will increasingly integrate quantum computing, edge intelligence, and multi-agent meta-learning to enhance scalability and decision-making capabilities. Future systems are expected to exhibit meta-reasoning abilities, enabling agents to explain and justify their decision-making processes, bridging current gaps in interpretability and accountability.

Shifting Toward Human-Centric and Ethical AI.

Trustworthy deployment will require embedding ethical norms, social intelligence, and human-in-the-loop mechanisms into agentic architectures. Future agentic AI is predicted to adopt multi-dimensional intelligence models, incorporating social, emotional, and ethical reasoning to align more closely with human values. These systems will increasingly focus on value-sensitive design, minimizing risks of manipulation or harmful autonomy.

Governance and Regulatory Trajectories.

Regulatory readiness will remain a decisive factor. Evolving policies must adapt to dynamic agentic behaviors and cross-border interactions, requiring frameworks that combine decentralized trust with enforceable accountability mechanisms. Explainable AI (XAI) and third-party audits will become core compliance tools to ensure that regulations translate into operational safety.

Trust, Adoption, and Human-AI Collaboration.

Trust in agentic AI will dictate adoption rates. Studies highlight that trust is shaped by technical robustness, ethical alignment, and perceived transparency. Agents capable of explaining their reasoning and negotiating with human stakeholders will foster a collaborative ecosystem rather than one of conflict or opacity.

Challenges Ahead.

Persistent risks include adversarial manipulation, moral crumple zones, and governance gaps in decentralized deployments. Addressing these requires interdisciplinary efforts, combining advances in cybersecurity, ethics, and policy to build systems that remain resilient under both technological and societal pressures.

The future of trustworthy agentic AI lies in adaptive architectures enriched with ethical intelligence, supported by transparent governance frameworks and human-centered oversight. As these systems evolve, ensuring they remain aligned, secure, and explainable will be critical to realizing their transformative potential while safeguarding public trust and global stability.

10.3 Call for interdisciplinary collaboration

Building trustworthy agentic AI is an inherently interdisciplinary challenge, demanding expertise that spans technical design, policy, ethics, law, and social sciences. The complexity of agentic AI autonomous systems capable of decision-making, planning, and multi-agent coordination requires coordinated efforts to mitigate risks, align goals, and ensure accountability.

The Necessity of Cross-Domain Expertise.

Agentic AI’s transformative potential is accompanied by risks that cannot be solved by technical advances alone. Studies emphasize that interdisciplinary collaboration uniting AI engineers, ethicists, legal scholars, and social scientists is crucial to address the ethical, legal, and societal implications of autonomy and long-term goal pursuit. Collaborative frameworks ensure that AI solutions are not only technically robust but also socially aligned and ethically grounded.

Enhancing Collaboration with Hybrid Models.

Emerging research supports hybrid collaboration models, where multi-agent AI systems work alongside humans to jointly solve complex problems, amplifying creativity and problem-solving capacity. In software development, frameworks such as ChatCollab show how human and AI agents can co-create solutions effectively, reinforcing the benefits of team-based interdisciplinary dynamics.

Institutionalizing Interdisciplinary Practices.

Interdisciplinary collaboration must move beyond ad hoc partnerships to become institutionalized. This includes creating cross-sectoral task forces, academic-industry consortia, and policy advisory groups that foster ongoing dialogue between technical developers, regulators, and ethicists. Iterative methodologies that combine ethics-by-design, value-sensitive design, and continuous feedback cycles have been proposed to maximize the benefits of interdisciplinary synergies.

Shaping Trustworthy Human-AI Collaboration.

Research highlights that trust in agentic AI depends on collaborative governance, transparent communication, and shared decision-making between human and AI agents. Multi-disciplinary approaches also help anticipate unintended consequences and design AI ecosystems that align with societal values.

The path to trustworthy agentic AI lies in deep interdisciplinary collaboration. By uniting technical innovation with ethical reasoning, legal oversight, and human-centered design, stakeholders can create AI systems that are not only powerful and adaptive but also transparent, accountable, and aligned with human welfare. Future advancements will require sustained, cooperative frameworks bridging academia, industry, and policy to ensure that agentic AI evolves as a beneficial and trustworthy partner in society.

Ethics and consent statement

Ethical approval and consent were not required.

Data availability

The supplementary materials underlying this article are openly available on Figshare²⁰⁹: Trustworthy Agentic AI Systems: A Cross-Layer Review of Architectures, Threat Models, and Governance Strategies for Real-World Deployment: Supplementary Data. This repository contains Tables, Figures, Appendix files, and Supplementary Data. All newly generated materials and supplementary datasets are available under the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Acknowledgments

Not applicable.

References

1. Vanneste BS, Puranam P: Artificial Intelligence, Trust, and Perceptions of Agency. Acad. Manag. Rev. Mar. 2024. Publisher Full Text
2. Karamchand G: Zero trust and AI: A synergistic approach to next-generation cyber threat mitigation. World J. Adv. Res. Rev. Dec. 2024; 24(3): 3374–3387. Publisher Full Text
3. Conradie NH, Nagel SK: No Agent in the Machine: Being Trustworthy and Responsible about AI. Philos. & Technol. Jun. 2024; 37(2). Publisher Full Text
4. Freiman O: Making sense of the conceptual nonsense ‘trustworthy AI,’. AI Ethics. Nov. 2023; 3(4): 1351–1360. Publisher Full Text
5. Lahusen C, Maggetti M, Slavkovik M: Trust, trustworthiness and AI governance. Sci. Rep. Dec. 2024; 14(1): 20752. PubMed Abstract | Publisher Full Text | Free Full Text
6. Kumar A, Sharma A, Pujari M: AI Governance via Explainable Reinforcement Learning (XRL) for Adaptive Cyber Deception in Zero-Trust Networks. J. Inf. Syst. Eng. Manag. May 2025; 10(43s): 98–115. Publisher Full Text
7. Mintoo AA, Saimon ASM, Bakhsh MM, et al.: NATIONAL RESILIENCE THROUGH AI-DRIVEN DATA ANALYTICS AND CYBERSECURITY FOR REAL-TIME CRISIS RESPONSE AND INFRASTRUCTURE PROTECTION. Am. J. Sch. Res. Innov. Mar. 2022; 1(1): 137–169. Publisher Full Text
8. Antony JIP, Khalid PZM: Integrating Artificial Intelligence (AI) in Teaching and Learning. Int. J. Multidiscip. Res. Mar. 2024; 6(2). Publisher Full Text
9. Afroogh S, Akbari A, Malone E, et al.: Trust in AI: Progress, Challenges, and Future Directions. ArXiv. 2024; abs/2403.1. Publisher Full Text
10. Slosser JL, Aasa B, Olsen HP: Trustworthy AI. Technol. Regul. Oct. 2023; 2023: 58–68. Publisher Full Text
11. Herzog C, Blank S, Stahl BC: Towards trustworthy medical AI ecosystems - a proposal for supporting responsible innovation practices in AI-based medical innovation. AI Soc. Apr. 2025; 40(4): 2119–2139. Publisher Full Text
12. Budnik C: Can We Trust Artificial Intelligence? Philos. & Technol. Mar. 2025; 38(1). Publisher Full Text
13. Adhikari D, Thapaliya S: An Overview of AI Applications in Cybersecurity for IT Management. NPRC J. Multidiscip. Res. Oct. 2024; 1(4): 121–133. Publisher Full Text
14. Veritti D, Rubinato L, Sarao V, et al.: Behind the mask: a critical perspective on the ethical, moral, and legal implications of AI in ophthalmology. Graefes Arch. Clin. Exp. Ophthalmol. Mar. 2024; 262(3): 975–982. Publisher Full Text
15. Afzal MNI, Shohan AHN, Siddiqui S, et al.: Application of AI on Human Resource Management: A Review. J. Hum. Resour. Manag. - HR Adv. Dev. Aug. 2023; 2023(1): 1–11. Publisher Full Text
16. Smith GK: Strategic Integration of Generative AI: Opportunities, Challenges, and Organizational Impacts. Law, Econ. Soc. May 2025; 1(1): p156. Publisher Full Text
17. Byrne JA: Improving the peer review of narrative literature reviews. Res. Integr. Peer Rev. Dec. 2016; 1(1): 12. PubMed Abstract | Publisher Full Text | Free Full Text
18. Sapkota R, Roumeliotis KI, Karkee M: AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges.2025. Accessed: Aug. 02, 2025.
19. Singh A, Ehtesham A, Kumar S, et al.: Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model 2024 IEEE World AI IoT Congr. 2024; pp. 527–532. Publisher Full Text
20. Bousetouane F: Agentic Systems: A Guide to Transforming Industries with Vertical AI Agents. ArXiv. 2025; abs/2501.0. Publisher Full Text
21. Saleh A, Tarkoma S, Donta P: Usercentrix: An agentic memory-augmented ai framework for smart spaces. arxiv.org A Saleh, S Tarkoma, PK Donta, NH Motlagh, S Dustdar, S Pirttikangas, L LovénarXiv Prepr. arXiv2505.00472, 2025•arxiv.org. 2025. Accessed: Aug. 02, 2025. Reference Source
22. Dai L, Jiang YH, Chen Y, et al.: Agent4EDU: Advancing AI for Education with Agentic Workflows Proc. 2024 3rd Int. Conf. Artif. Intell. Educ. Apr. 2025; pp. 180–185. Publisher Full Text
23. Zhao P, Jin Z, Cheng N: An In-depth Survey of Large Language Model-based Artificial Intelligence Agents. ArXiv. 2023; abs/2309.1. Publisher Full Text
24. Saleh A, et al.: UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces.May 2025. Accessed: Aug. 02, 2025. Reference Source
25. Fourney A, et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. ArXiv. 2024; abs/2411.0. Publisher Full Text
26. Chawla C, Chatterjee S, Gadadinni SS, et al.: Agentic AI: The building blocks of sophisticated AI business applications. J. AI, Robot. & Work. Autom. Sep. 2024; 3(3): 196. Publisher Full Text
27. Manheim D: Overoptimization Failures and Specification Gaming in Multi-agent Systems. Big Data Cogn. Comput. 2019; 3(2): 1–15. Publisher Full Text
28. Tallam K: Alignment, Agency and Autonomy in Frontier AI: A Systems Engineering Perspective. ArXiv. 2025; abs/2503.0. Publisher Full Text
29. de Witt C : Open challenges in multi-agent security: Towards secure systems of interacting ai agents.2025. Accessed: Aug. 02, 2025. Reference Source Reference Source
30. Balachandar N, Dieter J, Ramachandran GS: Collaboration of AI Agents via Cooperative Multi-Agent Deep Reinforcement Learning.Jun. 2019. Accessed: Aug. 02, 2025. Reference Source
31. Chenna S: Exploring the Synergy of Generative and Distributed AI in Multi-agent Systems. SSRN Electron. J. 2023. Publisher Full Text
32. Manjunath Kamath K, Samata Mehta S, Akshaya HL, et al.: Neuromorphic-Driven Agentic AI for Autonomous Decision-Making Systems 2024 4th Int. Conf. Mob. Networks Wirel. Commun. 2024; pp. 1–8. Publisher Full Text
33. Balasubramani R, Biradar VG: Empowering Autonomous Decision-Making Through Quantum Reinforcement Learning and Cognitive Neuromorphic Frameworks 2024 4th Int. Conf. Mob. Networks Wirel. Commun. 2024; pp. 1–7. Publisher Full Text
34. Freire IT, Arsiwalla XD, Puigbò JY, et al.: Modeling Theory of Mind in Dyadic Games Using Adaptive Feedback Control. Inf. Aug. 2023; 14(8). Publisher Full Text
35. Ivanov D, Dütting P, Talgam-Cohen I, et al.: Principal-Agent Reinforcement Learning: Orchestrating AI Agents with Contracts.2024.
36. Freire I, Arsiwalla X; J. P. preprint arXiv, and undefined 2019: Modeling theory of mind in multi-agent games using adaptive feedback control. IT Freire, XD Arsiwalla, JY Puigbò, P VerschurearXiv Prepr. arXiv1905.13225, 2019•arxiv.org. 2019. Accessed: Aug. 02, 2025. Reference Source https://arxiv.org/abs/1905.13225
37. Thoom SR: Understanding Agentic Frameworks in AI Development: A Technical Analysis. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2025; 11(1): 518–527. Publisher Full Text
38. Langley P: An cognitive architectures and the construction of intelligent agents. P LangleyProc. Work. Intell. Agent Archit. 2004•cdn.aaai.org. 2024. Accessed: Aug. 02, 2025 Reference Source Reference Source
39. Slaoui S: S-AI: A Sparse Artificial Intelligence System Orchestrated by a Hormonal MetaAgent and Context-Aware Specialized Agents. Int. J. Multidiscip. Res. Apr. 2025; 7(2). Publisher Full Text
40. Liu B, et al.: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems. ArXiv. 2025; abs/2504.0. Publisher Full Text
41. Klejnowski L, Bernard Y, Hähner J, et al.: An Architecture for Trust-Adaptive Agents 2010 Fourth IEEE Int. Conf. Self-Adaptive Self-Organizing Syst. Work. 2010; pp. 178–183. Publisher Full Text
42. Satav A: Enterprise API & Platform Strategy in the era of Agentic AI. J. Comput. Sci. Technol. Stud. Mar. 2025; 7(1): 380–385. Publisher Full Text
43. Rakshit P, Konar A: Agents and Multi-agent Coordination.2018; 57–88. Publisher Full Text
44. Joshi S: Advancing innovation in financial stability: A comprehensive review of ai agent frameworks, challenges and applications. World J. Adv. Eng. Technol. Sci. Feb. 2025; 14(2): 117–126. Publisher Full Text
45. Lesser VR: Reflections on the Nature of Multi-Agent Coordination and Its Implications for an Agent Architecture. Auton. Agent. Multi-Agent Syst. 1998; 1(1): 89–111. Publisher Full Text
46. Du M, Zhang M, Hu Y, et al.: Blockchain for Distributed Consistency: A Cliquebased Framework for Multi-agent Systems 2021 7th Int. Conf. Big Data Inf. Anal. 2021; pp. 421–427. Publisher Full Text
47. Yang T, Liu Y, Yang X, et al.: A Blockchain based Smart Agent System Architecture Proc. 4th Int. Conf. Crowd Sci. Eng. Oct. 2019; pp. 33–39. Publisher Full Text
48. Pokhrel SR, Yang L, Rajasegarar S, Li G: Robust Zero Trust Architecture: Joint Blockchain based Federated learning and Anomaly Detection based Framework Proc. SIGCOMM Work. Zero Trust Archit. Next Gener. Commun. Aug. 2024; pp. 7–12. Publisher Full Text
49. Mishra S, Tandon DR: Federated Learning in Healthcare: A Path Towards Decentralized and Secure Medical Insights. INTERANTIONAL J. Sci. Res. Eng. Manag. Oct. 2024; 08(10): 1–15. Publisher Full Text
50. Kiran S, Kumar A, Chukkala S: Decentralized AI at the Edge: Federated Learning, Quantum Optimization and IoT Scalability. Int. J. Sci. Res. Arch. Mar. 2025; 14(3): 256–263. Publisher Full Text
51. Tariq A, et al.: Trustworthy Federated Learning: A Comprehensive Review, Architecture, Key Challenges, and Future Research Prospects. IEEE Open J. Commun. Soc. 2024; 5: 4920–4998. Publisher Full Text
52. Mabina A, Mbotho A: A Hybrid Framework for Securing 5G-Enabled Healthcare Systems. Stud. Med. Heal. Sci. Jan. 2025; 2(1). Publisher Full Text
53. Echezona U, Emmanuel I, Motilol OT: Analyzing Edge AI Deployment Challenges with in Hybrid IT Systems Utilizing Containerization and Blockchain-Based Data Provenance Solutions. Int. J. Sci. Res. Mod. Technol. 2024; 125–141. Publisher Full Text
54. Karim MM, Van DH, Khan S, et al.: AI Agents Meet Blockchain: A Survey on Secure and Scalable Collaboration for Multi-Agents. Futur. Internet. Feb. 2025; 17(2). Publisher Full Text
55. Liu Y, et al.: SharHSC: A Sharding-Based Hybrid State Channel to Realize Blockchain Scalability and Security. IEEE Trans. Dependable Secur. Comput. 2025; 22(3): 2705–2722. Publisher Full Text
56. Barros S: Trusted Identities for AI Agents: Leveraging Telco-Hosted eSIM Infrastructure. ArXiv. 2025; abs/2504.1. Publisher Full Text
57. Villegas-Ch W, Govea J, Gutierrez R, et al.: Optimizing Security in IoT Ecosystems Using Hybrid Artificial Intelligence and Blockchain Models: A Scalable and Efficient Approach for Threat Detection. IEEE Access. 2025; 13: 16933–16958. Publisher Full Text
58. de Witt CS : Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents.May 2025. Accessed: Aug. 05, 2025 Reference Source
59. Šekrst K: Chinese Chat Room: AI Hallucinations, Epistemology and Cognition. Stud. Logic, Gramm. Rhetor. Dec. 2024; 69(1): 365–381. Publisher Full Text
60. Tlaie A: Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks. ArXiv. 2024; abs/2410.1. Publisher Full Text
61. da Silva Oliveira DG : Exploring the Risks of General-Purpose AI: The Role of the Brain’s Reward Mechanism and Nearsighted Goals in Processes of Decision-Makings. Commun. Comput. Inf. Sci. 2025; 2134: 261–267. Publisher Full Text
62. Li H, Principe J: Speeding Up Reinforcement Learning by Exploiting Causality in Reward Sequences 2021 Int. Jt. Conf. Neural Networks. Jul. 2021; vol. 2021-July: pp. 1–6. Publisher Full Text
63. Patlan A, Sheng P, Hebbar S, et al.: Real ai agents with fake memories: Fatal context manipulation attacks on web3 agents.2025. 2025, Accessed: Aug. 03, 2025. Reference Source Reference Source
64. Zhang Y, Chen K, Jiang X, et al.: Towards Action Hijacking of Large Language Model-based Agent. ArXiv. 2024; abs/2412.1. Publisher Full Text
65. Asadi M, Ruadulescu R, Now’e A: Explainable AI Based Diagnosis of Poisoning Attacks in Evolutionary Swarms.2025. Publisher Full Text
66. Hossain MT, La H, Badsha S: RAMPART: Reinforcing Autonomous Multi-Agent Protection through Adversarial Resistance in Transportation. J. Auton. Transp. Syst. Dec. 2024; 1(4): 1–25. Publisher Full Text
67. Jiao R, et al.: CAN WE TRUST EMBODIED AGENTS? EXPLORING BACKDOOR ATTACKS AGAINST EMBODIED LLM-BASED DECISION-MAKING SYSTEMS.2025.
68. Pan X, Hahami E, Zhang Z, et al.: Memorization and Knowledge Injection in Gated LLMs. ArXiv. 2025; abs/2504.2. Publisher Full Text
69. Sengupta A: Securing the Autonomous Future A Comprehensive Analysis of Security Challenges and Mitigation Strategies for AI Agents. INTERANTIONAL J. Sci. Res. Eng. Manag. 2024; 08(12): 1–2. Publisher Full Text
70. Shi J, Yuan Z, Tie G, et al.: Prompt Injection Attack to Tool Selection in LLM Agents. ArXiv. 2025; abs/2504.1. Publisher Full Text
71. Rossi S, Michel AM, Mukkamala R, et al.: An Early Categorization of Prompt Injection Attacks on Large Language Models. ArXiv. 2024; abs/2402.0. Publisher Full Text
72. Lee D, Tiwari M: Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems. ArXiv. 2024; abs/2410.0. Publisher Full Text
73. Zhan Q, Liang Z, Ying Z, et al.: InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents.2024; 10471–10506. Publisher Full Text
74. Nakash I, Kour G, Uziel G, et al.: Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In.2024; 6484–6509. Publisher Full Text
75. Zhu K, Yang X, Wang J, et al.: MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents.2025. Accessed: Aug. 03, 2025 Reference Source
76. Narajala VS, Narayan O: Securing Agentic AI: A Comprehensive Threat Model and Mitigation Framework for Generative AI Agents. ArXiv. 2025; abs/2504.1. Publisher Full Text
77. Shaikh A, Oliveira D: Shadow-IT system and Insider Threat: Opportunity as a Situational Perspective Conf. Proc. - IEEE SOUTHEASTCON. Apr. 2019; vol. 2019-April. Publisher Full Text
78. Akello P: Volitional non-malicious insider threats: At the intersection of COVID-19, WFH and cloud-facilitated shadow-apps.2021. Accessed: Aug. 03, 2025. Reference Source
79. Wu Z, et al.: LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries. Z Wu, S Cho, U Mohammed, C Munoz, K Costa, X Guan, T King, Z Wang, E KazimarXiv Prepr. arXiv2505.08842, 2025•arxiv.org. 2025. Accessed: Aug. 03, 2025. Reference Source Reference Source
80. Cui X, Gasior W, Beaver J, et al.: ShadowNet: An Active Defense Infrastructure for Insider Cyber Attack Prevention Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 2012; vol. 7336 LNCS(PART 4): pp. 646–653. Publisher Full Text
81. Calzada I, Németh G, Al-Radhi MS: Trustworthy AI for Whom? GenAI Detection Techniques of Trust Through Decentralized Web3 Ecosystems. Big Data Cogn. Comput. Mar. 2025; 9(3). Publisher Full Text
82. Sutcliffe HR, Brown S: Trust and Soft Law for AI. IEEE Technol. Soc. Mag. Dec. 2021; 40(4): 14–24. Publisher Full Text
83. Zhang J, Bentahar J, Falcone R, et al.: Introduction to the Special Section on Trust and AI. ACM Trans. Internet Technol. Nov. 2019; 19(4): 1–3. Publisher Full Text
84. Yousseef A, et al.: Autonomous Vehicle Security: A Deep Dive into Threat Modeling. ArXiv. 2024; abs/2412.1. Publisher Full Text
85. Tallam K: Engineering Risk-Aware, Security-by-Design Frameworks for Assurance of Large-Scale Autonomous AI Models. K TallamarXiv Prepr. arXiv2505.06409, 2025•arxiv.org. 2025. Accessed: Aug. 03, 2025 Reference Source Reference Source
86. Lievin R, Jamont JP, Hely D: CLASA: a Cross-Layer Agent Security Architecture for networked embedded systems 2021 IEEE Int. Conf. Omni-Layer Intell. Syst. Aug. 2021; pp. 1–8. Publisher Full Text
87. Wang B, Wu Y, Guo N, et al.: A cross-layer attack path detection method for smart grid dynamics 2022 5th Int. Conf. Adv. Electron. Mater. Comput. Softw. Eng. 2022; pp. 142–146. Publisher Full Text
88. Cirillo M, Di Mauro M, Matta V, et al.: Cyber-Threat Propagation over Network-Slicing Architectures ICASSP 2022-2022 IEEE Int. Conf. Acoust. Speech Signal Process. 2022; vol. 2022-May: pp. 2984–2988. Publisher Full Text
89. Gandotra V, Archana Singhal A, Bedi P: Layered security architecture for threat management using multi-agent system. ACM SIGSOFT Softw. Eng. Notes. Sep. 2011; 36(5): 1–11. Publisher Full Text
90. Yao P, Jiang Z, Yan B, et al.: Bayesian and stochastic game joint approach for Cross-Layer optimal defensive Decision-Making in industrial Cyber-Physical systems. Inf. Sci. Mar. 2024; 662: 120216. Publisher Full Text
91. Paul EM, Stanley UM, Kessie JD, et al.: Adversarial machine learning in cybersecurity: Mitigating evolving threats in AI-powered defense systems. World J. Adv. Eng. Technol. Sci. Dec. 2023; 10(2): 309–325. Publisher Full Text
92. Moharir C, Kuppuraju SY, Patil S: Adversarial Machine Learning Defenses in AI-Enabled Cybersecurity Systems. Int. J. Multidiscip. Res. Apr. 2025; 7(2). Publisher Full Text
93. Peter I, et al.: Harnessing adversarial machine learning for advanced threat detection: AI-driven strategies in cybersecurity risk assessment and fraud prevention. Open Access Res. J. Sci. Technol. May 2024; 11(1): 001–004. Publisher Full Text
94. Jehan N, Ansari NM, Ashraf Z, et al.: Adversarial Machine Learning for Cyber security Defense: Detecting Model Evasion, Poisoning Attacks, and Enhancing the Robustness of AI Systems. Glob. Res. J. Nat. Sci. Technol. Apr. 2025; 3. Publisher Full Text
95. Pasupuleti MK: Securing AI-driven Infrastructure: Advanced Cybersecurity Frameworks for Cloud and Edge Computing Environments.Mar. 2025. Publisher Full Text
96. Chen S, et al.: Blockchain Enabled Intelligence of Federated Systems (BELIEFS): An attack-tolerant trustable distributed intelligence paradigm. Energy Rep. 2021; 7: 8900–8911. Publisher Full Text
97. Narajala VS, Huang K, Habler I: Securing GenAI Multi-Agent Systems Against Tool Squatting: A Zero Trust Registry-Based Approach. ArXiv. 2025; abs/2504.1. Publisher Full Text
98. Timmers P: Ethics of AI and Cybersecurity When Sovereignty is at Stake. Mind. Mach. 2019; 29(4): 635–645. Publisher Full Text
99. OECD: OECD Framework for the Classification of AI systems. OECD Digit. Econ. Pap. Feb. 2022; 323. Publisher Full Text
100. von Struensee S : Analyzing Dilemmas Posed by Artificial Intelligence and 4IR Technologies Requires using all Available Models, Including the Existing International Human Rights Framework and Principles of AI Ethics. SSRN Electron. J. Jul. 2021. Publisher Full Text
101. Cancela-Outeda C: The EU’s AI act: A framework for collaborative governance. Internet Things. Oct. 2024; 27: 101291. Publisher Full Text
102. Gasser U: An EU landmark for AI governance. Science (80-.). Jun. 2023; 380(6651): 1203–1203. PubMed Abstract | Publisher Full Text
103. Priyanshu A, Maurya Y, Hong Z: AI Governance and Accountability: An Analysis of Anthropic’s Claude. ArXiv. 2024; abs/2407.0. Publisher Full Text
104. Wodi A: Artificial Intelligence (AI) Governance: An Overview. SSRN Electron. J. 2024. Publisher Full Text
105. Chaffer TJ, Goldston J, Okusanya B, et al.: Decentralized Governance of Autonomous AI Agents. Probl. Polit. Auth. 2013; 81–100. Publisher Full Text
106. Rebera AP: Reactive Attitudes and AI-Agents – Making Sense of Responsibility and Control Gaps. Philos. & Technol. Dec. 2024; 37(4). Publisher Full Text
107. Kasirzadeh A, Gabriel I: Characterizing AI Agents for Alignment and Governance. ArXiv. 2025; abs/2504.2. Publisher Full Text
108. Mukherjee A, Chang H: Agentic AI: Autonomy, Accountability, and the Algorithmic Society. ArXiv. 2025; abs/2502.0. Publisher Full Text
109. Chaffer TJ, Bayo JG, Gemach O, et al.: On the ETHOS of AI Agents: An Ethical Technology and Holistic Oversight System. ArXiv. 2024; abs/2412.1. Publisher Full Text
110. Huang K, Narajala V, Habler I, et al.: Agent name service (ans): A universal directory for secure ai agent discovery and interoperability. K Huang, VS Narajala, I Habler, A SheriffarXiv Prepr. arXiv2505.10609, 2025•arxiv.org. 2025. 2025, Accessed: Aug. 04, 2025. Reference Source https://arxiv.org/abs/2505.10609
111. OECD: Advancing accountability in AI. OECD Digit. Econ. Pap. Feb. 2023; 349. Publisher Full Text
112. Markovic M, Naja I; P. E.-C. W et al.: The accountability fabric: A suite of semantic tools for managing ai system accountability and audit. aura.abdn.ac.ukM Markovic, I Naja, P Edwards, W PangCEUR Work. Proceedings, 2021•aura.abdn.ac.uk. 2021. 2021, Accessed: Aug. 04, 2025. Reference Source
113. Baldoni M, Baroglio C, Micalizio R, et al.: Accountability in multi-agent organizations: from conceptual design to agent programming. Auton. Agent. Multi-Agent Syst. 2023; 37(1). Publisher Full Text
114. Mont MC: Privacy-Aware Identity Lifecycle Management. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 2011; 6545: 397–426. Publisher Full Text
115. Hariharan R: AI-Driven Identity and Access Management in Enterprise Systems. Int. J. IoT. May 2025; 05(01): 62–94. Publisher Full Text
116. van de Poel I : Embedding Values in Artificial Intelligence (AI) Systems. Mind. Mach. Sep. 2020; 30(3): 385–409. Publisher Full Text
117. Hayashi H, et al.: Multi-agent online planning architecture for real-time compliance. H Hayashi, T Mitsikas, YS Taheri, K Tsushima, R Schäfermeier, G Bourgne, JG Ganascia17th Int. Rule Chall. 7th Dr. …, 2023•hal.sorbonne-universite.fr. 2023. Accessed: Aug. 04, 2025. Reference Source Reference Source
118. Del Carmen Fernández Martínez M, Fernández A: AI in Recruiting. Multi-agent Systems Architecture for Ethical and Legal Auditing. IJCAI Int. Jt. Conf. Artif. Intell. 2019; 2019-August: 6428–6429. Publisher Full Text
119. Laukyte M: AI as a Legal Person. Proc. Seventeenth Int. Conf. Artif. Intell. Law. Jun. 2019; 209–213. Publisher Full Text
120. Fernández C, Fernández A: Inclusive AI in Recruiting. Multi-agent Systems Architecture for Ethical and Legal Auditing. Commun. Comput. Inf. Sci. 2019; 1047: 326–329. Publisher Full Text
121. Tang C: AI and big data in economic regulation: A comparative analysis of China and the United States. Appl. Comput. Eng. Jul. 2024; 69(1): 78–84. Publisher Full Text
122. Bhatta NP: Governance Models in Education: Insights for Nepal’s Federal Education System. AMC J. 2024; 5(1): 34–52. Publisher Full Text
123. Hafid A, Hocine R, Guezouli L, et al.: Centralized and Decentralized Federated Learning in Autonomous Swarm Robots: Approaches, Algorithms, Optimization Criteria and Challenges: The Sixth Edition of International Conference on Pattern Analysis and Intelligent Systems (PAIS’24) 2024 6th Int. Conf. Pattern Anal. Intell. Syst. 2024; pp. 1–8. Publisher Full Text
124. Araujo-Vizuete G, Robalino-López A: A Systematic Roadmap for Energy Transition: Bridging Governance and Community Engagement in Ecuador. Smart Cities. May 2025; 8(3): 80. Publisher Full Text
125. Tallam K: Transforming Cyber Defense: Harnessing Agentic and Frontier AI for Proactive, Ethical Threat Intelligence. ArXiv. 2025; abs/2503.0. Publisher Full Text
126. Balkin J: The Path of Robotics Law.2015; 6. Publisher Full Text
127. Winfield J: Ethical governance is essential to building trust in robotics and artificial intelligence systems. R. Winfield, M JirotkaPhilosophical Trans. R. Soc. A. Nov. 2018; vol. 376(2133). Publisher Full Text Reference Source
128. Murugesan S, Murugesan S: The Rise of Agentic AI: Implications, Concerns, and the Path Forward. IEEE Intell. Syst. 2025; 40(2): 8–14. Publisher Full Text
129. Casper S, et al.: The AI Agent Index. ArXiv. 2025; abs/2502.0. Publisher Full Text
130. Samdani G, Paul K, Saldanha F: Agentic AI in the Age of Hyper-Automation. World J. Adv. Eng. Technol. Sci. Feb. 2023; 8(1): 416–427. Publisher Full Text
131. Bollineni PK: Revolutionizing Financial Management: The Role of Agentic AI in SAP Finance. J. Comput. Sci. Technol. Stud. Apr. 2025; 7(2): 473–482. Publisher Full Text
132. Pěchouček M, Mařík V: Industrial deployment of multi-agent technologies: review and selected case studies. Auton. Agent. Multi-Agent Syst. Dec. 2008; 17(3): 397–431. Publisher Full Text
133. Biswas D: Stateful Monitoring and Responsible Deployment of AI Agents. Int. Conf. Agents Artif. Intell. 2025; 1: 393–399. Publisher Full Text
134. Ahmed N, Hossain ME, Hossain Z, et al.: Understanding the Capabilities and Implications of Agentic AI in Surveillance Systems. Indones. J. Adv. Res. Jan. 2025; 4(1): 91–110. Publisher Full Text
135. Khowaja SA, Dev K, Pathan MS, et al.: Integration of Agentic AI with 6G Networks for Mission-Critical Applications: Use-case and Challenges. ArXiv. 2025; abs/2502.1. Publisher Full Text
136. Acharya DB, Kuppan K, Divya B: Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey. IEEE Access. 2025; 13: 18912–18936. Publisher Full Text
137. Madireddy RR: Security Implications of Fully Autonomous Process Agents in Enterprise Workflows. J. Comput. Sci. Technol. Stud. May 2025; 7(3): 165–171. Publisher Full Text
138. Le Jeune P, Liu J, Rossi L, et al.: RealHarm: A Collection of Real-World Language Model Application Failures. ArXiv. 2025; abs/2504.1. Publisher Full Text
139. Ortega A: A proposal for an incident regime that tracks and counters threats to national security posed by AI systems. ArXiv. 2025; abs/2503.1. Publisher Full Text
140. Hammond L, et al.: Multi-Agent Risks from Advanced AI. ArXiv. 2025; abs/2502.1. Publisher Full Text
141. McGregor S: Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database. ArXiv. 2021; abs/2011.08512: 15458–15463. Publisher Full Text
142. Samdani G, Paul K, Saldanha F: Serverless architectures for agentic AI deployment. World J. Adv. Eng. Technol. Sci. Dec. 2022; 7(2): 320–333. Publisher Full Text
143. Li C: Future Trends and Technological Innovations of Private AI Deployment. Sci. Technol. Soc. Dev. Proc. Ser. Sep. 2024; 1: 1–14. Publisher Full Text
144. Khanna K: Proactive fraud detection: Safeguarding customers with agentic AI. Int. J. Multidiscip. Res. Growth Eval. 2024; 5(6): 1523–1531. Publisher Full Text
145. Pauloski JG, Babuji Y, Chard R, et al.: Empowering Scientific Workflows with Federated Agents. JG Pauloski, Y Babuji, R Chard, M Sak. K Chard, I Foster. Prepr. arXiv2505.05428, 2025•arxiv.org. 2025. Accessed: Aug. 04, 2025. Reference Source Reference Source
146. Rosenberg LB: The Manipulation Problem: Conversational AI as a Threat to Epistemic Agency. ArXiv. 2023; abs/2306.1. Publisher Full Text
147. Solano-Kamaiko IR, et al.Who is running it?’ Towards Equitable AI Deployment in Home Care Work Proc. 2025 CHI Conf. Hum. Factors Comput. Syst. Apr. 2025. Publisher Full Text
148. Fiaschetti A, Suraci V, Priscoli FD: The SHIELD framework: How to control Security, Privacy and Dependability in complex systems 2012 Complex. Eng. (COMPENG). Proc. 2012; pp. 1–4. Publisher Full Text
149. Simran SK, Hans A: The AI Shield and Red AI Framework: Machine Learning Solutions for Cyber Threat Intelligence (CTI) 2024 Int. Conf. Intell. Syst. Cybersecurity. 2024; pp. 1–6. Publisher Full Text
150. Bashir N, Zafar MZ: AI-Powered Cyberattacks: Impacts and Defense Strategies. World J. Adv. Res. Rev. Mar. 2025; 25(3): 510–512. Publisher Full Text
151. Delli Priscoli F, et al.: Ensuring cyber-security in smart railway surveillance with SHIELD. Int. J. Crit. Comput. Based Syst. 2017; 7(2): 138–170. Publisher Full Text
152. Chokkanathan K, Karpagavalli SM, Priyanka G, et al.: AI-Driven Zero Trust Architecture: Enhancing Cyber-Security Resilience. 2024 8th Int. Conf. Comput. Syst. Inf. Technol. Sustain. Solut. 2024; pp. 1–6. Publisher Full Text
153. Gurram A: Generative AI for enhanced cybersecurity: building a zero-trust architecture with agentic AI. World J. Adv. Eng. Technol. Sci. Apr. 2025; 15(1): 2380–2396. Publisher Full Text
154. Shah H, Shah M: AI-driven adaptive authentication for zero trust security architectures. Int. J. Sci. Res. Arch. Mar. 2025; 14(3): 705–712. Publisher Full Text
155. Paul EM, Kessie JD, Salawudeen MD: Zero trust architecture and AI: A synergistic approach to next-generation cybersecurity frameworks. Int. J. Sci. Res. Arch. Dec. 2024; 13(2): 4159–4169. Publisher Full Text
156. Obbu S: Zero trust architecture for AI-powered cloud systems: Securing the future of automated workloads. World J. Adv. Res. Rev. Apr. 2025; 26(1): 1315–1339. Publisher Full Text
157. Zhang K, Xu S, Shin B: Towards Adaptive Zero Trust Model for Secure AI 2023 IEEE Conf. Commun. Netw. Secur. 2023; pp. 1–2. Publisher Full Text
158. Syros G, Suri A, Nita-Rotaru C, et al.: SAGA: A Security Architecture for Governing AI Agentic Systems. ArXiv. 2025; abs/2504.2. Publisher Full Text
159. Onteddu AR, Koehler S, Kundavaram RR, et al.: Artificial Intelligence in Zero-Knowledge Proofs: Transforming Privacy in Cryptographic Protocols. Eng. Int. 2024; 12(1): 51–66. Publisher Full Text
160. Loevenich JF, et al.: Towards Robust and Secure Autonomous Cyber Defense Agents in Coalition Networks MILCOM 2024-2024 IEEE Mil. Commun. Conf. 2024; pp. 152–157. Publisher Full Text
161. Theron P, et al.: Towards an active, autonomous and intelligent cyber defense of military systems: The NATO AICA reference architecture 2018 Int. Conf. Mil. Commun. Inf. Syst. Jun. 2018; pp. 1–9. Publisher Full Text
162. Kurra P: Securing the cloud with AI: The future of autonomous threat defense. World J. Adv. Res. Rev. Apr. 2025; 26(1): 756–762. Publisher Full Text
163. Chakrabarty PK: Adversarial Attacks on Agentic AI Systems: Mechanisms, Impacts, and Defense Strategies. Int. J. Sci. Res. Apr. 2025; 14(4): 1367–1369. Publisher Full Text
164. Syros G, Suri A, Nita-Rotaru C, et al.: SAGA: A Security Architecture for Governing AI Agentic Systems.Apr. 2025. Reference Source
165. Loevenich JF, et al.: Training Autonomous Cyber Defense Agents: Challenges & Opportunities in Military Networks MILCOM 2024-2024 IEEE Mil. Commun. Conf. 2024; pp. 158–163. Publisher Full Text
166. Mechergui M, Sreedharan S: Goal Alignment: A Human-Aware Account of Value Alignment Problem. ArXiv. 2023; abs/2302.0. Publisher Full Text
167. Carroll M, Foote D, Siththaranjan A, et al.: AI Alignment with Changing and Influenceable Reward Functions. ArXiv. 2024; abs/2405.1. Publisher Full Text
168. Zhuang S; D. H.-M. Neural: Consequences of misaligned AI. proceedings.neurips.cc. 2021. Accessed: Aug. 04, 2025. Reference Source
169. Mechergui M, Neural SS: Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch. proceedings.neurips.ccM Mechergui, S SreedharanAdvances Neural Inf. Process. Syst. 2024•proceedings.neurips.cc. 2024. Accessed: Aug. 04, 2025. Reference Source
170. Sun Z, et al.: SALMON: SELF-ALIGNMENT WITH INSTRUCTABLE REWARD MODELS 12th Int. Conf. Learn. Represent. ICLR 2024. 2024. Accessed: Aug. 04, 2025. https://arxiv.org/pdf/2310.05910
171. Singh S: AI Alignment: Ensuring AI Objectives Match Human Values. Int. J. Sci. Res. Eng. Manag. Apr. 2025; 09(04): 1–9. Publisher Full Text
172. Jones B, Stemmler K, Su E, et al.: Users’ Expectations and Practices with Agent Memory. Proc. Ext. Abstr. CHI Conf. Hum. Factors Comput. Syst. 2025 Apr.. Publisher Full Text
173. DeChant C: On the risks and benefits of episodic memory in AI agents.2023. Accessed: Aug. 05, 2025.
174. Ganguli A, Deb P, Banerjee D: MARK: Memory Augmented Refinement of Knowledge.May 2025. Accessed: Aug. 05, 2025 https://arxiv.org/pdf/2505.05177
175. Vaithilingam P, et al.: Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale. ArXiv. 2025; abs/2504.0. Publisher Full Text
176. Rasmussen P, Paliychuk P, Beauvais T, et al.: Zep: A Temporal Knowledge Graph Architecture for Agent Memory. ArXiv. 2025; abs/2501.1. Publisher Full Text
177. Helmi T: Decentralizing AI Memory: SHIMI, a Semantic Hierarchical Memory Index for Scalable Agent Reasoning. ArXiv. 2025; abs/2504.0. Publisher Full Text
178. Kim M, Saad W: Analysis of the Memorization and Generalization Capabilities of AI Agents: are Continual Learners Robust? ICASSP 2024-2024 IEEE Int. Conf. Acoust. Speech Signal Process. 2024; pp. 6840–6844. Publisher Full Text
179. Springer A: Making Transparency Clear: The Dual Importance of Explainability and Auditability.Sep. 09, 2023. Accessed: Aug. 05, 2025. Reference Source
180. Ehsan U, et al.: New Frontiers of Human-centered Explainable AI (HCXAI): Participatory Civic AI, Benchmarking LLMs, XAI Hallucinations, and Responsible AI Audits Proc. Ext. Abstr. CHI Conf. Hum. Factors Comput. Syst. Apr. 2025. Publisher Full Text
181. Balasubramaniam N, Kauppinen M, Rannisto A, et al.: Transparency and explainability of AI systems: From ethical guidelines to requirements. Inf. Softw. Technol. Jul. 2023; 159: 107197. Publisher Full Text
182. Waltersdorfer L, Sabou M: Leveraging Knowledge Graphs for AI System Auditing and Transparency. J. Web Semant. Jan. 2025; 84: 100849. Publisher Full Text
183. Nannini L: Habemus a Right to an Explanation: so What? - A Framework on Transparency-Explainability Functionality and Tensions in the EU AI Act. Proc. AAAI/ACM Conf. AI, Ethics, Soc. Oct. 2024; 7: 1023–1035. Publisher Full Text
184. Werz JM, Borowski E, Isenhardt I: Explainability as a means for transparency? Lay users’ requirements towards transparent AI. Cogn. Comput. Internet Things. 2024; 124. Publisher Full Text
185. Bustamante P, et al.: On the Governance of Federated Platforms. SSRN Electron. J. 2023. Publisher Full Text
186. Pauloski JG, Babuji Y, Chard R, et al.: Empowering Scientific Workflows with Federated Agents.May 2025. Accessed: Aug. 05, 2025. Reference Source
187. Panda M, Mukherjee S: Architecting Intelligent Decentralized Data Systems to Enable Analytics with Entropy-Aware Governance, Quantum Readiness and LLM-Driven Federation. Int. J. Database Manag. Syst. Apr. 2025; 17(1/2): 17–23. Publisher Full Text
188. Yilmaz E, Can O: Unveiling Shadows: Harnessing Artificial Intelligence for Insider Threat Detection. Eng. Technol. & Appl. Sci. Res. 2024; 14(2): 13341–13346. Publisher Full Text
189. Feng X, Zheng Z, Hu P, et al.: Stealthy attacks meets insider threats: A three-player game model MILCOM 2015-2015 IEEE Mil. Commun. Conf. Dec. 2015; vol. 2015-December: pp. 25–30. Publisher Full Text
190. Chen Y, Hu X, Yin K, et al.: Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks.Apr. 2025. Accessed: Aug. 05, 2025. Reference Source
191. Matthews G, Wohleber R, Lin J, et al.: Cognitive and Affective Eye Tracking Metrics for Detecting Insider Threat: A Study of Simulated Espionage. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2018; 62: 242–246. Publisher Full Text
192. Ioannidis J, Harper J, Quah MS, et al.: Gracenote.ai: Legal Generative AI for Regulatory Compliance. CEUR Workshop Proc. 2023; 3423: 20–31. Publisher Full Text
193. Fratrič P, Parizi MM, Sileno G, et al.: Do agents dream of abiding by the rules?: Learning norms via behavioral exploration and sparse human supervision. Proc. Ninet. Int. Conf. Artif. Intell. Law. 2023; 81–90. Publisher Full Text
194. Mahajan P: AI Family Integration Index (AFII): Benchmarking a New Global Readiness for AI as Family. ArXiv. 2025; abs/2503.2. Publisher Full Text
195. Labanieh MF, Yusoff ZM, Ayub ZA, et al.: THE ARTIFICIAL INTELLIGENCE (AI) READINESS IN ASEAN COUNTRIES: THE GOVERNMENT POLICIES AND FRAMEWORKS. ASEAN Leg. Insights. Dec. 2024; 1: 68–76. Publisher Full Text
196. Tun HM, Naing L, Malik OA, et al.: Navigating ASEAN Region Artificial Intelligence (AI) Governance Readiness in Healthcare. Heal. Policy Technol. Mar. 2025; 14(2): 100981. Publisher Full Text
197. UNESCO: Readiness assessment methodology: a tool of the Recommendation on the. UNESCO; Accessed: Aug. 05, 2025. Reference Source
198. Reuel A, Soder L; B. B.-F. I et al.: Position: Technical research and talent is needed for effective AI governance. A Reuel, L Soder, B Bucknall, TA UndheimForty-first Int. Conf. Mach. Learn. 2024•openreview.net. 2024. Accessed: Aug. 05, 2025. Reference Source Reference Source
199. Pihlakas R, Pyykkö J: From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks.Jul. 2025. Accessed: Aug. 05, 2025. Reference Source
200. Moshkovich D, Mulian H, Zeltyn S, et al.: Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems. ArXiv. 2025; abs/2503.0. Publisher Full Text
201. Davydova M, Jeffries D, Barker P, et al.: OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents.May 2025. Accessed: Aug. 05, 2025. Reference Source
202. Geng L, Chang EY: REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems. ArXiv. 2025; abs/2502.1. Publisher Full Text
203. Siegel ZS, Kapoor S, Nagdir N, et al.: CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark. ArXiv. 2024; abs/2409.1. Publisher Full Text
204. Clark B, et al.: EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods. ArXiv. 2024; abs/2405.1. Publisher Full Text
205. Jaiswal A, Mishra PC: ARTIFICIAL INTELLIGENCE (AI) AND CYBERSECURITY LAW: LEGAL ISSUES IN AI-DRIVEN CYBER DEFENSE AND OFFENSE. ShodhKosh J. Vis. Perform. Arts. Jun. 2024; 5(6). Publisher Full Text
206. Birkstedt T, Minkkinen M, Tandon A, et al.: AI governance: themes, knowledge gaps and future agendas. Internet Res. 2023; 33(7): 133–167. Publisher Full Text
207. Aryal S, et al.: Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery. ArXiv. 2024; abs/2404.0. Publisher Full Text
208. Clatterbuck H, Castro C, Mor’an AM: Risk Alignment in Agentic AI Systems. ArXiv. 2024; abs/2410.0. Publisher Full Text
209. Adabara I, Sadiq BO, Shuaibu AN, et al.: Trustworthy Agentic AI Systems: A Cross-Layer Review of Architectures, Threat Models, and Governance Strategies for Real-World Deployment. [Dataset]. Trust. Agentic AI Syst. A Cross-Layer Rev. Archit. Threat Model. Gov. Strateg. Real-World Deploy. Suppl. Data. Figshare. Aug. 2025. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 11 Sep 2025

Author details Author details

¹ Computing, Kampala International University - Western Campus, Bushenyi, Western Region, Uganda
² Electrical, Telecommunication, and Computer Engineering, Kampala International University - Western Campus, Bushenyi, Western Region, Uganda

IBRAHIM ADABARA
Roles: Conceptualization, Investigation, Project Administration, Writing – Original Draft Preparation, Writing – Review & Editing

Bashir Olaniyi Sadiq
Roles: Supervision, Validation, Writing – Review & Editing

Aliyu Nuhu Shuaibu
Roles: Methodology, Resources, Supervision, Writing – Review & Editing

Yale Ibrahim Danjuma
Roles: Formal Analysis, Methodology, Resources, Supervision, Visualization

Venkateswarlu Maninti
Roles: Conceptualization, Data Curation, Resources, Software, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 11 Sep 2025, 14:905

https://doi.org/10.12688/f1000research.169927.1

Copyright

© 2025 ADABARA I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

ADABARA I, Olaniyi Sadiq B, Nuhu Shuaibu A et al. Trustworthy agentic AI systems: a cross-layer review of architectures, threat models, and governance strategies for real-world deployment [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:905 (https://doi.org/10.12688/f1000research.169927.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 11 Sep 2025

Open Peer Review

Reviewer Status

AWAITING PEER REVIEW

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

[1] 1. Vanneste BS, Puranam P: Artificial Intelligence, Trust, and Perceptions of Agency. Acad. Manag. Rev. Mar. 2024. Publisher Full Text

[2] 2. Karamchand G: Zero trust and AI: A synergistic approach to next-generation cyber threat mitigation. World J. Adv. Res. Rev. Dec. 2024; 24(3): 3374–3387. Publisher Full Text

[3] 3. Conradie NH, Nagel SK: No Agent in the Machine: Being Trustworthy and Responsible about AI. Philos. & Technol. Jun. 2024; 37(2). Publisher Full Text

[4] 4. Freiman O: Making sense of the conceptual nonsense ‘trustworthy AI,’. AI Ethics. Nov. 2023; 3(4): 1351–1360. Publisher Full Text

[5] 5. Lahusen C, Maggetti M, Slavkovik M: Trust, trustworthiness and AI governance. Sci. Rep. Dec. 2024; 14(1): 20752. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Kumar A, Sharma A, Pujari M: AI Governance via Explainable Reinforcement Learning (XRL) for Adaptive Cyber Deception in Zero-Trust Networks. J. Inf. Syst. Eng. Manag. May 2025; 10(43s): 98–115. Publisher Full Text

[7] 7. Mintoo AA, Saimon ASM, Bakhsh MM, et al.: NATIONAL RESILIENCE THROUGH AI-DRIVEN DATA ANALYTICS AND CYBERSECURITY FOR REAL-TIME CRISIS RESPONSE AND INFRASTRUCTURE PROTECTION. Am. J. Sch. Res. Innov. Mar. 2022; 1(1): 137–169. Publisher Full Text

[8] 8. Antony JIP, Khalid PZM: Integrating Artificial Intelligence (AI) in Teaching and Learning. Int. J. Multidiscip. Res. Mar. 2024; 6(2). Publisher Full Text

[9] 9. Afroogh S, Akbari A, Malone E, et al.: Trust in AI: Progress, Challenges, and Future Directions. ArXiv. 2024; abs/2403.1. Publisher Full Text

[10] 10. Slosser JL, Aasa B, Olsen HP: Trustworthy AI. Technol. Regul. Oct. 2023; 2023: 58–68. Publisher Full Text

[11] 11. Herzog C, Blank S, Stahl BC: Towards trustworthy medical AI ecosystems - a proposal for supporting responsible innovation practices in AI-based medical innovation. AI Soc. Apr. 2025; 40(4): 2119–2139. Publisher Full Text

[12] 12. Budnik C: Can We Trust Artificial Intelligence? Philos. & Technol. Mar. 2025; 38(1). Publisher Full Text

[13] 13. Adhikari D, Thapaliya S: An Overview of AI Applications in Cybersecurity for IT Management. NPRC J. Multidiscip. Res. Oct. 2024; 1(4): 121–133. Publisher Full Text

[14] 14. Veritti D, Rubinato L, Sarao V, et al.: Behind the mask: a critical perspective on the ethical, moral, and legal implications of AI in ophthalmology. Graefes Arch. Clin. Exp. Ophthalmol. Mar. 2024; 262(3): 975–982. Publisher Full Text

[15] 15. Afzal MNI, Shohan AHN, Siddiqui S, et al.: Application of AI on Human Resource Management: A Review. J. Hum. Resour. Manag. - HR Adv. Dev. Aug. 2023; 2023(1): 1–11. Publisher Full Text

[16] 16. Smith GK: Strategic Integration of Generative AI: Opportunities, Challenges, and Organizational Impacts. Law, Econ. Soc. May 2025; 1(1): p156. Publisher Full Text

[17] 17. Byrne JA: Improving the peer review of narrative literature reviews. Res. Integr. Peer Rev. Dec. 2016; 1(1): 12. PubMed Abstract | Publisher Full Text | Free Full Text

[18] 18. Sapkota R, Roumeliotis KI, Karkee M: AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges.2025. Accessed: Aug. 02, 2025.

[19] 19. Singh A, Ehtesham A, Kumar S, et al.: Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model 2024 IEEE World AI IoT Congr. 2024; pp. 527–532. Publisher Full Text

[20] 20. Bousetouane F: Agentic Systems: A Guide to Transforming Industries with Vertical AI Agents. ArXiv. 2025; abs/2501.0. Publisher Full Text

[21] 21. Saleh A, Tarkoma S, Donta P: Usercentrix: An agentic memory-augmented ai framework for smart spaces. arxiv.org A Saleh, S Tarkoma, PK Donta, NH Motlagh, S Dustdar, S Pirttikangas, L LovénarXiv Prepr. arXiv2505.00472, 2025•arxiv.org. 2025. Accessed: Aug. 02, 2025. Reference Source

[22] 22. Dai L, Jiang YH, Chen Y, et al.: Agent4EDU: Advancing AI for Education with Agentic Workflows Proc. 2024 3rd Int. Conf. Artif. Intell. Educ. Apr. 2025; pp. 180–185. Publisher Full Text

[23] 23. Zhao P, Jin Z, Cheng N: An In-depth Survey of Large Language Model-based Artificial Intelligence Agents. ArXiv. 2023; abs/2309.1. Publisher Full Text

[24] 24. Saleh A, et al.: UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces.May 2025. Accessed: Aug. 02, 2025. Reference Source

[25] 25. Fourney A, et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. ArXiv. 2024; abs/2411.0. Publisher Full Text

[26] 26. Chawla C, Chatterjee S, Gadadinni SS, et al.: Agentic AI: The building blocks of sophisticated AI business applications. J. AI, Robot. & Work. Autom. Sep. 2024; 3(3): 196. Publisher Full Text

[27] 27. Manheim D: Overoptimization Failures and Specification Gaming in Multi-agent Systems. Big Data Cogn. Comput. 2019; 3(2): 1–15. Publisher Full Text

[28] 28. Tallam K: Alignment, Agency and Autonomy in Frontier AI: A Systems Engineering Perspective. ArXiv. 2025; abs/2503.0. Publisher Full Text

[29] 29. de Witt C : Open challenges in multi-agent security: Towards secure systems of interacting ai agents.2025. Accessed: Aug. 02, 2025. Reference Source Reference Source

[30] 30. Balachandar N, Dieter J, Ramachandran GS: Collaboration of AI Agents via Cooperative Multi-Agent Deep Reinforcement Learning.Jun. 2019. Accessed: Aug. 02, 2025. Reference Source

[31] 31. Chenna S: Exploring the Synergy of Generative and Distributed AI in Multi-agent Systems. SSRN Electron. J. 2023. Publisher Full Text

[32] 32. Manjunath Kamath K, Samata Mehta S, Akshaya HL, et al.: Neuromorphic-Driven Agentic AI for Autonomous Decision-Making Systems 2024 4th Int. Conf. Mob. Networks Wirel. Commun. 2024; pp. 1–8. Publisher Full Text

[33] 33. Balasubramani R, Biradar VG: Empowering Autonomous Decision-Making Through Quantum Reinforcement Learning and Cognitive Neuromorphic Frameworks 2024 4th Int. Conf. Mob. Networks Wirel. Commun. 2024; pp. 1–7. Publisher Full Text

[34] 34. Freire IT, Arsiwalla XD, Puigbò JY, et al.: Modeling Theory of Mind in Dyadic Games Using Adaptive Feedback Control. Inf. Aug. 2023; 14(8). Publisher Full Text

[35] 35. Ivanov D, Dütting P, Talgam-Cohen I, et al.: Principal-Agent Reinforcement Learning: Orchestrating AI Agents with Contracts.2024.

[36] 36. Freire I, Arsiwalla X; J. P. preprint arXiv, and undefined 2019: Modeling theory of mind in multi-agent games using adaptive feedback control. IT Freire, XD Arsiwalla, JY Puigbò, P VerschurearXiv Prepr. arXiv1905.13225, 2019•arxiv.org. 2019. Accessed: Aug. 02, 2025. Reference Source https://arxiv.org/abs/1905.13225

[37] 37. Thoom SR: Understanding Agentic Frameworks in AI Development: A Technical Analysis. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2025; 11(1): 518–527. Publisher Full Text

[38] 38. Langley P: An cognitive architectures and the construction of intelligent agents. P LangleyProc. Work. Intell. Agent Archit. 2004•cdn.aaai.org. 2024. Accessed: Aug. 02, 2025 Reference Source Reference Source

[39] 39. Slaoui S: S-AI: A Sparse Artificial Intelligence System Orchestrated by a Hormonal MetaAgent and Context-Aware Specialized Agents. Int. J. Multidiscip. Res. Apr. 2025; 7(2). Publisher Full Text

[40] 40. Liu B, et al.: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems. ArXiv. 2025; abs/2504.0. Publisher Full Text

[41] 41. Klejnowski L, Bernard Y, Hähner J, et al.: An Architecture for Trust-Adaptive Agents 2010 Fourth IEEE Int. Conf. Self-Adaptive Self-Organizing Syst. Work. 2010; pp. 178–183. Publisher Full Text

[42] 42. Satav A: Enterprise API & Platform Strategy in the era of Agentic AI. J. Comput. Sci. Technol. Stud. Mar. 2025; 7(1): 380–385. Publisher Full Text

[43] 43. Rakshit P, Konar A: Agents and Multi-agent Coordination.2018; 57–88. Publisher Full Text

[44] 44. Joshi S: Advancing innovation in financial stability: A comprehensive review of ai agent frameworks, challenges and applications. World J. Adv. Eng. Technol. Sci. Feb. 2025; 14(2): 117–126. Publisher Full Text

[45] 45. Lesser VR: Reflections on the Nature of Multi-Agent Coordination and Its Implications for an Agent Architecture. Auton. Agent. Multi-Agent Syst. 1998; 1(1): 89–111. Publisher Full Text

[46] 46. Du M, Zhang M, Hu Y, et al.: Blockchain for Distributed Consistency: A Cliquebased Framework for Multi-agent Systems 2021 7th Int. Conf. Big Data Inf. Anal. 2021; pp. 421–427. Publisher Full Text

[47] 47. Yang T, Liu Y, Yang X, et al.: A Blockchain based Smart Agent System Architecture Proc. 4th Int. Conf. Crowd Sci. Eng. Oct. 2019; pp. 33–39. Publisher Full Text

[48] 48. Pokhrel SR, Yang L, Rajasegarar S, Li G: Robust Zero Trust Architecture: Joint Blockchain based Federated learning and Anomaly Detection based Framework Proc. SIGCOMM Work. Zero Trust Archit. Next Gener. Commun. Aug. 2024; pp. 7–12. Publisher Full Text

[49] 49. Mishra S, Tandon DR: Federated Learning in Healthcare: A Path Towards Decentralized and Secure Medical Insights. INTERANTIONAL J. Sci. Res. Eng. Manag. Oct. 2024; 08(10): 1–15. Publisher Full Text

[50] 50. Kiran S, Kumar A, Chukkala S: Decentralized AI at the Edge: Federated Learning, Quantum Optimization and IoT Scalability. Int. J. Sci. Res. Arch. Mar. 2025; 14(3): 256–263. Publisher Full Text

[51] 51. Tariq A, et al.: Trustworthy Federated Learning: A Comprehensive Review, Architecture, Key Challenges, and Future Research Prospects. IEEE Open J. Commun. Soc. 2024; 5: 4920–4998. Publisher Full Text

[52] 52. Mabina A, Mbotho A: A Hybrid Framework for Securing 5G-Enabled Healthcare Systems. Stud. Med. Heal. Sci. Jan. 2025; 2(1). Publisher Full Text

[53] 53. Echezona U, Emmanuel I, Motilol OT: Analyzing Edge AI Deployment Challenges with in Hybrid IT Systems Utilizing Containerization and Blockchain-Based Data Provenance Solutions. Int. J. Sci. Res. Mod. Technol. 2024; 125–141. Publisher Full Text

[54] 54. Karim MM, Van DH, Khan S, et al.: AI Agents Meet Blockchain: A Survey on Secure and Scalable Collaboration for Multi-Agents. Futur. Internet. Feb. 2025; 17(2). Publisher Full Text

[55] 55. Liu Y, et al.: SharHSC: A Sharding-Based Hybrid State Channel to Realize Blockchain Scalability and Security. IEEE Trans. Dependable Secur. Comput. 2025; 22(3): 2705–2722. Publisher Full Text

[56] 56. Barros S: Trusted Identities for AI Agents: Leveraging Telco-Hosted eSIM Infrastructure. ArXiv. 2025; abs/2504.1. Publisher Full Text

[57] 57. Villegas-Ch W, Govea J, Gutierrez R, et al.: Optimizing Security in IoT Ecosystems Using Hybrid Artificial Intelligence and Blockchain Models: A Scalable and Efficient Approach for Threat Detection. IEEE Access. 2025; 13: 16933–16958. Publisher Full Text

[58] 58. de Witt CS : Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents.May 2025. Accessed: Aug. 05, 2025 Reference Source

[59] 59. Šekrst K: Chinese Chat Room: AI Hallucinations, Epistemology and Cognition. Stud. Logic, Gramm. Rhetor. Dec. 2024; 69(1): 365–381. Publisher Full Text

[60] 60. Tlaie A: Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks. ArXiv. 2024; abs/2410.1. Publisher Full Text

[61] 61. da Silva Oliveira DG : Exploring the Risks of General-Purpose AI: The Role of the Brain’s Reward Mechanism and Nearsighted Goals in Processes of Decision-Makings. Commun. Comput. Inf. Sci. 2025; 2134: 261–267. Publisher Full Text

[62] 62. Li H, Principe J: Speeding Up Reinforcement Learning by Exploiting Causality in Reward Sequences 2021 Int. Jt. Conf. Neural Networks. Jul. 2021; vol. 2021-July: pp. 1–6. Publisher Full Text

[63] 63. Patlan A, Sheng P, Hebbar S, et al.: Real ai agents with fake memories: Fatal context manipulation attacks on web3 agents.2025. 2025, Accessed: Aug. 03, 2025. Reference Source Reference Source

[64] 64. Zhang Y, Chen K, Jiang X, et al.: Towards Action Hijacking of Large Language Model-based Agent. ArXiv. 2024; abs/2412.1. Publisher Full Text

[65] 65. Asadi M, Ruadulescu R, Now’e A: Explainable AI Based Diagnosis of Poisoning Attacks in Evolutionary Swarms.2025. Publisher Full Text

[66] 66. Hossain MT, La H, Badsha S: RAMPART: Reinforcing Autonomous Multi-Agent Protection through Adversarial Resistance in Transportation. J. Auton. Transp. Syst. Dec. 2024; 1(4): 1–25. Publisher Full Text

[67] 67. Jiao R, et al.: CAN WE TRUST EMBODIED AGENTS? EXPLORING BACKDOOR ATTACKS AGAINST EMBODIED LLM-BASED DECISION-MAKING SYSTEMS.2025.

[68] 68. Pan X, Hahami E, Zhang Z, et al.: Memorization and Knowledge Injection in Gated LLMs. ArXiv. 2025; abs/2504.2. Publisher Full Text

[69] 69. Sengupta A: Securing the Autonomous Future A Comprehensive Analysis of Security Challenges and Mitigation Strategies for AI Agents. INTERANTIONAL J. Sci. Res. Eng. Manag. 2024; 08(12): 1–2. Publisher Full Text

[70] 70. Shi J, Yuan Z, Tie G, et al.: Prompt Injection Attack to Tool Selection in LLM Agents. ArXiv. 2025; abs/2504.1. Publisher Full Text

[71] 71. Rossi S, Michel AM, Mukkamala R, et al.: An Early Categorization of Prompt Injection Attacks on Large Language Models. ArXiv. 2024; abs/2402.0. Publisher Full Text

[72] 72. Lee D, Tiwari M: Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems. ArXiv. 2024; abs/2410.0. Publisher Full Text

[73] 73. Zhan Q, Liang Z, Ying Z, et al.: InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents.2024; 10471–10506. Publisher Full Text

[74] 74. Nakash I, Kour G, Uziel G, et al.: Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In.2024; 6484–6509. Publisher Full Text

[75] 75. Zhu K, Yang X, Wang J, et al.: MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents.2025. Accessed: Aug. 03, 2025 Reference Source

[76] 76. Narajala VS, Narayan O: Securing Agentic AI: A Comprehensive Threat Model and Mitigation Framework for Generative AI Agents. ArXiv. 2025; abs/2504.1. Publisher Full Text

[77] 77. Shaikh A, Oliveira D: Shadow-IT system and Insider Threat: Opportunity as a Situational Perspective Conf. Proc. - IEEE SOUTHEASTCON. Apr. 2019; vol. 2019-April. Publisher Full Text

[78] 78. Akello P: Volitional non-malicious insider threats: At the intersection of COVID-19, WFH and cloud-facilitated shadow-apps.2021. Accessed: Aug. 03, 2025. Reference Source

[79] 79. Wu Z, et al.: LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries. Z Wu, S Cho, U Mohammed, C Munoz, K Costa, X Guan, T King, Z Wang, E KazimarXiv Prepr. arXiv2505.08842, 2025•arxiv.org. 2025. Accessed: Aug. 03, 2025. Reference Source Reference Source

[80] 80. Cui X, Gasior W, Beaver J, et al.: ShadowNet: An Active Defense Infrastructure for Insider Cyber Attack Prevention Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 2012; vol. 7336 LNCS(PART 4): pp. 646–653. Publisher Full Text

[81] 81. Calzada I, Németh G, Al-Radhi MS: Trustworthy AI for Whom? GenAI Detection Techniques of Trust Through Decentralized Web3 Ecosystems. Big Data Cogn. Comput. Mar. 2025; 9(3). Publisher Full Text

[82] 82. Sutcliffe HR, Brown S: Trust and Soft Law for AI. IEEE Technol. Soc. Mag. Dec. 2021; 40(4): 14–24. Publisher Full Text

[83] 83. Zhang J, Bentahar J, Falcone R, et al.: Introduction to the Special Section on Trust and AI. ACM Trans. Internet Technol. Nov. 2019; 19(4): 1–3. Publisher Full Text

[84] 84. Yousseef A, et al.: Autonomous Vehicle Security: A Deep Dive into Threat Modeling. ArXiv. 2024; abs/2412.1. Publisher Full Text

[85] 85. Tallam K: Engineering Risk-Aware, Security-by-Design Frameworks for Assurance of Large-Scale Autonomous AI Models. K TallamarXiv Prepr. arXiv2505.06409, 2025•arxiv.org. 2025. Accessed: Aug. 03, 2025 Reference Source Reference Source

[86] 86. Lievin R, Jamont JP, Hely D: CLASA: a Cross-Layer Agent Security Architecture for networked embedded systems 2021 IEEE Int. Conf. Omni-Layer Intell. Syst. Aug. 2021; pp. 1–8. Publisher Full Text

[87] 87. Wang B, Wu Y, Guo N, et al.: A cross-layer attack path detection method for smart grid dynamics 2022 5th Int. Conf. Adv. Electron. Mater. Comput. Softw. Eng. 2022; pp. 142–146. Publisher Full Text

[88] 88. Cirillo M, Di Mauro M, Matta V, et al.: Cyber-Threat Propagation over Network-Slicing Architectures ICASSP 2022-2022 IEEE Int. Conf. Acoust. Speech Signal Process. 2022; vol. 2022-May: pp. 2984–2988. Publisher Full Text

[89] 89. Gandotra V, Archana Singhal A, Bedi P: Layered security architecture for threat management using multi-agent system. ACM SIGSOFT Softw. Eng. Notes. Sep. 2011; 36(5): 1–11. Publisher Full Text

[90] 90. Yao P, Jiang Z, Yan B, et al.: Bayesian and stochastic game joint approach for Cross-Layer optimal defensive Decision-Making in industrial Cyber-Physical systems. Inf. Sci. Mar. 2024; 662: 120216. Publisher Full Text

[91] 91. Paul EM, Stanley UM, Kessie JD, et al.: Adversarial machine learning in cybersecurity: Mitigating evolving threats in AI-powered defense systems. World J. Adv. Eng. Technol. Sci. Dec. 2023; 10(2): 309–325. Publisher Full Text

[92] 92. Moharir C, Kuppuraju SY, Patil S: Adversarial Machine Learning Defenses in AI-Enabled Cybersecurity Systems. Int. J. Multidiscip. Res. Apr. 2025; 7(2). Publisher Full Text

[93] 93. Peter I, et al.: Harnessing adversarial machine learning for advanced threat detection: AI-driven strategies in cybersecurity risk assessment and fraud prevention. Open Access Res. J. Sci. Technol. May 2024; 11(1): 001–004. Publisher Full Text

[94] 94. Jehan N, Ansari NM, Ashraf Z, et al.: Adversarial Machine Learning for Cyber security Defense: Detecting Model Evasion, Poisoning Attacks, and Enhancing the Robustness of AI Systems. Glob. Res. J. Nat. Sci. Technol. Apr. 2025; 3. Publisher Full Text

[95] 95. Pasupuleti MK: Securing AI-driven Infrastructure: Advanced Cybersecurity Frameworks for Cloud and Edge Computing Environments.Mar. 2025. Publisher Full Text

[96] 96. Chen S, et al.: Blockchain Enabled Intelligence of Federated Systems (BELIEFS): An attack-tolerant trustable distributed intelligence paradigm. Energy Rep. 2021; 7: 8900–8911. Publisher Full Text

[97] 97. Narajala VS, Huang K, Habler I: Securing GenAI Multi-Agent Systems Against Tool Squatting: A Zero Trust Registry-Based Approach. ArXiv. 2025; abs/2504.1. Publisher Full Text

[98] 98. Timmers P: Ethics of AI and Cybersecurity When Sovereignty is at Stake. Mind. Mach. 2019; 29(4): 635–645. Publisher Full Text

[99] 99. OECD: OECD Framework for the Classification of AI systems. OECD Digit. Econ. Pap. Feb. 2022; 323. Publisher Full Text

[100] 100. von Struensee S : Analyzing Dilemmas Posed by Artificial Intelligence and 4IR Technologies Requires using all Available Models, Including the Existing International Human Rights Framework and Principles of AI Ethics. SSRN Electron. J. Jul. 2021. Publisher Full Text

[101] 101. Cancela-Outeda C: The EU’s AI act: A framework for collaborative governance. Internet Things. Oct. 2024; 27: 101291. Publisher Full Text

[102] 102. Gasser U: An EU landmark for AI governance. Science (80-.). Jun. 2023; 380(6651): 1203–1203. PubMed Abstract | Publisher Full Text

[103] 103. Priyanshu A, Maurya Y, Hong Z: AI Governance and Accountability: An Analysis of Anthropic’s Claude. ArXiv. 2024; abs/2407.0. Publisher Full Text

[104] 104. Wodi A: Artificial Intelligence (AI) Governance: An Overview. SSRN Electron. J. 2024. Publisher Full Text

[105] 105. Chaffer TJ, Goldston J, Okusanya B, et al.: Decentralized Governance of Autonomous AI Agents. Probl. Polit. Auth. 2013; 81–100. Publisher Full Text

[106] 106. Rebera AP: Reactive Attitudes and AI-Agents – Making Sense of Responsibility and Control Gaps. Philos. & Technol. Dec. 2024; 37(4). Publisher Full Text

[107] 107. Kasirzadeh A, Gabriel I: Characterizing AI Agents for Alignment and Governance. ArXiv. 2025; abs/2504.2. Publisher Full Text

[108] 108. Mukherjee A, Chang H: Agentic AI: Autonomy, Accountability, and the Algorithmic Society. ArXiv. 2025; abs/2502.0. Publisher Full Text

[109] 109. Chaffer TJ, Bayo JG, Gemach O, et al.: On the ETHOS of AI Agents: An Ethical Technology and Holistic Oversight System. ArXiv. 2024; abs/2412.1. Publisher Full Text

[110] 110. Huang K, Narajala V, Habler I, et al.: Agent name service (ans): A universal directory for secure ai agent discovery and interoperability. K Huang, VS Narajala, I Habler, A SheriffarXiv Prepr. arXiv2505.10609, 2025•arxiv.org. 2025. 2025, Accessed: Aug. 04, 2025. Reference Source https://arxiv.org/abs/2505.10609

[111] 111. OECD: Advancing accountability in AI. OECD Digit. Econ. Pap. Feb. 2023; 349. Publisher Full Text

[112] 112. Markovic M, Naja I; P. E.-C. W et al.: The accountability fabric: A suite of semantic tools for managing ai system accountability and audit. aura.abdn.ac.ukM Markovic, I Naja, P Edwards, W PangCEUR Work. Proceedings, 2021•aura.abdn.ac.uk. 2021. 2021, Accessed: Aug. 04, 2025. Reference Source

[113] 113. Baldoni M, Baroglio C, Micalizio R, et al.: Accountability in multi-agent organizations: from conceptual design to agent programming. Auton. Agent. Multi-Agent Syst. 2023; 37(1). Publisher Full Text

[114] 114. Mont MC: Privacy-Aware Identity Lifecycle Management. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 2011; 6545: 397–426. Publisher Full Text

[115] 115. Hariharan R: AI-Driven Identity and Access Management in Enterprise Systems. Int. J. IoT. May 2025; 05(01): 62–94. Publisher Full Text

[116] 116. van de Poel I : Embedding Values in Artificial Intelligence (AI) Systems. Mind. Mach. Sep. 2020; 30(3): 385–409. Publisher Full Text

[117] 117. Hayashi H, et al.: Multi-agent online planning architecture for real-time compliance. H Hayashi, T Mitsikas, YS Taheri, K Tsushima, R Schäfermeier, G Bourgne, JG Ganascia17th Int. Rule Chall. 7th Dr. …, 2023•hal.sorbonne-universite.fr. 2023. Accessed: Aug. 04, 2025. Reference Source Reference Source

[118] 118. Del Carmen Fernández Martínez M, Fernández A: AI in Recruiting. Multi-agent Systems Architecture for Ethical and Legal Auditing. IJCAI Int. Jt. Conf. Artif. Intell. 2019; 2019-August: 6428–6429. Publisher Full Text

[119] 119. Laukyte M: AI as a Legal Person. Proc. Seventeenth Int. Conf. Artif. Intell. Law. Jun. 2019; 209–213. Publisher Full Text

[120] 120. Fernández C, Fernández A: Inclusive AI in Recruiting. Multi-agent Systems Architecture for Ethical and Legal Auditing. Commun. Comput. Inf. Sci. 2019; 1047: 326–329. Publisher Full Text

[121] 121. Tang C: AI and big data in economic regulation: A comparative analysis of China and the United States. Appl. Comput. Eng. Jul. 2024; 69(1): 78–84. Publisher Full Text

[122] 122. Bhatta NP: Governance Models in Education: Insights for Nepal’s Federal Education System. AMC J. 2024; 5(1): 34–52. Publisher Full Text

[123] 123. Hafid A, Hocine R, Guezouli L, et al.: Centralized and Decentralized Federated Learning in Autonomous Swarm Robots: Approaches, Algorithms, Optimization Criteria and Challenges: The Sixth Edition of International Conference on Pattern Analysis and Intelligent Systems (PAIS’24) 2024 6th Int. Conf. Pattern Anal. Intell. Syst. 2024; pp. 1–8. Publisher Full Text

[124] 124. Araujo-Vizuete G, Robalino-López A: A Systematic Roadmap for Energy Transition: Bridging Governance and Community Engagement in Ecuador. Smart Cities. May 2025; 8(3): 80. Publisher Full Text

[125] 125. Tallam K: Transforming Cyber Defense: Harnessing Agentic and Frontier AI for Proactive, Ethical Threat Intelligence. ArXiv. 2025; abs/2503.0. Publisher Full Text

[126] 126. Balkin J: The Path of Robotics Law.2015; 6. Publisher Full Text

[127] 127. Winfield J: Ethical governance is essential to building trust in robotics and artificial intelligence systems. R. Winfield, M JirotkaPhilosophical Trans. R. Soc. A. Nov. 2018; vol. 376(2133). Publisher Full Text Reference Source

[128] 128. Murugesan S, Murugesan S: The Rise of Agentic AI: Implications, Concerns, and the Path Forward. IEEE Intell. Syst. 2025; 40(2): 8–14. Publisher Full Text

[129] 129. Casper S, et al.: The AI Agent Index. ArXiv. 2025; abs/2502.0. Publisher Full Text

[130] 130. Samdani G, Paul K, Saldanha F: Agentic AI in the Age of Hyper-Automation. World J. Adv. Eng. Technol. Sci. Feb. 2023; 8(1): 416–427. Publisher Full Text

[131] 131. Bollineni PK: Revolutionizing Financial Management: The Role of Agentic AI in SAP Finance. J. Comput. Sci. Technol. Stud. Apr. 2025; 7(2): 473–482. Publisher Full Text

[132] 132. Pěchouček M, Mařík V: Industrial deployment of multi-agent technologies: review and selected case studies. Auton. Agent. Multi-Agent Syst. Dec. 2008; 17(3): 397–431. Publisher Full Text

[133] 133. Biswas D: Stateful Monitoring and Responsible Deployment of AI Agents. Int. Conf. Agents Artif. Intell. 2025; 1: 393–399. Publisher Full Text

[134] 134. Ahmed N, Hossain ME, Hossain Z, et al.: Understanding the Capabilities and Implications of Agentic AI in Surveillance Systems. Indones. J. Adv. Res. Jan. 2025; 4(1): 91–110. Publisher Full Text

[135] 135. Khowaja SA, Dev K, Pathan MS, et al.: Integration of Agentic AI with 6G Networks for Mission-Critical Applications: Use-case and Challenges. ArXiv. 2025; abs/2502.1. Publisher Full Text

[136] 136. Acharya DB, Kuppan K, Divya B: Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey. IEEE Access. 2025; 13: 18912–18936. Publisher Full Text

[137] 137. Madireddy RR: Security Implications of Fully Autonomous Process Agents in Enterprise Workflows. J. Comput. Sci. Technol. Stud. May 2025; 7(3): 165–171. Publisher Full Text

[138] 138. Le Jeune P, Liu J, Rossi L, et al.: RealHarm: A Collection of Real-World Language Model Application Failures. ArXiv. 2025; abs/2504.1. Publisher Full Text

[139] 139. Ortega A: A proposal for an incident regime that tracks and counters threats to national security posed by AI systems. ArXiv. 2025; abs/2503.1. Publisher Full Text

[140] 140. Hammond L, et al.: Multi-Agent Risks from Advanced AI. ArXiv. 2025; abs/2502.1. Publisher Full Text

[141] 141. McGregor S: Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database. ArXiv. 2021; abs/2011.08512: 15458–15463. Publisher Full Text

[142] 142. Samdani G, Paul K, Saldanha F: Serverless architectures for agentic AI deployment. World J. Adv. Eng. Technol. Sci. Dec. 2022; 7(2): 320–333. Publisher Full Text

[143] 143. Li C: Future Trends and Technological Innovations of Private AI Deployment. Sci. Technol. Soc. Dev. Proc. Ser. Sep. 2024; 1: 1–14. Publisher Full Text

[144] 144. Khanna K: Proactive fraud detection: Safeguarding customers with agentic AI. Int. J. Multidiscip. Res. Growth Eval. 2024; 5(6): 1523–1531. Publisher Full Text

[145] 145. Pauloski JG, Babuji Y, Chard R, et al.: Empowering Scientific Workflows with Federated Agents. JG Pauloski, Y Babuji, R Chard, M Sak. K Chard, I Foster. Prepr. arXiv2505.05428, 2025•arxiv.org. 2025. Accessed: Aug. 04, 2025. Reference Source Reference Source

[146] 146. Rosenberg LB: The Manipulation Problem: Conversational AI as a Threat to Epistemic Agency. ArXiv. 2023; abs/2306.1. Publisher Full Text

[147] 147. Solano-Kamaiko IR, et al.Who is running it?’ Towards Equitable AI Deployment in Home Care Work Proc. 2025 CHI Conf. Hum. Factors Comput. Syst. Apr. 2025. Publisher Full Text

[148] 148. Fiaschetti A, Suraci V, Priscoli FD: The SHIELD framework: How to control Security, Privacy and Dependability in complex systems 2012 Complex. Eng. (COMPENG). Proc. 2012; pp. 1–4. Publisher Full Text

[149] 149. Simran SK, Hans A: The AI Shield and Red AI Framework: Machine Learning Solutions for Cyber Threat Intelligence (CTI) 2024 Int. Conf. Intell. Syst. Cybersecurity. 2024; pp. 1–6. Publisher Full Text

[150] 150. Bashir N, Zafar MZ: AI-Powered Cyberattacks: Impacts and Defense Strategies. World J. Adv. Res. Rev. Mar. 2025; 25(3): 510–512. Publisher Full Text

[151] 151. Delli Priscoli F, et al.: Ensuring cyber-security in smart railway surveillance with SHIELD. Int. J. Crit. Comput. Based Syst. 2017; 7(2): 138–170. Publisher Full Text

[152] 152. Chokkanathan K, Karpagavalli SM, Priyanka G, et al.: AI-Driven Zero Trust Architecture: Enhancing Cyber-Security Resilience. 2024 8th Int. Conf. Comput. Syst. Inf. Technol. Sustain. Solut. 2024; pp. 1–6. Publisher Full Text

[153] 153. Gurram A: Generative AI for enhanced cybersecurity: building a zero-trust architecture with agentic AI. World J. Adv. Eng. Technol. Sci. Apr. 2025; 15(1): 2380–2396. Publisher Full Text

[154] 154. Shah H, Shah M: AI-driven adaptive authentication for zero trust security architectures. Int. J. Sci. Res. Arch. Mar. 2025; 14(3): 705–712. Publisher Full Text

[155] 155. Paul EM, Kessie JD, Salawudeen MD: Zero trust architecture and AI: A synergistic approach to next-generation cybersecurity frameworks. Int. J. Sci. Res. Arch. Dec. 2024; 13(2): 4159–4169. Publisher Full Text

[156] 156. Obbu S: Zero trust architecture for AI-powered cloud systems: Securing the future of automated workloads. World J. Adv. Res. Rev. Apr. 2025; 26(1): 1315–1339. Publisher Full Text

[157] 157. Zhang K, Xu S, Shin B: Towards Adaptive Zero Trust Model for Secure AI 2023 IEEE Conf. Commun. Netw. Secur. 2023; pp. 1–2. Publisher Full Text

[158] 158. Syros G, Suri A, Nita-Rotaru C, et al.: SAGA: A Security Architecture for Governing AI Agentic Systems. ArXiv. 2025; abs/2504.2. Publisher Full Text

[159] 159. Onteddu AR, Koehler S, Kundavaram RR, et al.: Artificial Intelligence in Zero-Knowledge Proofs: Transforming Privacy in Cryptographic Protocols. Eng. Int. 2024; 12(1): 51–66. Publisher Full Text

[160] 160. Loevenich JF, et al.: Towards Robust and Secure Autonomous Cyber Defense Agents in Coalition Networks MILCOM 2024-2024 IEEE Mil. Commun. Conf. 2024; pp. 152–157. Publisher Full Text

[161] 161. Theron P, et al.: Towards an active, autonomous and intelligent cyber defense of military systems: The NATO AICA reference architecture 2018 Int. Conf. Mil. Commun. Inf. Syst. Jun. 2018; pp. 1–9. Publisher Full Text

[162] 162. Kurra P: Securing the cloud with AI: The future of autonomous threat defense. World J. Adv. Res. Rev. Apr. 2025; 26(1): 756–762. Publisher Full Text

[163] 163. Chakrabarty PK: Adversarial Attacks on Agentic AI Systems: Mechanisms, Impacts, and Defense Strategies. Int. J. Sci. Res. Apr. 2025; 14(4): 1367–1369. Publisher Full Text

[164] 164. Syros G, Suri A, Nita-Rotaru C, et al.: SAGA: A Security Architecture for Governing AI Agentic Systems.Apr. 2025. Reference Source

[165] 165. Loevenich JF, et al.: Training Autonomous Cyber Defense Agents: Challenges & Opportunities in Military Networks MILCOM 2024-2024 IEEE Mil. Commun. Conf. 2024; pp. 158–163. Publisher Full Text

[166] 166. Mechergui M, Sreedharan S: Goal Alignment: A Human-Aware Account of Value Alignment Problem. ArXiv. 2023; abs/2302.0. Publisher Full Text

[167] 167. Carroll M, Foote D, Siththaranjan A, et al.: AI Alignment with Changing and Influenceable Reward Functions. ArXiv. 2024; abs/2405.1. Publisher Full Text

[168] 168. Zhuang S; D. H.-M. Neural: Consequences of misaligned AI. proceedings.neurips.cc. 2021. Accessed: Aug. 04, 2025. Reference Source

[169] 169. Mechergui M, Neural SS: Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch. proceedings.neurips.ccM Mechergui, S SreedharanAdvances Neural Inf. Process. Syst. 2024•proceedings.neurips.cc. 2024. Accessed: Aug. 04, 2025. Reference Source

[170] 170. Sun Z, et al.: SALMON: SELF-ALIGNMENT WITH INSTRUCTABLE REWARD MODELS 12th Int. Conf. Learn. Represent. ICLR 2024. 2024. Accessed: Aug. 04, 2025. https://arxiv.org/pdf/2310.05910

[171] 171. Singh S: AI Alignment: Ensuring AI Objectives Match Human Values. Int. J. Sci. Res. Eng. Manag. Apr. 2025; 09(04): 1–9. Publisher Full Text

[172] 172. Jones B, Stemmler K, Su E, et al.: Users’ Expectations and Practices with Agent Memory. Proc. Ext. Abstr. CHI Conf. Hum. Factors Comput. Syst. 2025 Apr.. Publisher Full Text

[173] 173. DeChant C: On the risks and benefits of episodic memory in AI agents.2023. Accessed: Aug. 05, 2025.

[174] 174. Ganguli A, Deb P, Banerjee D: MARK: Memory Augmented Refinement of Knowledge.May 2025. Accessed: Aug. 05, 2025 https://arxiv.org/pdf/2505.05177

[175] 175. Vaithilingam P, et al.: Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale. ArXiv. 2025; abs/2504.0. Publisher Full Text

[176] 176. Rasmussen P, Paliychuk P, Beauvais T, et al.: Zep: A Temporal Knowledge Graph Architecture for Agent Memory. ArXiv. 2025; abs/2501.1. Publisher Full Text

[177] 177. Helmi T: Decentralizing AI Memory: SHIMI, a Semantic Hierarchical Memory Index for Scalable Agent Reasoning. ArXiv. 2025; abs/2504.0. Publisher Full Text

[178] 178. Kim M, Saad W: Analysis of the Memorization and Generalization Capabilities of AI Agents: are Continual Learners Robust? ICASSP 2024-2024 IEEE Int. Conf. Acoust. Speech Signal Process. 2024; pp. 6840–6844. Publisher Full Text

[179] 179. Springer A: Making Transparency Clear: The Dual Importance of Explainability and Auditability.Sep. 09, 2023. Accessed: Aug. 05, 2025. Reference Source

[180] 180. Ehsan U, et al.: New Frontiers of Human-centered Explainable AI (HCXAI): Participatory Civic AI, Benchmarking LLMs, XAI Hallucinations, and Responsible AI Audits Proc. Ext. Abstr. CHI Conf. Hum. Factors Comput. Syst. Apr. 2025. Publisher Full Text

[181] 181. Balasubramaniam N, Kauppinen M, Rannisto A, et al.: Transparency and explainability of AI systems: From ethical guidelines to requirements. Inf. Softw. Technol. Jul. 2023; 159: 107197. Publisher Full Text

[182] 182. Waltersdorfer L, Sabou M: Leveraging Knowledge Graphs for AI System Auditing and Transparency. J. Web Semant. Jan. 2025; 84: 100849. Publisher Full Text

[183] 183. Nannini L: Habemus a Right to an Explanation: so What? - A Framework on Transparency-Explainability Functionality and Tensions in the EU AI Act. Proc. AAAI/ACM Conf. AI, Ethics, Soc. Oct. 2024; 7: 1023–1035. Publisher Full Text

[184] 184. Werz JM, Borowski E, Isenhardt I: Explainability as a means for transparency? Lay users’ requirements towards transparent AI. Cogn. Comput. Internet Things. 2024; 124. Publisher Full Text

[185] 185. Bustamante P, et al.: On the Governance of Federated Platforms. SSRN Electron. J. 2023. Publisher Full Text

[186] 186. Pauloski JG, Babuji Y, Chard R, et al.: Empowering Scientific Workflows with Federated Agents.May 2025. Accessed: Aug. 05, 2025. Reference Source

[187] 187. Panda M, Mukherjee S: Architecting Intelligent Decentralized Data Systems to Enable Analytics with Entropy-Aware Governance, Quantum Readiness and LLM-Driven Federation. Int. J. Database Manag. Syst. Apr. 2025; 17(1/2): 17–23. Publisher Full Text

[188] 188. Yilmaz E, Can O: Unveiling Shadows: Harnessing Artificial Intelligence for Insider Threat Detection. Eng. Technol. & Appl. Sci. Res. 2024; 14(2): 13341–13346. Publisher Full Text

[189] 189. Feng X, Zheng Z, Hu P, et al.: Stealthy attacks meets insider threats: A three-player game model MILCOM 2015-2015 IEEE Mil. Commun. Conf. Dec. 2015; vol. 2015-December: pp. 25–30. Publisher Full Text

[190] 190. Chen Y, Hu X, Yin K, et al.: Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks.Apr. 2025. Accessed: Aug. 05, 2025. Reference Source

[191] 191. Matthews G, Wohleber R, Lin J, et al.: Cognitive and Affective Eye Tracking Metrics for Detecting Insider Threat: A Study of Simulated Espionage. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2018; 62: 242–246. Publisher Full Text

[192] 192. Ioannidis J, Harper J, Quah MS, et al.: Gracenote.ai: Legal Generative AI for Regulatory Compliance. CEUR Workshop Proc. 2023; 3423: 20–31. Publisher Full Text

[193] 193. Fratrič P, Parizi MM, Sileno G, et al.: Do agents dream of abiding by the rules?: Learning norms via behavioral exploration and sparse human supervision. Proc. Ninet. Int. Conf. Artif. Intell. Law. 2023; 81–90. Publisher Full Text

[194] 194. Mahajan P: AI Family Integration Index (AFII): Benchmarking a New Global Readiness for AI as Family. ArXiv. 2025; abs/2503.2. Publisher Full Text

[195] 195. Labanieh MF, Yusoff ZM, Ayub ZA, et al.: THE ARTIFICIAL INTELLIGENCE (AI) READINESS IN ASEAN COUNTRIES: THE GOVERNMENT POLICIES AND FRAMEWORKS. ASEAN Leg. Insights. Dec. 2024; 1: 68–76. Publisher Full Text

[196] 196. Tun HM, Naing L, Malik OA, et al.: Navigating ASEAN Region Artificial Intelligence (AI) Governance Readiness in Healthcare. Heal. Policy Technol. Mar. 2025; 14(2): 100981. Publisher Full Text

[197] 197. UNESCO: Readiness assessment methodology: a tool of the Recommendation on the. UNESCO; Accessed: Aug. 05, 2025. Reference Source

[198] 198. Reuel A, Soder L; B. B.-F. I et al.: Position: Technical research and talent is needed for effective AI governance. A Reuel, L Soder, B Bucknall, TA UndheimForty-first Int. Conf. Mach. Learn. 2024•openreview.net. 2024. Accessed: Aug. 05, 2025. Reference Source Reference Source

[199] 199. Pihlakas R, Pyykkö J: From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks.Jul. 2025. Accessed: Aug. 05, 2025. Reference Source

[200] 200. Moshkovich D, Mulian H, Zeltyn S, et al.: Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems. ArXiv. 2025; abs/2503.0. Publisher Full Text

[201] 201. Davydova M, Jeffries D, Barker P, et al.: OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents.May 2025. Accessed: Aug. 05, 2025. Reference Source

[202] 202. Geng L, Chang EY: REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems. ArXiv. 2025; abs/2502.1. Publisher Full Text

[203] 203. Siegel ZS, Kapoor S, Nagdir N, et al.: CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark. ArXiv. 2024; abs/2409.1. Publisher Full Text

[204] 204. Clark B, et al.: EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods. ArXiv. 2024; abs/2405.1. Publisher Full Text

[205] 205. Jaiswal A, Mishra PC: ARTIFICIAL INTELLIGENCE (AI) AND CYBERSECURITY LAW: LEGAL ISSUES IN AI-DRIVEN CYBER DEFENSE AND OFFENSE. ShodhKosh J. Vis. Perform. Arts. Jun. 2024; 5(6). Publisher Full Text

[206] 206. Birkstedt T, Minkkinen M, Tandon A, et al.: AI governance: themes, knowledge gaps and future agendas. Internet Res. 2023; 33(7): 133–167. Publisher Full Text

[207] 207. Aryal S, et al.: Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery. ArXiv. 2024; abs/2404.0. Publisher Full Text

[208] 208. Clatterbuck H, Castro C, Mor’an AM: Risk Alignment in Agentic AI Systems. ArXiv. 2024; abs/2410.0. Publisher Full Text

[209] 209. Adabara I, Sadiq BO, Shuaibu AN, et al.: Trustworthy Agentic AI Systems: A Cross-Layer Review of Architectures, Threat Models, and Governance Strategies for Real-World Deployment. [Dataset]. Trust. Agentic AI Syst. A Cross-Layer Rev. Archit. Threat Model. Gov. Strateg. Real-World Deploy. Suppl. Data. Figshare. Aug. 2025. Publisher Full Text