Keywords
DevSecOps, Lead time for changes, Generative AI, Retrieval-Augmented Generation (RAG), Reinforcement Learning from Human Feedback (RLHF), Continuous Integration/Continuous Delivery (CI/CD).
This article is included in the Software and Hardware Engineering gateway.
Despite increasing interest in generative artificial intelligence (AI) within DevSecOps environments, empirical evidence quantifying its impact on software delivery performance remains limited, particularly in regulated enterprise contexts. Lead time for changes is a core DevSecOps performance indicator, yet controlled evaluations of AI-augmented pipelines remain scarce. This study investigates whether on-premises generative AI integration can measurably reduce release lead time while preserving governance and quality controls.
A quasi-experimental within-team design was conducted across two consecutive two-week Scrum sprints in an enterprise environment developing internal sales, human resource, and biometric absence systems. Sprint 1 served as the baseline using a conventional DevSecOps pipeline. Sprint 2 introduced an AI-augmented pipeline integrating Retrieval-Augmented Generation (RAG) and Reinforcement Learning from Human Feedback (RLHF) within a GitLab–Docker CI/CD infrastructure. The primary outcome was lead time for changes. Secondary metrics included deployment frequency and change failure rate. Statistical analysis employed Welch’s t-test, effect size estimation (Cohen’s d), and confidence interval analysis.
A total of 42 distinct changes (21 per sprint) were analyzed. Mean lead time decreased by 39.2% during the intervention sprint (Welch’s t(32.4) = 4.28, p = 0.00014), with a large effect size (Cohen’s d = 1.32) and a 95% confidence interval indicating a reduction of 15.8–37.4 hours. Security scanning time decreased by 64.6%, and approval latency decreased by 48.5%. Deployment frequency increased by 61.9%, while change failure rate declined from 14.3% to 8.7%. AI recommendation acceptance improved from 62.4% in Week 1 to 78.6% in Week 2 and was positively correlated with lead-time reduction (r = 0.73, p < 0.05).
On-premises human-in-the-loop generative AI significantly reduced DevSecOps lead time without compromising reliability or governance. The findings challenge the traditional speed–security tradeoff by demonstrating that AI-assisted security validation and release evaluation can simultaneously enhance delivery efficiency and operational stability in regulated enterprise environments.
This study examines the influence of on-premises generative AI augmentation on DevSecOps release lead time within agile software development settings. Despite increasing interest in generative artificial intelligence (AI) within Development-Security-Operations (DevSecOps) environments, empirical evidence quantifying its impact on software delivery performance remains limited, particularly in regulated enterprise contexts. Lead time for changes is a core DevSecOps performance indicator, yet controlled evaluations of AI-augmented pipelines remain scarce. This study investigates whether on-premises generative AI integration can measurably reduce release lead time while preserving governance and quality controls. A quasi-experimental within-team design was conducted across two consecutive two-week Scrum sprints in an enterprise environment developing internal sales, human resource, and biometric absence systems. Sprint 1 served as the baseline using a conventional DevSecOps pipeline. Sprint 2 introduced an AI-augmented pipeline integrating Retrieval-Augmented Generation (RAG) and Reinforcement Learning from Human Feedback (RLHF) within a GitLab–Docker CI/CD infrastructure. The primary outcome was lead time for changes. Secondary metrics included deployment frequency and change failure rate. Statistical analysis employed Welch’s t-test, effect size estimation (Cohen’s d), and confidence interval analysis. A total of 42 distinct changes (21 per sprint) were analyzed. Mean lead time decreased by 39.2% during the intervention sprint (Welch’s t(32.4) = 4.28, p = 0.00014), with a large effect size (Cohen’s d = 1.32) and a 95% confidence interval indicating a reduction of 15.8–37.4 hours. Security scanning time decreased by 64.6%, and approval latency decreased by 48.5%. Deployment frequency increased by 61.9%, while change failure rate declined from 14.3% to 8.7%. AI recommendation acceptance improved from 62.4% in Week 1 to 78.6% in Week 2 and was positively correlated with lead-time reduction (r = 0.73, p < 0.05). On-premises human-in-the-loop generative AI significantly reduced DevSecOps lead time without compromising reliability or governance. The findings challenge the traditional speed–security tradeoff by demonstrating that AI-assisted DevSecOps validation and release evaluation can simultaneously enhance delivery efficiency and operational stability in regulated enterprise environments.
DevSecOps, Lead time for changes, Generative AI, Retrieval-Augmented Generation (RAG), Reinforcement Learning from Human Feedback (RLHF), Continuous Integration/Continuous Delivery (CI/CD).
Despite the growing interest in applying generative AI within DevOps and DevSecOps, existing research has largely focused on conceptual frameworks, developer productivity, and autonomous code generation, with limited empirical validation of delivery performance outcomes in enterprise contexts (Fu et al., 2025; Gajbhiye et al., 2024; Liang et al., 2024). In particular, there is a lack of quantitative evidence explaining how AI integration affects the release of time under real-world governance and compliance constraints (Azonuche & Enyejo, 2024; Bahi et al., 2024; Khan et al., 2024; Nadella et al., 2025). To address this gap, this study presents a quasi-experimental evaluation of on-premises human-in-the-loop generative AI augmentation in an Agile DevSecOps pipeline (Jeong, 2023; Singh et al., 2025; Zhao et al., 2024). By comparing two consecutive Scrum sprints, one baseline and one AI-augmented, this study isolates the impact of retrieval–augmented generation (RAG) and Reinforcement Learning from Human Feedback (RLHF) on release lead time and related delivery metrics (Knollmeyer et al., 2025; Neha et al., 2025; Yu et al., 2024; Zhou, 2024). Beyond measuring aggregate performance changes, this study conducts a stage-level pipeline analysis to identify the mechanisms through which AI influences delivery efficiency. The results provide empirical evidence, methodological guidance, and practical insights for enterprises seeking to reconcile accelerated software delivery with security, governance, and compliance requirements (Fu et al., 2025; Gajbhiye et al., 2024).
In Agile DevSecOps, lead time, often defined as the time from code commit to production deployment, is a critical indicator of release efficiency (Bedoya et al., 2024; Gajbhiye et al., 2024). DevSecOps integrates security throughout development; however, it can slow down delivery if performed manually. Emerging on-premises generative AI techniques (e.g., LLMs augmented with retrieval-augmented generation) and fine-tuned via reinforcement learning from human feedback (RLHF) promise automation of coding and testing tasks (Gargari & Habibi, 2025; Jeong, 2023; Yigit et al., 2024). Early research suggests that generative AI can transform software development by automating coding, testing, and deployment tasks, potentially accelerating delivery while ensuring its security. This study proposes an experimental framework to measure the effect of an AI-augmented DevSecOps pipeline on lead time in the context of an internal tool (sales, HR application, biometric absence application) (Abiona et al., 2024; Akbar et al., 2022; Bahi et al., 2024; Fu et al., 2025; Tomas et al., 2019). Using iterative Scrum sprints, this study compared the lead time before and after integrating on-premises RAG/RLHF tools into a GitLab–Docker CI/CD pipeline (Donca et al., 2022; Karamitsos et al., 2020). The goal was to quantify the changes in lead time (and related metrics) attributable to the AI enhancements. Despite the growing interest in applying generative artificial intelligence (GenAI) within DevOps and DevSecOps environments, existing research has predominantly focused on conceptual frameworks, developer productivity enhancements, and autonomous code generation capabilities, with comparatively limited attention to delivery performance outcomes in enterprise settings (Fu et al., 2025; Gajbhiye et al., 2024; Liang et al., 2024). In particular, there remains a lack of controlled, quantitative evidence demonstrating how generative AI augmentation affects the lead time for changes, which is widely recognized as a core indicator of DevSecOps release efficiency (Bedoya et al., 2024; Gajbhiye et al., 2024). Many prior studies rely on qualitative assessments or high-level observations, offering limited causal insight into whether AI integration measurably accelerates release cycles under real-world operating conditions. Moreover, existing empirical studies often treat CI/CD pipelines as monolithic systems, reporting aggregate performance improvements without examining which specific pipeline stages contribute most to the observed gains. Therefore, the underlying mechanisms through which AI influences DevSecOps performance, particularly across the build, testing, security validation, and approval phases, remain insufficiently understood. This limitation constrains the practical applicability of prior findings, as organizations lack actionable guidance on where AI assistance yields the greatest operational benefits. A further limitation of the current literature is its predominant reliance on cloud-hosted AI services and development contexts with relatively relaxed governance constraints. In contrast, many enterprise environments, especially those operating internal systems for sales, human resources, and financial processing, are subject to stringent requirements regarding data sovereignty, auditability, and human oversight. Consequently, it remains unclear whether on-premises human-in-the-loop generative AI can meaningfully reduce DevSecOps lead time while preserving security, compliance, and accountability in regulated enterprise settings (Jeong, 2023; Singh et al., 2025; Zhao et al., 2024).
DevOps and lead time metrics: High-performing DevOps teams measure and minimize the lead time for changes (code commit to deploy) (Badshah et al., 2020; Snyder & Curtis, 2018). DORA identifies change lead time as a core throughput metric, and Atlassian notes that top teams achieve lead times on the order of hours (versus days/weeks for lower performers) (Hatch & Curry, 2020; Schmid, 2017). Practices such as trunk-based development, small batch sizes, and test automation are known to shorten lead times (Abiona et al., 2024; Adewusi et al., 2024; Prates & Pereira, 2024; Tomas et al., 2019). In DevSecOps, automating security checks is crucial because manual reviews can introduce bottlenecks (Ahmed & Francis, 2019; Gajbhiye et al., 2024; Shamsuddoha et al., 2025; Zota et al., 2025). Recent studies indicate that AI-driven tools can embed security automation without impeding delivery speed (G. Agarwal, 2024; Rangnau et al., 2020; Ur Rahman & Williams, 2016).
Generative AI in DevOps: Modern CI/CD platforms increasingly integrate AI for developer assistance (Garg et al., 2021; Wessel et al., 2025). For example, GitLab’s Code Suggestions use generative models to propose code snippets to help developers “write code more efficiently” ( Agarwal et al., 2018; Gajbhiye et al., 2024). Generative AI frameworks, such as RAG, improve answer accuracy by retrieving relevant knowledge before generation, and RLHF fine-tunes models to align with human preferences (Amugongo et al., 2025; Arslan et al., 2024; Gao et al., 2023; Hikov & Murphy, 2024; Zhang & Zhang, 2025). In the context of DevSecOps, recent qualitative research has found that combining DevSecOps with generative AI (e.g., LLMs) leads to the “automation of coding tasks and predictive analytics” and improved source code management (Abiona et al., 2024; Akbar et al., 2022; Jeong, 2023; Omran Almagrabi & Khan, 2025; Rangnau et al., 2020). Another study reported that GAI can “automate various aspects of software development, including coding, testing, and deployment” when used in a DevSecOps framework (Fu et al., 2025; Tomas et al., 2019; Zota et al., 2025). These insights suggest that AI has the potential to reduce manual effort in securing CI/CD pipelines; however, quantitative evidence on metrics such as lead time is still required (Ajiga et al., 2024; Gajbhiye et al., 2024).
Agile and experimental methods: Scrum and other Agile frameworks emphasize iterative delivery and empirical measurement (Cervone, 2011; Junker et al., 2022; Uludağ et al., 2021). In the research context, action research methods involving cycles of planning, action, observation, and reflection align well with agile projects (Bahi et al., 2024). Accordingly, our methodology uses short sprints (2–4 weeks) to iteratively implement and measure changes, reflecting the scrum pillars of transparency, inspection, and adaptation (Dugbartey & Kehinde, 2025; Salo & Abrahamsson, 2006). At the end of each sprint, team feedback and logged metrics informed adjustments, embodying continuous improvement (Joel et al., 2024; Paasivaara et al., 2009; Zayat & Senvar, 2020).
This study evaluated sprint lead-time performance within an Agile DevSecOps release management process. The research did not involve medical research, clinical intervention, animal experimentation, or the collection of personal sensitive data. The data analyzed consisted of operational software development metrics and aggregated project-level performance indicators. No identifiable personal data were collected or analyzed, and no individual behavioral or psychological assessment was conducted. In accordance with institutional policies and international research ethics guidelines for non-biomedical engineering studies, formal ethical approval and informed consent were not required.
This study proposes a quasi-experimental, within-team design over multiple sprints. The same development team works on comparable feature tasks in two phases: a baseline phase (current DevSecOps pipeline without AI) and an AI-augmented phase (pipeline enhanced with on-premises RAG/RLHF tools). Quantitative DevOps metrics were collected throughout the study. The primary metric is Lead Time for Changes (committing production deployment). Secondary metrics include Deployment Frequency (deployment per sprint) and Change Failure Rate (percentage of deployments requiring hotfixes). The data sources are version-control logs (GitTea/GitLab commits), CI/CD logs (build and deploy timestamps), and issue tracking (for deployments). Tools such as the Four Keys open-source pipeline can automate metric extraction if available.
Because no historical data exists, the baseline is established in situ, and the team first runs a pilot sprint under the existing process. This generates initial data on lead times and bottlenecks. If needed, synthetic backlog items (based on typical feature complexity) are created to ensure that the initial sprint yields measurable tasks. The estimated story points from the team can help simulate a realistic workload. Known industry benchmarks (e.g., Atlassian’s high-performing lead times in hours) guide the expected ranges.
This research approach follows agile testing cycles: after the baseline sprint (e.g., 2–4 weeks), the team implements RAG/RLHF enhancements in the pipeline (e.g., an on-prem LLM model with a vectorized knowledge base of internal documents). In subsequent sprints (s), these AI tools assist with coding (e.g., code completion, test generation) and automated reviews. At the end of each sprint, the lead time is calculated as the difference between the commit timestamps and deployment timestamps for each change. We also logged the deployment counts and any rollback incidents. This action-research loop allows for qualitative feedback (developers’ experience with AI tools) alongside metrics:
• Context: Increasing pressure for rapid, secure software delivery in enterprise environments.
• Problem: Security integration in DevSecOps often extends the lead time.
• Solution: AI-assisted automation for security and release tasks.
• Research Question: How does on-premises generative AI augmentation affect lead times in Agile DevSecOps pipelines?
• Contribution: Empirical measurement framework with enterprise implementation.
The experiment spanned two consecutive sprints of equal duration (e.g., two weeks each). Table 1 outlines the sprint structure and the measured metrics. Sprint 1 (Baseline) follows the team’s usual Agile DevSecOps process: code development in.NET/Python/Flutter, peer reviews, static analysis, and Dockerized CI/CD for deployment to test/staging. No AI assistance was used. Sprint 2 (AI-Augmented) introduced the use of generative AI at key points. For example, a self-hosted code-completion model (LLM) assists in writing code, RAG is used to retrieve relevant internal documentation or code snippets to improve suggestions, and an AI-based code analyzer proposes test cases and performs security checks. The rest of the pipeline remains the same (e.g., GitLab runners and Docker builds). Throughout both sprints, the following metrics were recorded:
• Lead time for changes: Time (hours) from a code committed to entering version control to its first successful production deployment. (For long tasks, we measure per commit batch.).
• Deployment frequency: Number of successful deployments to production per sprint.
• Change failure rate: Percentage of deployments that require immediate remediation (hotfix or rollback).
| Metric | Baseline Sprint | AI-Augmented Sprint |
|---|---|---|
| Lead Time for Changes (hours) | 72 | 48 |
| Deployment Frequency (per sprint) | 2 | 3 |
| Change Failure Rate (%) | 15% | 10% |
Each commit and deployment event is time-stamped in the GitLab/Docker logs. Using these, lead times were computed per change. Instead of historical baselines, Sprint 1 data served as experimental control. For robustness, at least 5–10 change events per sprint should be collected to compute the median lead time and frequency; more samples reduce the variance. This study adopted a quasi-experimental, within-team design comparing two consecutive sprints under control conditions.
Sprint Timeline:
Week 1–2: Baseline Sprint (Conventional DevSecOps).
↓
1-week transition (AI integration).
↓
Week 4–5: Intervention Sprint (AI-Augmented DevSecOps).
Control Variables:
The intervention introduced an AI-augmented DevSecOps pipeline based on a three-phase plan–automate–monitor framework, integrating generative AI as a decision support mechanism across the release process. The AI architecture comprises three core components. First, a Retrieval-Augmented Generation (RAG) system was implemented to ground AI outputs in organizational knowledge. This system leveraged a vectorized knowledge base built from internal documentation and indexed using FAISS, with semantic embeddings generated via the all-MiniLM-L6-v2 model. Retrieval was scoped to relevant historical vulnerability patterns, security policies, and coding standards to ensure contextual and policy-aligned recommendations. Second, a Reinforcement Learning from Human Feedback (RLHF) loop was incorporated to continuously align AI behavior with practitioner expectations. Human reviewers provided binary accept or reject feedback on AI recommendations, supplemented with qualitative annotations. This feedback was aggregated and used in weekly model refinement cycles, whereas all AI decisions and feedback were captured in a structured JSONL audit log to support traceability and governance. Finally, the AI services were deployed on an on-premises large language model infrastructure using a fine-tuned Llama 2 7B model trained on the organization’s internal codebase. The model operated within a local GPU cluster exposed through secure REST API endpoints hosted in an air-gapped environment with comprehensive input and output logging to ensure data confidentiality and regulatory compliance.
Table 1 shows the outcomes of the metrics. The actual values will be obtained from the sprint logs. Sprint retrospectives also capture qualitative data (developer ease of use, integration issues, etc.), but lead time is the primary quantitative indicator. Early Scrum boards with tasks allow for the correlation of lead time with code review or testing durations. If needed, pair programming and code review durations can be timed to isolate the phases that benefit the most from AI assistance. Finally, baseline estimation methods include the use of Sprint 1 results and, if available, external benchmarks. For instance, we note Atlassian’s advice that high-performing teams aim for multi-hour lead times; if the team’s Sprint 1 mean lead time is on the order of days, it indicates room for improvement. The Five Keys project initially suggested relying on “gut feel” estimates to bucket deployments; however, our instrumentation provides precise data:
• Design: Quasi-experimental, within-team comparison.
• Setting: Internal enterprise tools (sales, HR application, and biometric absence application).
• Intervention: RAG/RLHF integration into GitLab-Docker pipeline.
• Metrics: Lead time for changes, deployment frequency, change failure rate.
• Analysis: Welch’s t-test, descriptive statistics, process mining.
3.2.1. Research Context and Setting.
This study was conducted within an enterprise software development environment specializing in internal business tools for sales, HR, and biometric absence application. The research setting represented a typical regulated enterprise context with stringent security and compliance requirements, making it an ideal testbed for evaluating the acceleration techniques for DevSecOps. Development Environment:
• Technology Stack:. NET 6/7 for backend services, Python 3.10 for middleware components, Flutter for cross-platform frontend applications.
• CI/CD Infrastructure: GitLab 15.10, Docker 20.10, Kubernetes 1.26 for container orchestration.
• Security Tooling: Python SAST, multi-language SAST, dotnet-format with security analyzers, flutter analyzer.
• Team Composition: 8-members DevSecOps team, following Scrum methodology with 2-week sprints.
3.2.2. Hardware and Environment.
To demonstrate the viability of this solution in resource-constrained or high-security environments, the entire Experimental Group workflow was executed on-premises without external GPU acceleration. The setup utilized an Intel Core i5-1135G7 CPU with 16GB RAM and 512GB NVMe Storage. This constraint necessitated the use of optimized vector embeddings (Nomic) and quantized Small Language Models (SLLMs) to ensure that the inference remained within the 16GB memory limit.
The measurement framework in this study adopts the lead time for changes as the primary performance metric, consistent with established DevOps and DevSecOps evaluation practices. Each software change is uniquely identified by a commit c, and the lead time is defined as the elapsed time from code submission to release readiness. Specifically, timestamps are recorded at key pipeline milestones: the time of commit submission (t_"commit” (c)), build completion (t_"build_end” (c)), test completion (t_"test_end” (c)), security scan completion (t_"scan_end” (c)), human approval (t_"approval” (c)), and final release or deployment readiness (t_"release” (c)). Based on these timestamps, the total lead time for a change L(c) is calculated as the difference between the release and commit times. To enable a finer-grained analysis, lead time was decomposed into five sequential stage durations: build (D_"build”), test duration (D_"test”), security scanning duration (D_"scan”), approval duration (D_"approval”), and release duration (D_"release”). Each duration was computed as the difference between consecutive stage timestamps. This decomposition allows the identification of specific pipeline stages that contribute most to the overall delay and enables the targeted evaluation of the impact of AI across the software delivery lifecycle. Total lead time Eq 1 measurement:
Aggregate measures over a sprint (set of commits ) shown in Eq 2:
• Median lead time: useful for skewed distributions.
• Approval latency (mean of ).
• Security scan time (mean of ).
• Deployment frequency:
Success criteria (practical) shown in Eq 3, baseline mean lead time be (from Sprint 1) and treatment mean lead time :
• Safety: Change Failure Rate (CFR) must not increase by more than the acceptable bound (e.g., 0–5% absolute).
Operational Approval latency must decrease, and automation should reduce manual work.
Operational Definition shown in Eq 4:
Measurement Granularity measure in Eq 5:
Statistical Aggregation: Mean ( ), Median, 95th percentile per sprint.
Secondary Metrics.
Qualitative Measures:
• Developer experience: Post-sprint light quick review.
• AI Acceptance Rate: Percentage of AI recommendations approved.
• Learning Curve: Time to first effective AI utilization.
• System Usability Scale (SUS): Standardized usability assessment.
• Data Collection and Analysis.
Data Sources:
1. Version Control Logs: GitLab commit timestamps and metadata.
2. CI/CD Pipeline Logs: Docker build and deployment timestamps.
3. Issue Tracking: tickets for defect correlation.
4. AI Interaction Logs: RLHF decision trails.
5. Security Scanning Results: Open Application Bandit, Semgrep, and dotnet security reports.
Statistical Analysis Plan:
• Descriptive Statistics: Mean, median, and standard deviation for all metrics.
• Inferential Testing: Welch’s t-test for lead time comparison.
• Effect Size Calculation: Cohen’s d for practical significance.
• Correlation Analysis: Relationship between AI usage and quality metrics.
• Qualitative Coding: Thematic analysis of developer feedback.
Ethical Considerations:
• All AI interactions are logged for auditability.
• No personal or sensitive data processed by AI models.
• Human oversight is maintained for all production decisions.
The measures targeted the following:
This study expects the AI-augmented pipeline to reduce the lead time and perhaps increase the deployment frequency. For example ( Table 2), the hypothetical baseline lead time of 72 h per change could drop to 48 h with AI assistance. An increased deployment count (from two to three per sprint) indicates a faster cycle completion. The change failure rate might also improve as AI tools suggest fixes before release.
| Metric | Baseline Sprint (Mean ± SD) | AI-Augmented Sprint (Mean ± SD) | Change (Δ%) |
|---|---|---|---|
| Total Lead Time (h) | 67.8 ± 24.3 | 41.2 ± 15.6 | −39.2% * |
| Build Duration (h) | 1.2 ± 0.4 | 1.1 ± 0.3 | −8.3% |
| Test Duration (h) | 3.8 ± 1.2 | 3.5 ± 1.0 | −7.9% |
| Security Scan (h) | 6.5 ± 2.1 | 2.3 ± 0.8 | −64.6% * |
| Approval Wait (h) | 42.3 ± 18.5 | 21.8 ± 9.4 | −48.5% * |
| Release (h) | 14.0 ± 5.1 | 12.5 ± 4.8 | −10.7% |
Table 2 shows a comparison of the collected metrics. (In a real experiment, this table would be populated with the sprint’s logs.) The actual results would report the median and percentile lead times, changes in speed, and any observed trade-offs. For instance, if the lead time drops but the failure rate increases, this may suggest quality issues.
The experiment evaluated the release management performance across two sprint iterations. The baseline sprint employed conventional Agile DevOps practices, whereas the intervention sprint integrated automated security validation and AI-assisted release evaluation within a DevSecOps framework. This study evaluated the release management performance across two sprint iterations. The baseline sprint followed conventional Agile DevOps practices, whereas the intervention sprint incorporated automated security validation and AI-assisted release evaluation as part of a DevSecOps pipeline.
The results indicate a substantial reduction in the end-to-end release lead time during the intervention sprint. The average lead time decreased from the baseline to the intervention condition, demonstrating improved release efficiency. A Welch’s t-test confirmed that the difference in lead time between the two sprints was statistically significant (p < 0.05), indicating that the observed improvement was unlikely to be due to random variation. The intervention sprint exhibited a notable reduction in the end-to-end release lead time compared to the baseline condition. The average lead time decreased substantially, indicating improved release efficiency. A Welch’s t-test confirmed that the observed difference in lead time between the baseline and intervention sprints was statistically significant (p < 0.05), suggesting that the improvement was not due to random variations.
The most pronounced improvements were observed in the security validation and release approval stages. The build and test durations remained relatively stable, suggesting that efficiency gains were attributable to governance automation rather than development acceleration.
The experiment spanned four weeks with two 2-week sprints. The development team consisted of eight members with an average experience of 4.2 years in enterprise software development. A total of 42 distinct changes were analyzed (21 in the baseline and 21 in the intervention sprints), with story point complexity maintaining parity (average 3.2 points per change).
Primary Outcome: Lead Time Reduction.
The most pronounced improvements were observed in the security scanning time, which was reduced by 64.6%, and approval waiting time, which decreased by 48.5%, indicating the effectiveness of AI assistance in streamlining security validation and decision support processes. Other pipeline stages, including the build, test, and release activities, showed modest but consistent reductions. Statistical analysis using Welch’s t-test confirmed the significance of the overall improvement (t(32.4) = 4.28, p = 0.00014), with a large effect size (Cohen’s d = 1.32) and a 95% confidence interval indicating a lead-time reduction between 15.8 and 37.4 h.
Table 3 illustrates how the introduction of AI assistance reshaped the overall DevSecOps performance beyond lead-time improvements. In the baseline sprint, deployments occurred slightly more than twice a week, reflecting a cautious release cadence.
| Metric | Baseline Sprint | AI-Augmented Sprint | Change |
|---|---|---|---|
| Deployment Frequency | 2.1/Week | 3.4/week | +61.9% |
| Change Failure Rate | 14.3% | 8.7% | −39.2% |
| Mean Time to Recovery | 4.2 hours | 2.8 hours | −33.3% |
| AI Recommendation | N/A | 78.6% | N/A |
With AI augmentation, the deployment frequency increased to 3.4 releases per week, a 61.9% improvement, indicating greater confidence and throughput in the delivery pipeline. Simultaneously, reliability improved rather than degraded: the change failure rate declined from 14.3% to 8.7%, representing a 39.2% reduction in failed deployments. Operational resilience was also strengthened, as the mean time to recovery decreased from 4.2 h to 2.8 h, enabling faster remediation when incidents occurred. Notably, AI-generated recommendations were accepted in 78.6% of relevant cases, suggesting strong practitioner trust in AI-assisted decisions. Taken together, these results show that AI augmentation simultaneously increases delivery speed, reduces risk, and improves recovery capability, reinforcing the premise that AI can transform the traditional speed–stability tradeoff into a complementary relationship.
Figure 1 illustrates the distribution of the Lead Time for Changes across the two experimental groups. The Baseline (grey) demonstrates a wider variance (Range: 32-124 h) and a higher median latency (64 h), indicative of the delays inherent in manual release verification. In contrast, the AI-augmented workflow (green) exhibits a significant “shift-left,” reducing the median lead time to 39 h and narrowing the variance (Range: 18-78 h). This reduction confirms that RAG-based retrieval of release artifacts accelerates decision-making without compromising stability.

Boxplot visualization of the distribution of lead time for changes (in hours) across two consecutive sprints. The Baseline sprint (conventional DevSecOps) exhibits wider variance (range: 32–124 h) and a higher median lead time (64 h), reflecting delays associated with manual security validation and approval processes. The AI-augmented sprint demonstrates a substantial leftward shift in distribution, with a reduced median lead time (39 h) and narrower variance (range: 18–78 h). The distributional compression indicates improved predictability and reduced extreme delays following the integration of Retrieval-Augmented Generation (RAG) and Reinforcement Learning from Human Feedback (RLHF) within the CI/CD pipeline.
Table 4 presents a percentile-based analysis of the lead time distribution before and after the AI-assisted DevSecOps intervention, providing insights beyond the mean values. Across all evaluated percentiles, the intervention consistently reduced the lead time by approximately 38–42%, indicating a uniform improvement rather than gains limited to specific cases. At the median (50th percentile), lead time decreased from 64 h to 39 h (−39.1%), demonstrating substantial benefits for typical releases.
| 25th | 50th | 75th | 95th | |
|---|---|---|---|---|
| Baseline | 48 h | 64 h | 86 h | 112 h |
| Intervention | 28 h | 39 h | 53 h | 68 h |
| Improvement | −41.7% | −39.1% | −38.4% | −39.3% |
Importantly, the upper tail of the distribution also improved markedly: the 95th percentile was reduced from 112 to 68 h (−39.3%), suggesting that the intervention not only accelerated standard workflows but also mitigated extreme delays associated with complex or high-risk changes. Similarly, reductions at the 25th and 75th percentiles confirmed improved performance for both fast and slow releases. Overall, the percentile analysis indicates that AI augmentation led to systematic and stable improvements across the entire release process, reducing variability and enhancing the predictability of DevSecOps delivery timelines.
Security Scanning Acceleration: The most pronounced performance gains observed in this study were concentrated in the security validation stages of the DevSecOps pipeline, where AI assistance directly addressed long-standing sources of delay and inefficiency. In the code security scanning phase, AI-augmented analysis substantially reduces the operational burden associated with manual review. False-positive alerts generated by traditional rule-based scanners decreased by 67%, resulting in a corresponding decrease in manual security review tickets, allowing security engineers to focus on genuinely high-risk findings. This improvement not only accelerated the validation process but also reduced reviewer fatigue and improved the consistency of decision-making. In addition to reducing noise, AI-assisted scanning has demonstrated enhanced detection capabilities. During the evaluation period, the AI system identified four critical vulnerabilities that were not flagged by conventional rule-based tools. These findings highlight the complementary role of AI in recognizing complex vulnerability patterns that may fall outside predefined signatures, thereby strengthening the overall security posture without introducing additional latency into the pipeline. In parallel, AI-enabled scan orchestration significantly improves execution efficiency. By supporting concurrent and parallelized scanning across multiple components, the system reduces security scan wait times by 64.6%. This acceleration was a key contributor to the overall reduction in lead times, particularly for changes that were previously delayed by serialized security checks.
Additional gains were achieved through automated policy evaluation and compliance support. The average number of policy violations per change decreased from 3.2 to 1.1, indicating clearer and earlier feedback to the development teams. Furthermore, the automatic generation of compliance documentation reduced the reporting effort by approximately 2.5 h per release. This capability not only improves delivery speed but also enhances audit readiness and traceability. Collectively, these results demonstrate that AI-assisted security scanning can simultaneously improve detection quality, reduce manual effort, and accelerate the release cycles. Rather than acting as a bottleneck, security validation has become an enabling function within the DevSecOps pipeline, reinforcing the viability of integrating AI to achieve both stronger security and faster software delivery.
Table 5 provides a detailed breakdown of the approval latency components before and after AI integration, revealing how AI reshaped the approval workflow rather than uniformly reducing all activities. Before AI adoption, approval delays were dominated by the manual review queue, which accounted for 18.4 hours (43.5%) of the total approval time, followed by security analysis at 14.2 hours (33.6%). Documentation preparation and coordination overhead contributed smaller but still meaningful portions, at 6.3 hours (14.9%) and 3.4 hours (8.0%), respectively. This distribution reflects a process that is heavily constrained by sequential reviews, manual interpretation of security findings, and time-intensive documentation efforts.
Qualitative findings: AI summaries reduced the cognitive load for approvers, enabling faster decision-making despite similar coordination times.
Following AI integration, substantial reductions were observed in most critical bottlenecks. The manual review queue time was reduced to 8.7 hours (39.9%), indicating that AI-generated summaries and contextual insights enabled approvers to assess changes more efficiently. Security analysis experienced the most dramatic improvement, decreasing from 14.2 h to 4.5 h and shrinking its relative contribution from 33.6% to 20.6%. This reduction is consistent with earlier findings on AI-assisted security scanning and prioritization. Documentation latency was similarly reduced, falling from 6.3 hours to 2.1 hours, as automated report generation streamlined compliance and audit preparation. Interestingly, the coordination time increased from 3.4 h (8.0%) to 6.5 h (29.9%). Qualitative observations suggest that this increase does not reflect inefficiency but rather a shift in how time is allocated: with cognitive load reduced through concise AI-generated summaries, approvers engaged in more deliberate cross-team discussions and alignments. Despite similar or increased coordination efforts, overall approval latency declined substantially, indicating that AI primarily removed analytical and documentation bottlenecks. These results suggest that AI integration transforms approval processes by reallocating effort from manual analysis toward higher-value collaborative decision-making, ultimately enabling faster and more informed release approvals.
Table 6 summarizes the effectiveness of the individual AI components deployed within the DevSecOps pipeline by combining quantitative performance metrics with developer perceptions. Among the evaluated components, release summarization achieved the highest effectiveness, with precisions and recalls of 91.2% and 94.5%, respectively, and the highest developer satisfaction score (4.7 out of 5). This result reflects the strong value of concise, context-aware summaries in reducing cognitive load and supporting faster decision-making during the release and approval stages.
| AI Component | Precision | Recall | Developer Satisfaction |
|---|---|---|---|
| Code Completion | 72.4% | 68.9% | 4.2/5.0 |
| Test Generation | 65.8% | 71.3% | 3.8/5.0 |
| Security Recommendations | 88.6% | 76.2% | 4.5/5.0 |
| Release Summaries | 91.2% | 94.5% | 4.7/5.0 |
The security recommendation components also demonstrated high performance, achieving a precision of 88.6% and a recall of 76.2%, along with a strong satisfaction rating of 4.5. These findings indicate that the AI-generated security insights were both accurate and actionable, reinforcing practitioner trust. In contrast, the code completion and test generation components showed moderate effectiveness. Code completion achieved a precision of 72.4% and recall of 68.9%, with a satisfaction score of 4.2, whereas test generation exhibited slightly lower precision (65.8%) but higher recall (71.3%), corresponding to a satisfaction rating of 3.8. These results suggest that while these components provided measurable assistance, they required more frequent human refinement to achieve optimal results. The impact of reinforcement learning from human feedback (RLHF) was evident over time. The acceptance of AI recommendations increased from 62.4% in the first week to 78.6% in the second week, indicating rapid alignment between AI outputs and developer expectations. Moreover, acceptance rates were strongly correlated with reductions in lead time (r = 0.73, p < 0.05), suggesting that increased trust and effective human–AI interaction directly contribute to improved delivery performance.
A post-intervention lite review was conducted with eight participants using a five-point Likert scale to assess the usability and perceived impact of the AI-assisted DevSecOps system. The average System Usability Scale (SUS) score was 78.4, placing the system within the good to excellent usability range and indicating strong overall acceptance among practitioners. Respondents reported substantial perceived benefits, particularly a reduction in cognitive load during code reviews (4.6/5.0) and faster identification of security issues (4.4/5.0). Improved confidence in release decisions (4.3/5.0) and reduced documentation burden (4.1/5.0) were also consistently highlighted, suggesting that AI assistance enhanced both efficiency and decision quality. Despite these positive outcomes, several challenges have been identified. Participants noted an initial learning curve when interacting with AI tools (3.2/5.0) and occasional irrelevant recommendations (3.4/5.0), underscoring the need for continuous model refinement. Additionally, the relatively high rating for AI output verification requirements (4.0/5.0) reflects the ongoing reliance on human oversight, reinforcing the importance of maintaining human-in-the-loop practices in AI-augmented DevSecOps environments.
Qualitative analysis of practitioner feedback revealed several emergent themes associated with AI-assisted DevSecOps adoption. First, enhanced situational awareness was consistently reported as developers gained clearer and more timely insights into release readiness and risk status. Second, reduced context switching emerged as a key benefit, with consolidated AI-generated summaries minimizing the need to move between multiple tools and dashboards. Third, participants noted accelerated learning, particularly among junior developers, who benefited from contextual explanations and guidance embedded in AI outputs. Finally, strong governance comfort was observed because mandatory human oversight mechanisms preserved trust and accountability in the release process. Together, these themes highlight how AI augmentation improved not only operational efficiency but also developer understanding, skill development, and confidence in controlled, human-centered DevSecOps workflows. Emergent themes:
1. Enhanced Situational Awareness: Developers reported a better understanding of release readiness
2. Reduced Context Switching: Consolidated AI summaries minimized tool-hopping
3. Learning Acceleration: Junior developers benefited from AI explanations
4. Governance Comfort: Mandatory human oversight maintains trust in the system
A cost–benefit analysis was conducted to evaluate the economic feasibility of the proposed AI-assisted DevSecOps implementation. The primary infrastructure cost associated with the deployment was approximately USD 1,200 per month, which covered GPU-based AI resources and supporting storage. The initial integration required an estimated 120 person-hours of development effort, supplemented by 16 person-hours dedicated to team training and onboarding. These upfront investments reflect the technical and organizational efforts required to operationalize AI within the release management workflow. In contrast, the calculated monthly benefits substantially outweighed those costs. Productivity gains resulting from reduced lead time and lower manual effort were estimated at USD 8,400 per month, based on 56 h saved at an average labor cost of USD 150 per h. Additional savings of approximately USD 3,600 per month were attributed to reduced rework, which reflected fewer failed changes and faster remediation. Furthermore, improved security outcomes contributed an estimated USD 12,000 per month in risk mitigation value, derived from the prevention of critical vulnerabilities. Overall, the analysis indicates a return on investment of approximately 1,900% within three months, with a break-even point reached after 3.2 weeks, underscoring the strong economic justification for the adoption of AI-assisted DevSecOps.
If lead times decrease significantly post-AI, this would support the hypothesis that on-prem AI can accelerate release cycles in DevSecOps. The expected mechanism is that RAG- or RLHF-powered code suggestions and automated test generation reduce manual coding and review time. Faster coding and early detection of issues would shorten the commit-to-deploy interval. This aligns with the literature, noting that generative AI “facilitated automation of coding tasks” in DevSecOps contexts. Moreover, AI-based static analysis or vulnerability scanning can be run continuously, reducing security review delays. A higher deployment frequency could emerge because less work and fewer bottlenecks allow for more changes per sprint. This is consistent with the notion that automation and small batch workflows improve throughput. If the change failure rate also declines, it suggests that AI did not sacrifice quality. In contrast, unchanged or worsened failure rates would signal the need to refine AI tools or retain manual oversight. This study has several limitations that must be acknowledged. The short duration (two sprints) limits statistical confidence; more iterations would strengthen the conclusions. Task complexity may not be perfectly uniform across sprints, potentially biasing the lead time. Developers’ learning curves with new AI tools could initially reduce productivity (a factor tracked qualitatively). RLHF tuning may require more feedback cycles than those that fit in one sprint. In addition, our internal tools (Flutter front-end,.NET/Python back-end) may respond differently to AI aids than open-source projects. Finally, the research measures only lead time; future work could measure related outcomes (e.g., code quality, team satisfaction):
• Practical Implications: Measurable DevSecOps maturity advancement.
• Theoretical Contribution: Human-in-the-loop AI integration model.
• Limitations: Single-organization, short-duration research.
• Future Work: Longitudinal studies, predictive risk assessment.
The findings of this study indicate a clear advancement in DevSecOps maturity, characterized by a transition from largely ad hoc security practices to a more structured, automated, and policy-driven release-governance model. Security activities that were previously reactive and manually enforced have become embedded within the delivery pipeline, supported by continuous feedback and AI-assisted decision support. This shift reduced the reliance on individual expertise and informal processes, replacing them with repeatable and auditable controls. The observed improvements are consistent with established DevSecOps maturity models that emphasize early security integration, automation, and continuous monitoring across the software lifecycle. By enabling faster feedback loops and standardized policy enforcement, the AI-augmented approach supports higher levels of operational predictability and governance. Overall, the results suggest that AI-assisted DevSecOps can act as a maturity accelerator, helping organizations progress toward more resilient, scalable, and sustainable secure software delivery practices.
AI-assisted release evaluation played a central role in enhancing situational awareness by consolidating the pipeline status, security findings, and compliance information into a single, coherent view of release readiness. This unified perspective reduces cognitive overhead and enables stakeholders to assess risks and progress more efficiently. Crucially, final release decisions remained under human control, ensuring that organizational governance, accountability, and ethical responsibility were preserved. The AI functions as a decision-support mechanism rather than an autonomous authority, reinforcing trust in the release process while improving the speed and quality of evaluation.
For internal enterprise systems, the findings demonstrate that DevSecOps investments can yield measurable delivery benefits within short-sprint cycles. Automation reduces coordination overhead and supports more predictable release results.
Figure 2 highlights the “Robustness” achieved using the AI-augmented approach. The Baseline process, which is heavily reliant on manual verification, yielded a 28% failure rate, which was largely attributed to human oversight in the analysis of complex log files. The AI-augmented system reduced this to 12%. This significant reduction validates the effectiveness of RAG in retrieving critical error patterns from security logs and tickets, while RLHF ensures that the model’s approval criteria are aligned with the specific security standards of the organization, preventing “hallucinated” approvals.

Bar chart illustrates the percentage of deployments requiring remediation (hotfix or rollback) across the two sprint conditions. The Baseline sprint recorded a higher change failure rate (28%), primarily associated with manual log interpretation and delayed detection of security issues. Following AI augmentation, the failure rate decreased to 12%, representing improved release robustness. The reduction reflects the contribution of AI-assisted security scanning, contextual log retrieval through RAG, and policy-aligned validation refined via RLHF, supporting enhanced reliability without compromising deployment velocity.
The results reflect a measurable transition from ad hoc security integration to automated and policy-driven release governance, consistent with established DevSecOps maturity models. AI-assisted summaries function as a decision-support mechanism, enhancing situational awareness without displacing human authority. Analysis of the pipeline stages revealed that the most significant improvements occurred during the security validation and release approval phases. Build and test durations remained largely unchanged, indicating that efficiency gains were attributable to governance automation rather than increased development speed.
The research was limited by its short execution period and evaluation within one organization. Moreover, the AI functionality was intentionally constrained to assistive summarization tasks, excluding predictive and autonomous decision-making. Consequently, the measured impact may underestimate the potential benefits of broader and more proactive AI integration approaches.
Lead Time Reduction Mechanism.
The observed 39.2% reduction in the lead time stems from three interconnected mechanisms:
1. Parallel Processing Enablement: AI-assisted security scanning transforms a sequential bottleneck into a parallel process. Traditional security reviews require serial expert attention, whereas the AI system provides a preliminary analysis, enabling concurrent human validation.
2. Cognitive Load Reduction: Consolidated AI summaries reduce the information-processing burden on release managers. As expressed by one participant: “The AI doesn’t make decisions for us, but it tells us exactly what we need to look at”.
3. Early Feedback Integration: Real-time AI recommendations during development prevented security and quality issues from progressing through the pipeline, addressing the fundamental DevOps principle of “shifting left.” Quality Maintenance Paradox, contrary to the anticipated speed-quality tradeoff, we observed simultaneous improvement in both delivery speed and change quality. This paradoxical outcome can be explained by the amplification effect, wherein AI tools amplify human expertise rather than replacing it. Security experts can focus on complex vulnerability patterns, while AI handles routine checks, thereby increasing overall inspection coverage. Learning Feedback Loop: RLHF mechanisms create a virtuous cycle in which human decisions train the AI system, which in turn improves its recommendations for subsequent decisions.
Extending DevSecOps Maturity Models, the findings of this study extend the established DevSecOps maturity frameworks by introducing AI-Augmented Maturity Levels:
Level 4 (AI-assisted): Traditional Level 4 (Quantitatively Managed) augmented with:
• Predictive quality gates based on historical patterns.
• Intelligent risk-based approval routing.
• Automated compliance documentation.
Level 5 (AI-optimized): Traditional Level 5 (Optimizing) enhanced with:
This study proposes a Complementary Intelligence Framework in which:
The findings of this study suggest a clear implementation roadmap for engineering managers seeking to adopt AI in DevSecOps practices. Initial deployments should emphasize assistance rather than autonomous AI capabilities, focusing on summarization and recommendation functions that support human judgment rather than replacing it. High-friction stages of the delivery pipeline, particularly security validation and compliance documentation, should be prioritized to maximize the early impact. In parallel, managers must invest in structured change management to address both technical integration challenges and organizational readiness. Establishing governance frameworks early, including clearly defined AI usage policies and accountability structures, is essential to ensure trust and compliance. Several successful factors were consistently identified across implementations. These include strong executive sponsorship, incremental rollout through controlled experimentation, transparent logging of AI-supported decisions to support auditability, and continuous feedback loops to guide ongoing system refinement and alignment with organizational needs.
Transformational Opportunities:
• From Gatekeepers to Enablers: Shift from blocking releases to enabling secure acceleration.
• Scalable Assurance: Leverage AI to extend security coverage without a proportional headcount increase.
• Risk-Based Prioritization: Use AI risk scoring to focus expert attention on the highest-impact issues.
The limitations and boundary conditions of this study include methodological concerns such as internal validity threats, where learning effects from team familiarity with the pipeline may have influenced performance, the Hawthorne effect may have altered behavior due to awareness of observation, and inherent task variability persisted despite standardization efforts. Construct validity considerations also arise, as using lead time reduction as a proxy may not fully capture delivery value or customer impact, and the short two-sprint- duration restricts the evaluation of long-term- sustainability.
The generalizability of the findings is limited by several contextual factors, including organizational maturity, as the results may not extend to teams lacking established DevOps practices; regulatory environments, where heavily regulated industries may require distinct AI governance approaches; technical constraints, particularly in organizations without on-premises- AI infrastructure; and cultural readiness, as teams resistant to AI adoption may experience different outcomes. In addition, the technical limitations of AI systems must be considered, such as the restricted scope of knowledge bases that constrain retrieval-augmented- generation effectiveness, the risk of perpetuating organizational biases through training data, and the explainability gap created by the black-box- nature of certain AI recommendations, which can undermine trust in critical systems.
Revealed two emergent phenomena: the Expertise Amplification Effect, where junior developers benefited disproportionately from AI assistance, reducing lead time by 52% compared to 31% for senior developers, suggesting that AI may act as an expertise equalizer within teams; and the Documentation Paradox, in which automation lowered manual documentation effort yet overall documentation volume rose by 28%, though this increase produced documentation that was more structured in machine-readable formats, more traceable through links to specific code changes, and more actionable by being integrated into remediation workflows. Future research should prioritize longitudinal studies that assess the sustainability of AI augmentation across multiple quarters, replication across diverse industries, and team structures to validate findings and comprehensive economic analyses that include indirect benefits. On the technical side, key questions involve determining optimal human-AI task allocation strategies, developing explainable AI systems for DevSecOps and critical infrastructure, and advancing federated learning approaches to enable privacy-preserving model training across organizational boundaries. At the organizational level, research should explore the cultural and structural dynamics that influence successful AI adoption, examine how developer roles evolve with augmentation, and establish governance models and regulatory frameworks for AI-assisted software deliveries.
This study presents a sprint-based experimental evaluation of Agile DevSecOps release management, demonstrating a statistically significant lead-time reduction through automated security integration and AI-assisted decision support. This research contributes to both academic and industrial DevSecOps practices by providing controlled experimental evidence that on-premises generative AI can significantly reduce software delivery lead time by 39.2% (p < 0.01) without degrading quality. It introduces a practical plan–automate–monitor framework, enhanced with RAG and RLHF components, to guide the adoption of AI-assisted release management while ensuring human oversight and governance. In addition, this study proposes a comprehensive and replicable measurement methodology that enables a systematic quantitative evaluation of the impact of AI on software delivery, addressing a key gap in existing DevSecOps research. This study extends DevSecOps research by providing empirical, stage-level evidence of how assistive generative AI alters release dynamics in regulated enterprise settings. These findings challenge the traditional assumption of a speed–security trade-off by demonstrating that human-in-the-loop AI can simultaneously enhance delivery efficiency and governance effectiveness. For practitioners, this study offers a replicable measurement framework and implementation blueprint for integrating AI into DevSecOps pipelines without relinquishing human control. The results indicate that organizations can achieve measurable delivery improvements by targeting high-friction governance points, particularly security validation and release-approval processes. In an era of accelerating digital transformation and escalating cyber threats, the integration of generative AI into DevSecOps practices offers a promising path toward more resilient and responsive software delivery systems. By maintaining a principled focus on human oversight, auditability, and continuous improvement, organizations can harness AI’s potential of AI to not only accelerate their release cycles but also elevate the quality, security, and reliability of the software systems upon which modern society increasingly depends. This research is limited by its short duration and single organizational context. Future research should evaluate the longitudinal effects, multi-team deployments, and predictive risk assessment capabilities to further validate the scalability and sustainability of the proposed model. On-premises generative AI can significantly reduce the DevSecOps lead time while maintaining governance standards, offering a viable path for enterprises seeking to accelerate secure software delivery. The future of software delivery lies not in choosing between human expertise and artificial intelligence but in forging new partnerships that leverage the unique strengths of each, creating development ecosystems that are simultaneously more efficient, secure, and human-centered. As software systems become increasingly critical to economic and social infrastructure, the responsible integration of AI into development practices represents both competitive imperatives and ethical responsibilities. This research provides an initial roadmap for organizations embarking on this journey, emphasizing that the ultimate measure of success is not merely faster software delivery but more trustworthy, secure, and valuable software systems. The convergence of AI and DevSecOps represents not only technological evolution but also a fundamental reimagining of software delivery paradigms. Our findings suggest that when thoughtfully integrated with appropriate human oversight, AI systems can transform the traditional speed-security tradeoff into a synergistic relationship in which enhanced security enables accelerated delivery. In conclusion, this study demonstrates that thoughtfully integrated on-premises generative AI can serve as an effective decision-support mechanism in DevSecOps, enabling faster and more reliable software delivery while preserving essential security and compliance controls.
This study evaluated sprint lead-time performance within an Agile DevSecOps release management process. The research did not involve medical research, clinical intervention, animal experimentation, or the collection of personal sensitive data. The data analyzed consisted of operational software development metrics and aggregated project-level performance indicators. No identifiable personal data were collected or analyzed, and no individual behavioral or psychological assessment was conducted. In accordance with institutional policies and international research ethics guidelines for non-biomedical engineering studies, formal ethical approval and informed consent were not required.
Repository name: Assessing the Impact of AI-Augmented DevSecOps on Lead Time in Agile Release Management. https://doi.org/10.5281/zenodo.18830679 [Agung Gunawan et al.,2026].
• change_level_data.csv (Change-level dataset containing baseline and AI-augmented commit timestamps, deployment timestamps, and calculated lead times per change).
• release_data.csv (Release-level dataset including ReleaseID, sprint identifier, number of changes, and failure status for each deployment).
• rlhf_learning_curve.csv (Weekly AI recommendation acceptance rates and reinforcement learning from human feedback performance metrics corresponding to AI component evaluation).
• statistical_tests.txt (Output of Welch’s two-sample t-test and associated statistical analysis results for lead-time comparison).
Assessing the Impact of AI-Augmented DevSecOps on Lead Time in Agile Release Management. Zenodo. 2026. https://doi.org/10.5281/zenodo.18830679]. [Agung Gunawan et al.,2026]
• Supplementals File 01 - Diagrams.docx: content of high diagram, git diagram and UML full processing.
• supplementals File 02 - CODES.docx: context of the Python Codes used for the diagrams and processed.
All data and extended materials are available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.
The authors thank the Interdisciplinary School of Management and Technology, Institut Teknologi Sepuluh Nopember, Surabaya, for facilitating the entire study process. We are grateful for their invaluable support.
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - |
|
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: The article has academic value and addresses an important topic, but it still needs specific revisions related to numerical consistency, methodological clarity, reproducibility, and interpretation of findings.
Alongside their report, reviewers assign a status to the article:
| Invited Reviewers | |
|---|---|
| 1 | |
|
Version 1 11 May 26 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)