ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Systematic Review

A Systematic Review of Immersive Technologies in L2 Skill Improvement: Cognitive and Affective Pathways

[version 1; peer review: awaiting peer review]
PUBLISHED 08 Jun 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

Abstract

Abstract*

Background

No post-2020 systematic review has simultaneously examined all four core second language (L2) skills: listening, speaking, reading, and writing within a unified corpus of immersive technology research, nor has any prior review theorized the distinct mediating pathways through which these environments operate. This review addresses both gaps by introducing a dual-pathway mediation framework that reconceptualizes immersive metaverse technologies as active mechanisms shaping cognitive engagement and affective regulation in L2 communicative contexts.

Methods

Following PRISMA guidelines, systematic searches were conducted across Scopus, ScienceDirect, and Taylor & Francis. From 1,317 candidate articles (2020–2025), 31 peer-reviewed empirical studies met the inclusion criteria. Mediating mechanisms were inductively coded through open, axial, and selective stages into two pathways: affective (anxiety reduction, motivational enhancement, self-regulation) and cognitive (embodied presence, multimodal scaffolding, situated meaning negotiation, schema activation).

Results and Discussion

Speaking was the most studied skill (n = 17), followed by listening (n = 9), reading (n = 5), and writing (n = 4). VR dominated (n = 21) over AR (n = 10); XR was entirely absent. Of 31 studies, 14 showed predominantly cognitive mediation, 15 demonstrated concurrent dual-pathway activation, and 2 were primarily affective. The cognitive pathway (present in 29 studies) supports situated cognition theory; the affective pathway, concentrated in VR speaking interventions, aligns with Krashen’s Affective Filter Hypothesis. VR and AR emerged as complementary: VR activates embodied presence and anxiety reduction for speaking and listening, while AR supports cognitive scaffolding for reading and vocabulary. Affordance mapping revealed differential influences of HMD immersion, avatar interaction, real-time AI feedback, and 360° spatial audio across skills.

Conclusions

This review contributes a simultaneous analysis of all four L2 skills under metaverse conditions, the dual-pathway mediation framework as an original theoretical contribution, and a systematic technology-skill affordance map for instructional design. Future research should prioritize longitudinal studies, underrepresented productive skills, and ethical dimensions of AI-enabled metaverse environments.

Keywords

Affective Factors, Cognitive Mediation, Immersive Technology, L2 Learning, Metaverse, Systematic Review.

Introduction

The global transformation of education over the past decade has been marked by the emergence of immersive metaverse technologies, including augmented reality (AR), virtual reality (VR), mixed reality (MR), and extended reality (XR).1,2 These technologies have reshaped the paradigm of second language learning by enabling more authentic interaction and contextually grounded learning experiences.3 This transformation has been further accelerated by the COVID-19 pandemic, which prompted a rapid shift from face-to-face instruction to more flexible and adaptive digital and hybrid learning models.4,5 In this context, the integration of metaverse technologies reflects an epistemological shift in conceptualizing language acquisition as fundamentally shaped by social interaction, immersive experiences, and language use in meaningful contexts.6

From a theoretical perspective, metaverse-based language learning is grounded in social constructivism and situated learning approaches that position learners as active participants in the learning process.6,7 Three-dimensional virtual environments offer pedagogical affordances that enable the simulation of real-world communicative contexts, thereby supporting meaning negotiation, social collaboration, and functional, context-sensitive language use.8,9 A growing body of prior research has consistently demonstrated that the use of multi-user metaverse environments contributes to various aspects of second language learning. Metaverse technologies have been shown to support the improvement of both receptive and productive language skills through realistic communication simulations and the provision of immediate and sustained feedback.10,11

This study makes several distinctive contributions that set it apart from existing systematic reviews. First, unlike prior reviews that examined one or two language skills or combined findings indiscriminately across skills, this review simultaneously and comprehensively analyzes all four core L2 skills (listening, speaking, reading, writing) within a post-2020 empirical corpus. Second, and most critically, this review introduces and empirically substantiates a dual-pathway mediation framework, demonstrating that metaverse environments promote L2 improvement through two distinct yet interacting channels: affective processes and cognitive processes a conceptual contribution absent from prior meta-analyses that positioned technology solely as an instructional tool. Third, by systematically coding specific technology features alongside immersion specifications across 31 studies, this review provides the first granular mapping of how individual technological affordances differentially influence each language skill.

Previous systematic reviews on the use of metaverse technologies in language learning

Several systematic reviews and meta-analyses have shown the substantial impact of immersive technologies such as XR, VR, and AR on language education. For example, XR tools significantly improve vocabulary, grammar, and pronunciation.12 Their meta-analysis indicated a large effect size (g = 0.825) for XR in enhancing L2 linguistic improvement. Similarly, a review of the effects of AR and VR on language skills emphasizes their significant positive impact on speaking and fluency.13 However, there remains limited research on the impact of immersive technologies on reading and writing skills, with some studies indicating inconsistent or insufficient evidence for these domains.14

In exploring the focus of research on immersive technologies, several studies have highlighted their impact on specific language skills and communicative competence. VR has particularly strengthened speaking and listening skills, alongside promoting autonomy in learners.15,16 Further research indicated that high-immersion VR led to higher engagement and greater language proficiency across all levels, with intermediate learners benefiting the most.16 In contrast, Schorr et al. (2024) reported that AR is particularly effective for vocabulary acquisition and also facilitates multimodal linguistic support.17 However, AR’s effectiveness for improving grammar was found to be inconsistent across studies, with some reporting significant gains while others showed little to no improvement.

While the reviewed studies show promising outcomes, research still faces gaps, particularly regarding classroom-based research and learner-centred perspectives. Christou et al. (2025) and Yudintseva (2023) emphasize that most existing reviews combine findings from teachers and learners without adequately accounting for learners’ experiences in classroom settings.18,19 Yudintseva (2023) argues that VR has helped reduce speaking anxiety and enhance oral communication, yet student-centred empirical research on immersive technologies remains underdeveloped.19 Additionally, Weng et al. (2024) and Qi & Chen (2025) stress the need for more research on how immersive technologies support the improvement of pragmatic competence, a critical area for language fluency.20,21 Both studies suggest that while immersive technologies improve cognitive skills, their role in higher-order skills such as critical thinking and writing remains inadequately explored.

Taken together, these systematic reviews reveal three critical, unresolved gaps. First, no post-2020 review has simultaneously examined all four L2 skills within a unified analytical corpus. Second, existing reviews treat technology primarily as an instructional tool, without accounting for the distinct mediating pathways: cognitive and affective through which immersive environments operate. Third, prior reviews aggregate findings across technology types without granular mapping of how specific affordances (e.g., HMD immersion, avatar interaction, AI feedback) differentially influence each skill. The present review directly addresses these three gaps through the analytical approaches described below.

This systematic review examines the use of metaverse-based immersive technologies in language learning, with a particular focus on the improvement of listening, speaking, reading, and writing skills among second-language learners. The goal is to analyze how these immersive technologies are integrated into language instruction and their impact on various aspects of language proficiency. This review draws on multiple studies to answer the following research questions (RQs):

RQ1: How does the use of metaverse-based immersive technologies influence the improvement of L2 skills?

RQ2: What learning processes mediated by metaverse environments contribute to the improvement of L2 skills?

RQ3: How do technology features and immersion specifications influence the improvement of L2 listening, speaking, reading, and writing skills?

Methods

Search strategy

We used the PRISMA guidelines to collect data for current research. Searches were conducted using the databases Scopus, ScienceDirect, and Taylor & Francis. This search was designed to cover a wide range of articles on metaverse technologies, including VR, AR, and XR, and their applications to L2, specifically targeting the four language skills of listening, speaking, reading, and writing.

The following boolean were then used to search for relevant articles in the Scopus and Taylor & Francis: TITLE-ABS-KEY ((Metaverse OR “virtual reality” OR vr OR “augmented reality” OR ar OR “extended reality” OR xr) AND (“second language” OR “foreign language” OR l2 OR “language learning”) AND (listening OR speaking OR reading OR writing)). For ScienceDirect, which limit Boolean terms to a maximum of 8 keywords, the query was adjusted to (Metaverse OR “virtual reality” OR “augmented reality” OR “extended reality”) AND (“second language”) AND (listening OR speaking OR reading OR writing). That boolean removed redundant synonyms to ensure the query fits within platform constraints while still capturing relevant results. Searches were last conducted on 30 November 2025.

Inclusion/exclusion criteria

Inclusion and exclusion criteria were applied to narrow the studies (see Table 1), helping identify the most relevant studies for this review. Our search was limited to research articles published between January 2020 and December 2025, considering the public release of Metaverse, XR, VR, or AR in late 2020. Using “title,” “abstract,” and “keywords” as filtering criteria, we initially identified 1317 English-language articles across the three databases.

Table 1. Inclusion and exclusion criteria.

Inclusion criteriaExclusion criteria

  • a. Studies published in 2020–2025

  • a. Studies not published between 2020 and 2025

  • b. Studies written in English

  • b. Studies written in other English languages

  • c. Open access or full text is legally available

  • c. No full-text available or closed access

  • d. Peer-reviewed journal and

  • e. empirical studies

  • d. Conference proceedings, book chapters, non-peer-reviewed manuscripts, reviews, and meta-analysis papers

  • f. Studies using immersive Metaverse technologies (XR, VR, AR)

  • e. Studies not using immersive metaverse technologies (VR, AR, XR)

  • g. Studies conducted in L2 learning context

  • f. Studies conducted outside of an L2 learning context

  • h. Studies addressing language skills

  • g. Studies focusing on non-skills of L2 learning

Data were extracted from the 31 selected studies using a custom-designed form by six researchers (see Figure 1). The extracted data included study title, authors, journal, research method, target skills, and language focus. Additionally, information was collected on the type of Metaverse used (e.g., VR, AR, or XR) and the country where the research was conducted. Effect sizes and impact magnitudes were extracted for studies that reported them. A blank version of the data extraction form used is available in the supplementary material.

53555feb-69a1-41b2-8026-80e2ece931bf_figure1.gif

Figure 1. Article selection flow.

Analytical framework improvement

Beyond descriptive coding, we developed an original analytical framework to explain how metaverse technologies mediate L2 skill improvement. This dual-pathway mediation framework emerged inductively through iterative thematic analysis of mediating processes reported across studies. We systematically coded learning mechanisms into two broad categories:

  • a. Affective processes: anxiety reduction, motivational enhancement, and self-regulation support.

  • b. Cognitive processes: embodied presence, multimodal scaffolding, situated meaning negotiation, and schema activation.

The framework improvement proceeded through three iterative stages. First, open coding identified 47 distinct mediating mechanisms across the 31 studies. Second, axial coding collapsed these into 12 intermediate categories representing conceptually similar processes. Third, selective coding organized these categories into two overarching pathways based on their primary function in supporting language improvement.

Processes for data coding and analysis

Data coding and analysis were performed based on the research questions (RQs) and predetermined categories. We coded and analyzed the 31 selected studies based on publication details, language, and result. The coding process was guided by the key areas highlighted in each research question: Skill Improvement, Mediating Processes, Technology Features, and Immersive Specification.

RQ1: How does the use of metaverse-based immersive technologies influence the improvement of L2 listening, speaking, reading, and writing skills?

For RQ1, we focused on effect sizes and impact magnitudes to determine the extent to which immersive technologies, such as VR, AR, or XR, improved L2 skills. The data included pre-test and post-test scores, highlighting the average post-test scores for each technology type (e.g., AR vs. VR), and the effect size was reported using methods such as ANCOVA/ANOVA. We also documented the duration of the interventions and the skill improvements.

RQ2: What learning processes mediated by metaverse environments contribute to the improvement of L2 listening, speaking, reading, and writing skills?

For RQ2, we coded the processes that were mediated by metaverse technologies. We also focused on mediating processes, such as interactive tasks in virtual spaces, that helped learners improve their listening and speaking.

RQ3: How do technology features and immersion specifications influence the improvement of L2 listening, speaking, reading, and writing skills?

For RQ3, we analyzed technology features (e.g., immersive AR/VR environments) that contributed to improved L2 performance. Studies were evaluated for immersion specifications, such as VR scenarios for speaking or AR overlays for reading comprehension.

To ensure the reliability of the coding process, five researchers (out of ten authors) independently analyzed the data to minimize bias. While four authors were involved in the study, only two were responsible for the data analysis, which enabled a more focused and unbiased evaluation. The consistency of coding was ensured by periodically cross-checking analysis results, comparing interpretations, and resolving discrepancies through discussion. This approach aimed to reduce the potential for individual bias and ensure that the coding adhered to the established criteria.

Technology-skill affordance mapping

To identify differential effects of specific technological features across language skills, we developed a systematic technology-skill affordance mapping methodology. For each study, we coded: (1) primary technology features (HMD-based VR, 360° video VR, mobile AR, marker-based AR, avatar interaction, real-time AI feedback, gamification), (2) target skills (listening, speaking, reading, writing), and (3) reported effectiveness (high/medium/low impact). Cross-tabulation of these dimensions enabled identification of technology-skill interaction patterns not evident in studies examined individually. This affordance mapping represents a methodological contribution distinct from prior reviews, which typically aggregated findings across technology types or skills without systematic examination of their intersection.

Results

Publication information

Table 2 displays the distribution of the selected studies across different journals. In total, 31 studies were reviewed, published in various high-impact journals, including Computer Assisted Language Learning (n = 5), Sustainability (Switzerland) (n = 3), and Heliyon (n = 2). Table 2 shows the frequency distribution of articles across journals.

Table 2. Journals that published articles included in the review.

Journal K Journal K
Computer-Assisted Language Learning5International Journal of Human–Computer Interaction1
Sustainability (Switzerland)3Innovation in Language Learning and Teaching1
Arab World English Journal2Computers and Education: Artificial Intelligence1
Computers & Education: X Reality1International Journal of Engineering Pedagogy1
Heliyon2Journal of Curriculum and Teaching1
The Language Learning Journal1Smart Learning Environments1
Interactive Learning Environments1Educational Process: International Journal1
ReCALL1Journal of Advanced Research in Applied Sciences and Engineering Technology1
Virtual Reality1IEEE Access 1
Journal of Computer Assisted Learning 1International Journal of Evaluation and Research in Education 1
International Journal of Interactive Mobile Technologies 1

Overview of included studies

Analysis of the 31 included studies revealed systematic patterns in research focus and technological implementation, as shown in Table 3. Speaking was most common with 17 studies, followed by listening with 9 studies, reading with 5 studies, and writing with only 4 studies (see Figure 2). Virtual Reality was employed in 21 studies, Augmented Reality in 10 studies, while Extended Reality was absent (see Figure 3). Across all reviewed studies, English was consistently identified as the target language. Geographically, studies concentrated in Asia (n = 19), followed by the Middle East (n = 7), Europe (n = 3), and Latin America (n = 2). A total of 31 studies met the inclusion criteria and are summarized in Table 3.2251

Table 3. Summary of included studies.

No AuthorTarget skillsTarget languageLocation (Country)Type of metaversePrimary pathwaySkill improvement
1AlAli & Al-Barakat (2024)ReadingEnglishJordanARCognitiveFluency, originality, and flexibility in creative reading skills.
2Asadi & Ebadi (2024)ReadingEnglishIranARCognitiveReading comprehension and vocabulary knowledge.
3Carrión-Robles et al. (2023)WritingEnglishRome, New York, MadridARCognitiveWriting structure and grammar significantly.
4Chang et al. (2020)4 SkillsEnglishTaiwanARCognitiveLanguage performance, focusing on speaking and contextual understanding.
5Chu et al. (2023)SpeakingEnglishTaiwanVRCognitiveSpeaking performance and fluency.
6Ding (2024)SpeakingEnglishChinaVRAffectiveNo significant improvement in speaking anxiety; low VR advantage.
7Ebadi & Ebadijalal (2022)SpeakingEnglishIranVRBothOral proficiency, especially fluency and lexical resources, and increased willingness to communicate.
8Ebadijalal & Yousofi (2024)WritingEnglishIranVRBothMotivation and writing fluency.
9Elhambakhsh et al. (2024)Listening, SpeakingEnglishIranVRCognitiveSpeaking skills but required pedagogical support to be effective.
10Ho-Minh & Suppasetseree (2025)SpeakingEnglishVietnamARBothFluency and Pronunciation, reducing speaking anxiety.
11Hoter et al. (2025)Listening, SpeakingEnglishIsraelVRBothVocabulary, listening, and speaking confidence.
12Hsu (2024)ListeningEnglishTaiwanVRCognitiveCognitive absorption and retention of listening content.
13P. Huang et al. (2025)Listening, SpeakingEnglishTaiwanVRBothReduced speaking and interview anxiety, improving fluency and accuracy.
14G.-J. Hwang et al. (2025)ListeningEnglishIranVRBothListening comprehension and significantly motivated behaviour.
15Kaplan-Rakowski & Gruber (2023)SpeakingEnglishGermanyVRAffectiveReduced speaking anxiety compared to Zoom.
16Lin et al. (2022)WritingEnglishTaiwanARBothWriting organization, but showed varying improvements across task criteria.
17Liu et al. (2023)ReadingEnglishChinaVRCognitiveEnhanced reading comprehension compared to conventional SVVR
18Lobanova et al. (2024)Listening, Speaking, WritingEnglishNot SpecifiedVRBothEnhanced speaking and listening, but had no significant impact on writing.
19Mansor et al. (2023)SpeakingEnglishMalaysiaARBothAR showed potential for speaking practice with younger learners.
20Parlar & Sütçü (2025)ListeningEnglishTürkiyeARCognitiveListening, vocabulary, and cultural knowledge.
21Peixoto et al. (2023)ListeningEnglishPortugalVRCognitiveInteractive VR is more effective than passive VR for listening comprehension.
22Raman et al. (2024)SpeakingEnglishMalaysiaVRCognitiveFluency, accuracy, and communicative competence.
23Shadiev et al. (2025)SpeakingEnglishChinaVRBothSpeaking performance significantly compared to traditional methods.
24Shen et al. (2025)WritingEnglishChinaVRCognitiveWriting performance, especially content and language use.
25Soto et al. (2020)Speaking, ReadingEnglishColombiaVRBothSpeaking fluency and reading support.
26Sulaiman et al. (2023)ReadingEnglishMalaysiaARCognitiveReading comprehension.
27Y. Wang et al. (2022)WritingEnglishChinaVRBothWriting organization and originality.
28Y. Wang (2025)SpeakingEnglishChinaARBothSpeaking performance, especially in fluency and Pronunciation.
29Z. Wang et al. (2021)ReadingEnglishChinaVRCognitiveReading comprehension and engagement.
30Yan et al. (2024)SpeakingEnglishChinaVRBothAll speaking sub-skills significantly.
31Yudintseva (2024)SpeakingEnglishCanadaVRBothVR has a small impact on speaking fluency, but it significantly reduces anxiety.
53555feb-69a1-41b2-8026-80e2ece931bf_figure2.gif

Figure 2. Distribution of articles by metaverse technology.

53555feb-69a1-41b2-8026-80e2ece931bf_figure3.gif

Figure 3. Distribution of articles by language skills.

Figure 2 presents a consolidated overview of article distribution across two analytical dimensions.

Mediating processes: Dual-pathway framework findings

The following analysis addresses RQ2 by mapping the mediating processes identified across all 31 included studies onto the dual-pathway mediation framework. Table 3 presents the condensed classification of each study’s primary mediating mechanism. Each study was coded for whether the primary reported learning mechanism was affective (e.g., anxiety reduction, motivational enhancement, self-regulation), cognitive (e.g., schema activation, multimodal scaffolding, situated meaning negotiation, embodied presence), or both. Of the 31 studies, (n = 2) exhibited predominantly affective mediation, (n = 14) predominantly cognitive mediation, and (n = 15) demonstrated concurrent activation of both pathways. Considered cumulatively, the affective pathway was represented in 17 studies, and the cognitive pathway in 29 studies. Studies involving the affective pathway predominantly employed high-immersion HMD-based VR, whereas cognitive scaffolding was distributed across both VR and AR modalities, reflecting the broader applicability of context-sensitive instructional strategies regardless of immersion level.

Technology-skill affordance patterns

This subsection presents the distribution and comparison of VR and AR technologies in language learning across different skills, as shown in Table 4.

Table 4. Dominant technologies and pedagogical mechanisms in VR and AR-Based L2 learning.

SkillDominant technologyStudies (n/total) Key features
Speaking (n = 17)VR High-Immersion HMD10/17Avatar interaction
Immersive environments
Avatar Interaction8/17NPC/peer interaction
Role-play
Real-time AI Feedback5/17Pronunciation correction
Fluency tracking
Listening (n = 9)360° VR Video3/9Spatial audio
Contextual visualization
VR HMD Immersion6/9Reduced distractions
Multimodal input
Reading (n = 5)Mobile AR Overlays2/5Vocabulary support
Text annotation
VR Contextualization3/5Immersive narrative spaces
Writing (n = 4)360° VR for Ideation2/4Virtual trips
Immersive contexts
AR Multimodal Support1/4Visual-textual integration

In summary, the results demonstrate clear technology-skill affordance patterns: VR predominantly supports speaking and listening through immersive and interactive features, while AR facilitates reading and writing through contextual and multimodal scaffolding.

Comparative technology effectiveness: VR versus AR

Table 5 explains the systematic comparison reveals that VR and AR offer complementary rather than competing affordances.

Table 5. Comparison of VR and AR effectiveness in L2 learning.

DimensionVR (21 studies)AR (10 studies)
Skill FocusSpeaking: 13/21
Listening: 8/21
Writing: 3/21
Reading: 3/21
Speaking: 4/10
Reading: 3/10
Listening: 1/10
Writing: 1/10
Pedagogical ApproachSelf-regulated learning: 17/21
Peer interaction: 7/21
Collaborative learning: 8/10
Teacher-facilitated: 9/10
Primary MechanismAffective: Anxiety reduction (13)
Cognitive: Immersive presence (16)
Cognitive: Just-in-time scaffolding (8)
Affective: Motivation (6)
Optimal Use CasesSpeaking anxiety reduction
Immersive listening practice
Individual skill improvement
Reading comprehension
Vocabulary acquisition
Classroom collaboration

Overall, the comparison confirms that VR and AR serve distinct yet complementary roles in L2 learning. VR demonstrates stronger effectiveness in immersive, individual-focused skill improvement, particularly for speaking and listening, while AR is better suited for structured, classroom-based learning that supports reading and vocabulary acquisition.

Discussion

Our review identified a striking disparity in technology adoption: VR accounted for 21 studies, AR for 10 studies, and extended reality (XR) received no representation. This finding diverges from recent meta-analytic evidence suggesting that both AR and VR are equally effective in language-learning contexts. A comprehensive systematic review by Zhang et al. (2025) analyzing 37 ARVL and VRVL studies found that, while VR studies more frequently employed head-mounted displays (HMDs), AR studies outnumbered their VR counterparts.52 They showed a predominant academic interest in non-wearable AR applications.

The complete absence of XR-labeled studies in this corpus, despite XR being explicitly targeted in the search strategy, reflects three converging structural realities rather than a methodological gap. Terminologically, XR functions as an umbrella construct subsuming VR, AR, and mixed reality rather than a distinct implementable technology, meaning researchers routinely index their work under VR or AR even when using XR-capable platforms,53 a pattern documented by Christou et al. (2025), who found XR-specific terminology systematically underrepresented across Scopus, Web of Science, and ERIC despite growing adoption of XR devices in CALL contexts.18 Infrastructurally, high-fidelity XR systems such as Microsoft HoloLens impose cost and access barriers that most educational institutions, particularly in Global South settings that dominate L2 research output, cannot overcome, a constraint Burke et al. (2023) identified as a primary reason empirical XR studies remain scarce even as theoretical literature on XR affordances expands.54

This absence signals that the evidence base for XR as a unified construct in L2 education remains effectively pre-empirical. Yan et al. (2024), reviewing AI-XR integration across four databases (2017–2024), similarly found that studies explicitly self-identifying as XR-based constituted a small minority even among studies using technologies that technically qualify as such.55 The theoretical promises of XR, particularly its defining affordance of seamlessly blending physical and digital environments in ways that neither VR nor AR independently achieves have yet to be systematically tested in ecologically valid L2 contexts. Future systematic reviews should therefore adopt search strategies that capture MR and XR-capable platforms irrespective of author-assigned labels, while the field at large must prioritize purpose-built XR interventions that move beyond the VR/AR binary that currently structures both research design and database indexing.

Beyond the absence of XR, the broader technology distribution within the corpus itself reveals patterns shaped more by disciplinary convention and historical trajectory than by comparative pedagogical evidence. This discrepancy may reflect disciplinary boundaries rather than pedagogical efficacy; AR research tends to focus on vocabulary acquisition through marker-based technologies and mobile devices, whereas VR research encompasses broader improvement of communicative competence. Furthermore, recent evidence from Schorr et al. (2024) demonstrates that AR’s primary application in language education is vocabulary acquisition, with clear trends toward marker-based technology, potentially explaining its underrepresentation in studies examining comprehensive language skills.17 The dominance of VR in our corpus may also reflect the field’s historical trajectory, as immersive VR environments have been positioned as more conducive to authentic communicative practice than AR’s reality-overlay approach, despite AR’s documented benefits in motivation, enjoyment, and anxiety reduction.

Our findings reveal a substantial research bias toward speaking skills (n = 17), with comparatively limited attention to listening (n = 9), reading (n = 5), and writing (n = 4). This asymmetry aligns with broader trends in immersive technology research but raises concerns about holistic language improvement. A recent systematic review examining technology’s impact on foreign language anxiety found that speaking was the most extensively studied language skill, with VR and AR technologies comprising (n = 13) of all anxiety-reduction interventions specifically targeting oral proficiency and presentation skills.56 Implementation studies examining XR for language learning for specific purposes identified only minimal attention to written communication skills, despite the critical role of reading and writing in academic and professional language use.57 This imbalance suggests that future research must deliberately prioritize receptive and productive written skills to provide a comprehensive evidence base for metaverse-enhanced language pedagogy.

The dual pathway mediation framework constitutes the central theoretical contribution of this review, and its empirical substantiation across 31 studies provides insights that extend beyond prior systematic reviews, which have treated technology primarily as a delivery vehicle rather than as a mechanism shaping interpersonal communication and learner interaction. The near-universal presence of the cognitive pathway (n = 29) aligns with the Cognitive Affective Model of Immersive Learning,58 which identifies embodied presence and agency as key psychological affordances of immersive environments that activate deeper cognitive processing through interaction with situated three-dimensional contexts. This finding provides empirical support for situated cognition theory by Brown et al. (1989) in L2 acquisition, indicating that learners develop language skills most effectively by using language in contexts that replicate authentic communicative demands and interpersonal exchange.59

However, the cognitive pathway alone does not fully explain the strong VR outcomes for speaking skills in interpersonal communication tasks. The affective pathway, observed in only 2 studies and concentrated in VR speaking interventions, operates through a mechanism consistent with Krashen’s (1984) Affective Filter Hypothesis.60 By creating low-stakes interaction spaces that reduce foreign language anxiety during peer and teacher communication, VR lowers psychological barriers and allows more comprehensible input to be processed effectively. Recent evidence refines this explanation. Gu (2025) showed through large-scale mediation analysis (n = 1,086) that VR does not directly reduce anxiety but does so indirectly by increasing communicative confidence and perceived fluency, reflecting motivational and self-regulation processes identified in this review.61 Puente-Torre et al. (2025) similarly found that technology reduces anxiety through psychological safety and personalized feedback in interaction, while increasing it when these conditions are absent.62

The concurrent activation of both pathways in 15 studies is theoretically significant, suggesting that affective and cognitive processes are mutually reinforcing within interpersonal communication. Reduced anxiety facilitates engagement in interaction, while immersive cognitive scaffolding builds contextual confidence that further reduces performance anxiety. This interdependence aligns with Kim et al.’s (2025) argument that cognitive load theory must incorporate affective factors to explain learning in emotionally rich environments.63 The differentiated roles of VR and AR further indicate that these technologies are complementary rather than competing in supporting interpersonal language learning. VR primarily activates affective processes, while AR supports cognitive scaffolding. Instructional design can therefore use VR to reduce speaking anxiety before transitioning to AR-based tasks that consolidate communicative gains through contextual support.

VR’s potential for literacy improvement may lie in its capacity to contextualize written texts within meaningful scenarios, reducing the abstraction that challenges many L2 readers, or in providing multimodal scaffolding through combined visual, auditory, and textual information.64 For writing, metaverse environments support composing processes by enabling writers to inhabit the communicative contexts for which they are writing, facilitating more authentic Audience awareness and rhetorical decision-making. Recent work exploring AR-based context-aware ubiquitous writing applications suggests that linking writing tasks to physical locations and authentic communicative purposes enhances both motivation and performance.65,66 Despite these promising findings, the field lacks a comprehensive understanding of how immersive technologies can systematically support literacy improvement across diverse learner populations and text types.

Our review finds that both AR and VR have a high impact on language learning outcomes, yet they mask important nuances in their respective pedagogical affordances. AR’s strength lies in bridging reality and virtuality, promoting collaborative learning through shared physical-digital spaces, and visualising abstract linguistic content in contextually meaningful ways.67 In contrast, VR excels at creating fully immersive environments that facilitate self-regulated learning through embodied presence and spatial engagement.68 Furthermore, practical considerations of cost, accessibility, and cognitive load favour AR for certain contexts: AR applications typically require less specialized equipment, place lower cognitive demands on first-time users, and integrate more seamlessly into classroom environments than HMD-based VR systems.

As metaverse technologies mature and become more accessible, the field must address several critical priorities to advance both research rigour and practical implementation. First, researchers should prioritize longitudinal studies examining whether initial learning gains from metaverse interventions transfer to real-world language use and whether these gains persist over extended periods. Such studies require collaboration between researchers and educational institutions to track learners across semesters or academic years. Second, while promising, the integration of AI and metaverse technologies demands careful ethical consideration. Emerging scholarship on AI-enabled metaverse education identifies challenges, including data privacy, accessibility barriers, algorithmic bias, and potential threats to learner agency and critical thinking.69 Third, research should explore low-cost alternatives, such as smartphone-based AR or WebXR platforms accessible through standard devices, to ensure equitable access across diverse socioeconomic contexts.

While our review identified social constructivism and situated learning as dominant frameworks, the field would benefit from more explicit theorization of how immersive presence, embodied cognition, and spatial navigation contribute to language acquisition. Emerging work on sociomaterial perspectives offers promising directions for understanding the complex interplay between human learners, AI agents, avatars, and digital environments.70 Finally, research must move beyond technology-focused questions (“Does VR/AR work?”) toward pedagogically grounded inquiries about optimal task design, scaffolding strategies, feedback mechanisms, and integration of metaverse experiences with broader curriculum objectives. The vision of a humanizing, sustainable metaverse for language learning requires researchers, educators, and technology developers to collaborate to create inclusive, ethically sound, and pedagogically effective implementations that enhance rather than replace human-centred language learning experiences.

Conclusion

This study set out to examine how immersive metaverse technologies influence second language skill improvement, with particular attention to the mediating learning processes and the differential affordances of virtual reality and augmented reality across the four core language skills. Drawing on a systematic analysis of 31 empirical studies, the findings demonstrate that immersive technologies support L2 improvement through two interrelated pathways, namely cognitive processes such as embodied presence and contextual scaffolding, and affective processes including anxiety reduction, motivation, and self-regulation. These pathways were found to co-occur in a substantial proportion of studies, indicating that language learning in immersive environments is shaped by the dynamic interplay between cognitive engagement and affective conditions within communicative contexts.

A key contribution of this review lies in the improvement and empirical grounding of the dual pathway mediation framework, which advances existing literature by positioning technology not merely as an instructional tool but as a mechanism that shapes interpersonal communication and interaction in language learning. The findings further reveal distinct yet complementary roles of immersive technologies, with virtual reality showing strong effectiveness in supporting speaking and listening through affective engagement and reduced anxiety, while augmented reality primarily facilitates reading and vocabulary improvement through cognitive scaffolding in contextualized environments. This technology skill mapping provides a more precise understanding of how specific features of immersive environments align with different dimensions of language learning.

Overall, the study underscores the importance of integrating cognitive and affective considerations in the design of immersive language learning environments, particularly in relation to interpersonal communication and interaction. These findings carry important implications for both theory and practice, suggesting that instructional design should strategically align technological affordances with targeted learning processes to optimize outcomes. By offering a comprehensive and empirically grounded framework, this review contributes to the growing body of research on immersive learning and provides a foundation for future studies that seek to explore sustainable, accessible, and interaction focused applications of metaverse technologies in language education.

Limitations

This review has several limitations that qualify its findings. First, the corpus was restricted to English-language peer-reviewed articles indexed in three databases (Scopus, ScienceDirect, and Taylor & Francis), excluding grey literature, conference proceedings, and non-English sources; future reviews should adopt broader search strategies to enhance generalizability and capture improvements in non-English-medium contexts. Second, while quality appraisal guided interpretation, the 31 studies varied considerably in methodological rigor, sample size (ranging from pilot studies with fewer than 20 participants to large-scale trials), and outcome measurement approaches, limiting the direct comparability of findings across studies. This review was not prospectively registered in PROSPERO prior to data collection, which represents a limitation in terms of pre-registration transparency. Addressing these limitations will be essential for building a more complete, globally representative evidence base for metaverse-enhanced L2 education.

Ethics statement

This systematic review synthesizes findings exclusively from peer-reviewed empirical studies published in indexed academic journals. The review did not involve direct recruitment of human participants, the collection of primary data, or any personal data processing. Accordingly, no ethical approval from an institutional review board or ethics committee was required for this review. All studies included in this analysis had independently obtained the necessary ethical clearances from their respective institutions before publication, and no personal or sensitive data were accessed or reanalyzed in the course of this review. This study complies with the ethical principles of transparency and reproducibility, as all data extraction procedures and decision criteria are documented in the supplementary materials and the dataset is openly archived at Zenodo (DOI: 10.5281/zenodo.20347363) under a Creative Commons Attribution 4.0 International (CC-BY 4.0) license.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 08 Jun 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Dewi DP, Maulidiya AR, Haryono A et al. A Systematic Review of Immersive Technologies in L2 Skill Improvement: Cognitive and Affective Pathways [version 1; peer review: awaiting peer review]. F1000Research 2026, 15:892 (https://doi.org/10.12688/f1000research.182929.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 08 Jun 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.