Designing a Unified Model for Measuring Arabic Language Competence among Non-Native Learners

Bader Hudayban Alharbi; Auwal Adam Sa'ad

doi:10.12688/f1000research.173332.1

Home Browse Designing a Unified Model for Measuring Arabic Language Competence...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Designing a Unified Model for Measuring Arabic Language Competence among Non-Native Learners

[version 1; peer review: 2 approved with reservations]

Bader Hudayban Alharbi ¹, Auwal Adam Sa'ad ²

PUBLISHED 26 Feb 2026

Author details Author details

¹ Institute of Arabic Language Institute for Speakers of other Languages, King Abdulaziz University, Jeddah, Mekkah, Saudi Arabia
² IIUM Institute of Islamic Banking and Finance, International Islamic University Malaysia, Kuala Lumpur, Kuala Lumpur, 53100, Malaysia

Bader Hudayban Alharbi
Roles: Conceptualization

Auwal Adam Sa'ad
Roles: Formal Analysis

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

This article proposes a revised and expanded framework for the development of standardised Arabic language proficiency assessments, with a particular emphasis on addressing the unique linguistic, cognitive, and cultural complexities of Arabic as a Foreign Language (AFL). The framework is designed to serve both general-purpose language learning objectives and domain-specific applications, such as academic, diplomatic, or professional use. Drawing on established international benchmarks including the Common European Framework of Reference for Languages (CEFR) and the American Council on the Teaching of Foreign Languages (ACTFL) guidelines alongside specialised Arabic proficiency instruments such as the Arabic Language Proficiency Test (ALPT), the Certificate in Modern Arabic (CIMA), and the Test of Arabic as a Foreign Language (TAFL), this study synthesises best practices in language testing with insights from applied linguistics and psychometrics. A central innovation of this framework is its incorporation of Arabic’s sociolinguistic realities, including the persistent challenge of diglossia the coexistence of Modern Standard Arabic (MSA) and diverse regional dialects as well as the integration of dialectal comprehension and production tasks to reflect authentic communicative demands. The framework also emphasises cognitive processing strategies relevant to Arabic’s morphosyntactic complexity, recognising that effective assessment must go beyond grammar and vocabulary to capture learners’ ability to infer, interpret, and communicate meaning in context. Furthermore, the revised model places significant weight on sociocultural and pragmatic competence, ensuring that assessments include idiomatic expressions, speech acts, and culturally embedded language reflective of real-life interaction. Realism in task design, fairness across learner backgrounds, and transparency in scoring are addressed through the adoption of evidence-based principles from the Standards for Educational and Psychological Testing (AERA/APA/NCME). The study also advocates for robust validation procedures, including construct alignment, item response analysis, and intercultural bias mitigation. By holistically integrating linguistic, cognitive, and cultural dimensions, this framework aims to guide the development of more authentic, valid, and equitable proficiency assessments for AFL learners across diverse contexts. The proposed model contributes to the growing body of research on Arabic language pedagogy and offers practical implications for test developers, curriculum designers, and policy makers engaged in Arabic language education worldwide.

Keywords

Arabic proficiency; language assessment; diglossia; cultural competence; test design; psychometrics.

Corresponding authors: Bader Hudayban Alharbi, Auwal Adam Sa'ad

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2026 Alharbi BH and Sa'ad AA. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Alharbi BH and Sa'ad AA. Designing a Unified Model for Measuring Arabic Language Competence among Non-Native Learners [version 1; peer review: 2 approved with reservations]. F1000Research 2026, 15:325 (https://doi.org/10.12688/f1000research.173332.1) First published: 26 Feb 2026, 15:325 (https://doi.org/10.12688/f1000research.173332.1) Latest published: 26 Feb 2026, 15:325 (https://doi.org/10.12688/f1000research.173332.1)

Introduction

Recent years have witnessed a growing emphasis on assessment as a tool for aligning instruction with proficiency-based goals in the teaching of Arabic as a Foreign Language (AFL). This shift reflects broader trends in language education, where assessments are not merely used for grading but also as diagnostic tools to inform instruction and curriculum development (Alghamdi & Al-Saqqaf, 2021). In AFL contexts, proficiency-based models have gained traction, particularly those that emphasize communicative competence across multiple modalities. Yet, current practices in Arabic assessment often lag behind innovations seen in European and East Asian language instruction, where assessment is more tightly integrated into learner centered pedagogies (Bensalah, 2022). The challenge in Arabic lies in its structural features root-based morphology, non-linear orthography, and diglossic variation which complicate the creation of standardized, scalable testing instruments.

A persistent challenge in AFL assessment is Arabic’s diglossia where learners are required to navigate Modern Standard Arabic (MSA) for formal contexts and colloquial dialects for everyday communication (Albirini, 2020). Traditional proficiency tests such as ALPT, OPI (Oral Proficiency Interview), and institutional placement exams tend to focus exclusively on MSA, thus failing to reflect the full communicative realities faced by learners in immersive environments. New research underscores the importance of integrating dialectal varieties into assessment tools to support learner sociolinguistic competence (Habash & Saleh, 2021). Some programs have begun to include dialectal components in oral assessments, particularly in advanced-level testing, yet there remains a lack of standardization and validated rubrics for evaluating dialectal proficiency.

With the rise of digital learning and remote assessment post-COVID-19, there has been a surge in interest toward computer-based and AI-driven Arabic language assessments. Platforms like Duolingo and Memrise have introduced Arabic learning modules, and more recently, AI-powered assessment tools are being piloted for automated speaking and writing evaluation (Al-Jarf, 2021). However, such tools often rely on MSA corpora and are not trained to recognize dialectal variations or pragmatic nuances in learners’ language use. Scholars have highlighted the need for Arabic-specific natural language processing (NLP) tools to improve the validity of automated assessments (Taha & Farghaly, 2022). Additionally, alignment with CEFR and ACTFL scales remains weak in most digital platforms currently available.

Another dimension increasingly recognized in AFL assessment is the role of culture. Recent studies emphasize that effective communication in Arabic requires not only linguistic proficiency but also intercultural awareness understanding norms, gestures, politeness strategies, and religious expressions embedded in Arabic discourse (Mahfoudhi & Alasmari, 2023). Traditional tests have largely neglected these aspects, prompting scholars to advocate for more authentic, performance-based assessments that simulate real-life interactions in Arab cultural settings. For instance, scenario-based assessments that include role-play or interpretive tasks are now being piloted in some AFL programs in Europe and the Gulf region, offering promising models for holistic evaluation (Shaw, 2022a).

Given these challenges and developments, researchers are calling for a unified AFL assessment framework that adapts global standards (such as CEFR and ACTFL) while integrating Arabic-specific linguistic and cultural variables. This includes developing calibrated rating scales that differentiate between MSA and dialect proficiency, validating test items with diverse learner populations, and ensuring fairness for learners from different language backgrounds (Zaki & Alanzi, 2024). A key recommendation is the use of the can-do descriptors aligned to CEFR, adapted to reflect both formal and informal Arabic use. There is also increasing interest in dynamic assessment models that evaluate learner potential rather than static knowledge—particularly relevant for heritage and bilingual learners (Harb & Atoum, 2023).

Literature review

Existing Arabic proficiency tests and their limitations

In recent decades, a number of standardised assessments for AFL have emerged. For example, the Arabic Language Proficiency Test (ALPT), developed by the Arabic Academy and endorsed by the Islamic Chamber of Commerce, has gained wide recognition; it assesses learners on five sections (Listening, Reading, Structure, Writing, Speaking). Similarly, the Certificat International de Maîtrise en Arabe (CIMA), offered by the Institut du Monde Arabe in Paris, certifies intermediate-level MSA proficiency. Other offerings include online CEFR-based exams and institutional certification programmes. However, none of these is universally adopted, and each has notable drawbacks. A critical issue is that many current AFL tests focus on MSA in artificial tasks and do not fully represent authentic language use. As Al-Hamly and Milson (2011) observe, even high-level listening and speaking items in proficiency guidelines often assume formal language, whereas real-world communication requires dialectal competence. In short, many tests fail to align with Arabic’s sociolinguistic reality.

Another recurring critique is the lack of rigorous validation. Many AFL test manuals provide minimal reliability or validity data. Al-Hamly and Milson note that testing programmes “do not offer sufficient data about the validation process” and often limit evidence to basic inter-rater checks. Without robust psychometric analysis (e.g. item analyses, generalisability studies), stakeholders cannot be confident in score interpretations or fairness. Concerns about cultural bias have also been raised: if test content assumes familiarity with a particular dialect or set of cultural practices, non-native learners may be disadvantaged. More transparency and fairness review is needed in AFL assessment.

A major limitation of existing Arabic proficiency tests is their narrow focus on Modern Standard Arabic (MSA), often at the expense of communicative realism. While MSA remains the standardized variety used in formal media and education, it is not the primary mode of interaction in daily life across Arab societies (Albirini, 2020). Most proficiency frameworks, including ALPT and CIMA, center on scripted tasks involving academic or literary texts that are disconnected from real-world usage. This has led to increasing calls to include dialects—or at least acknowledge their role—in proficiency measurement (Mahfoudhi & Alasmari, 2023). Learners trained solely in MSA frequently encounter difficulty navigating spoken Arabic in Arabic-speaking countries, a phenomenon referred to as the “MSA bubble” (Habash & Saleh, 2021). Moreover, the lack of dialectal input creates an unrealistic ceiling effect in oral and aural assessment tasks, where native-like fluency is judged by artificially elevated norms of formal correctness.

Another problem is the absence of a unified framework specifically tailored for AFL proficiency. While some institutions loosely align their testing with CEFR or ACTFL guidelines, the adaptation is often superficial. For example, CEFR descriptors such as “Can express opinions in a conversation on abstract topics” are difficult to map onto the diglossic and context-sensitive nature of Arabic (Zaki & Alanzi, 2024). Studies have shown that AFL learners’ performance in listening and speaking varies significantly depending on whether the task involves MSA or a regional dialect, suggesting that current proficiency descriptors do not capture the multidimensional nature of Arabic language use (Bensalah, 2022). As a result, many learners may achieve a high CEFR rating without being able to function effectively in real-life Arabic settings.

Despite their international reach, many Arabic proficiency tests still lack methodological transparency regarding their design, validation, and scoring processes. For instance, while the ALPT outlines general testing areas, there is minimal published data on item difficulty levels, standard-setting procedures, or construct validation (Alghamdi & Al-Saqqaf, 2021). Test designers often provide anecdotal justification for scoring rubrics without empirical support, violating fundamental principles of test reliability and fairness (Brown & Abeywickrama, 2019). In many cases, inter-rater reliability is the only reported statistic, and even that is often based on small, non-representative samples. As Taha and Farghaly (2022) emphasize, the absence of item-level psychometric analysis severely limits the interpretability of test scores and undermines their use in high-stakes contexts such as immigration or university admission.

Although digital testing platforms are increasingly used, especially in post-pandemic education, their use in Arabic assessment raises significant challenges. AI-based scoring engines such as those piloted in the ALPT or experimental Arabic modules in Duolingo and other platforms have limited sensitivity to Arabic’s morphological and dialectal diversity. For example, automated grammar checkers often misinterpret dialectal expressions or fail to recognize correct but non-standard morphosyntactic constructions (Taha & Farghaly, 2022). This misalignment not only skews assessment accuracy but can also discourage learners who are attempting to use more naturalistic Arabic forms. Furthermore, these systems rarely undergo proper fairness testing across different learner profiles—heritage speakers, bilinguals, and first-time learners—which raises concerns about equity and accessibility (Harb & Atoum, 2023).

Cultural bias is another recurrent concern in Arabic testing. Test prompts and listening materials frequently embed cultural references that may be familiar to learners from certain regions (e.g., Gulf or Levantine contexts) but alien to others (Mahfoudhi & Alasmari, 2023). This can disadvantage learners who have limited exposure to Arab cultures, or whose exposure is limited to particular media portrayals. Studies on intercultural communicative competence have argued that fairness in language testing must include a balance between cultural specificity and inclusivity (Shaw, 2022b). The issue becomes more pronounced in speaking tests, where raters may unconsciously reward culturally “authentic” pronunciation or expressions over more neutral but technically correct MSA usage. This creates a sociolinguistic inequity that has not been sufficiently addressed in most AFL testing models to date.

Theoretical foundations: Standards, competence, and cognition

International language proficiency frameworks such as the Common European Framework of Reference for Languages (CEFR) and the ACTFL Proficiency Guidelines serve as essential theoretical anchors for language assessment design. These models prioritize communicative competence as the core outcome of language learning and testing (Council of Europe, 2020; ACTFL, 2022). CEFR’s action-oriented approach and ACTFL’s proficiency scales offer structured descriptors across skill levels—ranging from novice to superior—which are critical in defining learning outcomes and aligning assessment tasks. ACTFL’s “5 Cs” model Communication, Cultures, Connections, Comparisons, and Communities goes further by embedding cultural literacy as a dimension of language proficiency, reflecting the understanding that communicative ability is inseparable from cultural context (ACTFL, 2022).

Building upon these standards, language assessment theory such as that outlined by AERA, APA, and NCME (2014) emphasizes the principles of validity, reliability, fairness, and practicality. Validity, in particular, requires that each test score interpretation be supported by evidence derived from test content, design, and statistical analyses (Kane, 2013). Our proposed AFL assessment framework is explicitly designed to align with these principles. Every test specification whether related to task design, scoring rubrics, or language domain coverage is justified in relation to the inferences intended from learners’ scores. Construct coverage (i.e., what is being measured), as well as fairness considerations for diverse learners (including heritage speakers, bilinguals, and monolinguals), is therefore embedded from the outset (Zaki & Alanzi, 2024).

Cognitively, Arabic presents unique challenges to learners, particularly due to its complex morphological system, root-and-pattern structure, and relatively deep orthography (Ryding, 2021). These features impose considerable cognitive load, especially on learners unfamiliar with Semitic language patterns. Assessments that rely solely on discrete-point grammar or vocabulary recall fail to capture learners’ processing strategies and real-time language comprehension. Recent psycholinguistic research emphasizes the importance of testing both declarative knowledge (facts and rules) and procedural knowledge (skills such as inferencing, decoding, and self-monitoring in comprehension tasks) in AFL learners (Bensalah, 2022; Taha & Farghaly, 2022).

In response, the framework incorporates tasks that go beyond isolated item types. Reading and listening sections, for example, include inference-based questions where learners must extract implied meanings from contextually rich passages. Such tasks measure learners’ ability to deploy metacognitive and strategic resources, aligning with current cognitive models of second language acquisition and assessment (Purpura, 2016; Alghamdi & Al-Saqqaf, 2021). This dimension is especially critical for Arabic, where lexical ambiguity and morpho-syntactic variation require higher-order comprehension.

From a sociolinguistic perspective, Arabic’s diglossia a coexistence of Modern Standard Arabic (MSA) and regional dialects presents a major consideration for assessment validity. As Habash and Saleh (2021) point out, learners may demonstrate high proficiency in MSA but fail to engage effectively in real-world communication where dialects dominate, particularly in listening and speaking domains. Consequently, our framework adopts a dual-track approach: proficiency is assessed separately for MSA and for one dialect selected by the learner or program (e.g., Egyptian, Levantine, or Gulf Arabic). This innovation reflects the sociolinguistic reality of Arabic use and addresses criticisms of traditional tests that remain artificially MSA-centric (Albirini, 2020; Harb & Atoum, 2023).

Each language domain especially oral production and aural comprehension is calibrated to allow for variation in pronunciation, lexical choices, and grammatical constructions based on dialectal norms. Moreover, the framework includes “code-switching awareness” items, which test learners’ recognition of shifts between MSA and dialect in authentic discourse (e.g., TV interviews, social media content). This sociolinguistic sensitivity not only improves construct validity but also equips learners for real-life interactions.

Cultural pragmatics and the gap in standardized testing

Despite the theoretical consensus around the importance of sociocultural competence in second language acquisition, most standardized AFL proficiency tests continue to marginalize this dimension (Albirini, 2020; Bensalah, 2022). While traditional assessments emphasize grammatical accuracy and vocabulary range, they often overlook cultural pragmatics the knowledge of when and how to use certain forms appropriately within specific cultural contexts. For example, greeting forms in Arabic vary significantly by region, formality, and relationship, yet most tests assess only literal translation or sentence completion without context (Mahfoudhi & Alasmari, 2023). This lack of pragmalinguistic and sociopragmatic assessment leads to what Taguchi and Roever (2017) term a “competence gap,” where learners are unable to transfer classroom knowledge to real-life social interactions.

Furthermore, test prompts that fail to represent authentic communicative acts may yield misleading results about learner readiness. In a study by Mahmoud (2023b), learners who performed well in grammar-based tasks showed significant challenges in role-playing tasks involving apology, refusal, and requesting in culturally nuanced scenarios. This raises questions about construct underrepresentation a validity threat where the test fails to measure an essential component of the construct (Messick, 1996; updated by Kane, 2013). Therefore, any robust AFL assessment must include scenario-based, context-rich tasks that mirror how Arabic is used in diverse social and cultural situations.

Recent scholarship advocates for integrating intercultural communicative competence (ICC) into language assessment frameworks. ICC includes knowledge of cultural norms, attitudes of openness, and skills in interpreting and relating across cultures (Byram, 2021). In AFL contexts, this is particularly relevant due to the cultural heterogeneity of the Arab world ranging from the Levant to North Africa to the Gulf each with distinct sociocultural codes. However, many AFL tests are designed from a singular cultural lens (often Egyptian or Levantine), which can lead to cultural bias (Shaw, 2022a). Learners unfamiliar with these cultural frames may struggle not because of linguistic deficiency, but due to a mismatch in sociocultural expectations. This highlights the need for cultural pluralism in assessment design i.e., developing culturally inclusive tasks that reflect various regional norms. For instance, a pragmatic task assessing leave-taking in Gulf Arabic may differ significantly from one in Moroccan Arabic, not only in language form but also in duration, politeness strategies, and expected rituals (Habash & Saleh, 2021). Emerging frameworks now emphasize ICC-informed rubrics that incorporate both language use and cultural appropriateness as scoring criteria (Mahfoudhi & Alasmari, 2023; Harb & Atoum, 2023). Such rubrics allow raters to evaluate learners’ ability to use language sensitively in culturally loaded situations, enhancing the validity and relevance of the assessment.

The role of authentic materials in assessing sociocultural competence

Another growing area in sociocultural competence assessment is the use of authentic materials, such as social media posts, video clips, and conversational transcripts. These materials help situate learners in real communicative contexts where meaning is shaped by both linguistic input and cultural cues (Taha & Farghaly, 2022). For instance, interpreting a WhatsApp message in colloquial Arabic may require understanding ellipsis, emojis, politeness markers, and tone all of which are pragmatically and culturally charged.

However, incorporating such materials into assessments presents both opportunities and challenges. On the one hand, authenticity improves ecological validity the degree to which a test mirrors real-world tasks. On the other hand, it demands high levels of test development expertise, including calibration of difficulty, dialect management, and bias control (Zaki & Alanzi, 2024). There are also concerns about accessibility and fairness: learners from different backgrounds may have varying levels of exposure to informal Arabic genres, leading to construct-irrelevant variance (Alghamdi & Al-Saqqaf, 2021). To address this, scholars recommend using tiered tasks that offer learners the option to choose topics or registers they are more familiar with, thereby preserving validity without sacrificing fairness (Shaw, 2022a).

Assessment implications for heritage and bilingual learners

The assessment of sociocultural competence must also account for the growing number of heritage and bilingual Arabic learners, particularly in North America, Europe, and Southeast Asia. These learners often possess high oral fluency in dialects but limited proficiency in MSA or formal registers. As Harb and Atoum (2023) note, applying traditional MSA-focused assessments to such learners leads to profiling bias, where their actual communicative competence is undervalued. Moreover, their sociocultural competence developed through family, community, and media exposureis rarely captured in formal assessments. A socioculturally responsive AFL assessment must therefore differentiate between linguistic competence and cultural familiarity, and design tasks that reflect the diverse trajectories of learners. Some scholars propose dynamic assessment approaches a model where feedback and mediation are built into the test to capture learners’ potential rather than just their static performance (Lantolf et al., 2020). This is particularly beneficial in evaluating sociocultural competence, which is often implicit and develops through guided interaction.

General vs specialised proficiency

A key expansion in this revision is the parallel focus on general-purpose and specialised/academic proficiency. In many contexts, learners study Arabic for specific fields (e.g. Islamic studies, Middle Eastern business, media). These domains require distinct vocabulary, registers, and genres. Language testing research (e.g. in English for Specific Purposes) shows that general proficiency frameworks must be supplemented by domain-related criteria for specialised purposes. Accordingly, our framework proposes dual standards: one set for broad communicative proficiency and another set for specialised contexts. For example, an Arabic for Medicine test at the B2 level would include tasks on technical medical vocabulary and interpreting clinical documents, in addition to general reading/listening tasks. As Al-Hamly and Milson explicitly note, there is an “urgent need for tests that target Arabic for specific purposes”. Our model addresses this by making the general criteria adaptable to specialised content, ensuring that both language skills and domain knowledge are assessed where relevant.

Methodology

This framework was developed using a descriptive-analytical methodology grounded in best practices from the language testing field. The process entailed a comprehensive and critical review of existing literature on second language assessment, with a particular focus on Arabic as a Foreign Language (AFL). Key sources included the CEFR, ACTFL Proficiency Guidelines, and the AERA/APA Standards for Educational and Psychological Testing. By extracting theoretical and operational principles from these sources, the study synthesised them into a context-sensitive framework suitable for Arabic. The iterative development cycle involved mapping test specifications to actual learner challenges in AFL, particularly those posed by diglossia, morphological complexity, and sociocultural variation. No empirical field testing was undertaken at this stage; however, the resulting framework represents a theoretically robust foundation that is adaptable for empirical validation in future studies (Bachman & Palmer, 2021; Alharbi, 2022).

Proposed Proficiency Framework Test Design Criteria The test design criteria in this framework are grounded in contemporary testing theory, aiming to maximise construct validity, authenticity, fairness, and reliability. Content and task validity are prioritised by ensuring that test specifications closely reflect the constructs they intend to measure. This includes the representation of MSA and dialectal Arabic, the integration of sociocultural content, and alignment with communicative goals as emphasised in ACTFL’s 5 Cs (Communication, Cultures, Connections, Comparisons, Communities). Authenticity is operationalised by using real-world materials such as news broadcasts, informal conversations, and academic texts, which better reflect communicative demands encountered by learners (Shaw, 2022a). The framework addresses the often-overlooked challenge of Arabic diglossia by embedding tasks that separately target MSA and dialectal varieties. Listening tasks, for example, may combine a formal MSA lecture with an informal dialogue in a Levantine or Gulf dialect, supported by glossaries to scaffold comprehension (Mahmoud, 2023b). Writing and speaking tasks are similarly designed to elicit both formal and informal registers, acknowledging that Arabic proficiency is multi-layered and context-dependent.

Interactive tasks are encouraged to assess integrated skills, such as oral interviews that combine listening and speaking or essay writing based on reading tasks. This reflects recent literature emphasizing the importance of integrated skill assessment to capture learners’ real-world communicative competence (Koizumi & In’nami, 2020). Bias minimisation is embedded through a multi-stage validation process, including expert review panels comprising educators from diverse Arabic-speaking regions. This helps identify regional, cultural, or gender-based bias in items. Differential item functioning (DIF) analysis is recommended for quantitative bias detection during pilot testing. These practices are consistent with ethical principles outlined by the AERA/APA/NCME (2014) and recent discussions on equity in language testing (McNamara, 2021). Transparency and accessibility are reinforced through open documentation. Test blueprints, rubrics, and scoring guidelines should be publicly available to support accountability and informed use by educators, students, and institutions (Kunnan et al., 2022). Proficiency Standards and Descriptors The core of the proposed AFL framework consists of calibrated descriptors organised by skill (Listening, Speaking, Reading, Writing), Arabic variety (MSA and dialect), and sociocultural competence. These descriptors align with international benchmarks like CEFR and ACTFL, yet they are customised to address AFL-specific issues such as diglossia, script complexity, and culturally bound pragmatics.

At the B1 level (intermediate), a Listening descriptor might state: “Can understand the main ideas of clear standard speech on familiar matters regularly encountered in school, work, and leisure. Can follow the gist of dialectal exchanges in familiar contexts.” For Speaking: “Can initiate and sustain conversations on topics of personal interest using MSA, and can use basic dialectal expressions to negotiate everyday situations.” Cultural competence descriptors are added to assess learners’ ability to interpret and respond to culturally embedded content. For example, at B2 level: “Can interpret culturally specific proverbs and idiomatic expressions; can respond appropriately to greetings and refusals in culturally congruent ways.” This addresses critiques in recent literature that AFL assessments often marginalise the pragmatic dimension of language use (Mahfoudhi & Alasmari, 2023).

Specialised descriptors are also proposed for domain-specific proficiency. For instance, a Reading descriptor for religious studies at B2 might state: “Understands classical texts and modern interpretations in Qur’anic Arabic; distinguishes between theological terms in various contexts.” Similarly, a Writing descriptor for academic settings could read: “Produces structured essays in formal MSA with appropriate citation and argumentation strategies.” These additions address the emerging need for Arabic for Specific Purposes (ASP) testing in academic and professional domains (Zoghbor & Hussein, 2022). Psychometric and Ethical Considerations To ensure psychometric integrity, the proposed AFL assessment framework includes a detailed validation and quality control protocol. Reliability estimates—such as Cronbach’s alpha for internal consistency and inter-rater reliability coefficients for speaking and writing—must be calculated and reported. Test developers should also conduct validation studies across multiple dimensions: content (via expert review), construct (via factor analysis or Rasch modelling), and criterion-related (via correlation with external benchmarks like university performance or other validated tests).

Fairness and ethical transparency remain central. Consistent with the Standards for Educational and Psychological Testing (AERA/APA/NCME, 2014), accommodations must be offered to test-takers with documented needs, and clear policies should govern the use and interpretation of scores. Test-takers should receive reports that include performance by skill and level, with explanatory notes to aid interpretation. Recent literature has emphasised the importance of ethical score use and transparency in high-stakes testing (Green, 2020; Pill & Harding, 2022). Accordingly, this framework mandates public access to test design documentation, item release samples, and guidelines for preparation and remediation. These practices contribute to more equitable and accountable language testing ecosystems, especially in multilingual and multicultural AFL contexts. In conclusion, the proposed AFL assessment framework integrates theoretical, sociolinguistic, and psychometric insights into a comprehensive structure tailored for Arabic learners. It responds to longstanding challenges in AFL testing such as diglossia, cultural variation, and the underassessment of dialectal skills while embracing international standards for fairness, validity, and transparency. This positions it as a credible and adaptable model for future AFL assessment development.

Example framework structure

Practically, the framework might be implemented as interlinked tables. Table A would list Design Standards (e.g. “Text selection: authentic, context-appropriate MSA or dialect passages”) with explanatory notes. Table B would present Skill-Level Descriptors by proficiency level. A brief illustrative excerpt of Table B might be:

• Listening A2: Understands simple factual information in standard announcements and short dialogues. Grasps key words in basic dialectal speech (e.g. introductions, simple requests) with occasional support.
• Listening B2: Understands extended speech and lectures on familiar topics in MSA; follows most content of conversations or media in two major dialects.
• Speaking A2: Can ask and answer simple questions on personal needs; uses present tense mainly and limited range of formulaic utterances in a familiar dialect.
• Speaking B2: Can converse fluently with native speakers on a range of topics; adapts language formality appropriately, including use of common dialect expressions; handles unexpected complications in dialogue.

Additional columns would cover Reading and Writing similarly, plus separate rows for Dialect comprehension and Cultural knowledge at each level (e.g. “Recognises basic social conventions and greetings; identifies the meaning of common proverbs in context” at a mid-level). This layout ensures clarity: test developers and educators can see exactly which skills and contexts are expected at each stage, bridging general proficiency and specific outcomes.

Discussion

This revised framework directly addresses the gaps noted in the literature. By explicitly including dialect and cultural components, it aligns assessment with real communication demands. As reviewed, prior tests often treated Arabic as a monolithic MSA code, but actual use requires fluency in both formal and informal registers. Incorporating dialect competence at each level ensures that learners are evaluated on skills they will need in practical settings. Similarly, embedding cultural pragmatics responds to critiques of MSA-centric tests: by including idioms, proverbs, and social scenarios, the framework makes assessments more authentic and meaningful.

Methodologically, tying each design choice to established standards (CEFR, ACTFL, AERA/APA) enhances validity. The literature shows that few existing Arabic tests document such rigour, so our emphasis on evidence-based development and transparency is a key advancement. In practical terms, implementing these standards will require collaboration among linguists, educators, and psychometricians. The framework provides a clear template to guide this process: for instance, bias panels and pilot analyses are made explicit steps. Over time, as tests are built and normed following this model, we expect that score interpretations will become more defensible.

One limitation is that the framework itself has not yet been field-tested as an integrated whole. Future research should involve piloting an assessment designed from these standards, then analysing its reliability, validity, and washback. Collaboration with domain specialists (e.g. legal scholars, business experts) will also be needed to refine specialised descriptors. Finally, advances in technology (computer- adaptive testing, automated scoring) could be incorporated under this framework to improve efficiency; this remains an area for future exploration.

Conclusion

In summary, this paper proposes an enhanced framework for Arabic language proficiency testing that responds directly to both theoretical expectations and practical needs. It expands the scope to cover not only general communicative ability but also specialised academic and professional proficiency. It integrates sociolinguistic realities (diglossia, dialect diversity), cognitive factors, and cultural competence into a coherent set of test-design standards. It also foregrounds realism, fairness, and rigorous validation at every stage. By systematically synthesising these elements, the framework aims to guide the development of valid, reliable, and equitable Arabic proficiency tests. We believe this model will improve the comparability and transparency of AFL assessments across diverse educational and occupational settings, ultimately benefiting learners and stakeholders alike.

Data availability statement

All data generated and analysed in this study are fully available in open access and may be freely used, shared, and reproduced without any restrictions. The dataset does not contain any sensitive, confidential, or personally identifiable information, and therefore poses no ethical or privacy concerns. All materials supporting the findings of this research have been generated in an openly accessible repository under an open licence and unrestricted by other researchers. Therefore, our data are normal data from various sources that has no restriction and open access.

References

ACTFL: World-Readiness Standards for Learning Languages. American Council on the Teaching of Foreign Languages; Revised ed. 2022.
Albirini A: Modern Arabic Sociolinguistics: Diglossia, Variation, Codeswitching, Attitudes and Identity. Routledge; 2020.
Alghamdi A, Al-Saqqaf A: Reframing assessment in Arabic foreign language instruction: A Saudi case study. Language Testing in Asia. 2021; 11(1): 1–17.
Al-Hamly M, Milson A:Assessing teachers’ assessment literacy: A study of EFL teachers in Kuwait.Paper presented at the 33rd Language Testing Research Colloquium (LTRC).Chicago, IL; 2011.
Alharbi AO: Issues with Communicative Language Teaching Implementation in Saudi Arabia Concerning the Government Policy, Teachers, and Students: Two Decades of Research. Arab World English Journal. 2022; 13(2):412–423. Publisher Full Text
Al-Jarf R: Online automated writing assessment tools in AFL learning: Potential and challenges. Arab World English Journal (AWEJ). Special Issue on CALL. 2021; 6: 114–127.
American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME): Standards for educational and psychological testing. Washington, DC: American Educational Research Association; 2014.
Bachman LF, Palmer AS: Language assessment in practice: Developing language assessments and justifying their use in the real world. 2nd ed.Oxford: Oxford University Press; 2021.
Bensalah M: Assessing Arabic Proficiency: A Review of Standardized Practices in European Universities. Journal of Language and Linguistic Studies. 2022; 18(4): 210–224.
Brown HD, Abeywickrama P: Language Assessment: Principles and Classroom Practices. 2nd ed. Pearson Education; 2019.
Byram M: Teaching and assessing intercultural communicative competence: Revisited. 2nd ed. Multilingual Matters; 2021. Publisher Full Text
CEFR Council of Europe: Common European Framework of Reference for Languages: Learning, teaching, assessment – Companion volume. Strasbourg: Council of Europe; 2020. Reference Source
Council of Europe: Common European Framework of Reference for Languages: Companion Volume with New Descriptors. Council of Europe Publishing; 2020.
Green A:Washback in Language Assessment.Shohamy E, Or I, May S, editors.Language testing and assessment: Encyclopedia of language and education.3rd ed. Cham, Switzerland: Springer Nature; 2020; pp. 559–372.
Habash N, Saleh S: Dialectal Arabic in language education: Needs and assessment strategies. Computational Approaches to Arabic Dialects. Springer; 2021.
Harb H, Atoum M: Dynamic assessment for heritage Arabic learners: An exploratory study. Studies in Second Language Acquisition. 2023; 45(3): 571–595.
Kane M: Validating the interpretations and uses of test scores. J. Educ. Meas. 2013; 50(1): 1–73. Publisher Full Text
Koizumi R, In’nami Y:Structural Equation Modeling of Vocabulary Size and Depth Using Conventional and Bayesian Methods.Frontiers in Language Assessment and Testing, Psychology of Language.2020;11: Publisher Full Text
Kunnan AJ, Qin CY, Zhao CG: Developing a Scenario-Based English Language Assessment in an Asian University. Language Assessment Quarterly. 2022; 19(4):368–393. Publisher Full Text
Lantolf JP, Poehner ME, Thorne SL:Sociocultural theory and L2 development. VanPatten B, Keating G, Wulff S, editors. Theories in Second Language Acquisition. 3rd ed.Routledge, London; 2020; pp. 223–247. Publisher Full Text
Mahfoudhi A, Alasmari A: Intercultural communicative competence in Arabic language testing: A framework for inclusion. Language and Intercultural Communication. 2023; 23(2): 179–196.
Mahmoud A: Pragmatic competence in Arabic as a foreign language: Trends in assessment. Arab Journal of Applied Linguistics. 2023a; 9(1): 44–66.
Mahmoud HA: Psychometric approaches to test development in Arabic proficiency assessment: Challenges and opportunities. Language Assessment Quarterly. 2023b; 20(1): 22–43. Publisher Full Text
McNamara (2021), “Language and Subjectivity” published in 2021Hu R: Language and Subjectivity. Applied Linguistics. 2021; 44(5):930–933.
Messick S: Validity and washback in language testing. Language Testing. 1996; 13(3): 241–256. Publisher Full Text
Pill J, Harding L: Assessing communicative competence. This appears to be a book or major publication chapter. A near-final version of a related chapter by co-author Macqueen from a different handbook was published in April 2022. 2022.
Purpura J: Second and foreign language assessment. Hall G, editor. The Routledge Handbook of English Language Teaching. Routledge; 2016; pp. 303–316.
Ryding KC: Morphological awareness in Arabic learners: A cognitive challenge. Second Language Research. 2021; 37(2): 175–193.
Shaw A: Scenario-based Assessment in Arabic Language Programs: Opportunities and Pitfalls. International Journal of Language Testing. 2022a; 12(2): 45–67.
Shaw I: Addressing dialect variation in Arabic language testing: A review of practices and proposals. International Journal of Language and Linguistics. 2022b; 9(4): 33–47.
Taguchi N, Roever C, editors: Second language pragmatics. Oxford University Press; 2017.
Taha T, Farghaly A: Arabic NLP for educational applications: Developing tools for AFL assessment. Journal of Arabic Linguistics and Corpus Studies. 2022; 3(1): 56–74.
Zaki H, Alanzi F: Toward a CEFR-aligned AFL Assessment Model: Integrating MSA and Dialectal Proficiency. Language Assessment Quarterly. 2024; 21(1): 33–52.
Zoghbor W, Hussein RF: Revisiting the construct of Arabic language proficiency: Implications for assessment in diglossic contexts. Language Testing. 2022; 39(3): 475–498. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 26 Feb 2026

Author details Author details

¹ Institute of Arabic Language Institute for Speakers of other Languages, King Abdulaziz University, Jeddah, Mekkah, Saudi Arabia
² IIUM Institute of Islamic Banking and Finance, International Islamic University Malaysia, Kuala Lumpur, Kuala Lumpur, 53100, Malaysia

Bader Hudayban Alharbi
Roles: Conceptualization

Auwal Adam Sa'ad
Roles: Formal Analysis

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 26 Feb 2026, 15:325

https://doi.org/10.12688/f1000research.173332.1

Copyright

© 2026 Alharbi BH and Sa'ad AA. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Alharbi BH and Sa'ad AA. Designing a Unified Model for Measuring Arabic Language Competence among Non-Native Learners [version 1; peer review: 2 approved with reservations]. F1000Research 2026, 15:325 (https://doi.org/10.12688/f1000research.173332.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 26 Feb 2026

Views

4

Reviewer Report 04 May 2026

Everhard Markiano Solissa, Universitas Pattimura, Pattimura, Indonesia

Approved with Reservations

https://doi.org/10.5256/f1000research.191138.r474394

No
Question
Suggested Revisions

1. Is the work clearly and accurately presented and does it cite the current literature?
Standardize citations according to APA style, avoid duplicate references, and strengthen claims by adding more ... Continue reading

No
Question
Suggested Revisions

1. Is the work clearly and accurately presented and does it cite the current literature?
Standardize citations according to APA style, avoid duplicate references, and strengthen claims by adding more specific empirical references or supporting quantitative data

2. Is the study design appropriate and is the work technically sound?
Add operational elements such as an instrument blueprint, sample test items, and scoring rubrics, and direct the study toward empirical testing (e.g., through a pilot study).

3. Are sufficient details of methods and analysis provided to allow replication by others?
Use a systematic review or scoping review approach, and explicitly describe the stages of literature analysis to ensure methodological replicability.

4. If applicable, is the statistical analysis and its interpretation appropriate?
-

5. Are all the source data underlying the results available to ensure full reproducibility?
Revise the data availability section to clarify that the study is literature-based and does not produce primary data.

6. Are the conclusions drawn adequately supported by the results?
The results of this study are very good so they can be used as a reference in learning Arabic. So, this study continued to the next stage with minor revisions.
-.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

4

Reviewer Report 23 Apr 2026

Refa Lina Tiawati, Universitas PGRI Sumatera Bara, jakatra, Indonesia

Approved with Reservations

https://doi.org/10.5256/f1000research.191138.r474402

This article presents an innovative conceptual framework for Arabic language assessment, effectively integrating international standards with Arabic-specific linguistic features. However, major revisions are needed due to the lack of empirical validation. The authors should provide a validation protocol, operational guidelines, ... Continue reading

This article presents an innovative conceptual framework for Arabic language assessment, effectively integrating international standards with Arabic-specific linguistic features. However, major revisions are needed due to the lack of empirical validation. The authors should provide a validation protocol, operational guidelines, a fully exemplified test instrument, and adjust the conclusions to reflect the theoretical status of the framework. Empirical validation is also lacking, specifically in the "Future Validation Protocol" section.
Operationalize dual-track assessments, provide illustrative scoring rubrics or decision trees by mapping CEFR-like descriptors to dialect tasks, and address mixed-register responses.
Depth of the psychometric protocol. Expand "Psychometric Considerations" with: minimum reliability, coefficients, recommended factor analysis approaches, criteria for flagging biased items, and references to open-source tools.
Final recommendation: Accept pending comprehensive substantive revision.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Expert in Applied Linguistics and Language Diplomacy through BIPAAs a lecturer and practitioner of Indonesian for Foreign Speakers (BIPA), Dr. Refa Lina Tiawati R. is a strategic figure at the Center for Discourse, Resilience, and Vitality Studies (PSDKV) at the University of PGRI West Sumatra. Her expertise in Applied Linguistics and her extensive track record as a resource person, writer, and instructor in various BIPA forums make her a pioneer in strengthening the role of Indonesian as an instrument for cultural preservation, revitalization, and diplomacy.Dr. Refa is actively developing contextual BIPA learning models based on global needs. Her experience as a speaker at various activities organized by APPBIPA, Language Centers, and international forums reinforces her vision to make Indonesian language learning more adaptive, inclusive, and internationally standardized. At PSDKV, she is also involved in developing teaching materials, thematic modules, and developing technology-based teaching strategies to reach learners across countries.Her books, such as BIPA Learning Strategy, BIPA Teaching Practice Module, and Sociolinguistics, demonstrate her contribution to language revitalization through an applied academic approach. She focuses not only on learning but also on strengthening Indonesian cultural identity through language, as well as the development of digital technology in BIPA teaching. With a global vision and strong practical expertise, Dr. Refa Lina Tiawati is a key driver in the PSDKV program, which bridges local cultural preservation with future digital linguistic innovations.BIPA; Cross-Cultural Understanding; Sociolinguistic; Pragmatic

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 26 Feb 2026

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 26 Feb 26	read	read

Refa Lina Tiawati, Universitas PGRI Sumatera Bara, jakatra, Indonesia
Everhard Markiano Solissa, Universitas Pattimura, Pattimura, Indonesia

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

4 Views

04 May 2026 | for Version 1

Everhard Markiano Solissa, Universitas Pattimura, Pattimura, Indonesia

4 Views Cite this report Responses(0)

Approved With Reservations

No
Question
Suggested Revisions

1. Is the work clearly and accurately presented and does it cite the current literature?
Standardize citations according to APA style, avoid duplicate references, and strengthen claims by adding more specific empirical references or supporting quantitative data

2. Is the study design appropriate and is the work technically sound?
Add operational elements such as an instrument blueprint, sample test items, and scoring rubrics, and direct the study toward empirical testing (e.g., through a pilot study).

3. Are sufficient details of methods and analysis provided to allow replication by others?
Use a systematic review or scoping review approach, and explicitly describe the stages of literature analysis to ensure methodological replicability.

4. If applicable, is the statistical analysis and its interpretation appropriate?
-

5. Are all the source data underlying the results available to ensure full reproducibility?
Revise the data availability section to clarify that the study is literature-based and does not produce primary data.

6. Are the conclusions drawn adequately supported by the results?
The results of this study are very good so they can be used as a reference in learning Arabic. So, this study continued to the next stage with minor revisions.
-.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

4 Views

23 Apr 2026 | for Version 1

Refa Lina Tiawati, Universitas PGRI Sumatera Bara, jakatra, Indonesia

4 Views Cite this report Responses(0)

Approved With Reservations

This article presents an innovative conceptual framework for Arabic language assessment, effectively integrating international standards with Arabic-specific linguistic features. However, major revisions are needed due to the lack of empirical validation. The authors should provide a validation protocol, operational guidelines, a fully exemplified test instrument, and adjust the conclusions to reflect the theoretical status of the framework. Empirical validation is also lacking, specifically in the "Future Validation Protocol" section.
Operationalize dual-track assessments, provide illustrative scoring rubrics or decision trees by mapping CEFR-like descriptors to dialect tasks, and address mixed-register responses.
Depth of the psychometric protocol. Expand "Psychometric Considerations" with: minimum reliability, coefficients, recommended factor analysis approaches, criteria for flagging biased items, and references to open-source tools.
Final recommendation: Accept pending comprehensive substantive revision.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Expert in Applied Linguistics and Language Diplomacy through BIPAAs a lecturer and practitioner of Indonesian for Foreign Speakers (BIPA), Dr. Refa Lina Tiawati R. is a strategic figure at the Center for Discourse, Resilience, and Vitality Studies (PSDKV) at the University of PGRI West Sumatra. Her expertise in Applied Linguistics and her extensive track record as a resource person, writer, and instructor in various BIPA forums make her a pioneer in strengthening the role of Indonesian as an instrument for cultural preservation, revitalization, and diplomacy.Dr. Refa is actively developing contextual BIPA learning models based on global needs. Her experience as a speaker at various activities organized by APPBIPA, Language Centers, and international forums reinforces her vision to make Indonesian language learning more adaptive, inclusive, and internationally standardized. At PSDKV, she is also involved in developing teaching materials, thematic modules, and developing technology-based teaching strategies to reach learners across countries.Her books, such as BIPA Learning Strategy, BIPA Teaching Practice Module, and Sociolinguistics, demonstrate her contribution to language revitalization through an applied academic approach. She focuses not only on learning but also on strengthening Indonesian cultural identity through language, as well as the development of digital technology in BIPA teaching. With a global vision and strong practical expertise, Dr. Refa Lina Tiawati is a key driver in the PSDKV program, which bridges local cultural preservation with future digital linguistic innovations.BIPA; Cross-Cultural Understanding; Sociolinguistic; Pragmatic

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] ACTFL: World-Readiness Standards for Learning Languages. American Council on the Teaching of Foreign Languages; Revised ed. 2022.

[2] Albirini A: Modern Arabic Sociolinguistics: Diglossia, Variation, Codeswitching, Attitudes and Identity. Routledge; 2020.

[3] Alghamdi A, Al-Saqqaf A: Reframing assessment in Arabic foreign language instruction: A Saudi case study. Language Testing in Asia. 2021; 11(1): 1–17.

[4] Al-Hamly M, Milson A:Assessing teachers’ assessment literacy: A study of EFL teachers in Kuwait.Paper presented at the 33rd Language Testing Research Colloquium (LTRC).Chicago, IL; 2011.

[5] Alharbi AO: Issues with Communicative Language Teaching Implementation in Saudi Arabia Concerning the Government Policy, Teachers, and Students: Two Decades of Research. Arab World English Journal. 2022; 13(2):412–423. Publisher Full Text

[6] Al-Jarf R: Online automated writing assessment tools in AFL learning: Potential and challenges. Arab World English Journal (AWEJ). Special Issue on CALL. 2021; 6: 114–127.

[7] American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME): Standards for educational and psychological testing. Washington, DC: American Educational Research Association; 2014.

[8] Bachman LF, Palmer AS: Language assessment in practice: Developing language assessments and justifying their use in the real world. 2nd ed.Oxford: Oxford University Press; 2021.

[9] Bensalah M: Assessing Arabic Proficiency: A Review of Standardized Practices in European Universities. Journal of Language and Linguistic Studies. 2022; 18(4): 210–224.

[10] Brown HD, Abeywickrama P: Language Assessment: Principles and Classroom Practices. 2nd ed. Pearson Education; 2019.

[11] Byram M: Teaching and assessing intercultural communicative competence: Revisited. 2nd ed. Multilingual Matters; 2021. Publisher Full Text

[12] CEFR Council of Europe: Common European Framework of Reference for Languages: Learning, teaching, assessment – Companion volume. Strasbourg: Council of Europe; 2020. Reference Source

[13] Council of Europe: Common European Framework of Reference for Languages: Companion Volume with New Descriptors. Council of Europe Publishing; 2020.

[14] Green A:Washback in Language Assessment.Shohamy E, Or I, May S, editors.Language testing and assessment: Encyclopedia of language and education.3rd ed. Cham, Switzerland: Springer Nature; 2020; pp. 559–372.

[15] Habash N, Saleh S: Dialectal Arabic in language education: Needs and assessment strategies. Computational Approaches to Arabic Dialects. Springer; 2021.

[16] Harb H, Atoum M: Dynamic assessment for heritage Arabic learners: An exploratory study. Studies in Second Language Acquisition. 2023; 45(3): 571–595.

[17] Kane M: Validating the interpretations and uses of test scores. J. Educ. Meas. 2013; 50(1): 1–73. Publisher Full Text

[18] Koizumi R, In’nami Y:Structural Equation Modeling of Vocabulary Size and Depth Using Conventional and Bayesian Methods.Frontiers in Language Assessment and Testing, Psychology of Language.2020;11: Publisher Full Text

[19] Kunnan AJ, Qin CY, Zhao CG: Developing a Scenario-Based English Language Assessment in an Asian University. Language Assessment Quarterly. 2022; 19(4):368–393. Publisher Full Text

[20] Lantolf JP, Poehner ME, Thorne SL:Sociocultural theory and L2 development. VanPatten B, Keating G, Wulff S, editors. Theories in Second Language Acquisition. 3rd ed.Routledge, London; 2020; pp. 223–247. Publisher Full Text

[21] Mahfoudhi A, Alasmari A: Intercultural communicative competence in Arabic language testing: A framework for inclusion. Language and Intercultural Communication. 2023; 23(2): 179–196.

[22] Mahmoud A: Pragmatic competence in Arabic as a foreign language: Trends in assessment. Arab Journal of Applied Linguistics. 2023a; 9(1): 44–66.

[23] Mahmoud HA: Psychometric approaches to test development in Arabic proficiency assessment: Challenges and opportunities. Language Assessment Quarterly. 2023b; 20(1): 22–43. Publisher Full Text

[24] McNamara (2021), “Language and Subjectivity” published in 2021Hu R: Language and Subjectivity. Applied Linguistics. 2021; 44(5):930–933.

[25] Messick S: Validity and washback in language testing. Language Testing. 1996; 13(3): 241–256. Publisher Full Text

[26] Pill J, Harding L: Assessing communicative competence. This appears to be a book or major publication chapter. A near-final version of a related chapter by co-author Macqueen from a different handbook was published in April 2022. 2022.

[27] Purpura J: Second and foreign language assessment. Hall G, editor. The Routledge Handbook of English Language Teaching. Routledge; 2016; pp. 303–316.

[28] Ryding KC: Morphological awareness in Arabic learners: A cognitive challenge. Second Language Research. 2021; 37(2): 175–193.

[29] Shaw A: Scenario-based Assessment in Arabic Language Programs: Opportunities and Pitfalls. International Journal of Language Testing. 2022a; 12(2): 45–67.

[30] Shaw I: Addressing dialect variation in Arabic language testing: A review of practices and proposals. International Journal of Language and Linguistics. 2022b; 9(4): 33–47.

[31] Taguchi N, Roever C, editors: Second language pragmatics. Oxford University Press; 2017.

[32] Taha T, Farghaly A: Arabic NLP for educational applications: Developing tools for AFL assessment. Journal of Arabic Linguistics and Corpus Studies. 2022; 3(1): 56–74.

[33] Zaki H, Alanzi F: Toward a CEFR-aligned AFL Assessment Model: Integrating MSA and Dialectal Proficiency. Language Assessment Quarterly. 2024; 21(1): 33–52.

[34] Zoghbor W, Hussein RF: Revisiting the construct of Arabic language proficiency: Implications for assessment in diglossic contexts. Language Testing. 2022; 39(3): 475–498. Publisher Full Text

Designing a Unified Model for Measuring Arabic Language Competence among Non-Native Learners

Abstract

Keywords

Introduction

Literature review

Existing Arabic proficiency tests and their limitations

Theoretical foundations: Standards, competence, and cognition

Cultural pragmatics and the gap in standardized testing

The role of authentic materials in assessing sociocultural competence

Assessment implications for heritage and bilingual learners

General vs specialised proficiency

Methodology

Discussion

Conclusion

Data availability statement

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated