ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Designing a Unified Model for Measuring Arabic Language Competence among Non-Native Learners

[version 1; peer review: 2 approved with reservations]
PUBLISHED 26 Feb 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

This article proposes a revised and expanded framework for the development of standardised Arabic language proficiency assessments, with a particular emphasis on addressing the unique linguistic, cognitive, and cultural complexities of Arabic as a Foreign Language (AFL). The framework is designed to serve both general-purpose language learning objectives and domain-specific applications, such as academic, diplomatic, or professional use. Drawing on established international benchmarks including the Common European Framework of Reference for Languages (CEFR) and the American Council on the Teaching of Foreign Languages (ACTFL) guidelines alongside specialised Arabic proficiency instruments such as the Arabic Language Proficiency Test (ALPT), the Certificate in Modern Arabic (CIMA), and the Test of Arabic as a Foreign Language (TAFL), this study synthesises best practices in language testing with insights from applied linguistics and psychometrics. A central innovation of this framework is its incorporation of Arabic’s sociolinguistic realities, including the persistent challenge of diglossia the coexistence of Modern Standard Arabic (MSA) and diverse regional dialects as well as the integration of dialectal comprehension and production tasks to reflect authentic communicative demands. The framework also emphasises cognitive processing strategies relevant to Arabic’s morphosyntactic complexity, recognising that effective assessment must go beyond grammar and vocabulary to capture learners’ ability to infer, interpret, and communicate meaning in context. Furthermore, the revised model places significant weight on sociocultural and pragmatic competence, ensuring that assessments include idiomatic expressions, speech acts, and culturally embedded language reflective of real-life interaction. Realism in task design, fairness across learner backgrounds, and transparency in scoring are addressed through the adoption of evidence-based principles from the Standards for Educational and Psychological Testing (AERA/APA/NCME). The study also advocates for robust validation procedures, including construct alignment, item response analysis, and intercultural bias mitigation. By holistically integrating linguistic, cognitive, and cultural dimensions, this framework aims to guide the development of more authentic, valid, and equitable proficiency assessments for AFL learners across diverse contexts. The proposed model contributes to the growing body of research on Arabic language pedagogy and offers practical implications for test developers, curriculum designers, and policy makers engaged in Arabic language education worldwide.

Keywords

Arabic proficiency; language assessment; diglossia; cultural competence; test design; psychometrics.

Introduction

Recent years have witnessed a growing emphasis on assessment as a tool for aligning instruction with proficiency-based goals in the teaching of Arabic as a Foreign Language (AFL). This shift reflects broader trends in language education, where assessments are not merely used for grading but also as diagnostic tools to inform instruction and curriculum development (Alghamdi & Al-Saqqaf, 2021). In AFL contexts, proficiency-based models have gained traction, particularly those that emphasize communicative competence across multiple modalities. Yet, current practices in Arabic assessment often lag behind innovations seen in European and East Asian language instruction, where assessment is more tightly integrated into learner centered pedagogies (Bensalah, 2022). The challenge in Arabic lies in its structural features root-based morphology, non-linear orthography, and diglossic variation which complicate the creation of standardized, scalable testing instruments.

A persistent challenge in AFL assessment is Arabic’s diglossia where learners are required to navigate Modern Standard Arabic (MSA) for formal contexts and colloquial dialects for everyday communication (Albirini, 2020). Traditional proficiency tests such as ALPT, OPI (Oral Proficiency Interview), and institutional placement exams tend to focus exclusively on MSA, thus failing to reflect the full communicative realities faced by learners in immersive environments. New research underscores the importance of integrating dialectal varieties into assessment tools to support learner sociolinguistic competence (Habash & Saleh, 2021). Some programs have begun to include dialectal components in oral assessments, particularly in advanced-level testing, yet there remains a lack of standardization and validated rubrics for evaluating dialectal proficiency.

With the rise of digital learning and remote assessment post-COVID-19, there has been a surge in interest toward computer-based and AI-driven Arabic language assessments. Platforms like Duolingo and Memrise have introduced Arabic learning modules, and more recently, AI-powered assessment tools are being piloted for automated speaking and writing evaluation (Al-Jarf, 2021). However, such tools often rely on MSA corpora and are not trained to recognize dialectal variations or pragmatic nuances in learners’ language use. Scholars have highlighted the need for Arabic-specific natural language processing (NLP) tools to improve the validity of automated assessments (Taha & Farghaly, 2022). Additionally, alignment with CEFR and ACTFL scales remains weak in most digital platforms currently available.

Another dimension increasingly recognized in AFL assessment is the role of culture. Recent studies emphasize that effective communication in Arabic requires not only linguistic proficiency but also intercultural awareness understanding norms, gestures, politeness strategies, and religious expressions embedded in Arabic discourse (Mahfoudhi & Alasmari, 2023). Traditional tests have largely neglected these aspects, prompting scholars to advocate for more authentic, performance-based assessments that simulate real-life interactions in Arab cultural settings. For instance, scenario-based assessments that include role-play or interpretive tasks are now being piloted in some AFL programs in Europe and the Gulf region, offering promising models for holistic evaluation (Shaw, 2022a).

Given these challenges and developments, researchers are calling for a unified AFL assessment framework that adapts global standards (such as CEFR and ACTFL) while integrating Arabic-specific linguistic and cultural variables. This includes developing calibrated rating scales that differentiate between MSA and dialect proficiency, validating test items with diverse learner populations, and ensuring fairness for learners from different language backgrounds (Zaki & Alanzi, 2024). A key recommendation is the use of the can-do descriptors aligned to CEFR, adapted to reflect both formal and informal Arabic use. There is also increasing interest in dynamic assessment models that evaluate learner potential rather than static knowledge—particularly relevant for heritage and bilingual learners (Harb & Atoum, 2023).

Literature review

Existing Arabic proficiency tests and their limitations

In recent decades, a number of standardised assessments for AFL have emerged. For example, the Arabic Language Proficiency Test (ALPT), developed by the Arabic Academy and endorsed by the Islamic Chamber of Commerce, has gained wide recognition; it assesses learners on five sections (Listening, Reading, Structure, Writing, Speaking). Similarly, the Certificat International de Maîtrise en Arabe (CIMA), offered by the Institut du Monde Arabe in Paris, certifies intermediate-level MSA proficiency. Other offerings include online CEFR-based exams and institutional certification programmes. However, none of these is universally adopted, and each has notable drawbacks. A critical issue is that many current AFL tests focus on MSA in artificial tasks and do not fully represent authentic language use. As Al-Hamly and Milson (2011) observe, even high-level listening and speaking items in proficiency guidelines often assume formal language, whereas real-world communication requires dialectal competence. In short, many tests fail to align with Arabic’s sociolinguistic reality.

Another recurring critique is the lack of rigorous validation. Many AFL test manuals provide minimal reliability or validity data. Al-Hamly and Milson note that testing programmes “do not offer sufficient data about the validation process” and often limit evidence to basic inter-rater checks. Without robust psychometric analysis (e.g. item analyses, generalisability studies), stakeholders cannot be confident in score interpretations or fairness. Concerns about cultural bias have also been raised: if test content assumes familiarity with a particular dialect or set of cultural practices, non-native learners may be disadvantaged. More transparency and fairness review is needed in AFL assessment.

A major limitation of existing Arabic proficiency tests is their narrow focus on Modern Standard Arabic (MSA), often at the expense of communicative realism. While MSA remains the standardized variety used in formal media and education, it is not the primary mode of interaction in daily life across Arab societies (Albirini, 2020). Most proficiency frameworks, including ALPT and CIMA, center on scripted tasks involving academic or literary texts that are disconnected from real-world usage. This has led to increasing calls to include dialects—or at least acknowledge their role—in proficiency measurement (Mahfoudhi & Alasmari, 2023). Learners trained solely in MSA frequently encounter difficulty navigating spoken Arabic in Arabic-speaking countries, a phenomenon referred to as the “MSA bubble” (Habash & Saleh, 2021). Moreover, the lack of dialectal input creates an unrealistic ceiling effect in oral and aural assessment tasks, where native-like fluency is judged by artificially elevated norms of formal correctness.

Another problem is the absence of a unified framework specifically tailored for AFL proficiency. While some institutions loosely align their testing with CEFR or ACTFL guidelines, the adaptation is often superficial. For example, CEFR descriptors such as “Can express opinions in a conversation on abstract topics” are difficult to map onto the diglossic and context-sensitive nature of Arabic (Zaki & Alanzi, 2024). Studies have shown that AFL learners’ performance in listening and speaking varies significantly depending on whether the task involves MSA or a regional dialect, suggesting that current proficiency descriptors do not capture the multidimensional nature of Arabic language use (Bensalah, 2022). As a result, many learners may achieve a high CEFR rating without being able to function effectively in real-life Arabic settings.

Despite their international reach, many Arabic proficiency tests still lack methodological transparency regarding their design, validation, and scoring processes. For instance, while the ALPT outlines general testing areas, there is minimal published data on item difficulty levels, standard-setting procedures, or construct validation (Alghamdi & Al-Saqqaf, 2021). Test designers often provide anecdotal justification for scoring rubrics without empirical support, violating fundamental principles of test reliability and fairness (Brown & Abeywickrama, 2019). In many cases, inter-rater reliability is the only reported statistic, and even that is often based on small, non-representative samples. As Taha and Farghaly (2022) emphasize, the absence of item-level psychometric analysis severely limits the interpretability of test scores and undermines their use in high-stakes contexts such as immigration or university admission.

Although digital testing platforms are increasingly used, especially in post-pandemic education, their use in Arabic assessment raises significant challenges. AI-based scoring engines such as those piloted in the ALPT or experimental Arabic modules in Duolingo and other platforms have limited sensitivity to Arabic’s morphological and dialectal diversity. For example, automated grammar checkers often misinterpret dialectal expressions or fail to recognize correct but non-standard morphosyntactic constructions (Taha & Farghaly, 2022). This misalignment not only skews assessment accuracy but can also discourage learners who are attempting to use more naturalistic Arabic forms. Furthermore, these systems rarely undergo proper fairness testing across different learner profiles—heritage speakers, bilinguals, and first-time learners—which raises concerns about equity and accessibility (Harb & Atoum, 2023).

Cultural bias is another recurrent concern in Arabic testing. Test prompts and listening materials frequently embed cultural references that may be familiar to learners from certain regions (e.g., Gulf or Levantine contexts) but alien to others (Mahfoudhi & Alasmari, 2023). This can disadvantage learners who have limited exposure to Arab cultures, or whose exposure is limited to particular media portrayals. Studies on intercultural communicative competence have argued that fairness in language testing must include a balance between cultural specificity and inclusivity (Shaw, 2022b). The issue becomes more pronounced in speaking tests, where raters may unconsciously reward culturally “authentic” pronunciation or expressions over more neutral but technically correct MSA usage. This creates a sociolinguistic inequity that has not been sufficiently addressed in most AFL testing models to date.

Theoretical foundations: Standards, competence, and cognition

International language proficiency frameworks such as the Common European Framework of Reference for Languages (CEFR) and the ACTFL Proficiency Guidelines serve as essential theoretical anchors for language assessment design. These models prioritize communicative competence as the core outcome of language learning and testing (Council of Europe, 2020; ACTFL, 2022). CEFR’s action-oriented approach and ACTFL’s proficiency scales offer structured descriptors across skill levels—ranging from novice to superior—which are critical in defining learning outcomes and aligning assessment tasks. ACTFL’s “5 Cs” model Communication, Cultures, Connections, Comparisons, and Communities goes further by embedding cultural literacy as a dimension of language proficiency, reflecting the understanding that communicative ability is inseparable from cultural context (ACTFL, 2022).

Building upon these standards, language assessment theory such as that outlined by AERA, APA, and NCME (2014) emphasizes the principles of validity, reliability, fairness, and practicality. Validity, in particular, requires that each test score interpretation be supported by evidence derived from test content, design, and statistical analyses (Kane, 2013). Our proposed AFL assessment framework is explicitly designed to align with these principles. Every test specification whether related to task design, scoring rubrics, or language domain coverage is justified in relation to the inferences intended from learners’ scores. Construct coverage (i.e., what is being measured), as well as fairness considerations for diverse learners (including heritage speakers, bilinguals, and monolinguals), is therefore embedded from the outset (Zaki & Alanzi, 2024).

Cognitively, Arabic presents unique challenges to learners, particularly due to its complex morphological system, root-and-pattern structure, and relatively deep orthography (Ryding, 2021). These features impose considerable cognitive load, especially on learners unfamiliar with Semitic language patterns. Assessments that rely solely on discrete-point grammar or vocabulary recall fail to capture learners’ processing strategies and real-time language comprehension. Recent psycholinguistic research emphasizes the importance of testing both declarative knowledge (facts and rules) and procedural knowledge (skills such as inferencing, decoding, and self-monitoring in comprehension tasks) in AFL learners (Bensalah, 2022; Taha & Farghaly, 2022).

In response, the framework incorporates tasks that go beyond isolated item types. Reading and listening sections, for example, include inference-based questions where learners must extract implied meanings from contextually rich passages. Such tasks measure learners’ ability to deploy metacognitive and strategic resources, aligning with current cognitive models of second language acquisition and assessment (Purpura, 2016; Alghamdi & Al-Saqqaf, 2021). This dimension is especially critical for Arabic, where lexical ambiguity and morpho-syntactic variation require higher-order comprehension.

From a sociolinguistic perspective, Arabic’s diglossia a coexistence of Modern Standard Arabic (MSA) and regional dialects presents a major consideration for assessment validity. As Habash and Saleh (2021) point out, learners may demonstrate high proficiency in MSA but fail to engage effectively in real-world communication where dialects dominate, particularly in listening and speaking domains. Consequently, our framework adopts a dual-track approach: proficiency is assessed separately for MSA and for one dialect selected by the learner or program (e.g., Egyptian, Levantine, or Gulf Arabic). This innovation reflects the sociolinguistic reality of Arabic use and addresses criticisms of traditional tests that remain artificially MSA-centric (Albirini, 2020; Harb & Atoum, 2023).

Each language domain especially oral production and aural comprehension is calibrated to allow for variation in pronunciation, lexical choices, and grammatical constructions based on dialectal norms. Moreover, the framework includes “code-switching awareness” items, which test learners’ recognition of shifts between MSA and dialect in authentic discourse (e.g., TV interviews, social media content). This sociolinguistic sensitivity not only improves construct validity but also equips learners for real-life interactions.

Cultural pragmatics and the gap in standardized testing

Despite the theoretical consensus around the importance of sociocultural competence in second language acquisition, most standardized AFL proficiency tests continue to marginalize this dimension (Albirini, 2020; Bensalah, 2022). While traditional assessments emphasize grammatical accuracy and vocabulary range, they often overlook cultural pragmatics the knowledge of when and how to use certain forms appropriately within specific cultural contexts. For example, greeting forms in Arabic vary significantly by region, formality, and relationship, yet most tests assess only literal translation or sentence completion without context (Mahfoudhi & Alasmari, 2023). This lack of pragmalinguistic and sociopragmatic assessment leads to what Taguchi and Roever (2017) term a “competence gap,” where learners are unable to transfer classroom knowledge to real-life social interactions.

Furthermore, test prompts that fail to represent authentic communicative acts may yield misleading results about learner readiness. In a study by Mahmoud (2023b), learners who performed well in grammar-based tasks showed significant challenges in role-playing tasks involving apology, refusal, and requesting in culturally nuanced scenarios. This raises questions about construct underrepresentation a validity threat where the test fails to measure an essential component of the construct (Messick, 1996; updated by Kane, 2013). Therefore, any robust AFL assessment must include scenario-based, context-rich tasks that mirror how Arabic is used in diverse social and cultural situations.

Recent scholarship advocates for integrating intercultural communicative competence (ICC) into language assessment frameworks. ICC includes knowledge of cultural norms, attitudes of openness, and skills in interpreting and relating across cultures (Byram, 2021). In AFL contexts, this is particularly relevant due to the cultural heterogeneity of the Arab world ranging from the Levant to North Africa to the Gulf each with distinct sociocultural codes. However, many AFL tests are designed from a singular cultural lens (often Egyptian or Levantine), which can lead to cultural bias (Shaw, 2022a). Learners unfamiliar with these cultural frames may struggle not because of linguistic deficiency, but due to a mismatch in sociocultural expectations. This highlights the need for cultural pluralism in assessment design i.e., developing culturally inclusive tasks that reflect various regional norms. For instance, a pragmatic task assessing leave-taking in Gulf Arabic may differ significantly from one in Moroccan Arabic, not only in language form but also in duration, politeness strategies, and expected rituals (Habash & Saleh, 2021). Emerging frameworks now emphasize ICC-informed rubrics that incorporate both language use and cultural appropriateness as scoring criteria (Mahfoudhi & Alasmari, 2023; Harb & Atoum, 2023). Such rubrics allow raters to evaluate learners’ ability to use language sensitively in culturally loaded situations, enhancing the validity and relevance of the assessment.

The role of authentic materials in assessing sociocultural competence

Another growing area in sociocultural competence assessment is the use of authentic materials, such as social media posts, video clips, and conversational transcripts. These materials help situate learners in real communicative contexts where meaning is shaped by both linguistic input and cultural cues (Taha & Farghaly, 2022). For instance, interpreting a WhatsApp message in colloquial Arabic may require understanding ellipsis, emojis, politeness markers, and tone all of which are pragmatically and culturally charged.

However, incorporating such materials into assessments presents both opportunities and challenges. On the one hand, authenticity improves ecological validity the degree to which a test mirrors real-world tasks. On the other hand, it demands high levels of test development expertise, including calibration of difficulty, dialect management, and bias control (Zaki & Alanzi, 2024). There are also concerns about accessibility and fairness: learners from different backgrounds may have varying levels of exposure to informal Arabic genres, leading to construct-irrelevant variance (Alghamdi & Al-Saqqaf, 2021). To address this, scholars recommend using tiered tasks that offer learners the option to choose topics or registers they are more familiar with, thereby preserving validity without sacrificing fairness (Shaw, 2022a).

Assessment implications for heritage and bilingual learners

The assessment of sociocultural competence must also account for the growing number of heritage and bilingual Arabic learners, particularly in North America, Europe, and Southeast Asia. These learners often possess high oral fluency in dialects but limited proficiency in MSA or formal registers. As Harb and Atoum (2023) note, applying traditional MSA-focused assessments to such learners leads to profiling bias, where their actual communicative competence is undervalued. Moreover, their sociocultural competence developed through family, community, and media exposureis rarely captured in formal assessments. A socioculturally responsive AFL assessment must therefore differentiate between linguistic competence and cultural familiarity, and design tasks that reflect the diverse trajectories of learners. Some scholars propose dynamic assessment approaches a model where feedback and mediation are built into the test to capture learners’ potential rather than just their static performance (Lantolf et al., 2020). This is particularly beneficial in evaluating sociocultural competence, which is often implicit and develops through guided interaction.

General vs specialised proficiency

A key expansion in this revision is the parallel focus on general-purpose and specialised/academic proficiency. In many contexts, learners study Arabic for specific fields (e.g. Islamic studies, Middle Eastern business, media). These domains require distinct vocabulary, registers, and genres. Language testing research (e.g. in English for Specific Purposes) shows that general proficiency frameworks must be supplemented by domain-related criteria for specialised purposes. Accordingly, our framework proposes dual standards: one set for broad communicative proficiency and another set for specialised contexts. For example, an Arabic for Medicine test at the B2 level would include tasks on technical medical vocabulary and interpreting clinical documents, in addition to general reading/listening tasks. As Al-Hamly and Milson explicitly note, there is an “urgent need for tests that target Arabic for specific purposes”. Our model addresses this by making the general criteria adaptable to specialised content, ensuring that both language skills and domain knowledge are assessed where relevant.

Methodology

This framework was developed using a descriptive-analytical methodology grounded in best practices from the language testing field. The process entailed a comprehensive and critical review of existing literature on second language assessment, with a particular focus on Arabic as a Foreign Language (AFL). Key sources included the CEFR, ACTFL Proficiency Guidelines, and the AERA/APA Standards for Educational and Psychological Testing. By extracting theoretical and operational principles from these sources, the study synthesised them into a context-sensitive framework suitable for Arabic. The iterative development cycle involved mapping test specifications to actual learner challenges in AFL, particularly those posed by diglossia, morphological complexity, and sociocultural variation. No empirical field testing was undertaken at this stage; however, the resulting framework represents a theoretically robust foundation that is adaptable for empirical validation in future studies (Bachman & Palmer, 2021; Alharbi, 2022).

Proposed Proficiency Framework Test Design Criteria The test design criteria in this framework are grounded in contemporary testing theory, aiming to maximise construct validity, authenticity, fairness, and reliability. Content and task validity are prioritised by ensuring that test specifications closely reflect the constructs they intend to measure. This includes the representation of MSA and dialectal Arabic, the integration of sociocultural content, and alignment with communicative goals as emphasised in ACTFL’s 5 Cs (Communication, Cultures, Connections, Comparisons, Communities). Authenticity is operationalised by using real-world materials such as news broadcasts, informal conversations, and academic texts, which better reflect communicative demands encountered by learners (Shaw, 2022a). The framework addresses the often-overlooked challenge of Arabic diglossia by embedding tasks that separately target MSA and dialectal varieties. Listening tasks, for example, may combine a formal MSA lecture with an informal dialogue in a Levantine or Gulf dialect, supported by glossaries to scaffold comprehension (Mahmoud, 2023b). Writing and speaking tasks are similarly designed to elicit both formal and informal registers, acknowledging that Arabic proficiency is multi-layered and context-dependent.

Interactive tasks are encouraged to assess integrated skills, such as oral interviews that combine listening and speaking or essay writing based on reading tasks. This reflects recent literature emphasizing the importance of integrated skill assessment to capture learners’ real-world communicative competence (Koizumi & In’nami, 2020). Bias minimisation is embedded through a multi-stage validation process, including expert review panels comprising educators from diverse Arabic-speaking regions. This helps identify regional, cultural, or gender-based bias in items. Differential item functioning (DIF) analysis is recommended for quantitative bias detection during pilot testing. These practices are consistent with ethical principles outlined by the AERA/APA/NCME (2014) and recent discussions on equity in language testing (McNamara, 2021). Transparency and accessibility are reinforced through open documentation. Test blueprints, rubrics, and scoring guidelines should be publicly available to support accountability and informed use by educators, students, and institutions (Kunnan et al., 2022). Proficiency Standards and Descriptors The core of the proposed AFL framework consists of calibrated descriptors organised by skill (Listening, Speaking, Reading, Writing), Arabic variety (MSA and dialect), and sociocultural competence. These descriptors align with international benchmarks like CEFR and ACTFL, yet they are customised to address AFL-specific issues such as diglossia, script complexity, and culturally bound pragmatics.

At the B1 level (intermediate), a Listening descriptor might state: “Can understand the main ideas of clear standard speech on familiar matters regularly encountered in school, work, and leisure. Can follow the gist of dialectal exchanges in familiar contexts.” For Speaking: “Can initiate and sustain conversations on topics of personal interest using MSA, and can use basic dialectal expressions to negotiate everyday situations.” Cultural competence descriptors are added to assess learners’ ability to interpret and respond to culturally embedded content. For example, at B2 level: “Can interpret culturally specific proverbs and idiomatic expressions; can respond appropriately to greetings and refusals in culturally congruent ways.” This addresses critiques in recent literature that AFL assessments often marginalise the pragmatic dimension of language use (Mahfoudhi & Alasmari, 2023).

Specialised descriptors are also proposed for domain-specific proficiency. For instance, a Reading descriptor for religious studies at B2 might state: “Understands classical texts and modern interpretations in Qur’anic Arabic; distinguishes between theological terms in various contexts.” Similarly, a Writing descriptor for academic settings could read: “Produces structured essays in formal MSA with appropriate citation and argumentation strategies.” These additions address the emerging need for Arabic for Specific Purposes (ASP) testing in academic and professional domains (Zoghbor & Hussein, 2022). Psychometric and Ethical Considerations To ensure psychometric integrity, the proposed AFL assessment framework includes a detailed validation and quality control protocol. Reliability estimates—such as Cronbach’s alpha for internal consistency and inter-rater reliability coefficients for speaking and writing—must be calculated and reported. Test developers should also conduct validation studies across multiple dimensions: content (via expert review), construct (via factor analysis or Rasch modelling), and criterion-related (via correlation with external benchmarks like university performance or other validated tests).

Fairness and ethical transparency remain central. Consistent with the Standards for Educational and Psychological Testing (AERA/APA/NCME, 2014), accommodations must be offered to test-takers with documented needs, and clear policies should govern the use and interpretation of scores. Test-takers should receive reports that include performance by skill and level, with explanatory notes to aid interpretation. Recent literature has emphasised the importance of ethical score use and transparency in high-stakes testing (Green, 2020; Pill & Harding, 2022). Accordingly, this framework mandates public access to test design documentation, item release samples, and guidelines for preparation and remediation. These practices contribute to more equitable and accountable language testing ecosystems, especially in multilingual and multicultural AFL contexts. In conclusion, the proposed AFL assessment framework integrates theoretical, sociolinguistic, and psychometric insights into a comprehensive structure tailored for Arabic learners. It responds to longstanding challenges in AFL testing such as diglossia, cultural variation, and the underassessment of dialectal skills while embracing international standards for fairness, validity, and transparency. This positions it as a credible and adaptable model for future AFL assessment development.

Example framework structure

Practically, the framework might be implemented as interlinked tables. Table A would list Design Standards (e.g. “Text selection: authentic, context-appropriate MSA or dialect passages”) with explanatory notes. Table B would present Skill-Level Descriptors by proficiency level. A brief illustrative excerpt of Table B might be:

  • Listening A2: Understands simple factual information in standard announcements and short dialogues. Grasps key words in basic dialectal speech (e.g. introductions, simple requests) with occasional support.

  • Listening B2: Understands extended speech and lectures on familiar topics in MSA; follows most content of conversations or media in two major dialects.

  • Speaking A2: Can ask and answer simple questions on personal needs; uses present tense mainly and limited range of formulaic utterances in a familiar dialect.

  • Speaking B2: Can converse fluently with native speakers on a range of topics; adapts language formality appropriately, including use of common dialect expressions; handles unexpected complications in dialogue.

Additional columns would cover Reading and Writing similarly, plus separate rows for Dialect comprehension and Cultural knowledge at each level (e.g. “Recognises basic social conventions and greetings; identifies the meaning of common proverbs in context” at a mid-level). This layout ensures clarity: test developers and educators can see exactly which skills and contexts are expected at each stage, bridging general proficiency and specific outcomes.

Discussion

This revised framework directly addresses the gaps noted in the literature. By explicitly including dialect and cultural components, it aligns assessment with real communication demands. As reviewed, prior tests often treated Arabic as a monolithic MSA code, but actual use requires fluency in both formal and informal registers. Incorporating dialect competence at each level ensures that learners are evaluated on skills they will need in practical settings. Similarly, embedding cultural pragmatics responds to critiques of MSA-centric tests: by including idioms, proverbs, and social scenarios, the framework makes assessments more authentic and meaningful.

Methodologically, tying each design choice to established standards (CEFR, ACTFL, AERA/APA) enhances validity. The literature shows that few existing Arabic tests document such rigour, so our emphasis on evidence-based development and transparency is a key advancement. In practical terms, implementing these standards will require collaboration among linguists, educators, and psychometricians. The framework provides a clear template to guide this process: for instance, bias panels and pilot analyses are made explicit steps. Over time, as tests are built and normed following this model, we expect that score interpretations will become more defensible.

One limitation is that the framework itself has not yet been field-tested as an integrated whole. Future research should involve piloting an assessment designed from these standards, then analysing its reliability, validity, and washback. Collaboration with domain specialists (e.g. legal scholars, business experts) will also be needed to refine specialised descriptors. Finally, advances in technology (computer- adaptive testing, automated scoring) could be incorporated under this framework to improve efficiency; this remains an area for future exploration.

Conclusion

In summary, this paper proposes an enhanced framework for Arabic language proficiency testing that responds directly to both theoretical expectations and practical needs. It expands the scope to cover not only general communicative ability but also specialised academic and professional proficiency. It integrates sociolinguistic realities (diglossia, dialect diversity), cognitive factors, and cultural competence into a coherent set of test-design standards. It also foregrounds realism, fairness, and rigorous validation at every stage. By systematically synthesising these elements, the framework aims to guide the development of valid, reliable, and equitable Arabic proficiency tests. We believe this model will improve the comparability and transparency of AFL assessments across diverse educational and occupational settings, ultimately benefiting learners and stakeholders alike.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 26 Feb 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Alharbi BH and Sa'ad AA. Designing a Unified Model for Measuring Arabic Language Competence among Non-Native Learners [version 1; peer review: 2 approved with reservations]. F1000Research 2026, 15:325 (https://doi.org/10.12688/f1000research.173332.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 26 Feb 2026
Views
3
Cite
Reviewer Report 04 May 2026
Everhard Markiano Solissa, Universitas Pattimura, Pattimura, Indonesia 
Approved with Reservations
VIEWS 3
No
Question
Suggested Revisions

1. Is the work clearly and accurately presented and does it cite the current literature?
Standardize citations according to APA style, avoid duplicate references, and strengthen claims by adding more ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Solissa EM. Reviewer Report For: Designing a Unified Model for Measuring Arabic Language Competence among Non-Native Learners [version 1; peer review: 2 approved with reservations]. F1000Research 2026, 15:325 (https://doi.org/10.5256/f1000research.191138.r474394)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
4
Cite
Reviewer Report 23 Apr 2026
Refa Lina Tiawati, Universitas PGRI Sumatera Bara, jakatra, Indonesia 
Approved with Reservations
VIEWS 4
This article presents an innovative conceptual framework for Arabic language assessment, effectively integrating international standards with Arabic-specific linguistic features. However, major revisions are needed due to the lack of empirical validation. The authors should provide a validation protocol, operational guidelines, ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Tiawati RL. Reviewer Report For: Designing a Unified Model for Measuring Arabic Language Competence among Non-Native Learners [version 1; peer review: 2 approved with reservations]. F1000Research 2026, 15:325 (https://doi.org/10.5256/f1000research.191138.r474402)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 26 Feb 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.