<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.174830.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Sentence Embedding Using Multimodal Approach: Combining FastText with AraBERT for Arabic Text Representation</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Almayyali</surname>
                        <given-names>Hind</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Aliwy</surname>
                        <given-names>Ahmed</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8032-8185</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>computer science, University of Kufa, Kufa, Najaf Governorate, Iraq</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:ahmedh.almajidy@uokufa.edu.iq">ahmedh.almajidy@uokufa.edu.iq</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>6</day>
                <month>2</month>
                <year>2026</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2026</year>
            </pub-date>
            <volume>15</volume>
            <elocation-id>206</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>29</day>
                    <month>1</month>
                    <year>2026</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Almayyali H and Aliwy A</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/15-206/pdf"/>
            <abstract>
                <sec>
                    <title>Background</title>
                    <p>Sentence-embedding models transform sentences into dense vector representations that capture their semantic meanings. These representations enable deep learning to perform many tasks efficiently, such as similarity measurement, retrieval, and summarization, with improved semantic understanding. Existing sentence embedding models often struggle to capture the semantic richness and morphological complexity of Arabic, limiting their effectiveness in tasks such as semantic similarity, question answering, summarization, and information retrieval.</p>
                </sec>
                <sec>
                    <title>Objectives</title>
                    <p>This study aims to develop a novel sentence-embedding framework tailored for Arabic that addresses the shortcomings of current models by integrating contextual and linguistic features.</p>
                </sec>
                <sec>
                    <title>Methods</title>
                    <p>We propose a multimodal architecture that combines a fine-tuned Sentence-AraBERT (SAraBERT) model with pre-trained FastText embeddings. The model is evaluated on standard Arabic Semantic Textual Similarity (STS) benchmarks using the Mean Squared Error (MSE) and Pearson Correlation Coefficient.</p>
                </sec>
                <sec>
                    <title>Results</title>
                    <p>Experimental results show that the proposed model outperforms existing baselines, achieving lower MSE values (0.0355) and higher correlation scores (0.8053), indicating a stronger alignment with human-annotated similarity judgments on the ATrD dataset.</p>
                </sec>
                <sec>
                    <title>Conclusion</title>
                    <p>The findings demonstrate the effectiveness of multimodal SAraBERT-based embeddings in enhancing sentence-level semantic understanding of Arabic. This study advances Natural Language Processing (NLP) capabilities for underrepresented languages and provides a foundation for future research on Arabic language understanding using deep learning techniques.</p>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Sentence Embedding</kwd>
                <kwd>Sentence Transformer</kwd>
                <kwd>AraBERT</kwd>
                <kwd>FastText for Arabic</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec6" sec-type="intro">
            <title>Introduction</title>
            <p>The first attempt at real embedding was word embedding, representing a word as a numerical vector in a multi-dimensional space. It lies at the core of recent Natural Language Processing (NLP) tasks and applications. Proper embedding can impact all NLP pipelines that are used in many real applications; however, perfect embedding needs to capture the semantic and syntactic properties of words.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> Some research and experiments have extended the embedding of words into the embedding of a whole sentence and produced semantically meaningful sentence embeddings.
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>,
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup> These attempts have many challenges, such as capturing sentence semantics in a vector space and taking into account word orders, syntactic structure, and the context of the sentence. An interesting aspect of sentence embedding is that the similarity between two sentences is checked by comparing the two fixed-size vectors.</p>
            <p>Traditional sentence representations have a long history, such as the One-hot Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (tf-idf
) to represent a sentence, phrase, or whole document. But one of the first real attempts was in 2014, where Doc2Vec was introduced.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup> However, the impactful revolution was introduced using the transformer architecture
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> by Vaswani et al., where the model focuses on different parts of a sentence simultaneously, leading to better contextual understanding while preserving semantic and syntactic relevance. Based on the transformer mechanism, BERT was introduced by Devlin,
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> where different representations can be used for the same word according to semantics and context. In 2019, Sentence-BERT
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> was introduced as an extension to BERT, presenting the whole sentence in a single dense vector optimized for similarity comparisons that can be extended and used in multiple NLP tasks.</p>
            <p>In the case of Arabic, a nonconcatenative-rich language that has complexity in morphology, syntax, and semantic levels, more preprocessing, different techniques, or specialized approaches are required.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> One of the early approaches used for Arabic was the AraVec Project.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup> This was one of the first attempts to represent Arabic words in vector embedding. Following progress in the English language, Arabert was introduced for Arabic and became a major milestone in Arabic NLP. Following Arabert, the research community came up with Arabic-focused models such as CAMeL-BERT,
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup> which focused on dialectal Arabic, and MARBERT,
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup> which addressed the slang and dialects to be optimized for social media text. Each of these models addresses specific challenges in Arabic NLP.</p>
            <p>Some attempts were made to create a universal language model, yet performance in the Arabic language was not promising, and it underperformed in dedicated language models because of the challenges mentioned before. Examples include XLM-R
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> and Multilingual BERT (mBERT).
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup>
            </p>
            <p>Despite the progress that has taken place in this aspect of sentence embedding, finding a model that provides the best and closest meaning of words in the Arabic language requires many studies to solve the challenges. In this study, an approach was developed by concatenating two models: the sentence transformer-based model and a classical word-embedding model. Therefore, we represent each sentence as a combination of the two vectors.</p>
        </sec>
        <sec id="sec7">
            <title>Related works</title>
            <p>Formally, Sentence embedding is not considered as a standalone task, but as part of other tasks such as IR, QA, summarization, and many others. Therefore, we start with word embedding, which can be used to produce sentence embedding, contextual embedding, and finally, a hybrid approach for producing sentence embedding.</p>
            <p>For word embedding, many approaches have been used and tested for different languages, such as Word2Vec
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup> and GloVe,
                <sup>
                    <xref ref-type="bibr" rid="ref13">13</xref>
                </sup> whereas Barhoumi
                <sup>
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup> presented an Arabic version of these models. FastText
                <sup>
                    <xref ref-type="bibr" rid="ref15">15</xref>
                </sup> extended these models by incorporating sub-word information, addressing key limitations in morphologically rich languages, and handling out-of-vocabulary (OOV) terms, making it suitable for the Arabic language.
                <sup>
                    <xref ref-type="bibr" rid="ref16">16</xref>
                </sup> All of the above word embeddings can be used for sentence embedding using pooling, such as max, average, or other techniques.</p>
            <p>BERT
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> introduced contextual embedding using transformer-based pre-trained language models. Sentence BERT (SBERT)
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> is an extension of BERT that produces a sentence embedding of a fixed size.</p>
            <p>Several BERT variants have been developed in the Arabic NLP domain, several BERT variants have achieved notable progress. AraBERT
                <sup>
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup> was trained on approximately 24GB of Arabic text from news sources and other sources. AraSBERT, a Siamese BERT architecture, enhances the performance on Arabic Semantic Textual Similarity (STS) tasks. Other models include multilingual BERT (mBERT), which supports multiple languages but often underperforms on Arabic because of a limited Arabic-specific vocabulary of around 2,000 tokens versus AraBERT&#x2019;s 60,000,
                <sup>
                    <xref ref-type="bibr" rid="ref18">18</xref>
                </sup> and CAMeLBERT-MSA, specialized for Modern Standard Arabic (MSA).
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup> These models capture the contextual nuances essential for disambiguating the inherent linguistic complexities of Arabic.</p>
            <p>For multimodal text embeddings, the existing literature features common fusion strategies, including the simple concatenation of embedding vectors or the use of shallow neural networks. Hengle
                <sup>
                    <xref ref-type="bibr" rid="ref19">19</xref>
                </sup> introduced a hybrid model for Arabic sarcasm detection and sentiment identification. Their approach concatenates the &#x2018;[CLS]&#x2019; token vector from AraBERT with a feature vector from a CNN-BiLSTM ensemble. This combined vector is fed into the classification layer. Using &#x2018;[CLS]&#x2019; token vector limited this embedding for a few applications, such as next sentence prediction but not for general- purpose Arabic sentence embeddings.</p>
            <p>In addition, several studies have been performed using the AraBERT model, one of which is ArabBert-LSTM by AlOsaimi,
                <sup>
                    <xref ref-type="bibr" rid="ref20">20</xref>
                </sup> who showed that hybrid architectures that use transformer-based AraBERT embeddings and LSTM networks can be very successful in Arabic sentiment analysis and are better than classical machine learning and deep learning methods. In addition, Jefry
                <sup>
                    <xref ref-type="bibr" rid="ref21">21</xref>
                </sup> utilized AraBERT with BiLSTM to improve Arabic sentiment analysis. Similarly, Khachfeh
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>
                </sup> designed a hybrid model to classify Arabic news based on the BERT-BiLSTM model. They showed that the performance of AraBERT on morphologically complex Arabic texts could be significantly improved by fine-tuning the final layer of the model and combining it with a downstream task that applies bidirectional processing.</p>
            <p>Our proposed sentence-embedding approach combines pre-trained FastText and fine-tuned sentence AraBERT to bridge the Representational Chasm by leveraging their respective strengths.</p>
        </sec>
        <sec id="sec8">
            <title>Methodology</title>
            <p>Any model that uses embedding for the Arabic language suffers from many low-accuracy problems, especially when used in applications such as classification, summarization, and question answering. This is primarily due to the fact that the actual embedding values do not reflect the exact meaning of the sentence. Therefore, in this study, we assumed a combination of more than one model to enhance the extraction of sentence embedding.</p>
            <p>The proposed model combines FastText and Sentence AraBERT for Arabic sentence embedding to create a multimodal architecture. In the first step, the Arabert model is fine-tuned by utilizing a triple dataset of the Arabic language to produce the Sentence Arabic BERT (SAraBERT) model. Each row in the triple dataset consisted of three columns: anchor, positive, and negative. In the second step, the SAraBERT model and the pre-trained Arabic FastText model were used to produce the final sentence embedding in a concatenation manner. 
                <xref ref-type="fig" rid="f1">Figure 1</xref> shows a block diagram of the proposed model, whose components are explained in more detail in the following sections.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>
Figure 1. </label>
                <caption>
                    <title>Block diagram of the proposed Arabic sentence embedding model.</title>
                </caption>
                <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/192766/93c59f02-5ff5-44e8-a130-2efa6dedd467_figure1.gif"/>
            </fig>
            <sec id="sec9">
                <title>Sentence-BERT (SBERT) and Sentence-AraBERT (SAraBERT)</title>
                <p>Sentence-BERT (SBERT), by Reimers &amp; Gurevych,
                    <sup>
                        <xref ref-type="bibr" rid="ref2">2</xref>
                    </sup> works on the basis of a Siamese neural network that can be used to generate semantically meaningful sentences through the fine-tuning of BERT models on sentences suited to applications such as semantic textual similarity, sentence classification, question-answer systems, and many others. Nonetheless, the manner in which BERT produces its output at the token level requires the development of aggregation mechanisms to provide sentence-level representations. The key method for obtaining uniform sentence vectors is mean pooling, which involves adding all the token vectors before computing the mean. This method converts the problem of scaling repeated model evaluations into fast vector-space operations. The architecture is a combination of two identical BERT encoders processing the sentence independently with a pooling layer, usually mean pooling, which performs better than max pooling and pooling on the [CLS] token. In this study, we used the same methodology as that used by Reimers.
                    <sup>
                        <xref ref-type="bibr" rid="ref2">2</xref>
                    </sup>
                </p>
                <p>We proposed Sentence AraBERT, which combines the SBERT
                    <sup>
                        <xref ref-type="bibr" rid="ref2">2</xref>
                    </sup> methodology with AraBERT.
                    <sup>
                        <xref ref-type="bibr" rid="ref17">17</xref>
                    </sup> The challenge of developing SAraBERT based on the AraBERT model would entail adopting this Siamese structure, where the AraBERT encoders would be replaced with SAraBERT models, which were modeled independently and specifically trained on Arabic corpora, thereby incorporating built-in knowledge of Arabic syntax, morphology, and semantics. Fine-tuning involves training the Siamese AraBERT model on Arabic sentence pair data, such as Arabic NLI triplet data or Arabic semantic textual similarity data, while retaining the same pooling and concatenation strategies used by SBERT. This adaptation would allow retention of the Arabic language using AraBERT and introduce the ability to encode semantics at the sentence level with high efficiency, resulting in a sentence transformer model best suited to Arabic text processing and semantic similarity applications. 
                    <xref ref-type="fig" rid="f2">Figure 2</xref> shows the SAraBert architecture, which follows the methodology of Reimers and Gurevych.
                    <sup>
                        <xref ref-type="bibr" rid="ref2">2</xref>
                    </sup>
                </p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>
Figure 2. </label>
                    <caption>
                        <title>SAraBERT architecture.</title>
                        <p>Parameter n refers to the dimensionality of embeddings (768 by default for ArBERT base).</p>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/192766/93c59f02-5ff5-44e8-a130-2efa6dedd467_figure2.gif"/>
                </fig>
            </sec>
            <sec id="sec10">
                <title>Fast text model</title>
                <p>FastText
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup> is a word-embedding model that trains sub-word information, making it robust to out-of-vocabulary (OOV) words. For the Arabic language, the pretrained Arabic FastText model of Grave
                    <sup>
                        <xref ref-type="bibr" rid="ref23">23</xref>
                    </sup> was used. It was trained according to a Common Crawl dataset. This model is intended to create word embeddings that are as accurate as possible to represent the semantics of Arabic language words, and it can be broadly applied to most natural language processing (NLP) tasks. The output was a 300-dimensional vector representation of each word. For sentence embedding, the embedding for each word in the sentence using FastText was taken, and then the average of the word vectors was used to form the sentence vector in a pooling task. The sentence vector is a 300-dimensional vector of the same size as the word vector.</p>
            </sec>
            <sec id="sec11">
                <title>Combination of the two outputs</title>
                <p>We have two vectors of different sizes: one of size 300 and the other of size 768. This difference makes the combination more difficult for traditional pooling methods, such as Max, Average, or [CLS] token embedding. Our suggestion was to concatenate the two embeddings to produce a new embedding size of 1068 dimensions, as shown in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>
Figure 3. </label>
                    <caption>
                        <title>Concatenation of sentence embedding to produce 1068 sentence embedding.</title>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/192766/93c59f02-5ff5-44e8-a130-2efa6dedd467_figure3.gif"/>
                </fig>
            </sec>
        </sec>
        <sec id="sec12">
            <title>Experimental results and evaluations</title>
            <p>All experiments were implemented using the latest version of Python with some libraries in a Kaggle environment. For fine-tuning, we used PyTorch 2.4.1+cu121 of the transformer model
                <sup>
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup> in Python 3.10.12, and the experiments were run in a multi-GPU setting with P100 GPUs. AraBERT, a 136-million-parameter BERT-base-Arabertv02, was used as a baseline to build the SAraBERT model.</p>
            <p>For the evaluation process, the Mean Squared Error (MSE) and Pearson correlation coefficient were used for gold standard similarity annotation. MSE is the average of the squared difference between the predicted values (
                <inline-formula>

                    <mml:math display="inline">
                        <mml:mover accent="true">
                            <mml:msub>
                                <mml:mi>y</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo stretchy="true">&#x0302;</mml:mo>
                        </mml:mover>
                    </mml:math>
</inline-formula>) and real values (y
                <sub>i</sub>). 
                <xref ref-type="disp-formula" rid="e1">Eq. (1)</xref> shows the formula used for MSE for n examples.
                <disp-formula id="e1">

                    <mml:math display="block">
                        <mml:mi mathvariant="italic">MSE</mml:mi>
                        <mml:mo>=</mml:mo>
                        <mml:mfrac>
                            <mml:mn>1</mml:mn>
                            <mml:mi>n</mml:mi>
                        </mml:mfrac>
                        <mml:munderover>
                            <mml:mo>&#x2211;</mml:mo>
                            <mml:mrow>
                                <mml:mi>i</mml:mi>
                                <mml:mo>=</mml:mo>
                                <mml:mn>1</mml:mn>
                            </mml:mrow>
                            <mml:mi>n</mml:mi>
                        </mml:munderover>
                        <mml:msup>
                            <mml:mrow>
                                <mml:mo stretchy="true">(</mml:mo>
                                <mml:mover accent="true">
                                    <mml:msub>
                                        <mml:mi>y</mml:mi>
                                        <mml:mi>i</mml:mi>
                                    </mml:msub>
                                    <mml:mo stretchy="true">&#x0302;</mml:mo>
                                </mml:mover>
                                <mml:mo>&#x2212;</mml:mo>
                                <mml:msub>
                                    <mml:mi>y</mml:mi>
                                    <mml:mi>i</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="true">)</mml:mo>
                            </mml:mrow>
                            <mml:mn>2</mml:mn>
                        </mml:msup>
                    </mml:math>

                    <label>(1)</label>
</disp-formula>
            </p>
            <p>
The second measure of evaluation, the Pearson correlation coefficient (r), was employed to determine the intensity of the line correlation between perfect semantic similarity and cosine similarity. It runs between -1 (perfect negative linear correlation) and +1 (perfect positive linear correlation), where 0 corresponds to a lack of linear correlation. It is defined as in 
                <xref ref-type="disp-formula" rid="e2">Eq. (2)</xref>.
                <disp-formula id="e2">

                    <mml:math display="block">
                        <mml:mi>r</mml:mi>
                        <mml:mo>=</mml:mo>
                        <mml:mfrac>
                            <mml:mrow>
                                <mml:munderover>
                                    <mml:mo>&#x2211;</mml:mo>
                                    <mml:mrow>
                                        <mml:mi>i</mml:mi>
                                        <mml:mo>=</mml:mo>
                                        <mml:mn>1</mml:mn>
                                    </mml:mrow>
                                    <mml:mi>n</mml:mi>
                                </mml:munderover>
                                <mml:mrow>
                                    <mml:mo stretchy="true">(</mml:mo>
                                    <mml:msub>
                                        <mml:mi>x</mml:mi>
                                        <mml:mi>i</mml:mi>
                                    </mml:msub>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mover accent="true">
                                        <mml:mi mathvariant="italic">x</mml:mi>
                                        <mml:mo>&#x20d0;</mml:mo>
                                    </mml:mover>
                                    <mml:mo stretchy="true">)</mml:mo>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mo stretchy="true">(</mml:mo>
                                    <mml:msub>
                                        <mml:mi>y</mml:mi>
                                        <mml:mi>i</mml:mi>
                                    </mml:msub>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mover accent="true">
                                        <mml:mi mathvariant="italic">y</mml:mi>
                                        <mml:mo>&#x20d0;</mml:mo>
                                    </mml:mover>
                                    <mml:mo stretchy="true">)</mml:mo>
                                </mml:mrow>
                            </mml:mrow>
                            <mml:mrow>
                                <mml:msqrt>
                                    <mml:mrow>
                                        <mml:munderover>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                <mml:mo>=</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                            <mml:mi>n</mml:mi>
                                        </mml:munderover>
                                        <mml:msup>
                                            <mml:mrow>
                                                <mml:mo stretchy="true">(</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>x</mml:mi>
                                                    <mml:mi>i</mml:mi>
                                                </mml:msub>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mover accent="true">
                                                    <mml:mi mathvariant="italic">x</mml:mi>
                                                    <mml:mo>&#x20d0;</mml:mo>
                                                </mml:mover>
                                                <mml:mo stretchy="true">)</mml:mo>
                                            </mml:mrow>
                                            <mml:mn>2</mml:mn>
                                        </mml:msup>
                                    </mml:mrow>
                                </mml:msqrt>
                                <mml:msqrt>
                                    <mml:mrow>
                                        <mml:munderover>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                <mml:mo>=</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                            <mml:mi>n</mml:mi>
                                        </mml:munderover>
                                        <mml:msup>
                                            <mml:mrow>
                                                <mml:mo stretchy="true">(</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>y</mml:mi>
                                                    <mml:mi>i</mml:mi>
                                                </mml:msub>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mover accent="true">
                                                    <mml:mi mathvariant="italic">y</mml:mi>
                                                    <mml:mo>&#x20d0;</mml:mo>
                                                </mml:mover>
                                                <mml:mo stretchy="true">)</mml:mo>
                                            </mml:mrow>
                                            <mml:mn>2</mml:mn>
                                        </mml:msup>
                                    </mml:mrow>
                                </mml:msqrt>
                            </mml:mrow>
                        </mml:mfrac>
                    </mml:math>

                    <label>(2)</label>
</disp-formula>
            </p>
            <p>In the next subsections, a description of the datasets used, the results, and the analysis are presented.</p>
            <sec id="sec13">
                <title>Datasets</title>
                <p>Two types of datasets were used: one for training and fine-tuning the SAraBERT model, and one for evaluating the final sentence embedding output. FastText was used as a pre-trained model
                    <sup>
                        <xref ref-type="bibr" rid="ref23">23</xref>
                    </sup>; therefore, it did not require a training dataset.</p>
                <p>The first dataset was the Arabic triplet dataset (ATrD)
                    <sup>
                        <xref ref-type="bibr" rid="ref24">24</xref>
                    </sup> of one million triplets (342.53 MB). The split ratios used for training, validation, and testing were 70%, 20%, and 10%, respectively. A triplet contains an anchor, a positive example that is semantically similar to the anchor, and a negative example that is semantically different. This architecture enables the model to learn and recognize acceptable semantic variations. It was used for fine-tuning the learning of the proposed SAraBERT model.</p>
                <p>The second dataset is the Arabic Version of the Semantic Textual Similarity Benchmark (STSB)
                    <sup>
                        <xref ref-type="bibr" rid="ref24">24</xref>
                    </sup> based on the English version in,
                    <sup>
                        <xref ref-type="bibr" rid="ref25">25</xref>
                    </sup> which is semantically similar to Arabic sentence pairs. It is a heterogeneous collection of sentence pairs with many different domains, such as news headlines, video and image captions, and natural-language inference data. Each sentence pair in the dataset was annotated manually with a similarity score rated on a scale of 1-5. In this particular Arabic variant, we normalized the similarity scores to a range of 0&#x2013;1, allowing us to compare and analyze semantic similarity throughout the dataset. This dataset was used to evaluate the final sentence embedding.</p>
            </sec>
            <sec id="sec14">
                <title>Results and analyses</title>
                <p>We performed all-encompassing experiments to compare our contextualized embedding-based transformer encoder (CETE) to modern state-of-the-art methods. To evaluate the effectiveness of our methodology, we compare our model with some of our baseline configurations.</p>
                <p>In our feature-based method, we used a hybrid embedding plan that combines SAraBERT contextualized embeddings and FastText word embedding. Such a combination takes advantage of AraBERT&#x2019;s contextual capabilities and FastText sub-word information embedded in embeddings. This model is built upon AraBERT, a transformer-based language model specifically pre-trained on a vast corpus of Arabic text.</p>
                <p>First, AraBERT was fine-tuned using the STSB dataset with Siamese BERT to produce SAraBERT. A pretrained fasttext was used; therefore, five types of testing were performed using the ATrD dataset. The MSE and correlation were estimated for these five types of tests as follows: FastText alone, AraBERT v2, SAraBERT, AraBERT+ FastText, and SAraBERT+ FastText. 
                    <xref ref-type="table" rid="T1">
Table 1</xref> shows the MSE and correlation for these tests, whereas 
                    <xref ref-type="fig" rid="f4">Figure 4</xref> shows the visualization for four of these models, excluding the combination of AraBERT v2+fasttext.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>
Table 1. </label>
                    <caption>
                        <title>MSE and correlation for five models.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Model</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">MSE</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Correlation</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">FASTTEXT</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.1109</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.5679</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Arabert v2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.1774</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.3401</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">SAraBERT</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.0466</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.7869</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">AraBERT v2+fasttext</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.1292</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.5674</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">SAraBERT v2+fasttext</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.0355</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.8053</bold>
</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>
Figure 4. </label>
                    <caption>
                        <title>Visual representation of results of Arabic sentence embedding models comparison (4 types of tests).</title>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/192766/93c59f02-5ff5-44e8-a130-2efa6dedd467_figure4.gif"/>
                </fig>
                <p>As shown in 
                    <xref ref-type="table" rid="T1">
Table 1</xref>, Sarabert achieved a Mean Squared Error (MSE) of 0.0466, which is an improvement over AraBERT-V2. This means that fine-tuning with Siamese architecture is more powerful for Arabic sentence embedding. The best MSE and Correlation values were obtained from the proposed combination approach of SAraBERT v2 + fasttext, which were 0.0355 (minimum) and 0.8053 (maximum), respectively. Comparing this value of MSE with the nearest values (SAraBERT results), the value decreased by 0.1419. This signifies an enhanced precision and robustness, underscoring the benefits of integrating contextual and morphological information.</p>
                <p>Despite AraBERT&#x2019;s strength in the modeling context and disambiguating polysemous terms, it faces challenges with out-of-vocabulary words and dialectal expressions that are underrepresented in its training corpus. Informal social media slang, for example, may be fragmented by word-piece tokenization, reducing semantic clarity for such words.</p>
                <p>The results also show that FastText complements AraBERT by modeling subword information, which enhances robustness in morphologically rich languages such as Arabic.</p>
                <p>We used the baseline AraBERT model as our main point of comparison because it is the current standard for Arabic language processing tasks. The improved architecture also keeps the architectural size of the hidden layers and feed-forward networks unchanged at the baseline to allow a fair comparison. The only difference is the use of the sentence transform approach to capture deep contextual understanding and intricate semantic relationships within sentences.</p>
                <p>For further comparison, we also fine-tuned the DistilBERT base multilingual based on the same dataset (STSB) and parameters that we used in AraBERT fine-tuning to produce S-DistilBERT. 
                    <xref ref-type="table" rid="T2">
Table 2</xref> presents the results obtained. We chose the DistilBERT model because, according to the given comparison in,
                    <sup>
                        <xref ref-type="bibr" rid="ref26">26</xref>
                    </sup> this model is considered one of the best models for sentence embedding transformers for Arabic text classification. However, according to the results shown in 
                    <xref ref-type="table" rid="T1">
Tables 1</xref> and 
                    <xref ref-type="table" rid="T2">2</xref>, the SAraBERT model gives better results in the embedding than DistilBERT, so we chose to combine it with the fasttext model.</p>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>
Table 2. </label>
                    <caption>
                        <title>Results of Sdistilbert and the combination of (Sdistilbert+fasttext).</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Model</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">MSE</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Correlation</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sdistilbert</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.0491</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.7047</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sdistilbert+fasttext</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.0413</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.7550</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
        </sec>
        <sec id="sec15">
            <title>Conclusions and future work</title>
            <p>The proposed work is an improvement in sentence embeddings using a combination of contextualized Sentence AraBERT representations with pooled FastText word vectors to perform better Arabic text processing tasks. Our experiments with several Arabic datasets indicate that the performance of our hybrid embedding method is significantly better than that of the AraBERT baselines. Moreover, we found that our hybrid approach successfully incorporates both contextual semantics via AraBERT and morphological features via FastText and pliant stronger sentence representations.</p>
            <p>Sentence-AraBERT (SAraBERT) extends the original AraBERT architecture to include the Siamese network architecture (as in SBERT) in addition to triplet loss functions to generate sentence embeddings that preserve semantic meaning.</p>
            <p>We found that our joint AraBERT-FastText model sets new standards for Arabic sentence embedding strategies. The hybrid approach is particularly useful in Arabic language processing because it has a rich morphological structure. In addition, the sub-word information provided by FastText was supplemented by contextual knowledge provided by AraBERT.</p>
            <p>It has been shown that multi-paradigm embedding can bring significant benefits over single-paradigm methods, even when it is not fine-tuned or trained on additional large corpora. Finally, we provide our implementation and trained models to the public so that they can conduct further research and make our experiments reproducible.</p>
            <p>At first glance, it may seem that the vector has a high dimension, but there are many practical models that have text embedding of more than 1068 dimensions. For example, in OpenAI Text-Embedding v3, vector embedding has a size of 1536/3072 while E5-Mistral-7B-Instruct has a vector embedding of 4096 dimensions. In addition to processing units, such as the GPT, the processing time is very small.</p>
            <p>In future work, we will also explore how our hybrid embedding method performs on other Arabic NLP problems, including Arabic information retrieval applications, Arabic sentiment analysis, Arabic named-entity recognition, Arabic text classification on imbalanced datasets, and Arabic document summarization. Another area that we are planning to expand and test is how well we can incorporate other embedding techniques and how we can apply our approach to other morphologically rich languages.</p>
        </sec>
    </body>
    <back>
        <sec id="sec18" sec-type="data-availability">
            <title>Data availability statement</title>
            <p>The datasets used in this study are publicly available and distributed as Third-party datasets. The Arabic Natural Language Inference Triplet (Arabic-NLI-Triplet) (Omer Nacar: 
                <email xlink:href="mailto:onajar@psu.edu.sa">onajar@psu.edu.sa</email>) dataset can be accessed via Zenodo at 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.18169892">https://doi.org/10.5281/zenodo.18169892</ext-link>. The Arabic Semantic Textual Similarity (Arabic STS) benchmark dataset (Omer Nacar: 
                <email xlink:href="mailto:onajar@psu.edu.sa">onajar@psu.edu.sa</email>) is available at 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.18170487">https://doi.org/10.5281/zenodo.18170487</ext-link>.</p>
            <p>Each dataset contains training, validation, and test files (train.csv, validation.csv, and test.csv). Both datasets are distributed, enabling readers and reviewers to access and reuse the data under the same conditions as the authors.</p>
        </sec>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gong</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bhat</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Viswanath</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <chapter-title>Embedding Syntax and Semantics of Prepositions via Tensor Decomposition.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.</italic>
</source>
                    <year>2018</year>; Volume<volume>1</volume>(<issue>Long Papers</issue>): pp.<fpage>896</fpage>&#x2013;<lpage>906</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/N18-1082</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Reimers</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gurevych</surname>
                            <given-names>I</given-names>
                        </name>
</person-group>:
                    <chapter-title>Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.</chapter-title>
                    <source>

                        <italic toggle="yes">Conference on Empirical Methods in Natural Language Processing.</italic>
</source>
                    <publisher-loc>Hong Kong, China</publisher-loc>:<year>2019</year>.
                    <pub-id pub-id-type="doi">10.18653/v1/D19-1410</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Le</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mikolov</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <chapter-title>Distributed Representations of Sentences and Documents.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 31st International Conference on Machine Learning.</italic>
</source>
                    <year>22--24 Jun 2014</year>; vol.<volume>32</volume>: pp.<fpage>1188</fpage>&#x2013;<lpage>1196</lpage>.
                    <pub-id pub-id-type="doi">10.48550/arXiv.1405.4053</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vaswani</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brain</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shazeer</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Attention is all you need.</article-title>
                    <source>

                        <italic toggle="yes">Adv. Neural Inf. Proces. Syst.</italic>
</source>
                    <year>2017</year>;<volume>30</volume>.
                    <pub-id pub-id-type="doi">10.48550/arXiv.1706.03762</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Devlin</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chang</surname>
                            <given-names>M-W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Bert: Pre-training of deep bidirectional transformers for language understanding.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies.</italic>
</source>
                    <year>2019</year>; Volume<volume>1</volume>(<issue>long and short papers</issue>): pp.<fpage>4171</fpage>&#x2013;<lpage>4186</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/N19-1423</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Matrane</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Benabbou</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sael</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>A systematic literature review of Arabic dialect sentiment analysis.</article-title>
                    <source>

                        <italic toggle="yes">Journal of King Saud University - Computer and Information Sciences.</italic>
</source>
                    <year>2023</year>;<volume>35</volume>(<issue>6</issue>):<fpage>101570</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.jksuci.2023.10157 0</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Soliman</surname>
                            <given-names>AB</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Eissa</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>El-Beltagy</surname>
                            <given-names>SR</given-names>
                        </name>
</person-group>:
                    <article-title>AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP.</article-title>
                    <source>

                        <italic toggle="yes">Procedia Computer Science.</italic>
</source>
                    <year>2017</year>;<volume>117</volume>:<fpage>256</fpage>&#x2013;<lpage>265</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.procs.2017.10.117</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Inoue</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Alhafni</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Baimukan</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the Sixth Arabic Natural Language Processing Workshop.</italic>
</source>
                    <year>2021</year>; pp.<fpage>92</fpage>&#x2013;<lpage>104</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/2021.wanlp-1.10</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Abdul-Mageed</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Elmadany</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nagoudi</surname>
                            <given-names>EMB</given-names>
                        </name>
</person-group>:
                    <chapter-title>ARBERT &amp; MARBERT: Deep Bidirectional Transformers for Arabic.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</italic>
</source>
                    <year>2021</year>; Volume<volume>1</volume>(<issue>Long Papers</issue>): pp.<fpage>7088</fpage>&#x2013;<lpage>7105</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/2021.acl-long.551</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ruder</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>S&#x00f8;gaard</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vuli&#x0107;</surname>
                            <given-names>I</given-names>
                        </name>
</person-group>:
                    <chapter-title>Unsupervised cross-lingual representation learning.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts.</italic>
</source>
                    <year>2019</year>; pp.<fpage>31</fpage>&#x2013;<lpage>38</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/P19-4007</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dredze</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <chapter-title>Are All Languages Created Equal in Multilingual BERT?</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 5th Workshop on Representation Learning for NLP.</italic>
</source>
                    <year>2020</year>; pp.<fpage>120</fpage>&#x2013;<lpage>130</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/2020.repl4nlp-1.16</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mikolov</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sutskever</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Distributed Representations of Words and Phrases and their Compositionality.</chapter-title>
                    <source>

                        <italic toggle="yes">Advances in Neural Information Processing Systems.</italic>
</source>
                    <year>2013</year>; vol.<volume>26</volume>.
                    <pub-id pub-id-type="doi">10.48550/arXiv.1310.4546</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pennington</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Socher</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Manning</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <chapter-title>GloVe: Global Vectors for Word Representation.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).</italic>
</source>
                    <year>2014</year>; pp.<fpage>1532</fpage>&#x2013;<lpage>1543</lpage>.
                    <pub-id pub-id-type="doi">10.3115/v1/D14-1162</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Barhoumi</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Est&#x00e8;ve</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Aloulou</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Document embeddings for Arabic Sentiment Analysis.</chapter-title>
                    <source>

                        <italic toggle="yes">The First Conference on Language Processing and Knowledge Management (LPKM 2017).</italic>
</source>
                    <publisher-loc>Sfax, Tunisia</publisher-loc>:<year>2017</year>.
                    <pub-id pub-id-type="doi">10.1109/LPKM.2017.8103994</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bojanowski</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Grave</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Joulin</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Enriching Word Vectors with Subword Information.</article-title>
                    <source>

                        <italic toggle="yes">Transactions of the Association for Comput. Linguist.</italic>
</source>
                    <year>2017</year>; vol.<volume>5</volume>: pp.<fpage>135</fpage>&#x2013;<lpage>146</lpage>. 06.
                    <pub-id pub-id-type="doi">10.1162/tacl_a_00051</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Almandouh</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Alrahmawy</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Eisa</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Ensemble based highperformance deep learning models for fake news detection.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Rep.</italic>
</source>
                    <year>2024</year>;<volume>14</volume>:<fpage>26591</fpage>.
                    <pub-id pub-id-type="doi">10.1038/s41598-024-77761-1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Antoun</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Baly</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hajj</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <chapter-title>AraBERT: Transformer-based Model for Arabic Language Understanding.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection.</italic>
</source>
                    <year>2020</year>; pp.<fpage>9</fpage>&#x2013;<lpage>15</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/2020.osact-1.2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ahmed</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Alfasly</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wen</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>AlclaM: Arabic Dialect Language Model.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the Second Arabic Natural Language Processing Conference.</italic>
</source>
                    <year>2024</year>; pp.<fpage>153</fpage>&#x2013;<lpage>159</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/2024.arabicnlp-1.14</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hengle</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kshirsagar</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Desai</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Combining Context-Free and Contextualized Representations for Arabic Sarcasm Detection and Sentiment Identification.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the Sixth Arabic Natural Language Processing Workshop.</italic>
</source>
                    <year>2021</year>; pp.<fpage>357</fpage>&#x2013;<lpage>363</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/2021.wanlp-1.46</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alosaimi</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ArabBert-LSTM: improving Arabic sentiment analysis based on transformer model and Long Short-Term Memory.</article-title>
                    <source>

                        <italic toggle="yes">Frontiers in Artificial Intelligence.</italic>
</source>
                    <year>2024</year>;<volume>7</volume>.
                    <pub-id pub-id-type="pmid">39015364</pub-id>
                    <pub-id pub-id-type="doi">10.3389/frai.2024.1408845</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11250580</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jefry</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Al-Doghman</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hussain</surname>
                            <given-names>FK</given-names>
                        </name>
</person-group>:
                    <chapter-title>BERT-LA: Leveraging BERT and AraBERT With Bi-LSTM for Cross-Lingual Sentiment Analysis of English and Arabic Texts.</chapter-title>
                    <source>

                        <italic toggle="yes">17th International Conference on Security of Information and Networks, SIN 2024, Sydney, Australia, December 2-4, 2024.</italic>
</source>
                    <year>2024</year>; pp.<fpage>1</fpage>&#x2013;<lpage>10</lpage>.
                    <pub-id pub-id-type="doi">10.1109/SIN63213.2024.10871432</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Khachfeh</surname>
                            <given-names>RA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>El Kabani</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Osman</surname>
                            <given-names>Z</given-names>
                        </name>
</person-group>:
                    <chapter-title>An Enhanced Hybrid BERT-BiLSTM Learning Model for Arabic News Classification.</chapter-title>
                    <source>

                        <italic toggle="yes">2025 International Conference on Machine Intelligence and Smart Innovation (ICMISI).</italic>
</source>
                    <year>2025</year>; pp.<fpage>201</fpage>&#x2013;<lpage>206</lpage>.
                    <pub-id pub-id-type="doi">10.1109/ICMISI65108.2025.11115581</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Grave</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bojanowski</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gupta</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Learning Word Vectors for 157 Languages.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).</italic>
</source>
                    <publisher-loc>Miyazaki, Japan</publisher-loc>:<year>2018</year>.
                    <pub-id pub-id-type="doi">10.48550/arXiv.1802.06893</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nacar</surname>
                            <given-names>O</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Koubaa</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <chapter-title>Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning.</chapter-title>
                    <source>

                        <italic toggle="yes">Generative AI and Large Language Models: Opportunities, Challenges, and Applications: Volume 1.</italic>
</source>
                    <person-group person-group-type="editor">

                        <name name-style="western">
                            <surname>Koubaa</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ammar</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ghouti</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>, editors.
                    <publisher-loc>Cham</publisher-loc>:
                    <publisher-name>Springer Nature Switzerland</publisher-name>;<year>2025</year>; pp.<fpage>179</fpage>&#x2013;<lpage>216</lpage>.
                    <pub-id pub-id-type="doi">10.1007/978-3-031-90573-5_6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cer</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Diab</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Agirre</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017).</italic>
</source>
                    <year>2017</year>; pp.<fpage>1</fpage>&#x2013;<lpage>14</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/S17-2001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Elbeltagi</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Comparing Arabic Sentence Transformers.</italic>
</source>
                    <publisher-name>GitHub</publisher-name>;
Retrieved September 5, 2025.<year>2024</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/m-elbeltagi/Comparing_Arabic_Sentence_Transformers?tab=readme-ov-file">Reference Source</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report463535">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.192766.r463535</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Sibaee</surname>
                        <given-names>Serry</given-names>
                    </name>
                    <xref ref-type="aff" rid="r463535a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r463535a1">
                    <label>1</label>Prince Sultan University, Riyadh, Saudi Arabia</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>2</day>
                <month>4</month>
                <year>2026</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Sibaee S</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport463535" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.174830.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Summary</p>
            <p> The paper proposes combining a fine-tuned Sentence-AraBERT (SAraBERT) model with pre-trained FastText embeddings for Arabic sentence representation. The two vectors (300-dim from FastText, 768-dim from AraBERT) are concatenated to produce a 1068-dimensional sentence embedding. Evaluation on the ATrD dataset shows the combined model achieves MSE of 0.0355 and Pearson correlation of 0.8053, outperforming each component in isolation.</p>
            <p> </p>
            <p> Evaluation</p>
            <p> 
                <bold>Is the work clearly and accurately presented and does it cite the current literature? Partly</bold>
            </p>
            <p> The paper is readable but contains a significant terminological error: calling this a "multimodal" approach is incorrect. Both FastText and AraBERT operate on text. This must be corrected to "hybrid" or "fusion-based" throughout. The literature review is adequate but omits relevant recent Arabic sentence embedding work such as AraSBERT, which is directly comparable.</p>
            <p> 
                <bold>Is the study design appropriate and is the work technically sound?&#x00a0; Partly</bold>
            </p>
            <p> The core idea is sound but the experimental design is narrow. Five models are tested on a single dataset. There is no ablation of the concatenation strategy itself why not learned fusion, attention-weighted combination, or dimensionality-normalized pooling? The choice of simple concatenation is not experimentally justified beyond reporting that it works.</p>
            <p> 
                <bold>Are sufficient details provided to allow replication?&#x00a0; No</bold>
            </p>
            <p> The following are missing or underspecified: training hyperparameters (learning rate, batch size, number of epochs, warmup steps), tokenization details, how similarity scores were computed from the 1068-dim vector (cosine? dot product?), and the exact train/validation/test split applied to the evaluation STSB dataset. Without these, replication is not possible.</p>
            <p> 
                <bold>Is the statistical analysis appropriate?&#x00a0; No</bold>
            </p>
            <p> Results are from a single run with no confidence intervals, standard deviations, or significance tests. Given the sensitivity of fine-tuned transformer models to random seeds, this is insufficient. A minimum of three runs with reported variance is required.</p>
            <p> 
                <bold>Are source data available for reproducibility? Partly</bold>
            </p>
            <p> Training datasets are publicly linked, which is good. However, the trained SAraBERT model weights and the exact fine-tuning code are not provided, despite the authors stating they will make them public. A working repository link must be included before acceptance.</p>
            <p> 
                <bold>Are the conclusions supported by the results?&#x00a0; No</bold>
            </p>
            <p> The paper claims the approach "advances NLP capabilities for underrepresented languages" and establishes new standards for Arabic sentence embedding&#x00a0; both claims go beyond what a single-dataset evaluation supports. The improvement over SAraBERT alone (MSE drop of ~0.011) is modest and its practical significance is not discussed.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>No</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>No</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>No</p>
            <p>Reviewer Expertise:</p>
            <p>Arabic NLP</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report463539">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.192766.r463539</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Elnagar</surname>
                        <given-names>Ashraf</given-names>
                    </name>
                    <xref ref-type="aff" rid="r463539a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r463539a1">
                    <label>1</label>University of Sharjah,, Sharjah,, United Arab Emirates</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>18</day>
                <month>3</month>
                <year>2026</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Elnagar A</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport463539" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.174830.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The article presents a hybrid Arabic sentence embedding approach combining SAraBERT and FastText. While the paper is readable and the general goal is relevant, I do not find the current contribution sufficiently strong for indexing in its present form.</p>
            <p> </p>
            <p> The main concern is that the work appears 
                <underline>incremental and insufficiently novel</underline>. The proposed method is essentially a combination of two existing text-based embedding techniques, without a clearly new modeling contribution or sufficiently broad empirical validation to demonstrate a substantial advance. The manuscript does not convincingly establish why this should be considered more than a straightforward hybrid baseline.</p>
            <p> </p>
            <p> A second issue is that the paper appears to use 
                <bold>&#x201c;multimodal&#x201d; incorrectly</bold>. This is 
                <bold>not a multimodal approach</bold>, since both SAraBERT and FastText operate on the same modality, namely text. The method is better described as a hybrid text embedding approach or a fusion of textual representations. Referring to it as multimodal is misleading and should be corrected.</p>
            <p> </p>
            <p> The study design is also too limited to support the broader claims. The evaluation is narrow, the baseline set is not strong enough to establish clear superiority, and the conclusions extend beyond what the reported experiments justify. In addition, the manuscript lacks enough implementation detail for full replication, and the statistical analysis is not adequate: results are reported without repeated runs, uncertainty estimates, or significance testing.</p>
            <p> </p>
            <p> The paper has a reasonable idea, but in its current form it is too limited in novelty, validation, and reproducibility. The conclusions are stronger than the evidence supports, and the framing of the method should be corrected.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>No</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>No</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>No</p>
            <p>Reviewer Expertise:</p>
            <p>NLP</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
</article>
