Background

F1000Research

2046-1402

F1000 Research Limited

London, UK

10.12688/f1000research.174830.1

Research Article

Articles

Sentence Embedding Using Multimodal Approach: Combining FastText with AraBERT for Arabic Text Representation

[version 1; peer review: 2 not approved]

Almayyali

Hind

Formal Analysis Methodology Project Administration Resources Software Validation Visualization Writing – Original Draft Preparation Writing – Review & Editing 1 Aliwy

Ahmed

Formal Analysis Methodology Supervision Validation Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0001-8032-8185 a 1 1computer science, University of Kufa, Kufa, Najaf Governorate, Iraq

a ahmedh.almajidy@uokufa.edu.iq

No competing interests were disclosed.

6 2 2026

2026

206

29 1 2026

2026

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background

Sentence-embedding models transform sentences into dense vector representations that capture their semantic meanings. These representations enable deep learning to perform many tasks efficiently, such as similarity measurement, retrieval, and summarization, with improved semantic understanding. Existing sentence embedding models often struggle to capture the semantic richness and morphological complexity of Arabic, limiting their effectiveness in tasks such as semantic similarity, question answering, summarization, and information retrieval.

Objectives

This study aims to develop a novel sentence-embedding framework tailored for Arabic that addresses the shortcomings of current models by integrating contextual and linguistic features.

Methods

We propose a multimodal architecture that combines a fine-tuned Sentence-AraBERT (SAraBERT) model with pre-trained FastText embeddings. The model is evaluated on standard Arabic Semantic Textual Similarity (STS) benchmarks using the Mean Squared Error (MSE) and Pearson Correlation Coefficient.

Results

Experimental results show that the proposed model outperforms existing baselines, achieving lower MSE values (0.0355) and higher correlation scores (0.8053), indicating a stronger alignment with human-annotated similarity judgments on the ATrD dataset.

Conclusion

The findings demonstrate the effectiveness of multimodal SAraBERT-based embeddings in enhancing sentence-level semantic understanding of Arabic. This study advances Natural Language Processing (NLP) capabilities for underrepresented languages and provides a foundation for future research on Arabic language understanding using deep learning techniques.

Sentence Embedding Sentence Transformer AraBERT FastText for Arabic

The author(s) declared that no grants were involved in supporting this work.

Introduction

The first attempt at real embedding was word embedding, representing a word as a numerical vector in a multi-dimensional space. It lies at the core of recent Natural Language Processing (NLP) tasks and applications. Proper embedding can impact all NLP pipelines that are used in many real applications; however, perfect embedding needs to capture the semantic and syntactic properties of words. ¹ Some research and experiments have extended the embedding of words into the embedding of a whole sentence and produced semantically meaningful sentence embeddings. ^{2,
3} These attempts have many challenges, such as capturing sentence semantics in a vector space and taking into account word orders, syntactic structure, and the context of the sentence. An interesting aspect of sentence embedding is that the similarity between two sentences is checked by comparing the two fixed-size vectors.

Traditional sentence representations have a long history, such as the One-hot Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (tf-idf ) to represent a sentence, phrase, or whole document. But one of the first real attempts was in 2014, where Doc2Vec was introduced. ³ However, the impactful revolution was introduced using the transformer architecture ⁴ by Vaswani et al., where the model focuses on different parts of a sentence simultaneously, leading to better contextual understanding while preserving semantic and syntactic relevance. Based on the transformer mechanism, BERT was introduced by Devlin, ⁵ where different representations can be used for the same word according to semantics and context. In 2019, Sentence-BERT ² was introduced as an extension to BERT, presenting the whole sentence in a single dense vector optimized for similarity comparisons that can be extended and used in multiple NLP tasks.

In the case of Arabic, a nonconcatenative-rich language that has complexity in morphology, syntax, and semantic levels, more preprocessing, different techniques, or specialized approaches are required. ⁶ One of the early approaches used for Arabic was the AraVec Project. ⁷ This was one of the first attempts to represent Arabic words in vector embedding. Following progress in the English language, Arabert was introduced for Arabic and became a major milestone in Arabic NLP. Following Arabert, the research community came up with Arabic-focused models such as CAMeL-BERT, ⁸ which focused on dialectal Arabic, and MARBERT, ⁹ which addressed the slang and dialects to be optimized for social media text. Each of these models addresses specific challenges in Arabic NLP.

Some attempts were made to create a universal language model, yet performance in the Arabic language was not promising, and it underperformed in dedicated language models because of the challenges mentioned before. Examples include XLM-R ¹⁰ and Multilingual BERT (mBERT). ¹¹

Despite the progress that has taken place in this aspect of sentence embedding, finding a model that provides the best and closest meaning of words in the Arabic language requires many studies to solve the challenges. In this study, an approach was developed by concatenating two models: the sentence transformer-based model and a classical word-embedding model. Therefore, we represent each sentence as a combination of the two vectors.

Related works

Formally, Sentence embedding is not considered as a standalone task, but as part of other tasks such as IR, QA, summarization, and many others. Therefore, we start with word embedding, which can be used to produce sentence embedding, contextual embedding, and finally, a hybrid approach for producing sentence embedding.

For word embedding, many approaches have been used and tested for different languages, such as Word2Vec ¹² and GloVe, ¹³ whereas Barhoumi ¹⁴ presented an Arabic version of these models. FastText ¹⁵ extended these models by incorporating sub-word information, addressing key limitations in morphologically rich languages, and handling out-of-vocabulary (OOV) terms, making it suitable for the Arabic language. ¹⁶ All of the above word embeddings can be used for sentence embedding using pooling, such as max, average, or other techniques.

BERT ⁵ introduced contextual embedding using transformer-based pre-trained language models. Sentence BERT (SBERT) ² is an extension of BERT that produces a sentence embedding of a fixed size.

Several BERT variants have been developed in the Arabic NLP domain, several BERT variants have achieved notable progress. AraBERT ¹⁷ was trained on approximately 24GB of Arabic text from news sources and other sources. AraSBERT, a Siamese BERT architecture, enhances the performance on Arabic Semantic Textual Similarity (STS) tasks. Other models include multilingual BERT (mBERT), which supports multiple languages but often underperforms on Arabic because of a limited Arabic-specific vocabulary of around 2,000 tokens versus AraBERT’s 60,000, ¹⁸ and CAMeLBERT-MSA, specialized for Modern Standard Arabic (MSA). ⁸ These models capture the contextual nuances essential for disambiguating the inherent linguistic complexities of Arabic.

For multimodal text embeddings, the existing literature features common fusion strategies, including the simple concatenation of embedding vectors or the use of shallow neural networks. Hengle ¹⁹ introduced a hybrid model for Arabic sarcasm detection and sentiment identification. Their approach concatenates the ‘[CLS]’ token vector from AraBERT with a feature vector from a CNN-BiLSTM ensemble. This combined vector is fed into the classification layer. Using ‘[CLS]’ token vector limited this embedding for a few applications, such as next sentence prediction but not for general- purpose Arabic sentence embeddings.

In addition, several studies have been performed using the AraBERT model, one of which is ArabBert-LSTM by AlOsaimi, ²⁰ who showed that hybrid architectures that use transformer-based AraBERT embeddings and LSTM networks can be very successful in Arabic sentiment analysis and are better than classical machine learning and deep learning methods. In addition, Jefry ²¹ utilized AraBERT with BiLSTM to improve Arabic sentiment analysis. Similarly, Khachfeh ²² designed a hybrid model to classify Arabic news based on the BERT-BiLSTM model. They showed that the performance of AraBERT on morphologically complex Arabic texts could be significantly improved by fine-tuning the final layer of the model and combining it with a downstream task that applies bidirectional processing.

Our proposed sentence-embedding approach combines pre-trained FastText and fine-tuned sentence AraBERT to bridge the Representational Chasm by leveraging their respective strengths.

Methodology

Any model that uses embedding for the Arabic language suffers from many low-accuracy problems, especially when used in applications such as classification, summarization, and question answering. This is primarily due to the fact that the actual embedding values do not reflect the exact meaning of the sentence. Therefore, in this study, we assumed a combination of more than one model to enhance the extraction of sentence embedding.

The proposed model combines FastText and Sentence AraBERT for Arabic sentence embedding to create a multimodal architecture. In the first step, the Arabert model is fine-tuned by utilizing a triple dataset of the Arabic language to produce the Sentence Arabic BERT (SAraBERT) model. Each row in the triple dataset consisted of three columns: anchor, positive, and negative. In the second step, the SAraBERT model and the pre-trained Arabic FastText model were used to produce the final sentence embedding in a concatenation manner. Figure 1 shows a block diagram of the proposed model, whose components are explained in more detail in the following sections.

Figure 1. Block diagram of the proposed Arabic sentence embedding model. Sentence-BERT (SBERT) and Sentence-AraBERT (SAraBERT)

Sentence-BERT (SBERT), by Reimers & Gurevych, ² works on the basis of a Siamese neural network that can be used to generate semantically meaningful sentences through the fine-tuning of BERT models on sentences suited to applications such as semantic textual similarity, sentence classification, question-answer systems, and many others. Nonetheless, the manner in which BERT produces its output at the token level requires the development of aggregation mechanisms to provide sentence-level representations. The key method for obtaining uniform sentence vectors is mean pooling, which involves adding all the token vectors before computing the mean. This method converts the problem of scaling repeated model evaluations into fast vector-space operations. The architecture is a combination of two identical BERT encoders processing the sentence independently with a pooling layer, usually mean pooling, which performs better than max pooling and pooling on the [CLS] token. In this study, we used the same methodology as that used by Reimers. ²

We proposed Sentence AraBERT, which combines the SBERT ² methodology with AraBERT. ¹⁷ The challenge of developing SAraBERT based on the AraBERT model would entail adopting this Siamese structure, where the AraBERT encoders would be replaced with SAraBERT models, which were modeled independently and specifically trained on Arabic corpora, thereby incorporating built-in knowledge of Arabic syntax, morphology, and semantics. Fine-tuning involves training the Siamese AraBERT model on Arabic sentence pair data, such as Arabic NLI triplet data or Arabic semantic textual similarity data, while retaining the same pooling and concatenation strategies used by SBERT. This adaptation would allow retention of the Arabic language using AraBERT and introduce the ability to encode semantics at the sentence level with high efficiency, resulting in a sentence transformer model best suited to Arabic text processing and semantic similarity applications. Figure 2 shows the SAraBert architecture, which follows the methodology of Reimers and Gurevych. ²

Figure 2. SAraBERT architecture.

Parameter n refers to the dimensionality of embeddings (768 by default for ArBERT base).

Fast text model

FastText ¹⁵ is a word-embedding model that trains sub-word information, making it robust to out-of-vocabulary (OOV) words. For the Arabic language, the pretrained Arabic FastText model of Grave ²³ was used. It was trained according to a Common Crawl dataset. This model is intended to create word embeddings that are as accurate as possible to represent the semantics of Arabic language words, and it can be broadly applied to most natural language processing (NLP) tasks. The output was a 300-dimensional vector representation of each word. For sentence embedding, the embedding for each word in the sentence using FastText was taken, and then the average of the word vectors was used to form the sentence vector in a pooling task. The sentence vector is a 300-dimensional vector of the same size as the word vector.

Combination of the two outputs

We have two vectors of different sizes: one of size 300 and the other of size 768. This difference makes the combination more difficult for traditional pooling methods, such as Max, Average, or [CLS] token embedding. Our suggestion was to concatenate the two embeddings to produce a new embedding size of 1068 dimensions, as shown in Figure 3.

Figure 3. Concatenation of sentence embedding to produce 1068 sentence embedding. Experimental results and evaluations

All experiments were implemented using the latest version of Python with some libraries in a Kaggle environment. For fine-tuning, we used PyTorch 2.4.1+cu121 of the transformer model ¹⁷ in Python 3.10.12, and the experiments were run in a multi-GPU setting with P100 GPUs. AraBERT, a 136-million-parameter BERT-base-Arabertv02, was used as a baseline to build the SAraBERT model.

For the evaluation process, the Mean Squared Error (MSE) and Pearson correlation coefficient were used for gold standard similarity annotation. MSE is the average of the squared difference between the predicted values ( y i ̂ ) and real values (y _i). Eq. (1) shows the formula used for MSE for n examples. MSE = 1 n ∑ i = 1 n ( y i ̂ − y i ) 2 (1)

The second measure of evaluation, the Pearson correlation coefficient (r), was employed to determine the intensity of the line correlation between perfect semantic similarity and cosine similarity. It runs between -1 (perfect negative linear correlation) and +1 (perfect positive linear correlation), where 0 corresponds to a lack of linear correlation. It is defined as in Eq. (2). r = ∑ i = 1 n ( x i − x ⃐ ) ( y i − y ⃐ ) ∑ i = 1 n ( x i − x ⃐ ) 2 ∑ i = 1 n ( y i − y ⃐ ) 2 (2)

In the next subsections, a description of the datasets used, the results, and the analysis are presented.

Datasets

Two types of datasets were used: one for training and fine-tuning the SAraBERT model, and one for evaluating the final sentence embedding output. FastText was used as a pre-trained model ²³; therefore, it did not require a training dataset.

The first dataset was the Arabic triplet dataset (ATrD) ²⁴ of one million triplets (342.53 MB). The split ratios used for training, validation, and testing were 70%, 20%, and 10%, respectively. A triplet contains an anchor, a positive example that is semantically similar to the anchor, and a negative example that is semantically different. This architecture enables the model to learn and recognize acceptable semantic variations. It was used for fine-tuning the learning of the proposed SAraBERT model.

The second dataset is the Arabic Version of the Semantic Textual Similarity Benchmark (STSB) ²⁴ based on the English version in, ²⁵ which is semantically similar to Arabic sentence pairs. It is a heterogeneous collection of sentence pairs with many different domains, such as news headlines, video and image captions, and natural-language inference data. Each sentence pair in the dataset was annotated manually with a similarity score rated on a scale of 1-5. In this particular Arabic variant, we normalized the similarity scores to a range of 0–1, allowing us to compare and analyze semantic similarity throughout the dataset. This dataset was used to evaluate the final sentence embedding.

Results and analyses

We performed all-encompassing experiments to compare our contextualized embedding-based transformer encoder (CETE) to modern state-of-the-art methods. To evaluate the effectiveness of our methodology, we compare our model with some of our baseline configurations.

In our feature-based method, we used a hybrid embedding plan that combines SAraBERT contextualized embeddings and FastText word embedding. Such a combination takes advantage of AraBERT’s contextual capabilities and FastText sub-word information embedded in embeddings. This model is built upon AraBERT, a transformer-based language model specifically pre-trained on a vast corpus of Arabic text.

First, AraBERT was fine-tuned using the STSB dataset with Siamese BERT to produce SAraBERT. A pretrained fasttext was used; therefore, five types of testing were performed using the ATrD dataset. The MSE and correlation were estimated for these five types of tests as follows: FastText alone, AraBERT v2, SAraBERT, AraBERT+ FastText, and SAraBERT+ FastText. Table 1 shows the MSE and correlation for these tests, whereas Figure 4 shows the visualization for four of these models, excluding the combination of AraBERT v2+fasttext.

Table 1. MSE and correlation for five models.

Model	MSE	Correlation
FASTTEXT	0.1109	0.5679
Arabert v2	0.1774	0.3401
SAraBERT	0.0466	0.7869
AraBERT v2+fasttext	0.1292	0.5674
SAraBERT v2+fasttext	0.0355	0.8053

Figure 4. Visual representation of results of Arabic sentence embedding models comparison (4 types of tests).

As shown in Table 1, Sarabert achieved a Mean Squared Error (MSE) of 0.0466, which is an improvement over AraBERT-V2. This means that fine-tuning with Siamese architecture is more powerful for Arabic sentence embedding. The best MSE and Correlation values were obtained from the proposed combination approach of SAraBERT v2 + fasttext, which were 0.0355 (minimum) and 0.8053 (maximum), respectively. Comparing this value of MSE with the nearest values (SAraBERT results), the value decreased by 0.1419. This signifies an enhanced precision and robustness, underscoring the benefits of integrating contextual and morphological information.

Despite AraBERT’s strength in the modeling context and disambiguating polysemous terms, it faces challenges with out-of-vocabulary words and dialectal expressions that are underrepresented in its training corpus. Informal social media slang, for example, may be fragmented by word-piece tokenization, reducing semantic clarity for such words.

The results also show that FastText complements AraBERT by modeling subword information, which enhances robustness in morphologically rich languages such as Arabic.

We used the baseline AraBERT model as our main point of comparison because it is the current standard for Arabic language processing tasks. The improved architecture also keeps the architectural size of the hidden layers and feed-forward networks unchanged at the baseline to allow a fair comparison. The only difference is the use of the sentence transform approach to capture deep contextual understanding and intricate semantic relationships within sentences.

For further comparison, we also fine-tuned the DistilBERT base multilingual based on the same dataset (STSB) and parameters that we used in AraBERT fine-tuning to produce S-DistilBERT. Table 2 presents the results obtained. We chose the DistilBERT model because, according to the given comparison in, ²⁶ this model is considered one of the best models for sentence embedding transformers for Arabic text classification. However, according to the results shown in Tables 1 and 2, the SAraBERT model gives better results in the embedding than DistilBERT, so we chose to combine it with the fasttext model.

Table 2. Results of Sdistilbert and the combination of (Sdistilbert+fasttext).

Model	MSE	Correlation
Sdistilbert	0.0491	0.7047
Sdistilbert+fasttext	0.0413	0.7550

Conclusions and future work

The proposed work is an improvement in sentence embeddings using a combination of contextualized Sentence AraBERT representations with pooled FastText word vectors to perform better Arabic text processing tasks. Our experiments with several Arabic datasets indicate that the performance of our hybrid embedding method is significantly better than that of the AraBERT baselines. Moreover, we found that our hybrid approach successfully incorporates both contextual semantics via AraBERT and morphological features via FastText and pliant stronger sentence representations.

Sentence-AraBERT (SAraBERT) extends the original AraBERT architecture to include the Siamese network architecture (as in SBERT) in addition to triplet loss functions to generate sentence embeddings that preserve semantic meaning.

We found that our joint AraBERT-FastText model sets new standards for Arabic sentence embedding strategies. The hybrid approach is particularly useful in Arabic language processing because it has a rich morphological structure. In addition, the sub-word information provided by FastText was supplemented by contextual knowledge provided by AraBERT.

It has been shown that multi-paradigm embedding can bring significant benefits over single-paradigm methods, even when it is not fine-tuned or trained on additional large corpora. Finally, we provide our implementation and trained models to the public so that they can conduct further research and make our experiments reproducible.

At first glance, it may seem that the vector has a high dimension, but there are many practical models that have text embedding of more than 1068 dimensions. For example, in OpenAI Text-Embedding v3, vector embedding has a size of 1536/3072 while E5-Mistral-7B-Instruct has a vector embedding of 4096 dimensions. In addition to processing units, such as the GPT, the processing time is very small.

In future work, we will also explore how our hybrid embedding method performs on other Arabic NLP problems, including Arabic information retrieval applications, Arabic sentiment analysis, Arabic named-entity recognition, Arabic text classification on imbalanced datasets, and Arabic document summarization. Another area that we are planning to expand and test is how well we can incorporate other embedding techniques and how we can apply our approach to other morphologically rich languages.

Data availability statement

The datasets used in this study are publicly available and distributed as Third-party datasets. The Arabic Natural Language Inference Triplet (Arabic-NLI-Triplet) (Omer Nacar: onajar@psu.edu.sa) dataset can be accessed via Zenodo at https://doi.org/10.5281/zenodo.18169892. The Arabic Semantic Textual Similarity (Arabic STS) benchmark dataset (Omer Nacar: onajar@psu.edu.sa) is available at https://doi.org/10.5281/zenodo.18170487.

Each dataset contains training, validation, and test files (train.csv, validation.csv, and test.csv). Both datasets are distributed, enabling readers and reviewers to access and reuse the data under the same conditions as the authors.

References 1

Gong

Bhat

Viswanath

: Embedding Syntax and Semantics of Prepositions via Tensor Decomposition. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018; Volume1(Long Papers): pp.896–906. 10.18653/v1/N18-1082

Reimers

Gurevych

: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Conference on Empirical Methods in Natural Language Processing. Hong Kong, China:2019. 10.18653/v1/D19-1410

Mikolov

: Distributed Representations of Sentences and Documents. Proceedings of the 31st International Conference on Machine Learning. 22--24 Jun 2014; vol.32: pp.1188–1196. 10.48550/arXiv.1405.4053

Vaswani

Brain

Shazeer

: Attention is all you need. Adv. Neural Inf. Proces. Syst. 2017;30. 10.48550/arXiv.1706.03762

Devlin

Chang

M-W

Lee

: Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies. 2019; Volume1(long and short papers): pp.4171–4186. 10.18653/v1/N19-1423

Matrane

Benabbou

Sael

: A systematic literature review of Arabic dialect sentiment analysis. Journal of King Saud University - Computer and Information Sciences. 2023;35(6):101570. 10.1016/j.jksuci.2023.10157 0

Soliman

Eissa

El-Beltagy

: AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP. Procedia Computer Science. 2017;117:256–265. 10.1016/j.procs.2017.10.117

Inoue

Alhafni

Baimukan

: The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models. Proceedings of the Sixth Arabic Natural Language Processing Workshop. 2021; pp.92–104. 10.18653/v1/2021.wanlp-1.10

Abdul-Mageed

Elmadany

Nagoudi

EMB

: ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021; Volume1(Long Papers): pp.7088–7105. 10.18653/v1/2021.acl-long.551

Ruder

Søgaard

Vulić

: Unsupervised cross-lingual representation learning. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. 2019; pp.31–38. 10.18653/v1/P19-4007

Dredze

: Are All Languages Created Equal in Multilingual BERT? Proceedings of the 5th Workshop on Representation Learning for NLP. 2020; pp.120–130. 10.18653/v1/2020.repl4nlp-1.16

Mikolov

Sutskever

Chen

: Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems. 2013; vol.26. 10.48550/arXiv.1310.4546

Pennington

Socher

Manning

: GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014; pp.1532–1543. 10.3115/v1/D14-1162

Barhoumi

Estève

Aloulou

: Document embeddings for Arabic Sentiment Analysis. The First Conference on Language Processing and Knowledge Management (LPKM 2017). Sfax, Tunisia:2017. 10.1109/LPKM.2017.8103994

Bojanowski

Grave

Joulin

: Enriching Word Vectors with Subword Information. Transactions of the Association for Comput. Linguist. 2017; vol.5: pp.135–146. 06. 10.1162/tacl_a_00051

Almandouh

Alrahmawy

Eisa

: Ensemble based highperformance deep learning models for fake news detection. Sci. Rep. 2024;14:26591. 10.1038/s41598-024-77761-1

Antoun

Baly

Hajj

: AraBERT: Transformer-based Model for Arabic Language Understanding. Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. 2020; pp.9–15. 10.18653/v1/2020.osact-1.2

Ahmed

Alfasly

Wen

: AlclaM: Arabic Dialect Language Model. Proceedings of the Second Arabic Natural Language Processing Conference. 2024; pp.153–159. 10.18653/v1/2024.arabicnlp-1.14

Hengle

Kshirsagar

Desai

: Combining Context-Free and Contextualized Representations for Arabic Sarcasm Detection and Sentiment Identification. Proceedings of the Sixth Arabic Natural Language Processing Workshop. 2021; pp.357–363. 10.18653/v1/2021.wanlp-1.46

Alosaimi

: ArabBert-LSTM: improving Arabic sentiment analysis based on transformer model and Long Short-Term Memory. Frontiers in Artificial Intelligence. 2024;7. 39015364

10.3389/frai.2024.1408845

PMC11250580

Jefry

Al-Doghman

Hussain

: BERT-LA: Leveraging BERT and AraBERT With Bi-LSTM for Cross-Lingual Sentiment Analysis of English and Arabic Texts. 17th International Conference on Security of Information and Networks, SIN 2024, Sydney, Australia, December 2-4, 2024. 2024; pp.1–10. 10.1109/SIN63213.2024.10871432

Khachfeh

El Kabani

Osman

: An Enhanced Hybrid BERT-BiLSTM Learning Model for Arabic News Classification. 2025 International Conference on Machine Intelligence and Smart Innovation (ICMISI). 2025; pp.201–206. 10.1109/ICMISI65108.2025.11115581

Grave

Bojanowski

Gupta

: Learning Word Vectors for 157 Languages. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan:2018. 10.48550/arXiv.1802.06893

Nacar

Koubaa

: Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning. Generative AI and Large Language Models: Opportunities, Challenges, and Applications: Volume 1. Koubaa

Ammar

Ghouti

, editors. Cham: Springer Nature Switzerland;2025; pp.179–216. 10.1007/978-3-031-90573-5_6

Cer

Diab

Agirre

: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). 2017; pp.1–14. 10.18653/v1/S17-2001

Elbeltagi

: Comparing Arabic Sentence Transformers. GitHub; Retrieved September 5, 2025.2024. Reference Source

10.5256/f1000research.192766.r463535

Reviewer response for version 1

Sibaee

Serry

1 Referee 1Prince Sultan University, Riyadh, Saudi Arabia

Competing interests: No competing interests were disclosed.

2 4 2026

2026

This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

recommendation

reject

Summary

The paper proposes combining a fine-tuned Sentence-AraBERT (SAraBERT) model with pre-trained FastText embeddings for Arabic sentence representation. The two vectors (300-dim from FastText, 768-dim from AraBERT) are concatenated to produce a 1068-dimensional sentence embedding. Evaluation on the ATrD dataset shows the combined model achieves MSE of 0.0355 and Pearson correlation of 0.8053, outperforming each component in isolation.

Evaluation

Is the work clearly and accurately presented and does it cite the current literature? Partly

The paper is readable but contains a significant terminological error: calling this a "multimodal" approach is incorrect. Both FastText and AraBERT operate on text. This must be corrected to "hybrid" or "fusion-based" throughout. The literature review is adequate but omits relevant recent Arabic sentence embedding work such as AraSBERT, which is directly comparable.

Is the study design appropriate and is the work technically sound? Partly

The core idea is sound but the experimental design is narrow. Five models are tested on a single dataset. There is no ablation of the concatenation strategy itself why not learned fusion, attention-weighted combination, or dimensionality-normalized pooling? The choice of simple concatenation is not experimentally justified beyond reporting that it works.

Are sufficient details provided to allow replication? No

The following are missing or underspecified: training hyperparameters (learning rate, batch size, number of epochs, warmup steps), tokenization details, how similarity scores were computed from the 1068-dim vector (cosine? dot product?), and the exact train/validation/test split applied to the evaluation STSB dataset. Without these, replication is not possible.

Is the statistical analysis appropriate? No

Results are from a single run with no confidence intervals, standard deviations, or significance tests. Given the sensitivity of fine-tuned transformer models to random seeds, this is insufficient. A minimum of three runs with reported variance is required.

Are source data available for reproducibility? Partly

Training datasets are publicly linked, which is good. However, the trained SAraBERT model weights and the exact fine-tuning code are not provided, despite the authors stating they will make them public. A working repository link must be included before acceptance.

Are the conclusions supported by the results? No

The paper claims the approach "advances NLP capabilities for underrepresented languages" and establishes new standards for Arabic sentence embedding both claims go beyond what a single-dataset evaluation supports. The improvement over SAraBERT alone (MSE drop of ~0.011) is modest and its practical significance is not discussed.

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Are all the source data underlying the results available to ensure full reproducibility?

Partly

Is the study design appropriate and is the work technically sound?

Partly

Are the conclusions drawn adequately supported by the results?

Are sufficient details of methods and analysis provided to allow replication by others?

Reviewer Expertise:

Arabic NLP

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

10.5256/f1000research.192766.r463539

Reviewer response for version 1

Elnagar

Ashraf

1 Referee 1University of Sharjah,, Sharjah,, United Arab Emirates

Competing interests: No competing interests were disclosed.

18 3 2026

2026

recommendation

reject

The article presents a hybrid Arabic sentence embedding approach combining SAraBERT and FastText. While the paper is readable and the general goal is relevant, I do not find the current contribution sufficiently strong for indexing in its present form.

The main concern is that the work appears incremental and insufficiently novel. The proposed method is essentially a combination of two existing text-based embedding techniques, without a clearly new modeling contribution or sufficiently broad empirical validation to demonstrate a substantial advance. The manuscript does not convincingly establish why this should be considered more than a straightforward hybrid baseline.

A second issue is that the paper appears to use “multimodal” incorrectly. This is not a multimodal approach, since both SAraBERT and FastText operate on the same modality, namely text. The method is better described as a hybrid text embedding approach or a fusion of textual representations. Referring to it as multimodal is misleading and should be corrected.

The study design is also too limited to support the broader claims. The evaluation is narrow, the baseline set is not strong enough to establish clear superiority, and the conclusions extend beyond what the reported experiments justify. In addition, the manuscript lacks enough implementation detail for full replication, and the statistical analysis is not adequate: results are reported without repeated runs, uncertainty estimates, or significance testing.

The paper has a reasonable idea, but in its current form it is too limited in novelty, validation, and reproducibility. The conclusions are stronger than the evidence supports, and the framing of the method should be corrected.

Is the work clearly and accurately presented and does it cite the current literature?

Partly

If applicable, is the statistical analysis and its interpretation appropriate?

Are all the source data underlying the results available to ensure full reproducibility?

Partly

Is the study design appropriate and is the work technically sound?

Partly

Are the conclusions drawn adequately supported by the results?

Are sufficient details of methods and analysis provided to allow replication by others?

Reviewer Expertise:

NLP