A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised Learning for Credit Card Fraud Detection

Mohammad Shanaa; Sherief Abdallah

doi:10.12688/f1000research.166350.1

Home Browse A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised Learning for Credit Card Fraud Detection

[version 1; peer review: 1 not approved]

Mohammad Shanaa ¹, Sherief Abdallah¹

PUBLISHED 07 Jul 2025

Author details Author details

¹ Faculty of Engineering and IT, The British University in Dubai, Dubai, Dubai, United Arab Emirates

Mohammad Shanaa
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Resources, Software, Visualization, Writing – Original Draft Preparation

Sherief Abdallah
Roles: Supervision, Validation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background

Credit card fraud detection remains a major challenge because of the highly imbalanced nature of transaction data. Conventional supervised models often suffer from low recall or high false positive rates, whereas unsupervised methods lack precision.

Methods

In this study, we propose a hybrid anomaly detection framework that combines an unsupervised autoencoder trained on normal transactions to capture reconstruction error patterns with a supervised XGBoost classifier trained on the same dataset. The hybrid system integrates both scores via an optimized thresholding mechanism to balance sensitivity and specificity. We evaluated the model on the publicly available Kaggle creditcard.csv dataset comprising 284,807 transactions, with only 492 labelled fraudulent.

Results

The proposed model achieved superior performance, with a recall of 0.9250, precision of 0.9569, F1-score of 0.9407, and Matthews Correlation Coefficient (MCC) of 0.9407, with an accuracy of 0.9998, surpassing the results of similar published models using the same dataset.

Conclusions

This framework provides a practical, reproducible, high-performance solution for detecting financial fraud. The code, model configuration, and data-processing pipeline were made available to support transparency and future research.

Keywords

Fraud Detection, Autoencoder, Isolation Forest, XGBoost, Random Forest, Hybrid Model, Anomaly Detection, Imbalanced Dataset.

Corresponding author: Mohammad Shanaa

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 Shanaa M and Abdallah S. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Shanaa M and Abdallah S. A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised Learning for Credit Card Fraud Detection [version 1; peer review: 1 not approved]. F1000Research 2025, 14:664 (https://doi.org/10.12688/f1000research.166350.1) First published: 07 Jul 2025, 14:664 (https://doi.org/10.12688/f1000research.166350.1) Latest published: 07 Jul 2025, 14:664 (https://doi.org/10.12688/f1000research.166350.1)

Introduction

Credit card fraud remains one of the most persistent and damaging threats to the digital financial ecosystem. As the volume of online transactions continues to grow, so too does the complexity of fraudulent activities increases. Global losses are projected to exceed $40 billion annually by 2025, driven by the increasing digitalization of financial services and constant evolution of fraud tactics.¹ The core challenge in this domain lies in accurately detecting fraudulent transactions that are rare (less than 1%), adaptive, and often indistinguishable from legitimate user behavior. This imbalance (between legitimate and fraudulent transactions) significantly impairs the performance of both conventional and machine learning-based detection systems, often leading to biased predictions and poor generalizability across datasets.^2,3 Traditional fraud detection methods struggle to scale effectively in such dynamic and imbalanced environments, frequently resulting in missed fraud cases or excessive false positives. Detection systems frequently encounter difficulties in balancing sensitivity and specificity; enhancing fraud detection (true positives) often leads to an increase in false positives, thereby disrupting the customer experience and straining resources. Conversely, conservative models may fail to identify fraudulent activities, leading to financial losses and reputational harm.

Recent research has highlighted the potential of hybrid models that combine supervised classification techniques with unsupervised anomaly detection to enhance both the precision and robustness of fraud detection. For instance, studies integrating techniques, such as autoencoders, isolation-based methods, and gradient boosting classifiers, have demonstrated improved performance in identifying complex and evolving fraud patterns.⁴ However, many of these models still lack generalizability or require substantial computational resources, which limits their practical application in real-time financial environments.

The aim of this study is to develop and evaluate a hybrid anomaly detection framework that integrates both supervised and unsupervised learning techniques to improve the accuracy, robustness, and generalizability of credit card fraud-detection systems. This study specifically targets the challenges posed by imbalanced data, evolving fraud patterns, and limitations of single-model detection strategies.

Our approach is empirically validated using the publicly available European credit card fraud dataset, which presents realistic challenges including severe class imbalance. We conducted comprehensive experiments to measure the performance of the model across standard evaluation metrics and benchmarked its results against state-of-the-art techniques. Using this approach, this study aims to demonstrate the practical value and academic contribution of hybrid learning models in improving credit card fraud detection.

This study makes the following contributions:

1. A novel hybrid anomaly detection framework that integrates supervised (XGBoost, Random Forest) and unsupervised (Autoencoder, Isolation Forest) models is proposed to address the challenges of data imbalance and concept drift in credit card fraud detection.
2. Comparative analysis of the hybrid model against state-of-the-art models using the publicly available and widely adopted Kaggle creditcard.csv dataset.
3. A reproducible pipeline suitable for adaptation in real-world applications that balances detection accuracy with computational efficiency.

Related work

Credit card fraud detection has become increasingly critical with the rapid expansion of online transactions and growing sophistication of fraudulent activities. Contemporary trends underscore the adoption of advanced machine learning (ML) techniques, which have shown considerable promise in enhancing both the accuracy and efficiency of fraud-detection systems. Nevertheless, these advancements have introduced several challenges, particularly the limitations of traditional anomaly detection methods and the constraints inherent in current ML-based models.

Traditional approaches to anomaly detection, including rule-based systems and statistical models, have long served as the foundation for fraud detection. However, these techniques frequently struggle to address the dynamic and adaptive nature of fraudulent behavior, which often mimics legitimate transaction patterns. Consequently, they tend to exhibit high false positive rates.^5,6 Moreover, such approaches generally fail to scale effectively with the vast and continuously growing volume of transaction data, rendering them less viable in real-time fraud detection scenarios.^7,8 Consequently, there has been an increasing shift toward machine learning models that are better equipped to manage large datasets and adapt to evolving fraud strategies.^9,10

Despite their advantages, the existing ML models are not without limitations. A primary concern is the class imbalance inherent in credit card transaction datasets, where legitimate transactions overwhelmingly outnumber fraudulent transactions. This imbalance often leads to skewed model performance, resulting in a high rate of false negatives in which fraudulent transactions remain undetected.^2,3 Additionally, many ML models demand extensive feature engineering and frequently struggle to generalize across datasets because of variations in consumer behavior and transaction patterns.^11,12 The scarcity of accurately labelled fraudulent transactions further complicates the training process, as acquiring such labels is challenging in real-world settings.¹³

Hybrid approaches have emerged as promising solutions for mitigating these issues. By combining different methodologies, researchers have been able to enhance the detection accuracy and reduce false positives.^14,15 For example, hybrid models that integrate convolutional neural networks with support vector machines have demonstrated improved performance in identifying anomalies in financial datasets.¹⁵ These methods exploit the strengths of diverse algorithms and contribute more robust and generalizable detection capabilities. Moreover, similar hybrid strategies have shown effectiveness in other domains facing anomaly detection challenges, including healthcare and cybersecurity.¹⁴

In the context of fraud detection research, several benchmark datasets are frequently used, notably the European Credit Card Transactions dataset and the Kaggle Credit Card Fraud Detection dataset. These datasets are distinguished by their high dimensionality and extreme class imbalance, with fraudulent instances often comprising less than 1% of the total records.^2,3 In particular, the European dataset includes anonymized transaction features derived from Principal Component Analysis (PCA) to ensure user privacy, making it suitable for academic use.^12,16 Such datasets are instrumental in training and evaluating fraud-detection models because they closely reflect the complexities encountered in real-world applications.

In summary, although traditional anomaly detection techniques have laid the foundational framework for credit card fraud detection, the adoption of machine learning and hybrid methodologies opens new possibilities for improving the detection efficacy. Nonetheless, persistent challenges necessitate ongoing research in this field. The advancement of more sophisticated hybrid models and the utilization of comprehensive real-world datasets will be essential to overcome these hurdles and further progress in this critical area.

In the domain of credit card fraud detection, unsupervised learning methods have garnered increasing attention owing to their capacity to identify anomalies without relying on labelled data. Among these, clustering algorithms such as DBSCAN and HDBSCAN have demonstrated considerable potential. For instance,¹ reported that combining HDBSCAN with UMAP and SMOTE enables the identification of previously unseen fraud patterns, while significantly reducing false positives. Similarly, deep-learning-based anomaly detection frameworks, such as the attentional anomaly detection network proposed by,¹⁶ show promise for capturing behavioral transaction anomalies without the need for predefined class labels. These approaches are particularly advantageous in real-world contexts where labelled fraudulent data are limited, allowing the detection of novel fraud patterns that traditional supervised models may overlook.¹⁷

Conversely, supervised learning techniques, particularly gradient boosting methods, such as XGBoost, have been widely adopted owing to their robustness and interpretability.² highlighted the effectiveness of XGBoost when paired with data augmentation strategies, such as SMOTE ENN, achieving high accuracy with low false-positive rates. Further evidence from¹⁸ demonstrated that integrating XGBoost with resampling methods enhanced the overall performance across a range of machine learning models. Notably, the inherent capability of XGBoost to handle imbalanced datasets makes it particularly well-suited for credit card fraud detection, where fraudulent transactions comprise only a small fraction of the total dataset.¹⁰

Hybrid approaches integrating supervised and unsupervised learning have emerged as promising strategies,¹⁴ for example, presented a deep learning model combined with SMOTE oversampling, which effectively addressed the class imbalance issue while improving the detection accuracy. Similarly,¹⁹ illustrated the benefits of combining neural networks with traditional machine learning techniques to enhance the overall detection efficacy. These hybrid models exploit the complementary strengths of each learning paradigm, thereby resulting in adaptive and accurate systems.

Despite these advancements, several persistent challenges continue to hinder optimal fraud detection performance. A primary issue is class imbalance, wherein the overwhelming dominance of legitimate transactions can bias models and reduce their sensitivity to fraudulent instances.¹¹ Additionally, the constantly evolving tactics of fraudsters necessitate frequent model retraining and updates, which can be both computationally and operationally demanding.¹¹ Scalability is also a concern, as many models exhibit performance degradation when deployed in large-scale or real-time transaction streams.²⁰

The performance metrics across existing models vary significantly in terms of scalability, accuracy, and operational efficiency. Research indicates that ensemble techniques that combine multiple classifiers tend to outperform individual models in terms of their robustness and accuracy.²¹ However, the increased computational requirements of ensemble models may limit their applicability in time-sensitive scenarios.²⁰ In contrast, XGBoost has often been identified as a suitable compromise, offering a favorable balance between predictive performance and computational efficiency, which makes it attractive for real-world fraud detection systems.^2,22

Research into hybrid anomaly detection models typically seeks to fulfil several key objectives, including enhancing detection accuracy, improving robustness against emerging fraud patterns, and integrating both supervised and unsupervised learning techniques to capitalize on the strengths of each approach. Hybrid models are particularly advantageous in scenarios where labelled data are limited because they enable the use of unsupervised methods to identify anomalies, whereas supervised models refine and validate these detections.^23–25 For example, integrating supervised models that learn from historical transaction data with unsupervised models capable of detecting novel anomalies facilitates a more comprehensive detection framework, addressing the limitations of methods that rely solely on a single-learning paradigm.^23,24

The literature highlights notable gaps in existing anomaly detection frameworks, particularly their limited adaptability to evolving fraud patterns and poor generalizability across diverse datasets. Hybrid models offer a promising solution to these issues by leveraging various data sources and learning strategies, thereby increasing their effectiveness in real-world deployment.^26,27 For instance, studies incorporating Generative Adversarial Networks (GANs) into traditional machine learning workflows have demonstrated improved detection of complex fraud patterns that may elude conventional models.⁴ Moreover, the flexibility of hybrid models supports continuous learning and adaptation, which are essential features of the constantly evolving fraud landscape.^23,24

Success in fraud detection research is typically measured using performance metrics such as accuracy, precision, recall, and F1-score, which collectively evaluate a model’s capability to correctly identify fraudulent transactions while maintaining operational efficiency.^28,29 Minimizing false positives and effectively identifying previously unseen fraud cases are also critical indicators of success.^23,24 Models that strike a balance between high accuracy and low false positive rates are particularly valued, as they reduce the burden of manual transaction reviews and minimize disruption to legitimate users.^23,24,29

Both supervised and unsupervised learning play an integral role in addressing the research challenges in fraud detection. Supervised learning is particularly effective when sufficient labelled data are available, enabling the model to learn the distinctions between fraudulent and non-fraudulent transactions.^30,31 By contrast, unsupervised learning excels in scenarios where labels are unavailable, identifying novel or emerging fraud patterns without prior examples.^23–25 The integration of both techniques enhances not only the model’s detection capacity but also the interpretability and adaptability of the fraud detection framework, as evidenced by research that underscores their complementary nature.^24,25

In the literature, “success” in fraud detection is frequently defined in terms of balancing detection performance with operational efficiency. This includes the ability to accurately detect fraudulent transactions with minimal false positives, thereby ensuring that genuine users are not adversely affected.^23–25 Furthermore, a model’s adaptability to new fraud typologies and its performance across various datasets are equally important for assessing its practical applicability and overall robustness.^23–25

Unsupervised methods

Autoencoders

Autoencoders have emerged as powerful tools for feature extraction in anomaly detection, particularly fraud detection. By leveraging their ability to learn compressed representations of data, autoencoders can effectively identify anomalies by reconstructing the input data and measuring the reconstruction error. This process allows for the extraction of relevant features that distinguish normal data from anomalies, as the model learns to ignore noise and irrelevant information during training.^32–34 The architecture of autoencoders, which typically consists of an encoder and decoder, facilitates dimensionality reduction, making them suitable for high-dimensional datasets often encountered in fraud detection scenarios.^35,36

Despite their advantages, autoencoders have limitations when applied to unsupervised-learning tasks. A significant challenge is determining an appropriate reconstruction error threshold, which is crucial for distinguishing between normal and anomalous instances. This threshold can be influenced by the distribution of reconstruction errors, and improper selection may lead to high false positive rates or missed detections.^33,37,38 Moreover, autoencoders can struggle with class imbalances because they are typically trained on predominantly normal data, making it difficult to generalize to rare fraudulent instances.^37,39 Additionally, the complexity of the model can lead to overfitting, particularly when the training dataset is small or lacks diversity.^40,41

When comparing autoencoders to other unsupervised methods in fraud detection, such as clustering and traditional statistical methods, autoencoders often demonstrate superior performance because of their ability to learn complex, non-linear relationships in the data.^35,39,42 For example, while clustering methods may struggle with high-dimensional data, autoencoders can effectively reduce dimensionality and capture intricate patterns that signify fraudulent behavior.^35,39,42 Furthermore, ensemble methods that combine autoencoders with other algorithms, such as Random Forests or Gradient Boosting, have shown promising results in improving detection accuracy and robustness against class imbalance.^40,41

In summary, autoencoders are effective for feature extraction in anomaly detection, particularly fraud. Their architectures, such as VAEs and LSTM autoencoders, are suitable for various data types. However, issues, such as threshold determination and class imbalance, require further investigation. In this study, we combined autoencoders with other models to enhance the results and address these challenges.

Isolation forest

The Isolation Forest algorithm is a powerful tool for anomaly detection, particularly in financial datasets. It operates based on the principle of isolating anomalies, instead of profiling normal data points. This is achieved by constructing a random forest of isolation trees, where each tree is built by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of that feature. Anomalies are identified as instances that require fewer splits to be isolated because they are often located far from the majority of the data points in the feature space.^43,44 This characteristic makes isolation forests particularly effective in high-dimensional datasets, where traditional methods may struggle owing to the curse of dimensionality. Studies have shown that isolation forests maintain robust performance in high-dimensional settings, effectively identifying outliers, even when dimensionality increases significantly.^43,45

Parameter tuning is crucial for optimizing the performance of an isolation-forest algorithm. Common techniques include adjusting the number of trees in the forest and subsampling size, which can influence the sensitivity of the model to anomalies. For instance, increasing the number of trees generally improves the robustness of the model, while the subsampling size can be tuned to balance between computational efficiency and detection accuracy.^45,46 In terms of computational advantages, the Isolation Forest algorithm is highly efficient and requires linear time complexity relative to the number of data points, making it scalable for large datasets.^44,47

Isolated forests can also be integrated into hybrid models to enhance their anomaly detection capabilities. For example, it can be combined with supervised learning techniques to refine the detection process by leveraging labelled data for training. This integration allows for improved feature selection and anomaly characterization, leading to better overall performance in detecting complex patterns in financial datasets.^48,49 Such hybrid approaches can utilize the strengths of multiple algorithms, thereby improving the robustness and accuracy of anomaly detection frameworks in various applications, including fraud detection in banking and finance.⁴⁹

In summary, the Isolation Forest algorithm is a robust method for detecting anomalies in financial datasets, and is particularly effective in high-dimensional spaces. Parameter tuning plays a critical role in optimizing the performance, whereas its computational efficiency makes it suitable for large datasets. Despite these limitations, the integration of isolated forests with other methods in hybrid models can significantly enhance their anomaly detection capabilities.

Supervised methods

XGBoost

Extreme gradient boosting (XGBoost) has emerged as a powerful tool for fraud detection, particularly in the context of imbalanced datasets. The algorithm’s inherent ability to handle imbalanced data stems from its gradient boosting framework, which optimizes the model by focusing on misclassified instances, thereby enhancing its sensitivity to minority classes, such as fraudulent transactions. This characteristic is crucial in fraud detection, where fraudulent cases often significantly outnumber legitimate ones.^50,51 Furthermore, XGBoost incorporates regularization techniques that help mitigate overfitting, which is a common challenge in machine learning models trained on imbalanced datasets^50,51

Hyperparameter tuning is essential for optimizing the performance of XGBoost in fraud detection tasks. Techniques such as grid search, random search, and more advanced methods such as Bayesian optimization have been employed to identify the most effective hyperparameters. For instance, the use of Bayesian optimization has been shown to enhance the model’s ability to balance training weights for asymmetric examples, which is particularly beneficial in fraud-detection scenarios.^52,53

When comparing XGBoost with other supervised learning methods, it consistently demonstrates superior performance in fraud-detection tasks. Studies have shown that XGBoost outperforms traditional models such as logistic regression and decision trees as well as other ensemble methods such as random forests. This superiority is attributed to its ability to capture complex non-linear relationships and interactions between features, which are often present in fraud detection datasets.^54,55 Moreover, XGBoost’s feature importance capabilities allow practitioners to gain insights into the most influential predictors of fraud, further enhancing model interpretability and decision-making processes.^19,56

Researchers have also explored the integration of XGBoost with hybrid anomaly-detection models. For instance, combining XGBoost with unsupervised learning techniques allows for the extraction of patterns from data that can be used as new features, thereby improving the robustness of the model against noise and outliers.⁵⁷

In conclusion, XGBoost’s optimization for fraud detection in imbalanced datasets is facilitated by its robust handling of misclassifications, effective hyperparameter tuning techniques, and superior performance compared to other supervised learning methods. The role of feature importance is critical in refining model performance, while hybrid approaches continue to expand the capabilities of XGBoost in anomaly detection scenarios.

Random forest

Random Forest (RF) is a versatile ensemble technique that has been broadly applied in anomaly detection for both supervised and semi-supervised learning tasks. In fully supervised settings, RF algorithms are trained with labelled examples covering both normal and anomalous classes, thereby enabling the model to learn complex non-linear decision boundaries that can reliably separate rare and abnormal events.⁵⁸ In contrast, semi-supervised applications typically exploit RF’s ability to capture underlying data distributions by training exclusively on normal (or “positive”) samples and subsequently flagging deviations as anomalies.⁵⁹

The performance of RF is particularly noteworthy in high-dimensional and large-scale datasets such as those encountered in credit card fraud detection. RF can naturally handle large numbers of features owing to its random feature subspace selection at each split, which mitigates overfitting and improves generalization.⁶⁰ Empirical studies have demonstrated that RF-based methods perform competitively in scenarios characterized by rare events, such as fraud detection, by effectively identifying subtle patterns that differentiate fraudulent from legitimate behaviors.⁶⁰ Nevertheless, the class imbalance inherent in such applications often calls for hybrid or improved approaches, for example, through combination with feature selection procedures or integration with unsupervised algorithms, to further boost detection accuracy.

Hybrid integration

Hybrid models, which combine unsupervised and supervised learning techniques, have gained traction in various fields owing to their ability to leverage the strengths of both approaches. The integration of unsupervised outputs with supervised methods can enhance the predictive performance, particularly in scenarios where labelled data are scarce. This synthesis typically involves several strategies, including feature extraction, ensemble methods, and model stacking, which can significantly improve the overall performance of the hybrid models.

One effective integration strategy is the use of unsupervised learning for feature extraction, which can reduce dimensionality and capture underlying patterns in the data. For instance, autoencoders or clustering algorithms can preprocess data before they are fed into a supervised learning model, thereby enhancing their predictive capabilities.^61,62 In addition, ensemble methods that combine predictions from unsupervised and supervised models can lead to more robust outcomes. For example, a hybrid model that integrates predictions from a clustering algorithm with those from a regression model can yield a better accuracy than either model alone.⁶³

Handling conflicting outputs from unsupervised and supervised models is a critical challenge in hybrid modelling. Researchers often employ conflict resolution strategies such as voting mechanisms, where the final decision is based on the majority output, or weighted averaging, where outputs are combined based on their reliability or performance metrics.^64,65 This approach allows for more nuanced integration of the models, ensuring that the final output reflects the strengths of both methodologies. In this study, we utilized a weighting method to combine the outputs of supervised and unsupervised algorithms.

In summary, hybrid models that integrate unsupervised and supervised methods offer significant advantages in terms of predictive performance and robustness. By employing effective integration strategies, resolving conflicts between outputs, and utilizing appropriate benchmarks for evaluation, researchers can harness the strengths of both methodologies to address complex challenges across various domains.

Evaluation metrics

In fraud detection studies, various evaluation metrics were employed to assess the performance of the models. Commonly used metrics include accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). Each metric provides unique insights into the effectiveness of the model in identifying fraudulent activity.

Precision, recall, and F1-score are particularly significant in the context of anomaly detection. Precision measures the proportion of true positive predictions among all positive predictions, indicating the number of flagged instances that were fraudulent. However, recall assesses the proportion of true positives among all actual positives, reflecting the model’s ability to identify all relevant instances. The F1-score is the harmonic mean of the precision and recall, providing a single metric that balances both concerns. In fraud detection, where false positives can lead to unnecessary investigations and false negatives can result in undetected fraud, these metrics are crucial for evaluating the model performance.^66,67

Precision = \frac{TP}{TP + FP}

Precision means: Of all predicted positive cases, how many were actually positive.

Recall = \frac{TP}{TP + FN}

Recall means: Of all actual positive cases, how many were correctly predicted.

F 1 - Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

F1-Score: Harmonic means of Precision and Recall — a balance between the two.

Where:

TP = True Positives

TN = True Negatives

FP = False Positives

FN = False Negatives

The trade-off between accuracy and computational efficiency is critical for fraud detection. While accuracy provides a straightforward measure of overall correctness, it can be misleading in imbalanced datasets, which are common in fraud-detection scenarios where fraudulent cases are rare compared with legitimate ones. Computational efficiency, on the other hand, refers to the time and resources required to train and deploy models. Models that achieve high accuracy may require extensive computational resources, making them less practical for real-time fraud detection applications. Therefore, it is necessary to strike a balance must be struck between achieving high accuracy and maintaining computational efficiency to ensure that the models can operate effectively in real-world environments.^66,67

AUC-ROC curves are instrumental in assessing model performance, particularly in binary classification tasks such as fraud detection. The ROC (Receiver Operating Characteristic) curve plots the true positive rate against the false positive rate at various threshold settings, allowing for visualization of the trade-off between sensitivity and specificity. The AUC (Area Under the Curve) quantifies the overall ability of the model to discriminate between the positive and negative classes, with values closer to 1 indicating better performance. AUC-ROC is particularly useful in fraud detection because it provides a comprehensive view of the model’s performance across different decision thresholds, aiding in the selection of an optimal threshold for deployment.^68–70

Several datasets and competitions exist in terms of benchmarks for comparing model results for fraud detection. For instance, Kaggle competitions often provide standardized datasets for benchmarking machine-learning models. Additionally, the UCI Machine Learning Repository includes various datasets relevant to fraud detection, allowing researchers to compare their models with established baselines. These benchmarks facilitate the evaluation of new methods against existing approaches and promote advancements in the field.^66,67

In summary, the evaluation metrics commonly used in fraud-detection studies include precision, recall, F1-score, and AUC-ROC. Each metric offers valuable insights into the model performance, particularly in the context of imbalanced datasets. The trade-off between accuracy and computational efficiency highlights the need for practical solutions for real-time applications. AUC-ROC curves serve as vital tools for assessing model discrimination capabilities, whereas established benchmarks provide a framework for comparative analysis in the field. The researcher used precision, recall, and F1-score to evaluate the performance of the hybrid model. Additionally, AUC-ROC and MCC (Matthews Correlation Coefficient) values were calculated to obtain insights into the model results.

Methods

In dataset description

The creditcard.csv dataset, which is widely utilized in fraud-detection research, is characterized by its focus on credit card transactions, specifically anonymized records from European cardholders. This dataset typically contains features such as transaction time, transaction amount, and various anonymized features derived from PCA (Principal Component Analysis) to protect user privacy. A notable aspect of this dataset is its significant class imbalance, where fraudulent transactions are vastly outnumbered by legitimate transactions, presenting a challenge for machine learning models.^30,71,72 The dataset consists of approximately 284,807 transactions, with only 492 labelled as fraudulent, highlighting the difficulty of detecting fraud owing to the rarity of positive instances.^9,73

The quality of datasets significantly affects the performance of hybrid models in fraud detection. High-quality datasets enable more accurate feature extraction and model training, leading to improved detection rates and reduced false positives.^20,74 Conversely, poor-quality datasets can result in overfitting, where models perform well on training data but fail to generalize to unseen data, ultimately undermining their effectiveness in real-world applications.^9,75 Therefore, ensuring high-quality data is essential for developing reliable and efficient fraud-detection systems.

This study employs the publicly available creditcard.csv dataset from Kaggle, which contains anonymized credit card transaction data from European cardholders. The dataset consists of 284,807 transactions, of which 492 are labelled as fraudulent, representing approximately 0.17% of the total data. The features include 28 principal components (V1–V28) derived through Principal Component Analysis (PCA) to preserve privacy, along with the Amount, Time, and Class attributes. The Class variable serves as the binary target label, where 1 indicates fraud, and 0 represents a legitimate transaction.

Data preprocessing and class imbalance

The preprocessing challenges in real-world financial datasets are prevalent and multifaceted. Common issues include handling missing values, addressing class imbalances, and ensuring data privacy and security.^2,6,76

As a preprocessing step, The MinMaxScaler technique was used. MinMaxScaler is a widely used data pre-processing technique that transforms numerical features by rescaling them to a specified range, typically between 0 and 1. This scaling method preserves the relationships between the original data values while ensuring that all features contribute proportionately to model training. It is particularly effective for distance-based algorithms and neural networks, which are sensitive to differences in feature magnitude. This helps standardize features such as transaction amounts or time-related attributes, enabling models such as autoencoders to converge more quickly and effectively.

Additionally, researchers have employed Principal Component Analysis (PCA) as a preprocessing tool for dimensionality reduction. PCA is a widely used technique for dimensionality reduction during anomaly detection. By transforming high-dimensional data into a lower-dimensional space, PCA helps identify patterns and anomalies more efficiently. This is achieved by projecting the data onto the directions of maximum variance, effectively filtering out noise and irrelevant features, which can obscure the detection of anomalies.^77,78

Furthermore, class imbalance, where legitimate transactions far outnumber fraudulent transactions, complicates the training of machine-learning models, often leading to biased predictions that favor the majority class.^72,79 To address class imbalance, researchers employed the BorderlineSMOTE method to address class imbalance in the creditcard.csv dataset. However, this method is exclusively applied during the training of supervised methods because it adversely affects the unsupervised algorithms.

Results and Discussion

In this study, XGBoost and Random Forest were employed as supervised learning algorithms, whereas the Autoencoder and Isolation Forest were utilized as unsupervised methods to detect anomalies. The data preprocessing pipeline includes MinMax normalization to standardize the feature scales and remove statistical outliers to reduce noise and improve model stability. To address the high dimensionality of the dataset, Principal Component Analysis (PCA) was applied as a dimensionality reduction technique, preserving the most significant variance components.

In addition, BorderlineSMOTE was incorporated into the training process of the supervised models to address class imbalance and improve minority class learning. This technique was particularly beneficial in enhancing the sensitivity of classifiers to fraudulent transactions while also reducing the risk of overfitting to rare fraud instances. Moreover, BorderlineSMOTE contributes to increased robustness against boundary-region vulnerabilities and potential data-poisoning attacks, thereby strengthening the overall generalization capability of supervised components.

As an initial step, we analyzed the performance of each method. The table below ( Table 1) presents the results of the precision, recall, F1-score and accuracy for each method for both normal cases (0) and fraud cases (1).

Table 1. Performance results for XGBoost, RandomForest, Autoencoder, and IsolationForest.

Method	Precision(0)	Precision(1)	Recall(0)	Recall(1)	F1-score(0)	F1-score(1)	Accuracy
XGBoost	0.9999	0.9407	0.9999	0.9250	0.9999	0.9328	0.9998
RandomForest	0.9998	0.9459	0.9999	0.8750	0.9999	0.9091	0.9997
Autoencoder	0.9993	0.5847	0.9994	0.5750	0.9993	0.5798	0.9987
IsolationForest	0.9999	0.0192	0.9230	0.9500	0.9599	0.0376	0.9230

Among the evaluated methods, XGBoost exhibited the best overall performance. It achieves near-perfect results for the majority class (Class 0) with precision (0) = 0.9999 and recall (0) = 0.9999 and maintains a high level of performance in the minority class (Class 1, i.e., fraud cases), with precision (1) = 0.9407, recall (1) = 0.9250, and F1-score (1) = 0.9328. This balance between precision and recall is crucial for fraud detection, indicating that XGBoost not only detects the most fraudulent transactions but also minimizes false alarms. The overall accuracy of 0.9998 further confirms its robustness, although in imbalanced datasets, the accuracy alone is not a sufficient indicator. In conclusion, XGBoost is a top-performing supervised method that effectively manages both false positives and false negatives.

Moreover, Random Forest also demonstrates strong performance in the majority class, similar to XGBoost, with Precision(0) = 0.9998 and Recall(0) = 0.9999. However, it performed slightly lower on the minority class, with recall (1) = 0.8750 and F1-score (1) = 0.9091. This suggests that while Random Forest is highly effective, it may miss a small number of fraud cases compared to XGBoost. Nevertheless, its accuracy of 0.9997 confirms its high reliability. In conclusion, Random Forest is an effective and reliable ensemble method, but slightly less optimal than XGBoost for fraud detection.

In contrast, The Autoencoder, an unsupervised learning method trained on normal data (Class 0), performs exceptionally well on the majority class, with precision (0) = 0.9993 and recall (0) = 0.9994. However, its fraud detection performance was significantly lower, with precision (1) = 0.5847, recall (1) = 0.5750, and F1-score (1) = 0.5798. Although it still detects some anomalies, the model generates a large number of false positives and fails to detect many frauds. In conclusion, the autoencoder is moderately effective as a baseline anomaly detector but lacks precision and recall for minority class identification in isolation.

The isolation Forest produces poor results for fraud detection, with precision (1) = 0.0192 and F1-score (1) = 0.0376, despite a relatively high recall (1) = 0.9500. This suggests that while it flags nearly all frauds (high recall), it generates an extremely high number of false positives (very low precision), making it impractical for real-world fraud detection, where every alert carries a cost. The overall accuracy of 0.9230 was misleadingly high, inflated by the overwhelming presence of normal transactions. In conclusion, the forest isolation method is overly sensitive and lacks practical usefulness for fraud detection in imbalanced datasets.

In high-stakes domains, such as credit card fraud detection, the cost of false positives (customer complaints) and false negatives (missed fraud) must be minimized. Among the models tested, XGBoost provided the best trade-off between fraud detection and noise minimization. Hybrid approaches that combine the sensitivity of unsupervised methods (such as autoencoders) with the precision of supervised learners (such as XGBoost or RF) may offer better results when properly tuned.

Hence, in this study, we tested a hybrid model by combining these four methods (XGBoost, RandomForest, Autoencoder, and IsolationForest) and applied a weight tool as it assigns different importance levels (weights) to the outputs of various models (e.g., Autoencoder, XGBoost, Isolation Forest, etc.) when combining their anomaly scores into a single decision score.

The table below ( Table 2) presents the final performance results after combining the methods and applying the weights. We named the model XRAI, which is the first letter of each method (XGBoost, RandomForest, Autoencoder, and IsolationForest).

Table 2. Performance comparison between XRAI and other models.

Method	Precision(0)	Precision(1)	Recall(0)	Recall(1)	F1-score(0)	F1-score(1)	Accuracy
XRAI	0.9999	0.9569	0.9999	0.9250	0.9999	0.9407	0.9998
XGBoost	0.9999	0.9407	0.9999	0.9250	0.9999	0.9328	0.9998
RandomForest	0.9998	0.9459	0.9999	0.8750	0.9999	0.9091	0.9997
Autoencoder	0.9993	0.5847	0.9994	0.5750	0.9993	0.5798	0.9987
IsolationForest	0.9999	0.0192	0.9230	0.9500	0.9599	0.0376	0.9230

The hybrid XRAI model, which integrates the strengths of XGBoost, Random Forest, Autoencoder, and Isolation Forest using a weighted score, demonstrates outstanding anomaly detection capability. It effectively combines supervised and unsupervised methods to balance precision, recall, and generalization, which are crucial in high-stake fraud detection scenarios.

Performance on the majority class (Normal - Class 0)

• Precision (0) = 0.9999 and recall (0) = 0.9999 indicate near-perfect classification of legitimate transactions.
This means that the model is extremely reliable for minimizing false positives, which is critical for avoiding the disruption of normal customer activity.
• The F1-score (0) = 0.9999 confirms that there is no trade-off between precision and recall for normal transactions.

Performance on the minority class (Fraud - Class 1)

• Precision (1) = 0.9569 indicates that when the model flags a transaction as fraudulent, it is correct approximately 96% of the time, which is vital to avoid wasting resources on false alarms.
• Recall (1) = 0.9250 shows that the model can capture over 92% of all fraudulent transactions, which is an impressive detection rate given the class imbalance and subtlety of the fraud patterns.
• The F1-score (1) = 0.9407 demonstrates a strong harmonic balance between precision and recall, making the model highly effective for real-world deployment.

Figure 1 illustrates the Receiver Operating Characteristic (ROC) curve for the proposed hybrid anomaly detection model, XRAI (XGBoost, Random Forest, Autoencoder, Isolation Forest). The ROC curve plots the True Positive Rate (recall) against the False Positive Rate (1 - Specificity) across a range of classification thresholds.

Figure 1. Receiver Operating Characteristic (ROC) curve for the proposed model XRAI.

(XRAI: First letters of XGBoost, Random Forest, Autoencoder, Isolation Forest).

The curve shows a steep rise toward the upper-left corner of the plot, which is indicative of a high-performing classifier. The area under the ROC curve (AUC) is 0.9885, suggesting that the model had excellent discriminative capability. An AUC value closer to 1 indicates that the classifier is highly capable of distinguishing between the positive class (fraudulent transactions) and negative class (legitimate transactions).

In summary, the ROC curve and its corresponding AUC of 0.9885 provide strong empirical evidence of XRAI’s ability to effectively separate fraud from non-fraud, even under class imbalance conditions, a critical requirement for robust fraud-detection systems in the financial domain.

The proposed XRAI model, an ensemble combining XGBoost, Random Forest, Autoencoder, and Isolation Forest, achieved a Matthews Correlation Coefficient (MCC) of 94.07%, indicating a strong and balanced predictive performance, particularly in the context of imbalanced classification tasks such as credit card fraud detection.

The XRAI model demonstrates a highly optimized hybrid ensemble for credit card fraud detection. It achieves excellent detection of rare fraudulent cases, while maintaining ultralow false-positive rates. The combination of supervised precision and unsupervised anomaly sensitivity is managed through a weighted mechanism that positions XRAI as a practically deployable solution in real-time financial anomaly detection systems.

Comparison to other similar studies

To contextualize the performance of the proposed hybrid anomaly detection framework, a comparative analysis was conducted with recent studies on credit card fraud detection that utilized similar datasets and evaluation metrics. The objective of this comparison is to demonstrate the relative effectiveness of the proposed model in terms of precision, recall, F1-score, and MCC.

Several studies have explored both the single-model and hybrid approaches using the Kaggle credit card fraud dataset. These models include supervised methods such as Logistic Regression, Random Forest, and XGBoost as well as unsupervised techniques such as Isolation Forest and Autoencoder-based anomaly detectors. In more recent works, hybrid models combining deep learning and ensemble techniques have been proposed to address the limitations of detection accuracy and generalizability.

Table 3, summarizes the selection of comparable studies, outlining the key models used and their reported results. The evaluation metrics used in each study were also included to enable a standardized comparison. Where applicable, the performance of the proposed hybrid model is highlighted to illustrate the improvements over the existing approaches.

Table 3. Comparative performance of proposed model vs. existing studies.

Method	Accuracy	Precision	Recall (TPR)	F1-score	MCC	TNR
Our Proposed Method (XRAI)	0.9998	0.9569	0.9250	0.9407	0.9407	0.9999
Ding et al. (2024)⁸⁰ - AE + LightGBM (AEELG)	0.921	0.8875	0.3451	0.4722	0.4739
Du et al. (2024)⁸¹ - AE-XGB-SMOTE-CGAN	0.9993		0.7839		0.8845	0.9997
Alshameri & Xia (2024)⁸² – VAE		0.93	0.92	0.92
Wu & Wang (2022)⁸³ - Autoencoder + Adversarial Net	0.9061	0.9216	0.8878	0.9044	0.8128
Lok et al. (2022)²³ - Hybrid Kmeans -KNN		0.9579	0.7215	0.8231
Ishak et al. (2022)⁸⁴ - Enhanced Stacking Classifier System	0.9837		0.8841
Benchaji et al. (2021)⁸⁵ - Attention + LSTM	0.9672	0.9885	0.9191

As shown in the Table 3, the proposed hybrid model achieved superior performance across multiple metrics, attaining the highest accuracy with value of 0.9998, precision of 0.9569 (in top threee) and recall of 0.9250 (top one), resulting in an F1-score of 0.9407 (top one) and MCC of 0.9407 (top one). These results reflect significant advancements over earlier models, particularly in balancing the trade-off between the sensitivity and specificity.

This comparison substantiates the effectiveness of the proposed framework and supports its relevance as a practical, high-performance solution for financial fraud detection.

Real-world applications of the model in financial fraud detection

The findings of this study have significant implications for real-world financial fraud detection, particularly in environments where data are imbalanced, adversarial, and evolving. The proposed hybrid model, XRAI, demonstrated exceptional accuracy and robustness in detecting anomalies in widely used credit cards. csv dataset. By leveraging the strengths of XGBoost, Random Forest, Autoencoder, and Isolation Forest through a weighted scoring mechanism, XRAI offers a holistic and practical approach for identifying fraudulent financial transactions in real-time.

One of the most critical applications of this model is early detection of credit card fraud. Financial institutions are facing increasing threats from sophisticated fraud schemes that are often hidden within massive volumes of transactional data. Traditional models that rely solely on supervised learning struggle with previously unseen and rare types of fraud. By incorporating unsupervised models, such as autoencoders and isolation forests, XRAI can detect previously unclassified anomalies, enabling systems to capture zero-day fraud attacks that evade conventional classifiers.

In addition to fraud detection, this hybrid approach can be adapted for anti-money laundering (AML) systems, insurance fraud detection, and transaction monitoring in e-commerce. Given the adaptability of the model to high-dimensional and noisy data, it can also be used in environments beyond banking, such as healthcare claim validation or cyber intrusion detection, where anomalous patterns are often rare and context dependent.

The practical benefits of this hybrid system extend beyond academic experimentation. It offers a deployable, scalable, and intelligent solution for industries facing complex fraud challenges. As financial crime continues to grow in scale and complexity, systems such as XRAI provide a promising blueprint for building more secure, proactive, and trustworthy fraud detection frameworks.

Challenges in implementation and model limitations

Although the XRAI hybrid model presents a strong case for fraud-detection performance, several limitations emerged during the development and evaluation that must be addressed to fully understand its practical applicability. These limitations can be grouped into three primary categories: data, models, and operational constraints.

First, it relies on the creditcard.csv dataset, which has certain constraints despite its popularity. It is highly imbalanced, anonymized, and preprocessed and does not fully reflect the diversity and noise found in real-world financial data. Features such as merchant category, transaction geolocation, and time-series behavior were not present in this dataset. This limits the generalizability of the model to broader financial environments. Moreover, the dataset lacks adversarial fraud samples that mimic legitimate behavior, which is increasingly common in real financial systems.

Second, the complexity of hybrid architecture introduces challenges in terms of interpretability, maintenance, and scalability. Although the ensemble combines multiple strengths, it also has its weaknesses. For example, autoencoders require careful tuning and are sensitive to reconstruction thresholds, whereas Isolation Forests tend to produce high false-positive rates unless precisely calibrated. Managing the balance of weights across all models adds an additional layer of complexity, particularly when adapting a system to new datasets or changing fraud patterns.

Another limitation is the requirement for labelled data for supervised components, such as XGBoost and Random Forest. Labeling fraud in real-world data is often delayed or incomplete, which can limit the speed of retraining and adaptation. In rapidly changing environments, supervised models become stable unless mechanisms are in place for online or incremental learning.

In summary, although XRAI provides strong fraud-detection performance in experimental settings, its real-world deployment requires careful consideration of data diversity, model manageability, latency, and compliance. Addressing these limitations can further enhance its reliability and adoption.

Conclusion and future work

This study introduced a novel hybrid model, XRAI, designed to enhance the performance and robustness of anomaly detection in credit card fraud-detection systems. By strategically integrating supervised learning algorithms such as XGBoost and Random Forest with unsupervised techniques such as autoencoders and isolation forests, the model effectively overcomes the limitations of single-classifier approaches in highly unbalanced and adversarial environments.

The XRAI model demonstrated strong predictive power across a range of performance metrics, achieving an accuracy of 99.98%, precision of 95.69%, recall of 92.50%, and F1-score of 94.07%. The Matthews Correlation Coefficient (MCC) of 94.07% and AUC of 0.9885 further indicate a high discriminative ability and balanced performance between the fraud and non-fraud classes. These results highlight the model’s potential for real-time deployment in financial institutions aimed at reducing operational risks and minimizing false alarms.

Despite these achievements, the study also acknowledged key limitations, including reliance on a single publicly available dataset (creditcard.csv), the computational cost of the hybrid architecture, and interpretability challenges. These limitations pave the way for further research in this area.

Building on the current findings, future research on the XRAI model can pursue several promising directions to enhance its applicability and robustness in real-world settings. A critical improvement involves incorporating temporal and contextual features, as fraudulent behaviors often manifest as sequential patterns over time. Leveraging techniques such as LSTM-based Autoencoders or Transformer-based architectures can enhance the detection of complex and evolving fraud strategies. Moreover, integrating contextual data, such as customer profiles, merchant categories, and geographic transaction information, can further improve classification accuracy and reduce false positives.

Future studies should focus on adaptive ensemble strategies, explainable AI techniques, and robustness against adversarial attacks. Testing the model across diverse datasets and domains is essential to validate its generalizability and scalability.

In conclusion, the XRAI model presents a scalable, intelligent, and highly accurate solution for credit-card fraud detection. With further refinements in temporal modelling, explainability, and robustness, hybrid models such as XRAI hold significant promise for building trustworthy and resilient fraud detection systems tailored to the ever-evolving landscape of financial crime.

Ethical considerations

Not applicable. This study does not involve human or animal subjects.

Contributions

The contributions of each author are described according to the CRediT (Contributor Roles Taxonomy) system:

• Mohammad Shanaa: Conceptualization; Methodology; Data Curation; Formal Analysis; Software; Validation; Visualization; Writing – Original Draft; Writing – Review & Editing; Project Administration.
Mohammad Shanaa led the design and execution of the research, conducted the data analysis and model development, and prepared the initial and revised versions of the manuscript.
• Sherief Abdallah: Supervision; Conceptualization; Writing – Review & Editing.
Sherief Abdallah supervised the research process, contributed to refining the methodology and framing the research direction, and provided critical revisions to the manuscript.

Data availability

Underlying data

This project is utilizing creditcard.csv dataset which is available on Kaggle website. The dataset is available with license type Database Contents License (DbCL) v1.0.

Users can download the dataset using the following steps:

- Visit: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
- Login/Register with Kaggle website
- Click on Download option, and select Download dataset option

Extended data

The source code can be accessed from: https://github.com/mohshanaa/XRAI.git

Archived code as time of publication⁸⁶: https://doi.org/10.5281/zenodo.15626193

License: Creative Commons Attribution 4.0 International

Acknowledgements

This manuscript utilized OpenAI’s ChatGPT (GPT-4) for drafting, linguistic refinement, and grammatical editing. Additionally, scite.ai was employed to identify and evaluate pertinent academic sources. The final content is the result of the author’s original work and the critical analysis.

References

1. Setiawan R, Tjahjono B, Firmansyah G, et al.: Fraud Detection in Credit Card Transactions Using HDBSCAN, UMAP and SMOTE Methods. International Journal of Science, Technology & Management. 2023; 4: 1333–1339. Publisher Full Text
2. Noviandy TR, Idroes GM, Maulana A, et al.: Credit Card Fraud Detection for Contemporary Financial Management Using XGBoost-Driven Machine Learning and Data Augmentation Techniques. Indatu J Manag Account. 2023; 1: 29–35. Publisher Full Text
3. Shimu Khatun M, Rabiul Alam B, Taslim M, et al.: Handling Class Imbalance in Credit Card Fraud Using Various Sampling Techniques. Am J Multidis Res Innov. 2022; 1: 160–168. Publisher Full Text
4. Naidoo K, Marivate V: Unsupervised Anomaly Detection of Healthcare Providers Using Generative Adversarial Networks.Hattingh M, Matthee M, Smuts H, et al., editors. Responsible Design, Implementation and Use of Information and Communication Technology. Cham: Springer International Publishing; 2020; vol. 12066. : pp. 419–430. Publisher Full Text
5. Peng H, Wang J: Unbalanced Data Processing and Machine Learning in Credit Card Fraud Detection.2022. Publisher Full Text
6. Gowda VT: Credit Card Fraud Detection using Supervised and Unsupervised Learning. Computer Science & Information Technology (CS & IT), AIRCC Publishing Corporation; 2021; 93–98. Publisher Full Text
7. Ganji VR, Chaparala A, Sajja R: Shuffled shepherd political optimization-based deep learning method for credit card fraud detection. Concurr. Comput. 2023; 35: e7666. Publisher Full Text
8. Jain A, Arora M, Mehra A, et al.: Anomaly Detection Algorithms in Financial Data. IJEAT. 2021; 10: 76–78. Publisher Full Text
9. Aslam F: Advancing Credit Card Fraud Detection: A Review of Machine Learning Algorithms and the Power of Light Gradient Boosting. AJCST. 2024. Publisher Full Text
10. Pitsane MY, Mogale H, Rensburg JJV: Improving Accuracy of Credit Card Fraud Detection Using Supervised Machine Learning Models and Dimension Reduction. ICONIC. 2022; 2022: 290–301. Publisher Full Text
11. Saad S, Nadher I, Hameed SM: Credit Card Fraud Detection Challenges and Solutions: A Review. Iraqi J. Sci. 2024; 2287–2303. Publisher Full Text
12. Zhang Y-F, Lu H-L, Lin H-F, et al.: The Optimized Anomaly Detection Models Based on an Approach of Dealing with Imbalanced Dataset for Credit Card Fraud Detection. Mob. Inf. Syst. 2022; 2022: 1–10. Publisher Full Text
13. Zheng J, Yang L, Xin D, et al.: The Credit Card Anti-fraud Detection Model in the Context of Dynamic Integration Selection Algorithm. FCIS. 2024; 6: 119–122. Publisher Full Text
14. Maheshwari VC, Osman NA, Aziz N: A Hybrid Approach Adopted for Credit Card Fraud Detection Based on Deep Neural Networks and Attention Mechanism. ARASET. 2023; 32: 315–331. Publisher Full Text
15. Berhane T, Melese T, Walelign A, et al.: A Hybrid Convolutional Neural Network and Support Vector Machine-Based Credit Card Fraud Detection Model. Math. Probl. Eng. 2023; 2023: 8134627. Publisher Full Text
16. Jiang S, Dong R, Wang J, et al.: Credit Card Fraud Detection Based on Unsupervised Attentional Anomaly Detection Network. Systems. 2023; 11: 305. Publisher Full Text
17. Alharbi A, Alshammari M, Okon OD, et al.: A Novel text2IMG Mechanism of Credit Card Fraud Detection: A Deep Learning Approach. Electronics. 2022; 11: 756. Publisher Full Text
18. Hajek P, Abedin MZ, Sivarajah U: Fraud Detection in Mobile Payment Systems using an XGBoost-based Framework. Inf. Syst. Front. 2023; 25: 1985–2003. PubMed Abstract | Publisher Full Text | Free Full Text
19. Liu C: Enhancing Credit Card Fraud Detection on Imbalanced Datasets. HBEM. 2023; 21: 765–773. Publisher Full Text
20. Airlangga G: Evaluating the Efficacy of Machine Learning Models in Credit Card Fraud Detection. CNAHPC. 2024; 6: 829–837. Publisher Full Text
21. Murat RK, Tursunmetova F, Nadirov NK: MULTI-CLASSIFIERS SYSTEM FOR CREDIT CARD FRAUD DETECTION. BTOUPhMath. 2023; 33–47. Publisher Full Text
22. Sujitha P, Vanitha R: Enhanced Technique for Credit Card Extortion Detection Using Extreme Gradient Boosting Algorithm. MEJAST. 2023; 06: 35–45. Publisher Full Text
23. Lok LK, Abdul Hameed V, Ehsan Rana M: Hybrid machine learning approach for anomaly detection. IJEECS. 2022; 27: 1016. Publisher Full Text
24. Debener J, Heinke V, Kriebel J: Detecting insurance fraud using supervised and unsupervised machine learning. J. Risk Insur. 2023; 90: 743–768. Publisher Full Text
25. Carcillo F, Le Borgne Y-A, Caelen O, et al.: Combining unsupervised and supervised learning in credit card fraud detection. Inf. Sci. 2021; 557: 317–331. Publisher Full Text
26. Nassif AB, Talib MA, Nasir Q, et al.: Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access. 2021; 9: 78658–78700. Publisher Full Text
27. Benedek B, Ciumas C, Nagy BZ: Automobile insurance fraud detection in the age of big data – a systematic and comprehensive literature review. JFRC. 2022; 30: 503–523. Publisher Full Text
28. Fraud Guard: A Comprehensive Comparative Analysis of Machine Learning Approaches to Enhance Credit Card Fraud Detection. JIEA. 2024. Publisher Full Text
29. Sulaiman SS, Nadher I, Hameed SM: Credit Card Fraud Detection Using Improved Deep Learning Models. CMC. 2024; 78: 1049–1069. Publisher Full Text
30. Lai G: Artificial Intelligence Techniques for Fraud Detection.2023. Publisher Full Text
31. Adelakun BO, Onwubuariri ER, Adeniran GA, et al.: Enhancing fraud detection in accounting through AI: Techniques and case studies. Financ. Account Res. J. 2024; 6: 978–999. Publisher Full Text
32. Esmaeili F, Cassie E, Nguyen HPT, et al.: Anomaly Detection for Sensor Signals Utilizing Deep Learning Autoencoder-Based Neural Networks. Bioengineering. 2023; 10: 405. PubMed Abstract | Publisher Full Text | Free Full Text
33. Park S, Adosoglou G, Pardalos PM: Interpreting Rate-Distortion of Variational Autoencoder and Using Model Uncertainty for Anomaly Detection.2020. Publisher Full Text
34. Fraser K, Homiller S, Mishra RK, et al.: Challenges for Unsupervised Anomaly Detection in Particle Physics.2021. Publisher Full Text
35. Kim Y-G, Park T-H: Anomaly Detection Using Autoencoder With Feature Vector Frequency Map. IEEE Access. 2021; 9: 73808–73817. Publisher Full Text
36. Zhu J, Jiang M, Liu Z: Fault Detection and Diagnosis in Industrial Processes with Variational Autoencoder: A Comprehensive Study. Sensors. 2021; 22: 227. PubMed Abstract | Publisher Full Text | Free Full Text
37. Ikeda C, Ouazzane K, Yu Q, et al.: New Feature Engineering Framework for Deep Learning in Financial Fraud Detection. IJACSA. 2021; 12. Publisher Full Text
38. Rosley N, Tong G-K, Ng K-H, et al.: Autoencoders with Reconstruction Error and Dimensionality Reduction for Credit Card Fraud Detection. Haw S-C, Sonai Muthu K, editors. Proceedings of the International Conference on Computer, Information Technology and Intelligent Computing (CITIC 2022). Dordrecht: Atlantis Press International BV; 2022; pp. 503–512. Publisher Full Text
39. Salekshahrezaee Z, Leevy JL, Khoshgoftaar TM: The effect of feature extraction and data sampling on credit card fraud detection. J. Big Data. 2023; 10: 6. Publisher Full Text
40. Lin T-H, Jiang J-R: Credit Card Fraud Detection with Autoencoder and Probabilistic Random Forest. Mathematics. 2021; 9: 2683. Publisher Full Text
41. Prabha DP, Priscilla CV: Probabilistic XGBoost Threshold Classification with Autoencoder for Credit Card Fraud Detection. IJRITCC. 2023; 11: 528–537. Publisher Full Text
42. Gomes C, Jin Z, Yang H: Insurance fraud detection with unsupervised deep learning. J. Risk Insur. 2021; 88: 591–624. Publisher Full Text
43. Bulut O, Gorgun G, He S: Unsupervised Anomaly Detection in Sequential Process Data: Insights From PIAAC Problem-Solving Tasks. Z. Psychol. 2024; 232: 74–94. Publisher Full Text
44. Mohamed Elmahalwy A, Mousa HM, Amin KM: New hybrid ensemble method for anomaly detection in data science. IJECE. 2023; 13: 3498. Publisher Full Text
45. Feng B, Zhang L: Optimizing the Isolation Forest Algorithm for Identifying Abnormal Behaviors of Students in Education Management Big Data. JAIT. 2023. Publisher Full Text
46. Research Scholar: Department of Computer Science, Karpagam Academy of Higher Education, Coimbatore, 641 021, Tamil Nadu, India, Prajesha TM, Veni S. An Efficient Outlier Detection Using Isolation Forest Based on Robust Scaling and Principal Component Analysis for the Prediction of Anxiety Disorder. IJST. 2023; 16: 2244–2251. Publisher Full Text
47. Fang N, Fang X, Lu K: Anomalous Behavior Detection Based on the Isolation Forest Model with Multiple Perspective Business Processes. Electronics. 2022; 11: 3640. Publisher Full Text
48. Hadi MU, Tashi QA, Qureshi R, et al.: A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage.2023. Publisher Full Text
49. Meduri K: Cybersecurity threats in banking: Unsupervised fraud detection analysis. Int. J. Sci. Res. Arch. 2024; 11: 915–925. Publisher Full Text
50. Liu J, Wang Y, Sun Y, et al.: Preparation and Optimization of Mesoporous SnO₂ Quantum Dot Thin Film Gas Sensors for H₂S Detection Using XGBoost Parameter Importance Analysis. Chemosensors. 2023; 11: 525. Publisher Full Text
51. Shi F, Lu S, Gu J, et al.: Modeling and Evaluation of the Permeate Flux in Forward Osmosis Process with Machine Learning. Ind. Eng. Chem. Res. 2022; 61: 18045–18056. Publisher Full Text
52. Wang X, Ding C, Chen T, et al.: Research on the Application of Bayesian-Optimized XGBoost in Minor Faults in Coalfields. Math. Probl. Eng. 2022; 2022: 1–13. Publisher Full Text
53. Nam SM, Peterson TA, Seo KY, et al.: Discovery of Depression-Associated Factors From a Nationwide Population-Based Survey: Epidemiological Study Using Machine Learning and Network Analysis. J. Med. Internet Res. 2021; 23: e27344. PubMed Abstract | Publisher Full Text | Free Full Text
54. Huang P, Yan H, Song Z, et al.: Combining autoencoder with clustering analysis for anomaly detection in radiotherapy plans. Quant. Imaging Med. Surg. 2023; 13: 2328–2338. PubMed Abstract | Publisher Full Text | Free Full Text
55. Guo M, Yuan Z, Janson B, et al.: Older Pedestrian Traffic Crashes Severity Analysis Based on an Emerging Machine Learning XGBoost. Sustainability. 2021; 13: 926. Publisher Full Text
56. Patel S, Singh G, Zarbiv S, et al.: Mortality Prediction Using SaO₂/FiO₂ Ratio Based on eICU Database Analysis. Crit. Care Res. Prac. 2021; 2021: 1–9. PubMed Abstract | Publisher Full Text | Free Full Text
57. Ru B, Kujawski S, Lee Afanador N, et al.: Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches (Preprint).2022. Publisher Full Text
58. Esmaeilzadeh S, Salajegheh N, Ziai A, et al.: Abuse and Fraud Detection in Streaming Services Using Heuristic-Aware Machine Learning.2022. Publisher Full Text
59. Dong Y, Chen K, Peng Y, et al.: Comparative Study on Supervised versus Semi-supervised Machine Learning for Anomaly Detection of In-vehicle CAN Network. 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). Macau, China: IEEE; 2022; pp. 2914–2919. Publisher Full Text
60. Bakumenko A, Elragal A: Detecting Anomalies in Financial Data Using Machine Learning Algorithms. Systems. 2022; 10: 130. Publisher Full Text
61. Aghaee M, Krau S, Tamer IM, et al.: Unsupervised Hybrid Models Integrating Deep Autoencoders and Process Controllers’ Models for Enhanced Process Monitoring and Fault Detection. Ind. Eng. Chem. Res. 2024; 63: 14748–14760. Publisher Full Text
62. Zhang Y, Huangfu Y, Ziada Y, et al.: A Hybrid Fault Detection Method for Hairpin Windings Integrating Physics Model and Machine Learning. IEEE Access. 2024; 12: 70392–70404. Publisher Full Text
63. Giroh H, Kumar V, Singh G: Improving the Performance of Hybrid Models Using Machine Learning and Optimization Techniques. Ijmst. 2023; 10: 3396–3409. Publisher Full Text
64. Albahlal B: Emerging Technology-Driven Hybrid Models for Preventing and Monitoring Infectious Diseases: A Comprehensive Review and Conceptual Framework. Diagnostics. 2023; 13: 3047. PubMed Abstract | Publisher Full Text | Free Full Text
65. Li N, Chiang F, Down DG, et al.: A decision integration strategy for short-term demand forecasting and ordering for red blood cell components.2020. Publisher Full Text
66. Liao W-W, Hsieh Y-W, Lee T-H, et al.: Machine learning predicts clinically significant health related quality of life improvement after sensorimotor rehabilitation interventions in chronic stroke. Sci. Rep. 2022; 12: 11235. PubMed Abstract | Publisher Full Text | Free Full Text
67. Ito G, Yada S, Wakamiya S, et al.: Predictive Model for Extended-Spectrum β-Lactamase–Producing Bacterial Infections Using Natural Language Processing Technique and Open Data in Intensive Care Unit Environment: Retrospective Observational Study. JMIR Form Res. 2024; 8: e54044. PubMed Abstract | Publisher Full Text | Free Full Text
68. Tan M, Ma W, Sun Y, et al.: Prediction of the Growth Rate of Early-Stage Lung Adenocarcinoma by Radiomics. Front. Oncol. 2021; 11: 658138. PubMed Abstract | Publisher Full Text | Free Full Text
69. Sun F, Han B, Wu F, et al.: Development And Validation Of Models To Predict Cesarean Delivery Among Low-Risk Nulliparous Women At Term: A Retrospective Study In China.2020. Publisher Full Text
70. Tu K-C, Tau ENT, Chen N-C, et al.: Machine Learning Algorithm Predicts Mortality Risk in Intensive Care Unit for Patients with Traumatic Brain Injury. Diagnostics. 2023; 13: 3016. PubMed Abstract | Publisher Full Text | Free Full Text
71. Manda VT, Kondapalli D, Malla AS, et al.: Imbalanced Data Challenges and Their Resolution to Improve Fraud Detection in Credit Card Transactions.2024. Publisher Full Text
72. Esenogho E, Mienye ID, Swart TG, et al.: A Neural Network Ensemble With Feature Engineering for Improved Credit Card Fraud Detection. IEEE Access. 2022; 10: 16400–16407. Publisher Full Text
73. Sudhakar M, Kaliyamurthie KP: A Novel Machine learning Algorithms used to Detect Credit Card Fraud Transactions. IJRITCC. 2023; 11: 163–168. Publisher Full Text
74. Ileberi E, Sun Y, Wang Z: Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost. IEEE Access. 2021; 9: 165286–165294. Publisher Full Text
75. Assessing the feasibility of machine learning-based modelling and prediction of credit fraud outcomes using hyperparameter tuning. ACSS. 2023; 7. Publisher Full Text
76. Trisanto D, Rismawati N, Mulya M, et al.: Effectiveness Undersampling Method and Feature Reduction in Credit Card Fraud Detection. IJIES. 2020; 13: 173–181. Publisher Full Text
77. Mohammed RA, A. Bazzi Y.: Implement an Intrusion Detection System Utilizing Machine Learning and Principal Component Analysis. IRJIET. 2024; 08: 01–07. Publisher Full Text
78. Ezekiel S, Alshehri AA, Pearlstein L, et al.: IoT Anomaly Detection using Multivariate. IJITEE. 2020; 9: 1662–9. Publisher Full Text
79. Zhu M, Zhang Y, Gong Y, et al.: Enhancing Credit Card Fraud Detection: A Neural Network and SMOTE Integrated Approach. JTPES. 2024; 4: 23–30. Publisher Full Text
80. Ding L, Liu L, Wang Y, et al.: An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection. PeerJ Comput. Sci. 2024; 10: e2323. PubMed Abstract | Publisher Full Text | Free Full Text
81. Du H, Lv L, Wang H, et al.: A novel method for detecting credit card fraud problems. PLoS ONE. 2024; 19: e0294537. PubMed Abstract | Publisher Full Text | Free Full Text
82. Alshameri F, Xia R: An Evaluation of Variational Autoencoder in Credit Card Anomaly Detection. Big Data Min. Anal. 2024; 7: 718–729. Publisher Full Text
83. Wu T, Wang Y: Locally Interpretable One-Class Anomaly Detection for Credit Card Fraud Detection.2022. Publisher Full Text
84. Ishak NA, Ng K-H, Tong G-K, et al.: Mitigating unbalanced and overlapped classes in credit card fraud data with enhanced stacking classifiers system. F1000Res. 2022; 11: 71. Publisher Full Text
85. Benchaji I, Douzi S, El Ouahidi B, et al.: Enhanced credit card fraud detection based on attention mechanism and LSTM deep model. J. Big Data. 2021; 8: 151. Publisher Full Text
86. Shanaa M, Abdallah S: XRAI: A Hybrid Anomaly Detection Framework for Credit Card Fraud Detection.2025. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 07 Jul 2025

Author details Author details

¹ Faculty of Engineering and IT, The British University in Dubai, Dubai, Dubai, United Arab Emirates

Mohammad Shanaa
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Resources, Software, Visualization, Writing – Original Draft Preparation

Sherief Abdallah
Roles: Supervision, Validation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 07 Jul 2025, 14:664

https://doi.org/10.12688/f1000research.166350.1

Copyright

© 2025 Shanaa M and Abdallah S. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Shanaa M and Abdallah S. A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised Learning for Credit Card Fraud Detection [version 1; peer review: 1 not approved]. F1000Research 2025, 14:664 (https://doi.org/10.12688/f1000research.166350.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 07 Jul 2025

Views

12

Reviewer Report 26 Aug 2025

Gian Marco Paldino, Université Libre de Bruxelles, Brussels, Belgium

Not Approved

https://doi.org/10.5256/f1000research.183325.r398688

The manuscript presents a hybrid model, named XRAI, for credit card fraud detection. The model combines two supervised (XGBoost, Random Forest) and two unsupervised (Autoencoder, Isolation Forest) algorithms, reporting superior performance on the public Kaggle creditcard.csv dataset. The problem of fraud detection ... Continue reading

The manuscript presents a hybrid model, named XRAI, for credit card fraud detection. The model combines two supervised (XGBoost, Random Forest) and two unsupervised (Autoencoder, Isolation Forest) algorithms, reporting superior performance on the public Kaggle creditcard.csv dataset. The problem of fraud detection is of significant practical and academic importance, and the authors' effort to develop a high-performance solution is commendable.
However, the manuscript in its current form suffers from several major methodological and conceptual issues that must be addressed before it can be considered for indexing. The core concerns relate to the justification of the preprocessing pipeline, the rationale for the ensemble's architecture, and the practical significance of the model's contribution.
Major Concerns

Fundamental Flaw in Preprocessing Methodology: A significant methodological concern is the application of Principal Component Analysis (PCA) for dimensionality reduction. The creditcard.csv dataset's primary features (V1-V28) are already the result of a PCA transformation, a fact the authors acknowledge. Applying PCA again to these components is conceptually flawed, as it assumes the components are correlated in a way that allows for further linear dimensionality reduction, which is not guaranteed and highly unusual. This step demonstrates a misunderstanding of the dataset's nature and potentially distorts the data's inherent structure. The authors must either remove this step or provide a strong theoretical justification for this unconventional approach.
Unjustified Ensemble Architecture and Model Selection: The rationale for the specific composition of the XRAI model is unclear and seems arbitrary.
- Inclusion of a Poorly Performing Model: The authors' own results (Table 1) show that the Isolation Forest model yields extremely low precision (0.0192) and an F1-score (0.0376) for the fraud class, which the authors rightly identify as "impractical" and lacking "practical usefulness." Its inclusion in the final weighted ensemble is counterintuitive and requires justification. A clear explanation is needed as to why a model known to be highly noisy and generate excessive false positives contributes positively to the final ensemble.
- Redundancy of Supervised Models: The framework includes both XGBoost and Random Forest, which are methodologically similar tree-based ensemble methods. The manuscript does not explain the benefit of using both in the final model rather than selecting the single best-performing supervised algorithm. This adds unnecessary complexity without a clear, stated advantage.
Marginal Performance Gain and Novelty of Contribution:
- The manuscript claims to propose a "novel" framework. While the specific four-model combination might be new, the general concept of creating a hybrid model by combining supervised and unsupervised learning for fraud detection is well-established in the literature, as cited by the authors themselves (e.g., Carcillo et al., 2021).
- Furthermore, the performance gain of the complex XRAI model over its best individual component (XGBoost) is marginal. The F1-score for the fraud class improves from 0.9328 to 0.9407—a gain of less than one percentage point—while the recall remains identical. The authors should clarify the practical significance of this small improvement in light of the model's increased complexity, maintainability, and computational overhead. A simpler ensemble, perhaps combining only XGBoost and the Autoencoder, should be tested and discussed as a more parsimonious alternative.
Insufficient Motivation for Imbalance Handling Technique: The choice of BorderlineSMOTE for handling class imbalance is stated but not motivated. The authors should briefly explain why this specific technique was selected over other common methods (e.g., ADASYN, SMOTE-ENN, random over/under-sampling) and how it is particularly suited for this dataset and model architecture
Generalizability: The framework is not specific to fraud detection and is tested on a single, anonymized dataset. While this is a limitation of the study, the authors could strengthen the discussion by more clearly positioning the framework as a general anomaly detection pipeline and suggesting how it might be adapted with domain-specific features for other applications, and including performance metrics for other publicly available datasets.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Credit Card Fraud Detection, Time Series Forecasting, Anomaly Detection

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 07 Jul 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1
Version 1 07 Jul 25	read

Gian Marco Paldino, Université Libre de Bruxelles, Brussels, Belgium

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

12 Views

26 Aug 2025 | for Version 1

Gian Marco Paldino, Université Libre de Bruxelles, Brussels, Belgium

12 Views Cite this report Responses(0)

Not Approved

The manuscript presents a hybrid model, named XRAI, for credit card fraud detection. The model combines two supervised (XGBoost, Random Forest) and two unsupervised (Autoencoder, Isolation Forest) algorithms, reporting superior performance on the public Kaggle creditcard.csv dataset. The problem of fraud detection is of significant practical and academic importance, and the authors' effort to develop a high-performance solution is commendable.
However, the manuscript in its current form suffers from several major methodological and conceptual issues that must be addressed before it can be considered for indexing. The core concerns relate to the justification of the preprocessing pipeline, the rationale for the ensemble's architecture, and the practical significance of the model's contribution.
Major Concerns

Fundamental Flaw in Preprocessing Methodology: A significant methodological concern is the application of Principal Component Analysis (PCA) for dimensionality reduction. The creditcard.csv dataset's primary features (V1-V28) are already the result of a PCA transformation, a fact the authors acknowledge. Applying PCA again to these components is conceptually flawed, as it assumes the components are correlated in a way that allows for further linear dimensionality reduction, which is not guaranteed and highly unusual. This step demonstrates a misunderstanding of the dataset's nature and potentially distorts the data's inherent structure. The authors must either remove this step or provide a strong theoretical justification for this unconventional approach.
Unjustified Ensemble Architecture and Model Selection: The rationale for the specific composition of the XRAI model is unclear and seems arbitrary.
- Inclusion of a Poorly Performing Model: The authors' own results (Table 1) show that the Isolation Forest model yields extremely low precision (0.0192) and an F1-score (0.0376) for the fraud class, which the authors rightly identify as "impractical" and lacking "practical usefulness." Its inclusion in the final weighted ensemble is counterintuitive and requires justification. A clear explanation is needed as to why a model known to be highly noisy and generate excessive false positives contributes positively to the final ensemble.
- Redundancy of Supervised Models: The framework includes both XGBoost and Random Forest, which are methodologically similar tree-based ensemble methods. The manuscript does not explain the benefit of using both in the final model rather than selecting the single best-performing supervised algorithm. This adds unnecessary complexity without a clear, stated advantage.
Marginal Performance Gain and Novelty of Contribution:
- The manuscript claims to propose a "novel" framework. While the specific four-model combination might be new, the general concept of creating a hybrid model by combining supervised and unsupervised learning for fraud detection is well-established in the literature, as cited by the authors themselves (e.g., Carcillo et al., 2021).
- Furthermore, the performance gain of the complex XRAI model over its best individual component (XGBoost) is marginal. The F1-score for the fraud class improves from 0.9328 to 0.9407—a gain of less than one percentage point—while the recall remains identical. The authors should clarify the practical significance of this small improvement in light of the model's increased complexity, maintainability, and computational overhead. A simpler ensemble, perhaps combining only XGBoost and the Autoencoder, should be tested and discussed as a more parsimonious alternative.
Insufficient Motivation for Imbalance Handling Technique: The choice of BorderlineSMOTE for handling class imbalance is stated but not motivated. The authors should briefly explain why this specific technique was selected over other common methods (e.g., ADASYN, SMOTE-ENN, random over/under-sampling) and how it is particularly suited for this dataset and model architecture
Generalizability: The framework is not specific to fraud detection and is tested on a single, anonymized dataset. While this is a limitation of the study, the authors could strengthen the discussion by more clearly positioning the framework as a general anomaly detection pipeline and suggesting how it might be adapted with domain-specific features for other applications, and including performance metrics for other publicly available datasets.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Credit Card Fraud Detection, Time Series Forecasting, Anomaly Detection

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

[1] 1. Setiawan R, Tjahjono B, Firmansyah G, et al.: Fraud Detection in Credit Card Transactions Using HDBSCAN, UMAP and SMOTE Methods. International Journal of Science, Technology & Management. 2023; 4: 1333–1339. Publisher Full Text

[2] 2. Noviandy TR, Idroes GM, Maulana A, et al.: Credit Card Fraud Detection for Contemporary Financial Management Using XGBoost-Driven Machine Learning and Data Augmentation Techniques. Indatu J Manag Account. 2023; 1: 29–35. Publisher Full Text

[3] 3. Shimu Khatun M, Rabiul Alam B, Taslim M, et al.: Handling Class Imbalance in Credit Card Fraud Using Various Sampling Techniques. Am J Multidis Res Innov. 2022; 1: 160–168. Publisher Full Text

[4] 4. Naidoo K, Marivate V: Unsupervised Anomaly Detection of Healthcare Providers Using Generative Adversarial Networks.Hattingh M, Matthee M, Smuts H, et al., editors. Responsible Design, Implementation and Use of Information and Communication Technology. Cham: Springer International Publishing; 2020; vol. 12066. : pp. 419–430. Publisher Full Text

[5] 5. Peng H, Wang J: Unbalanced Data Processing and Machine Learning in Credit Card Fraud Detection.2022. Publisher Full Text

[6] 6. Gowda VT: Credit Card Fraud Detection using Supervised and Unsupervised Learning. Computer Science & Information Technology (CS & IT), AIRCC Publishing Corporation; 2021; 93–98. Publisher Full Text

[7] 7. Ganji VR, Chaparala A, Sajja R: Shuffled shepherd political optimization-based deep learning method for credit card fraud detection. Concurr. Comput. 2023; 35: e7666. Publisher Full Text

[8] 8. Jain A, Arora M, Mehra A, et al.: Anomaly Detection Algorithms in Financial Data. IJEAT. 2021; 10: 76–78. Publisher Full Text

[9] 9. Aslam F: Advancing Credit Card Fraud Detection: A Review of Machine Learning Algorithms and the Power of Light Gradient Boosting. AJCST. 2024. Publisher Full Text

[10] 10. Pitsane MY, Mogale H, Rensburg JJV: Improving Accuracy of Credit Card Fraud Detection Using Supervised Machine Learning Models and Dimension Reduction. ICONIC. 2022; 2022: 290–301. Publisher Full Text

[11] 11. Saad S, Nadher I, Hameed SM: Credit Card Fraud Detection Challenges and Solutions: A Review. Iraqi J. Sci. 2024; 2287–2303. Publisher Full Text

[12] 12. Zhang Y-F, Lu H-L, Lin H-F, et al.: The Optimized Anomaly Detection Models Based on an Approach of Dealing with Imbalanced Dataset for Credit Card Fraud Detection. Mob. Inf. Syst. 2022; 2022: 1–10. Publisher Full Text

[13] 13. Zheng J, Yang L, Xin D, et al.: The Credit Card Anti-fraud Detection Model in the Context of Dynamic Integration Selection Algorithm. FCIS. 2024; 6: 119–122. Publisher Full Text

[14] 14. Maheshwari VC, Osman NA, Aziz N: A Hybrid Approach Adopted for Credit Card Fraud Detection Based on Deep Neural Networks and Attention Mechanism. ARASET. 2023; 32: 315–331. Publisher Full Text

[15] 15. Berhane T, Melese T, Walelign A, et al.: A Hybrid Convolutional Neural Network and Support Vector Machine-Based Credit Card Fraud Detection Model. Math. Probl. Eng. 2023; 2023: 8134627. Publisher Full Text

[16] 16. Jiang S, Dong R, Wang J, et al.: Credit Card Fraud Detection Based on Unsupervised Attentional Anomaly Detection Network. Systems. 2023; 11: 305. Publisher Full Text

[17] 17. Alharbi A, Alshammari M, Okon OD, et al.: A Novel text2IMG Mechanism of Credit Card Fraud Detection: A Deep Learning Approach. Electronics. 2022; 11: 756. Publisher Full Text

[18] 18. Hajek P, Abedin MZ, Sivarajah U: Fraud Detection in Mobile Payment Systems using an XGBoost-based Framework. Inf. Syst. Front. 2023; 25: 1985–2003. PubMed Abstract | Publisher Full Text | Free Full Text

[19] 19. Liu C: Enhancing Credit Card Fraud Detection on Imbalanced Datasets. HBEM. 2023; 21: 765–773. Publisher Full Text

[20] 20. Airlangga G: Evaluating the Efficacy of Machine Learning Models in Credit Card Fraud Detection. CNAHPC. 2024; 6: 829–837. Publisher Full Text

[21] 21. Murat RK, Tursunmetova F, Nadirov NK: MULTI-CLASSIFIERS SYSTEM FOR CREDIT CARD FRAUD DETECTION. BTOUPhMath. 2023; 33–47. Publisher Full Text

[22] 22. Sujitha P, Vanitha R: Enhanced Technique for Credit Card Extortion Detection Using Extreme Gradient Boosting Algorithm. MEJAST. 2023; 06: 35–45. Publisher Full Text

[23] 23. Lok LK, Abdul Hameed V, Ehsan Rana M: Hybrid machine learning approach for anomaly detection. IJEECS. 2022; 27: 1016. Publisher Full Text

[24] 24. Debener J, Heinke V, Kriebel J: Detecting insurance fraud using supervised and unsupervised machine learning. J. Risk Insur. 2023; 90: 743–768. Publisher Full Text

[25] 25. Carcillo F, Le Borgne Y-A, Caelen O, et al.: Combining unsupervised and supervised learning in credit card fraud detection. Inf. Sci. 2021; 557: 317–331. Publisher Full Text

[26] 26. Nassif AB, Talib MA, Nasir Q, et al.: Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access. 2021; 9: 78658–78700. Publisher Full Text

[27] 27. Benedek B, Ciumas C, Nagy BZ: Automobile insurance fraud detection in the age of big data – a systematic and comprehensive literature review. JFRC. 2022; 30: 503–523. Publisher Full Text

[28] 28. Fraud Guard: A Comprehensive Comparative Analysis of Machine Learning Approaches to Enhance Credit Card Fraud Detection. JIEA. 2024. Publisher Full Text

[29] 29. Sulaiman SS, Nadher I, Hameed SM: Credit Card Fraud Detection Using Improved Deep Learning Models. CMC. 2024; 78: 1049–1069. Publisher Full Text

[30] 30. Lai G: Artificial Intelligence Techniques for Fraud Detection.2023. Publisher Full Text

[31] 31. Adelakun BO, Onwubuariri ER, Adeniran GA, et al.: Enhancing fraud detection in accounting through AI: Techniques and case studies. Financ. Account Res. J. 2024; 6: 978–999. Publisher Full Text

[32] 32. Esmaeili F, Cassie E, Nguyen HPT, et al.: Anomaly Detection for Sensor Signals Utilizing Deep Learning Autoencoder-Based Neural Networks. Bioengineering. 2023; 10: 405. PubMed Abstract | Publisher Full Text | Free Full Text

[33] 33. Park S, Adosoglou G, Pardalos PM: Interpreting Rate-Distortion of Variational Autoencoder and Using Model Uncertainty for Anomaly Detection.2020. Publisher Full Text

[34] 34. Fraser K, Homiller S, Mishra RK, et al.: Challenges for Unsupervised Anomaly Detection in Particle Physics.2021. Publisher Full Text

[35] 35. Kim Y-G, Park T-H: Anomaly Detection Using Autoencoder With Feature Vector Frequency Map. IEEE Access. 2021; 9: 73808–73817. Publisher Full Text

[36] 36. Zhu J, Jiang M, Liu Z: Fault Detection and Diagnosis in Industrial Processes with Variational Autoencoder: A Comprehensive Study. Sensors. 2021; 22: 227. PubMed Abstract | Publisher Full Text | Free Full Text

[37] 37. Ikeda C, Ouazzane K, Yu Q, et al.: New Feature Engineering Framework for Deep Learning in Financial Fraud Detection. IJACSA. 2021; 12. Publisher Full Text

[38] 38. Rosley N, Tong G-K, Ng K-H, et al.: Autoencoders with Reconstruction Error and Dimensionality Reduction for Credit Card Fraud Detection. Haw S-C, Sonai Muthu K, editors. Proceedings of the International Conference on Computer, Information Technology and Intelligent Computing (CITIC 2022). Dordrecht: Atlantis Press International BV; 2022; pp. 503–512. Publisher Full Text

[39] 39. Salekshahrezaee Z, Leevy JL, Khoshgoftaar TM: The effect of feature extraction and data sampling on credit card fraud detection. J. Big Data. 2023; 10: 6. Publisher Full Text

[40] 40. Lin T-H, Jiang J-R: Credit Card Fraud Detection with Autoencoder and Probabilistic Random Forest. Mathematics. 2021; 9: 2683. Publisher Full Text

[41] 41. Prabha DP, Priscilla CV: Probabilistic XGBoost Threshold Classification with Autoencoder for Credit Card Fraud Detection. IJRITCC. 2023; 11: 528–537. Publisher Full Text

[42] 42. Gomes C, Jin Z, Yang H: Insurance fraud detection with unsupervised deep learning. J. Risk Insur. 2021; 88: 591–624. Publisher Full Text

[43] 43. Bulut O, Gorgun G, He S: Unsupervised Anomaly Detection in Sequential Process Data: Insights From PIAAC Problem-Solving Tasks. Z. Psychol. 2024; 232: 74–94. Publisher Full Text

[44] 44. Mohamed Elmahalwy A, Mousa HM, Amin KM: New hybrid ensemble method for anomaly detection in data science. IJECE. 2023; 13: 3498. Publisher Full Text

[45] 45. Feng B, Zhang L: Optimizing the Isolation Forest Algorithm for Identifying Abnormal Behaviors of Students in Education Management Big Data. JAIT. 2023. Publisher Full Text

[46] 46. Research Scholar: Department of Computer Science, Karpagam Academy of Higher Education, Coimbatore, 641 021, Tamil Nadu, India, Prajesha TM, Veni S. An Efficient Outlier Detection Using Isolation Forest Based on Robust Scaling and Principal Component Analysis for the Prediction of Anxiety Disorder. IJST. 2023; 16: 2244–2251. Publisher Full Text

[47] 47. Fang N, Fang X, Lu K: Anomalous Behavior Detection Based on the Isolation Forest Model with Multiple Perspective Business Processes. Electronics. 2022; 11: 3640. Publisher Full Text

[48] 48. Hadi MU, Tashi QA, Qureshi R, et al.: A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage.2023. Publisher Full Text

[49] 49. Meduri K: Cybersecurity threats in banking: Unsupervised fraud detection analysis. Int. J. Sci. Res. Arch. 2024; 11: 915–925. Publisher Full Text

[50] 50. Liu J, Wang Y, Sun Y, et al.: Preparation and Optimization of Mesoporous SnO₂ Quantum Dot Thin Film Gas Sensors for H₂S Detection Using XGBoost Parameter Importance Analysis. Chemosensors. 2023; 11: 525. Publisher Full Text

[51] 51. Shi F, Lu S, Gu J, et al.: Modeling and Evaluation of the Permeate Flux in Forward Osmosis Process with Machine Learning. Ind. Eng. Chem. Res. 2022; 61: 18045–18056. Publisher Full Text

[52] 52. Wang X, Ding C, Chen T, et al.: Research on the Application of Bayesian-Optimized XGBoost in Minor Faults in Coalfields. Math. Probl. Eng. 2022; 2022: 1–13. Publisher Full Text

[53] 53. Nam SM, Peterson TA, Seo KY, et al.: Discovery of Depression-Associated Factors From a Nationwide Population-Based Survey: Epidemiological Study Using Machine Learning and Network Analysis. J. Med. Internet Res. 2021; 23: e27344. PubMed Abstract | Publisher Full Text | Free Full Text

[54] 54. Huang P, Yan H, Song Z, et al.: Combining autoencoder with clustering analysis for anomaly detection in radiotherapy plans. Quant. Imaging Med. Surg. 2023; 13: 2328–2338. PubMed Abstract | Publisher Full Text | Free Full Text

[55] 55. Guo M, Yuan Z, Janson B, et al.: Older Pedestrian Traffic Crashes Severity Analysis Based on an Emerging Machine Learning XGBoost. Sustainability. 2021; 13: 926. Publisher Full Text

[56] 56. Patel S, Singh G, Zarbiv S, et al.: Mortality Prediction Using SaO₂/FiO₂ Ratio Based on eICU Database Analysis. Crit. Care Res. Prac. 2021; 2021: 1–9. PubMed Abstract | Publisher Full Text | Free Full Text

[57] 57. Ru B, Kujawski S, Lee Afanador N, et al.: Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches (Preprint).2022. Publisher Full Text

[58] 58. Esmaeilzadeh S, Salajegheh N, Ziai A, et al.: Abuse and Fraud Detection in Streaming Services Using Heuristic-Aware Machine Learning.2022. Publisher Full Text

[59] 59. Dong Y, Chen K, Peng Y, et al.: Comparative Study on Supervised versus Semi-supervised Machine Learning for Anomaly Detection of In-vehicle CAN Network. 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). Macau, China: IEEE; 2022; pp. 2914–2919. Publisher Full Text

[60] 60. Bakumenko A, Elragal A: Detecting Anomalies in Financial Data Using Machine Learning Algorithms. Systems. 2022; 10: 130. Publisher Full Text

[61] 61. Aghaee M, Krau S, Tamer IM, et al.: Unsupervised Hybrid Models Integrating Deep Autoencoders and Process Controllers’ Models for Enhanced Process Monitoring and Fault Detection. Ind. Eng. Chem. Res. 2024; 63: 14748–14760. Publisher Full Text

[62] 62. Zhang Y, Huangfu Y, Ziada Y, et al.: A Hybrid Fault Detection Method for Hairpin Windings Integrating Physics Model and Machine Learning. IEEE Access. 2024; 12: 70392–70404. Publisher Full Text

[63] 63. Giroh H, Kumar V, Singh G: Improving the Performance of Hybrid Models Using Machine Learning and Optimization Techniques. Ijmst. 2023; 10: 3396–3409. Publisher Full Text

[64] 64. Albahlal B: Emerging Technology-Driven Hybrid Models for Preventing and Monitoring Infectious Diseases: A Comprehensive Review and Conceptual Framework. Diagnostics. 2023; 13: 3047. PubMed Abstract | Publisher Full Text | Free Full Text

[65] 65. Li N, Chiang F, Down DG, et al.: A decision integration strategy for short-term demand forecasting and ordering for red blood cell components.2020. Publisher Full Text

[66] 66. Liao W-W, Hsieh Y-W, Lee T-H, et al.: Machine learning predicts clinically significant health related quality of life improvement after sensorimotor rehabilitation interventions in chronic stroke. Sci. Rep. 2022; 12: 11235. PubMed Abstract | Publisher Full Text | Free Full Text

[67] 67. Ito G, Yada S, Wakamiya S, et al.: Predictive Model for Extended-Spectrum β-Lactamase–Producing Bacterial Infections Using Natural Language Processing Technique and Open Data in Intensive Care Unit Environment: Retrospective Observational Study. JMIR Form Res. 2024; 8: e54044. PubMed Abstract | Publisher Full Text | Free Full Text

[68] 68. Tan M, Ma W, Sun Y, et al.: Prediction of the Growth Rate of Early-Stage Lung Adenocarcinoma by Radiomics. Front. Oncol. 2021; 11: 658138. PubMed Abstract | Publisher Full Text | Free Full Text

[69] 69. Sun F, Han B, Wu F, et al.: Development And Validation Of Models To Predict Cesarean Delivery Among Low-Risk Nulliparous Women At Term: A Retrospective Study In China.2020. Publisher Full Text

[70] 70. Tu K-C, Tau ENT, Chen N-C, et al.: Machine Learning Algorithm Predicts Mortality Risk in Intensive Care Unit for Patients with Traumatic Brain Injury. Diagnostics. 2023; 13: 3016. PubMed Abstract | Publisher Full Text | Free Full Text

[71] 71. Manda VT, Kondapalli D, Malla AS, et al.: Imbalanced Data Challenges and Their Resolution to Improve Fraud Detection in Credit Card Transactions.2024. Publisher Full Text

[72] 72. Esenogho E, Mienye ID, Swart TG, et al.: A Neural Network Ensemble With Feature Engineering for Improved Credit Card Fraud Detection. IEEE Access. 2022; 10: 16400–16407. Publisher Full Text

[73] 73. Sudhakar M, Kaliyamurthie KP: A Novel Machine learning Algorithms used to Detect Credit Card Fraud Transactions. IJRITCC. 2023; 11: 163–168. Publisher Full Text

[74] 74. Ileberi E, Sun Y, Wang Z: Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost. IEEE Access. 2021; 9: 165286–165294. Publisher Full Text

[75] 75. Assessing the feasibility of machine learning-based modelling and prediction of credit fraud outcomes using hyperparameter tuning. ACSS. 2023; 7. Publisher Full Text

[76] 76. Trisanto D, Rismawati N, Mulya M, et al.: Effectiveness Undersampling Method and Feature Reduction in Credit Card Fraud Detection. IJIES. 2020; 13: 173–181. Publisher Full Text

[77] 77. Mohammed RA, A. Bazzi Y.: Implement an Intrusion Detection System Utilizing Machine Learning and Principal Component Analysis. IRJIET. 2024; 08: 01–07. Publisher Full Text

[78] 78. Ezekiel S, Alshehri AA, Pearlstein L, et al.: IoT Anomaly Detection using Multivariate. IJITEE. 2020; 9: 1662–9. Publisher Full Text

[79] 79. Zhu M, Zhang Y, Gong Y, et al.: Enhancing Credit Card Fraud Detection: A Neural Network and SMOTE Integrated Approach. JTPES. 2024; 4: 23–30. Publisher Full Text

[80] 80. Ding L, Liu L, Wang Y, et al.: An AutoEncoder enhanced light gradient boosting machine method for credit card fraud detection. PeerJ Comput. Sci. 2024; 10: e2323. PubMed Abstract | Publisher Full Text | Free Full Text

[81] 81. Du H, Lv L, Wang H, et al.: A novel method for detecting credit card fraud problems. PLoS ONE. 2024; 19: e0294537. PubMed Abstract | Publisher Full Text | Free Full Text

[82] 82. Alshameri F, Xia R: An Evaluation of Variational Autoencoder in Credit Card Anomaly Detection. Big Data Min. Anal. 2024; 7: 718–729. Publisher Full Text

[83] 83. Wu T, Wang Y: Locally Interpretable One-Class Anomaly Detection for Credit Card Fraud Detection.2022. Publisher Full Text

[84] 84. Ishak NA, Ng K-H, Tong G-K, et al.: Mitigating unbalanced and overlapped classes in credit card fraud data with enhanced stacking classifiers system. F1000Res. 2022; 11: 71. Publisher Full Text

[85] 85. Benchaji I, Douzi S, El Ouahidi B, et al.: Enhanced credit card fraud detection based on attention mechanism and LSTM deep model. J. Big Data. 2021; 8: 151. Publisher Full Text

[86] 86. Shanaa M, Abdallah S: XRAI: A Hybrid Anomaly Detection Framework for Credit Card Fraud Detection.2025. Publisher Full Text

A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised Learning for Credit Card Fraud Detection

Abstract

Background

Methods

Results

Conclusions

Keywords

Introduction

Related work

Unsupervised methods

Autoencoders

Isolation forest

Supervised methods

XGBoost

Random forest

Hybrid integration

Evaluation metrics

Methods

In dataset description

Data preprocessing and class imbalance

Results and Discussion

Table 1. Performance results for XGBoost, RandomForest, Autoencoder, and IsolationForest.

Table 2. Performance comparison between XRAI and other models.

Performance on the majority class (Normal - Class 0)

Performance on the minority class (Fraud - Class 1)

Figure 1. Receiver Operating Characteristic (ROC) curve for the proposed model XRAI.

Comparison to other similar studies

Table 3. Comparative performance of proposed model vs. existing studies.

Real-world applications of the model in financial fraud detection

Challenges in implementation and model limitations

Conclusion and future work

Ethical considerations

Contributions

Data availability

Underlying data

Extended data

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated