ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Advances in Leukemia detection and classification: A Systematic review of AI and image processing techniques

[version 1; peer review: 1 approved with reservations]
PUBLISHED 19 Dec 2024
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Background

Leukemia, a heterogeneous group of blood cancers, poses significant challenges to global health due to its complexity, diverse risk factors, and variable outcomes. Accurate and early diagnosis is critical but remains a significant hurdle, particularly in low-resource settings. Recent advancements in artificial intelligence (AI) and image processing offer transformative solutions to improve leukemia detection and classification, addressing limitations in traditional diagnostic methods.

Methods

This study systematically reviewed over 25,000 scientific articles sourced from Scopus, employing a PRISMA-guided methodology to ensure a comprehensive and rigorous analysis. The analysis focused on the application of AI, particularly convolutional neural networks (CNNs), in diagnosing four primary leukemia types: acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), chronic lymphocytic leukemia (CLL), and chronic myeloid leukemia (CML). It also examined global epidemiological trends, risk factors, and disparities in healthcare access.

Results

Key risk factors for leukemia include genetic syndromes like Down syndrome, environmental exposures to toxins such as benzene, ionizing radiation, and viral infections. Socio-economic disparities and geographical differences significantly impact leukemia incidence and outcomes. AI-based models, especially CNNs, demonstrated enhanced accuracy, speed, and reliability in diagnosing leukemia compared to traditional methods. However, challenges such as data variability, model scalability, and unequal access to AI technologies continue to hinder widespread adoption.

Conclusion

AI and image processing technologies hold immense potential to revolutionize leukemia diagnostics by enabling early detection, precise classification, and personalized treatment planning. Addressing critical challenges, including data standardization and equitable access to these technologies, will be vital for global application. This review highlights the transformative role of AI in improving leukemia outcomes and advancing precision medicine worldwide.

Keywords

Leukemia Classification, Artificial Intelligence in Medical Diagnostics, Machine Learning Algorithms, Hematologic Malignancies, Deep Learning Models, Convolutional Neural Networks, Cancer Epidemiology, Diagnostic Accuracy.

1. Introduction

Leukemia is a complex hematological malignancy defined by the abnormal proliferation of white blood cells, which disrupts the body’s blood-forming tissues and interferes with normal immune functions. This group of cancers encompasses both acute and chronic forms affecting either lymphoid or myeloid cells, resulting in various subtypes with distinct clinical features and progression rates.1,2 Key characteristics of leukemia include rapid, uncontrolled cell growth that impairs immune functions, leads to anemia, and creates other systemic impacts. Leukemia’s morphological diversity complicates diagnosis and subtype differentiation, as highlighted by Mirmohammadi et al.3 and Safuan et al.4 This diversity in presentation underscores the need for accurate classification, which is essential for effective prognosis and treatment planning.

Due to high mortality rates and diagnostic challenges associated with leukemia, accurate classification has become increasingly critical. Advances in artificial intelligence (AI) are transforming leukemia detection and classification in medical imaging. Studies demonstrate that machine learning and deep learning models, including convolutional neural networks (CNNs), enhance diagnostic precision and enable faster subtype identification, promising substantial potential for clinical applications.5,6 This study aims to address the specific challenges in leukemia classification by developing an AI-driven approach to accurately detect and classify leukemia subtypes.

This paper provides an in depth exploration of leukemia detection and classification, beginning with an analysis of global epidemiology to outline incidence, prevalence, and risk factors across regions and demographics, thereby highlighting disparities and risk factors. The second section delves into classification, focusing on primary leukemia subtypes: ALL, AML, CLL, and CML and examining morphological and genetic characteristics important for diagnosis. In the third section, we describe our methodology, which details our data collection, processing, and analytical techniques used to perform a robust bibliometric analysis. This bibliometric analysis identifies publication trends, key contributors, and influential studies, offering insights into the current landscape of leukemia research. Lastly, we present a comparison of state-of-the-art AI applications for leukemia, reviewing machine learning and deep learning techniques employed in diagnostics. Here, we discuss each approach’s limitations, emphasizing challenges related to data quality, generalizability, and integration into clinical workflows.

2. Global epidemiology and risk factors of Leukemia

The epidemiology of leukemia has been extensively analyzed through various studies over the years, revealing significant trends, risk factors, and geographical disparities. Bouchbika et al. conducted a study in Greater Casablanca from 2005 to 2007, using data from the Casablanca cancer registry. They found age-standardized rates (ASR) of 2.7 per 100,000 for men and 2.0 for women, with incidence peaks in children aged 59 and adults over 65. Comparatively, Morocco’s leukemia rates were lower than those in Tunisia and Algeria, and significantly lower than Western countries. Childhood leukemia (0-14 years) constituted 10.9% of all childhood cancers with an ASR of 1.4.7 Expanding the analysis globally, Miranda- Filho et al. used data from 290 cancer registries in 68 countries and national estimates from 2012. Their study found the highest ASRs in Australia, New Zealand (11.3 per 100,000 in males, 7.2 in females), Northern America (10.5 in males, 7.2 in females), and Western Europe (9.6 in males, 6.0 in females). Conversely, Western Africa had the lowest rates (1.4 in males, 1.2 in females). Acute lymphoblastic leukemia was predominant in children, while chronic lymphocytic leukemia was more common in adults in European and North American countries. Key risk factors identified included genetic syndromes such as Down syndrome, environmental exposures like benzene and ionizing radiation, and viral infections.8 In Arab countries, Al-Muftah and Al-Ejeh highlighted higher leukemia incidence rates among younger populations, particularly in the Arabian Gulf and Levant regions. Utilizing GLOBOCAN data from 2003 to 2016, the study noted elevated age-standardized incidence rates (ASIR) in children under 15 years compared to global averages. For adults aged 35 and older, the Levant region also showed higher rates than the global norm. These findings emphasized the need for targeted genetic epidemiology studies and improved clinical management strategies.9 Further, ElBakali, Abdellatif, and Smiri examined 8,851 cancer cases in the Souss Massa region of Morocco from 2014 to 2019. Hematological cancers, including leukemia, accounted for 14% of cases, with a female incidence rate of 47.74 per 100,000 and a male incidence rate of 45.71 per 100,000. The average age of leukemia patients was 47.81 years. Their analysis underscored the importance of reinforcing prevention efforts and developing specific screening strategies for hematological cancers.10 Baeker Bispo, Pinheiro, and Kobetz provided a comprehensive overview of leukemia and lymphoma in the US and globally, reporting higher incidence rates in developed regions and racial disparities in survival rates. They noted that leukemia is the 15th most common cancer worldwide, accounting for 437,033 cases and 309,006 deaths in 2018. The study highlighted genetic abnormalities, immunosuppression, ionizing radiation, carcinogenic chemicals, and oncogenic viruses as key risk factors. They emphasized the need for equitable access to diagnostic and treatment services to address these disparities.11 Smith Torres-Roman et al. analyzed leukemia motality trends in children from 15 Latin American countries between 2000 and 2017. They found the highest mortality rates in Venezuela, Ecuador, Nicaragua, Mexico, and Peru, with significant upward trends in Nicaragua and Peru. Conversely, Puerto Rico saw substantial declines. The study predicted that by 2030, leukemia mortality would increase in several countries, emphasizing the need for interventions to reduce inequalities and ensure universal healthcare coverage.12 Huang et al. conducted a global analysis using data from GLOBOCAN, CI5, WHO, NORDCAN, and SEER databases. They found that leukemia accounted for 2.5% of new cancer cases and 3.1% of cancer deaths, with an age-standardized incidence rate (ASIR) of 5.4 and mortality rate of 3.3 per 100,000 people. The study associated higher incidence and mortality rates with the Human Development Index, GDP per capita, and lifestyle factors such as smoking and obesity. They recommended lifestyle modifications and further research to understand these trends.13 Finally, Zhang et al. examined the global burden of hematologic malignancies from 1990 to 2019 using data from the Global Burden of Disease (GBD) study. They found a declining trend in leukemia incidence (ASIR of 8.22 per 100,000) and mortality (ASDR of 4.26 per 100,000), with significant decreases in age-standardized rates.

However, regions like Central Europe, Western Europe, and East Asia experienced increases. The study highlighted the influence of socio-economic factors, occupational exposures, and high BMI on leukemia burden, emphasizing the need for targeted prevention strategies and improved healthcare access.14 these studies collectively highlight the global variations in leukemia incidence and mortality, influenced by genetic, environmental, and socio-economic factors. While some regions have seen declining trends, others continue to face significant challenges, underscoring the need for comprehensive prevention and management strategies tailored to specific regional needs.

3. Classification of Leukemia

Leukemia is primarily classified into two types: acute, which progresses rapidly, and chronic, which develops more slowly. The four major forms of leukemia include Acute Lymphoblastic Leukemia (ALL), Acute Myeloid Leukemia (AML), Chronic Lymphocytic Leukemia (CLL), and Chronic Myeloid Leukemia (CML)

3.1 Acute Lymphoblastic Leukemia (ALL)

According to FAB classification, Acute Lymphoblastic Leukemia (ALL) is the type of leukemia that occurs most frequently. It is further classified into three morphological subtypes: L1, L2 and L3. L1 cells have small size with coarse granular appearance they are composed of uniform groups of cells. L2 cells have bigger sizes as compared to L1 cells but also show variation in the appearance of the nucleus. L3 are bigger than L1 cells too but they have more vacuoles in them, with homogeneity in their nuclei.15 The defining feature of ALL is the uncontrolled growth of atypical lymphocytes, or lymphoblast cells, which can be morphologically identified from other cell types by their irregular shape, small clefts within a body part and round objects in a nucleus consisting from one to many nuclear lobes.16 In fact, according to some, early diagnosis, especially in children, plays a critical role in the treatment process.

Diagnosis of ALL, however, is not an easy task because Leukemia cells L1, L2, L3, are morphologically very similar to those of normal and reactive lymphocytes and atypical lymphocytes, which are nonmalignant cells or noncancerous cells.

Figure 1 illustrates the microscopic appearance of Acute Lymphoblastic Leukemia (ALL) cells, while Table 1 presents a comparative analysis of the morphological and cytological characteristics of different ALL subtype.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure1.gif

Figure 1. Microscopic image of Acute Lymphoblastic Leukemia.

Table 1. Comparative analysis of ALL subtypes.

SubtypeMorphological characteristicsCytological characteristics
L1Small cells, uniform nuclei, rough chromatinHigh nuclear-to-cytoplasmic ratio, scanty cytoplasm
L2Larger cells, nuclear heterogeneity, irregular nucleiMore abundant cytoplasm, prominent nucleoli, granules
L3Large cells, homogeneous nuclei, prominent vacuolesBasophilic cytoplasm with prominent vacuoles

3.2 Acute Myeloid Leukemia (AML)

Acute myeloid leukemia (AML) is a type of cancer characterized by rapid growth, abnormal white blood cells that accumulate in the bone marrow, and inhibition of the production of normal blood cells. Using the French American-British system, AML is classified into several subtypes according to the morphology and immunophenotype of the leukemic cells. Each of these types has distinct features that assist in its diagnosis and the formulation of a line of treatment.17

Figure 2 shows the microscopic image of Acute Myeloid Leukemia (AML) cells, highlighting their distinct morphological features.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure2.gif

Figure 2. Microscopic image of Acute Myeloid Leukemia.

AML arises from the transformation of myeloid precursor cells, leading to the uncontrolled proliferation of immature white blood cells, known as myeloblasts. These blasts fail to differentiate into fully functional blood cells, resulting in a deficiency of red blood cells, platelets, and mature white blood cells. This disruption in normal blood cell production causes symptoms such as anemia, infection, and bleeding tendencies. The classification of AML subtypes is essentiel, as it guides the clinical management of the disease. For instance, acute promyelocytic leukemia (APL), classified as M3, is notable for its sensitivity to all-trans retinoic acid (ATRA) therapy, which can induce remission. Other subtypes require different treatment approaches based on their unique characteristics, making accurate classification essential for effective therapy. Subtypes, including M0 (acute myeloblastic leukemia with minimal differentiation), M1 (acute myeloblastic leukemia without maturation), M2 (acute myeloblastic leukemia with maturation), M3 (acute promyelocytic leukemia and its subtypes), M4 (acute myelomonocytic leukemia), M5 (acute monocytic leukemia), M6 (acute erythroid leukemia), and M7 (acute megakaryocytic leukemia), with their characteristics, are presented in the Table 2.

Table 2. Comparative analysis of AML subtypes.

SubtypeBlast sizeNucleus shapeChromatinNucleoliCytoplasm
M0MediumRoundedFineProminentBasophilic, non-granular
M1MediumRoundedImmature, dispersedProminentBasophilic, may contain fine azurophilic granules or isolated Auer rods
M2Small to mediumRounded, sometimes in a cornerDispersed, immatureOne or moreBasophilic, may contain primary azurophilic granules or isolated Auer rods
M3MediumMonocytic, irregular or bilobed with deep cleftProliferated azurophilic granulationScarcely basophilic due to intense granulation
M4LargeRounded or kidney-shaped Dispersed, immature1-3Moderately large, intensely basophilic, may contain Auer rods
M5LargeRounded or kidney-shaped Dispersed, immature1-3Intensely basophilic, more highly granulated than monoblasts, may contain vacuoles
M6LargeRounded, kidney-shaped or irregularVariable basophiliaProminentNon-granular basophilic, pseudopods or granulations
M7Highly immature, polymorphicEccentricReticulated1-3Non-granular, basophilic, platelet-like with pseudopods

3.3 Chronic Lymphocytic Leukemia (CLL)

Chronic lymphocytic leukemia (CLL) is primarily characterized by the proliferation of small, mature B-lymphocytes in the peripheral blood, bone marrow, and secondary lymphoid tissues. Morphologically, CLL lymphocytes are generally small with scant cytoplasm and dense nuclear chromatin, often referred to as” smudge cells” due to their fragility. However, variations in lymphocyte morphology have been identified, leading to the classification of CLL into distinct morphological sub- groups. According to Ghia et al., the typical lymphocytes in CLL are small and homogeneous, exhibiting fragile cell membranes which result in frequent cell rupture during blood smear preparation. These cells are predominantly characterized by a condensed nuclear chromatin and narrow rims of pale cytoplasm.18 In contrast, Peterson et al. identified three morphologic groups based on lymphocyte size and structure: Group I consists of small to medium-sized lymphocytes with coarsely clumped chromatin, Group II features larger lymphocytes with abundant cytoplasm resembling reactive lymphocytes, and Group III includes a heterogeneous mix of lymphocytes from both Groups I and II.

Notably, the identification of these morphologic subtypes helps refine the diagnosis and risk stratification of patients, guiding more personalized treatment strategies.

This classification is significant as it correlates with disease progression and patient prognosis, with larger, reactive-like lymphocytes associated with longer survival times.19

Figure 3 illustrates the microscopic appearance of Chronic Lymphocytic Leukemia (CLL) cells, highlighting their distinct characteristics.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure3.gif

Figure 3. Microscopic image of Chronic Lymphocytic Leukemia.

Table 3 presents a detailed comparison of CLL subtypes, emphasizing their unique morphological and cytological features.

Table 3. Comparative analysis of CLL subtypes.

GroupLymphocytesCytoplasm Median survivalFrequencyKey characteristics
ISmall to medium-sized Narrow26+ months42%Uniform appearance, occasional granulation, nucleoli
IILargeAbundant46+ months17%Reactive-like, radial basophilia, frequent vacuoles
IIIMix of small and largeFeatures of Groups I and II50+ months31%Heterogeneous population, cell type transitions

3.4 Chronic Myeloid Leukemia (CML)

CML represents a clonal hematopoietic stem cell disorder characterized by uncontrolled myeloid cell proliferation. CML usually develops in three distinct phases: the chronic phase, the accelerated phase, and finally, blast crisis. The chronic phase is characterized by an insidious onset of symptoms, and it is the longest of the three phases, it is the one during which most patients are stable, with relatively manageable symptoms. Advanced phase: An advance in the progress of the disease, with increasing symptoms and resistance to treatment, thus more ag gressive clinically. The accelerated phase is a progression of the disease, bringing with it more severe symptoms and resistance to treatment, making it clinically more aggressive. The blast crisis phase, however, is the most dramatic phase, with rapid proliferation of the very immature, similar-to-acute-leukemia cells, often refractory to treatment and carries a poor prognosis. The hallmark of CML is represented by the Ph chromosome, resulting from the reciprocal translocation between chromosomes 9 and 22, forming the BCR-ABL fusion gene. This fusion gene encodes for a constitutively active tyrosine kinase acting benignly to drive the leukemogenesis in CML.20

The CML cells can, therefore, be broadly divided into the developmental stages of myeloblasts, promyelocytes, myelocytes, metamyelocytes, bands, and neutrophils.

It has typical morphological features characteristic of its own variety, extremely vital in an accurate diagnosis and categorization. Myeloblasts are the most primitive cells, characterized by large nuclei with fine-type chromatin and basophilic cytoplasm.

During the development of cells into Promyelocytes, Myelocytes, Metamyelocytes, Bands, and finally Neutrophils, there are many conspicuous changes in nuclear shape, chromatin texture, color of cytoplasm, and presence of granules.

It is through the realization of these different characteristics that pathologists are able to reach a proper diagnosis with relation to CML and monitor the progression of the disease.21

Figure 4 illustrates the microscopic image of Chronic Myeloid Leukemia (CML) cells.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure4.gif

Figure 4. Microscopic image of Chronic Myeloid Leukemia.

The Table 4 provides a comparative analysis of the various subtypes of Chronic Myeloid Leukemia (CML), focusing on their specific cellular characteristics. It outlines differences in nucleus lobes, chromatin texture, nucleus shape, cytoplasm color, granules, and nucleoli for each cell type, ranging from immature myeloblasts to mature neutrophils.

Table 4. Comparative analysis of CML subtypes.

Cell type Nucleus lobes Chromatin texture Nucleus shape Cytoplasm color Granules Nucleoli
MyeloblastSingleFineRound or ovalBlueNone or few primaryOne or more
PromyelocyteSingleCoarser than MyeloblastRound or ovalLess basophilicPrimaryOne or more
MyelocyteSingleCoarser than PromyelocyteRound to ovalAcidophilicSecondaryNone
MetamyelocyteSingleClumpedIndentedPinkSecondaryNone
BandSingleCoarse and clumpyBand-shaped PinkSecondaryNone
NeutrophilMultipleClumpedMulti-lobed PinkFineNone

4. Overview of AI in medical diagnostics

world of medical diagnostics, offering a powerful set of tools to analyze and interpret vast amounts of medical data.19 Subfields of AI, like machine learning (ML) and deep learning (DL), are particularly adept at finding patterns and making predictions based on these large datasets. In healthcare, this translates to several key benefits: improved diagnostic accuracy, better prediction of patient outcomes, and the potential for personalized treatment plans. Machine learning algorithms, such as support vector machines (SVMs) and k-nearest neighbors (KNNs), have proven successful in diagnosing diseases. These algorithms can analyze medical images and patient data to identify patterns indicative of specific conditions.22 Deep learning, particularly convolutional neural networks (CNNs), has become a game-changer in medical imaging. CNNs are exceptionally skilled at detecting and classifying various diseases, including cancers and cardiovascular conditions.23,24 For example, CNNs have achieved high accuracy in detecting diabetic retinopathy, show- casing their potential to revolutionize ophthalmology.25 Additionally, AI-powered systems are being integrated into radiology workflows, acting as assistants to radiologists. These systems can identify abnormalities in X-rays and MRI scans, leading to fewer diagnostic errors and improved efficiency. These advancements highlight the immense potential of AI to transform healthcare delivery and ultimately improve patient care.

5. Methodology

In this study, we employed a comprehensive bibliometric analysis approach to explore various aspects of leukemia, leveraging data from Scopus. Our research spanned multiple queries tailored to specific sections of the study, including risk factors, classification, and the role of artificial intelligence in leukemia diagnosis and treatment. Initially, we conducted broad searches using relevant keywords in article titles, abstracts, and keywords, resulting in a cumulative total of over 25000 articles across different sections. For the section on risk factors, we identified and screened 12,218 articles, focusing on publications from 2019 to 2024. We further refined this selection based on relevance, leading to a final set of 2116 articles. The classification section included data from 12,514 articles, filtered similarly to ensure the inclusion of 2189 pertinent studies.

In our analysis of AI’s role in leukemia, we extracted data from 368 articles, highlighting key terms such as “artificial intelligence,” “machine learning,” and “deep learning.” Articles were included based on their focus on the application of AI in medical diagnostics and treatment.

We meticulously recorded detailed bibliographic information for each selected article, including authors, titles, journal names, publication dates, and citation counts. To visualize and analyze citation networks, we utilized tools like VOSviewer, which enabled us to identify central themes and emerging trends within the field. The study’s procedural flow and inclusion criteria are detailed in the PRISMA diagram, illustrating the rigorous selection and screening process undertaken to ensure a comprehensive review of the literature.

Figure 5 illustrates the PRISMA research methodology used in this study.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure5.gif

Figure 5. PRISMA research methodology.

5.1 Publications trend

Figure 6 depicts the publication trend over the years 2019 to 2024, showing the number of articles published per year and the corresponding total citations.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure6.gif

Figure 6. Publications trend.

The graph presents the trend of publications and total citations per year related to leukemia research from 2019 to 2024. The blue line indicates the number of articles published each year, while the orange line represents the total number of citations these articles received per year.

The data reveal an initial increase in both the number of publications and citations, peaking in 2021. This suggests a heightened interest and active engagement in leukemia research during this period. The rise in publications could be attributed to advancements in research methodologies, increased funding, or a growing awareness of leukemia-related issues.

The decrease in citations from 2022 can be attributed to the time lag that often occurs in academic publishing, where newly published articles take time to be discovered, read, and subsequently cited by other researchers. This lag is a common phenomenon, as it typically takes a few years for new research to become well-known and integrated into the broader academic discourse.

5.2 Risk factors

The keyword analysis of leukemia risk factors was conducted using data extracted from Scopus, encompassing a total of 12,218 articles. The query “risk” AND “factors” AND “leukemia” was employed to gather relevant literature between 2019 and 2024. The analysis visualized the associations and prevalence of key terms, highlighting the most frequently discussed topics in leukemia research. Central themes include genetic factors, environmental exposures, and demographic variables, with significant nodes like “human,” “acute myeloid leukemia,” “genetics,” and “risk factor” illustrating the interconnected nature of these elements. The network graph provides a comprehensive overview of the critical areas of focus within the field, offering valuable insights into the prevailing concerns and emerging trends in leukemia research. This foundational understanding informs the subsequent exploration of epidemiological patterns and risk factors as reported in various studies.

Figure 7 presents a keyword co-occurrence network highlighting the relationships between terms associated with leukemia and its risk factors.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure7.gif

Figure 7. Keyword co-occurrence with risk factors.

5.3 Types of Leukemia

The classification of leukemia was further explored through a comprehensive keyword analysis using data extracted from Scopus. This analysis encompassed 12,514 articles, focusing on keywords with more than 160 occurrences. The visualization provided a detailed mapping of the terminologies and concepts frequently associated with various types of leukemia. Prominent nodes in the network include terms like “acute myeloid leukemia,” “genetics,” “acute lymphoblastic leukemia,” and “treatment response,” among others. The graph illustrates the relationships and common themes in the literature, highlighting the intricate connections between different leukemia types, their genetic markers, treatment modalities, and related biological processes.

Figure 8 visualizes the keyword co-occurrence network, highlighting key terms and their relationships in the classification of leukemia.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure8.gif

Figure 8. Keyword co-occurrence with classification of leukemia.

5.4 The integration of AI in Leukemia research

The integration of artificial intelligence (AI) in leukemia research has been gaining significant traction, as evidenced by a comprehensive keyword analysis conducted on 368 articles from Scopus. This analysis focused on keywords with more than 20 occurrences, highlighting the prevalent themes and emerging trends within this interdisciplinary field. Key terms such as “artificial intelligence,” “machine learning,” “deep learning,” and specific medical terms like “chronic myeloid leukemia” and “diagnosis” emerged as central nodes in the network graph. The visualization showcases the increasing application of AI techniques in the medical domain, particularly in enhancing diagnostic accuracy, predicting patient outcomes, and enabling personalized treatment approaches. The centrality of AI-related keywords underscores a growing emphasis on advanced computational methods in contemporary leukemia studies.

Figure 9 illustrates the keyword co-occurrence network focusing on AI in leukemia research.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure9.gif

Figure 9. Keyword co-occurence with AI.

5.5 Countries

Beyond keyword analysis, we investigated collaboration pat- terns among countries to gain a broader perspective on the global landscape of leukemia research. VOS viewer allows us to analyze” citations” as the unit of analysis, with” countries” as the collaborating entities. This analysis focused on prominent collaborations, excluding documents with a vast number of co-authoring countries. The resulting map depicts the citation impact and collaborative networks of various countries from 2016 to 2022. The visualization reveals the United States, India, and the United Kingdom as central hubs, signifying their leadership and significant contributions to leukemia research. India, in particular, displays a notable network of connections with countries like Pakistan, South Korea, and Saudi Arabia. This highlights India’s active role in fostering collaborative re- search endeavors. The color gradient, transitioning from blue to yellow, represents the publication timeline. This trend visually depicts the growing intensity of international collaborations and citations in leukemia research over the past few years. This underscores the increasingly global nature of the field and emphasizes the critical role of multinational cooperation in driving advancements in leukemia research.

Figure 10 presents a citation analysis of AI in leukemia research based on countries, highlighting the collaborative networks and the leading contributions from nations.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure10.gif

Figure 10. Citation analysis AI in leukemia based on countries.

The bibliometric analysis revealed a clear trend: artificial intelligence (AI) is rapidly transforming leukemia research. AI offers significant advantages, including improved diagnostic accuracy, personalized treatment plans, and enhanced prediction capabilities.

These powerful techniques empower researchers and processing algorithms being applied. This in-depth examination aims to provide a clearer picture of the current landscape and identify key areas where innovation and improvement can revolutionize leukemia diagnostics and treatment.

6. Advancements in Leukemia detection using AI and image processing techniques

In the literature, one promising approach for leukemia detection relies on microscopic image processing through a multi-step pipeline to analyze blood smear images. The process begins with image acquisition, where high-quality blood smear images are obtained. These images then undergo preprocessing, including resizing, RGB conversion, contrast adjustment, and noise reduction, ensuring the data is optimized for further analysis. In the segmentation stage, individual cells, particularly white blood cells, are isolated from the background and neighboring cells to focus on relevant regions. The next step is feature extraction, where important characteristics like shape, size, color, texture, and morphology are identified from the segmented cells. These features are then analyzed in the leukemia detection phase, where machine learning or deep learning models process the data to identify potential abnormalities. Finally, in the classification stage, each cell is categorized as either normal or cancerous, supporting the diagnosis and treatment planning for leukemia.26,27 This structured workflow enhances accuracy by refining the data at each step, ensuring reliable detection and classification.

The Figure 11 illustrates the process as described in the literature.

49c40020-dbbf-4bc0-b20a-470c2cc81837_figure11.gif

Figure 11. Leukemia detection workflow.

Building on this workflow, the literature offers various innovative methodologies that further enhance the accuracy and efficiency of leukemia detection through advanced computational techniques. Recent studies demonstrate the potential of integrating machine learning, deep learning, and feature extraction methods to improve diagnostic outcomes.

Warnat-Herresthal et al. (2020) conducted a comprehensive study using 12,029 samples from 105 different studies to develop robust and scalable classifiers for Acute Myeloid Leukemia (AML). Their methodology involved rigorous preprocessing steps, including RMA normalization for microarray data and DESeq2 normalization for RNA-seq data, followed by trimming to common genes. The study employed L1-regularized logistic regression (lasso) and compared its performance with other machine learning methods such as k-nearest neighbors, linear SVM, linear discriminant analysis, random forests, and deep neural networks. The classifiers demonstrated high accuracy, sensitivity, and specificity, and were evaluated through extensive cross-study and cross-platform analyses, ensuring their robustness and generalizability. This work underscores the potential of machine learning combined with transcriptomics to enhance AML diagnostics, particularly in settings where traditional hematological expertise is limited. Despite the promising results, the study acknowledged some limitations. Performance variability due to cross-study and cross-platform differences was noted, highlighting the need for large, diverse training datasets to improve the model’s generalizability. Furthermore, the study identified the high reliance on the quality and diversity of the dataset as a potential weakness. Future directions for this research include integrating transcriptomic-based machine learning into clinical workflows for AML diagnostics, which could offer a scalable and cost-effective alternative to traditional diagnostic methods. The authors propose further refining these models and extending their application to other hematologic and non-hematologic diseases, potentially revolutionizing medical diagnostics. Additionally, future work aims to include prospective studies to assess the diagnostic utility of these models in real-world clinical settings, addressing cross-study and cross-platform variations, and exploring simple data transformations to enhance cross-platform generalizability.28

Loey et al. (2020) proposed two automated classification models utilizing transfer learning with the pre-trained deep convolutional neural network, AlexNet, to classify leukemia in blood microscopic images. The first model involves pre-processing the images, extracting features using a pre-trained AlexNet, and classifying these features using various classifiers such as SVM, linear discriminants (LD), decision trees (DT), and k-nearest neighbors (K-NN). The second model fine-tunes AlexNet for both feature extraction and classification. Experiments on a dataset of 2820 images show that the second model achieves a perfect classification accuracy of 100%, outperforming the first model. The study emphasizes the advantages of using transfer learning to overcome the challenges of designing and training deep neural networks from scratch. The pre-processing step includes converting images to RGB, resizing them to 227×227 pixels, and applying data augmentation techniques like translation, reflection, and rotation. For feature extraction, AlexNet’s architecture, which includes five convolutional layers, three fully connected layers, and max-pooling layers, is utilized. The classifiers tested in the first model include decision trees with a max-split of 20, linear discriminants, SVM with various kernel functions (linear, Gaussian, cubic), and K-NN with Euclidean distance. Performance metrics such as precision, recall, accuracy, and specificity were evaluated, with the second model demonstrating higher accuracy and robustness. Despite the promising results, the study acknowledged some limitations and future directions. Potential overfitting remains a concern despite the use of dropout and normalization techniques. Additionally, the models’ dependence on high-quality, well-annotated datasets for training poses challenges for broader applicability. Future research could focus on extending these models to differentiate between various types of leukemia, rather than just identifying the presence of leukemia. Furthermore, the use of larger and more diverse datasets could help validate the models’ robustness and accuracy. The authors propose integrating these automated systems into clinical workflows to aid in the early and accurate diagnosis of leukemia, ultimately improving patient outcomes through timely and appropriate treatment.29

Baig et al. (2022) developed a comprehensive approach using deep learning-based convolutional neural networks (CNN) to detect acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), and multiple myeloma (MM) from microscopic blood smear images. The study “Detecting Malignant Leukemia Cells Using Microscopic Blood Smear Images: A Deep Learning Approach” by Raheel Baig et al. investigates an advanced automated system for detecting various types of malignant leukemia cells using deep learning techniques. The proposed methodology integrates pre-processing, feature extraction, and classification to identify Acute Lymphoblastic Leukemia (ALL), Acute Myeloid Leukemia (AML), and Multiple Myeloma (MM) from blood smear images. The dataset, comprising 4150 images, is pre-processed to enhance image quality, including background elimination, noise reduction, and contrast enhancement using adaptive histogram equalization. Two Convolutional Neural Networks (CNN-1 and CNN-2) with 19 and 15 layers, respectively, are employed for feature extraction. The extracted features are then fused using Canonical Correlation Analysis (CCA) to improve the discriminative power of the features. Finally, the fused features are classified using various machine learning classifiers, including Bagging Ensemble, LPBoost, Total Boost, RUSBoost, Fine KNN, and SVM, with the Bagging Ensemble achieving the highest accuracy of 97.04%.The methodology involves multiple steps, starting with pre-processing, where RGB images are converted to grayscale, enhanced using intensity adjustment, and noise removed through area and closing operations. Feature extraction is performed using CNN models trained on pre-processed images. CNN-1 and CNN-2, containing convolutional, batch normalization, ReLU, max-pooling, fully connected, SoftMax, and classification layers, extract deep features, which are then combined using CCA fusion. The classification of leukemia subtypes is performed using machine learning classifiers, evaluated using metrics like accuracy, sensitivity, specificity, precision, and F1-score. The Bagging Ensemble classifier outperforms others, highlighting the system’s robustness and potential for clinical application. Despite the promising results, the study acknowledged some limitations and future directions. The method’s performance is dependent on high-quality, well-annotated datasets, which may limit its applicability in settings with varied data quality. Additionally, there is a potential risk of overfitting, despite the use of data augmentation techniques. Future research aims to extend the current methodology to include larger, more diverse datasets and refine the algorithms to improve robustness and accuracy. The authors propose further integration of this automated system into clinical workflows, aiding in the early and accurate diagnosis of leukemia, which could significantly enhance patient outcomes. Future research directions include differentiating between more subtypes of leukemia and exploring the applicability of the methodology to other hematologic and non-hematologic conditions.

This vision underscores the potential of advanced deep learning techniques to revolutionize medical diagnostics, improving efficiency and accuracy in detecting malignant conditions.30

Shafique and Tehsin (2018) proposed a robust and automated detection method for acute lymphoblastic leukemia (ALL) and its subtypes (L1, L2, and L3) using transfer learning with a pretrained deep convolutional neural network (DCNN) AlexNet. The ALL-IDB database, which contains images from both healthy patients and leukemia patients, was used for this research. The dataset was split into training and evaluation sets, with data augmentation applied to prevent overfitting. The authors utilized the AlexNet architecture, which includes five convolutional layers followed by three fully connected layers. The last layers were replaced with new ones to classify the input images into four classes: L1, L2, L3, and Normal. Preprocessing steps included image normalization, conversion into binary images, and noise reduction. The feature extraction was done using the pretrained layers of AlexNet, and classification was achieved using the softmax activation function. The model achieved an accuracy of 99.50% for ALL detection and 96.06% for subtype classification. The researchers highlighted that DCNNs are powerful enough to detect and classify leukemia without the need for manual image segmentation. The study also suggests further improvements by integrating the approach into a fully automated system and expanding the dataset for training from scratch to enhance diagnostic accuracy. The methodology involves several key steps. In preprocessing, images are normalized, converted into binary format, and subjected to noise reduction. Unlike traditional methods, no manual segmentation is needed due to the strength of the DCNN layers. Feature extraction is performed using the pretrained AlexNet layers, and the classifier is a modified AlexNet architecture with new fully connected layers for classification. The network includes five convolutional layers, three max-pooling layers, and three newly added fully connected layers for classification. The algorithm characteristics include ReLU activation, softmax for classification, and data augmentation techniques like image rotation and mirroring to enhance the model’s robustness. The study achieved notable performance metrics with an accuracy of 99.50% for ALL detection and 96.06% for subtype classification. Sensitivity was 100% for ALL detection and 96.74% for subtype classification, while specificity was 98.11% for ALL detection and 99.03% for subtype classification. These results underscore the high accuracy and effectiveness of using transfer learning with DCNNs for leukemia detection. Despite the promising results, the study acknowledges some limitations and future directions. The limited dataset could affect the robustness of the training, and further research is needed to integrate the system into a fully automated diagnostic tool. Future work should focus on training deep learning models from scratch with larger datasets to improve accuracy and reliability. The authors also propose integrating their approach into a fully automated system to assist pathologists and oncologists in the early and accurate diagnosis of leukemia, ultimately enhancing patient outcomes through timely treatment interventions.31

Chiaretti et al. (2014) outlined a comprehensive approach to diagnosing and subclassifying Acute Lymphoblastic Leukemia (ALL) using bone marrow morphology, multi-channel flow cytometry (MFC), and genetic/cytogenetic analysis, following the WHO classification of lymphoid neoplasms. Their work provides a detailed overview of current standards and methodologies. The approach integrates cell morphology, immunophenotyping, and genetic/cytogenetic studies as outlined in the 2008 WHO classification. Morphological assessment is the initial diagnostic step, distinguishing ALL from AML based on cell characteristics in bone marrow and peripheral blood. Immunophenotyping with multi-channel flow cytometry (MFC) is the gold standard for identifying cell lineage and defining subsets, utilizing specific markers for B- and T-lineage ALL. Cytogenetic and genetic analyses, including karyotyping, fluorescence in situ hybridization (FISH), array-CGH, and next-generation sequencing (NGS), further refine the diagnosis and provide critical prognostic information.

The article classifies ALL into B-lymphoblastic leukemia/lymphoma not otherwise specified, B-lymphoblastic leukemia/lymphoma with recurrent cytogenetic alterations, and T-lymphoblastic leukemia/lymphoma. Each subtype is characterized by specific immunophenotypic markers and genetic aberrations. For instance, B-lineage ALL markers include CD19, CD20, CD22, CD24, and CD79a, while T-lineage ALL markers include CD1a, CD2, CD3, CD4, CD5, CD7, and CD8. The review highlights the prognostic significance of various genetic lesions, such as IKZF1 deletions in BCR-ABL+ ALL and CRLF2 rearrangements in B-ALL, which are associated with poor outcomes. The authors advocate for prospective clinical trials to ensure accurate diagnosis and effective therapy, emphasizing that early diagnostic work should be performed by experienced personnel to optimize treatment outcomes.30 Despite its strengths, the study acknowledges certain limitations and future directions. The method heavily relies on high-quality samples and advanced diagnostic facilities, which may not be accessible everywhere. Additionally, diagnostic accuracy can vary across different laboratories. To address these issues, the authors stress the importance of conducting early and accurate diagnostic work-ups by experienced personnel. They recommend integrating advanced genetic and immunophenotyping techniques into clinical practice to optimize treatment strategies. Future research aims to identify novel subgroups with prognostic significance and refine diagnostic algorithms to enhance patient outcomes. Prospective clinical trials are also suggested to ensure diagnostic accuracy and therapeutic efficacy, ultimately improving the comprehensive understanding and treatment of ALL.32

Minal D. Joshi, Atul H. Karode, and S.R. Suralkar (2013) developed an automatic method for detecting acute leukemia in blood microscopic images using image processing techniques and a k-nearest neighbor (kNN) classifier. The methodology involves preprocessing, segmentation, feature extraction, and classification to differentiate between normal and leukemic cells. In the preprocessing step, RGB images are converted to grayscale, contrast is enhanced using histogram equalization, and linear contrast stretching is applied to adjust image intensity levels. The segmentation process employs Otsu’s thresholding method to convert grayscale images to binary images, followed by morphological operations to remove small pixel groups and noise. Feature extraction focuses on calculating three morphological features of lymphocyte cells: area, perimeter, and circularity. These features are then used by the kNN classifier to classify the cells as either normal or blast (leukemic). The proposed system was evaluated on 108 images from the ALL-IDB public dataset, achieving an overall accuracy of 93%. The methodology includes several key steps. Preprocessing involves converting RGB images to grayscale, applying histogram equalization for contrast enhancement, and using linear contrast stretching for image intensity adjustment. Noise reduction is achieved through area and closing operations. Segmentation is done using Otsu’s thresholding to convert grayscale images to binary images, followed by morphological operations to remove small pixel groups and noise, and connect neighboring pixels to form objects. Feature extraction calculates the area by counting the total number of non-zero pixels within the nucleus region, measures the perimeter by calculating the distance between successive boundary pixels of the nucleus, and calculates circularity. The kNN classifier then uses these features to classify lymphocyte cells as either normal or blast (leukemic). The performance metrics showed an overall accuracy of 93%, with high precision in classifying blast cells, high recall in identifying leukemic cells, and high specificity in distinguishing between normal and blast cells. Despite the promising results, the study identified some limitations and future directions. The system’s performance depends on high-quality, well-annotated datasets, and it may be sensitive to variations in image quality and staining. Future research aims to optimize image processing techniques to improve robustness and accuracy, integrate more advanced machine learning classifiers, and explore stain-independent segmentation methods to enhance reliability. Expanding the dataset and refining the algorithms to improve the classification of various leukemia subtypes are also recommended. Integrating this automated system into clinical workflows could significantly aid in early and accurate leukemia diagnosis, thereby improving patient outcomes.33

Tusneem A. Elhassan et al. (2023) propose a novel approach for classifying atypical white blood cells (WBCs) in Acute Myeloid Leukemia (AML) using a hybrid model that integrates a geometric transformation (GT) with a deep convolutional autoencoder (DCAE) and a convolutional neural network (CNN). The methodology involves preprocessing, augmentation, feature extraction, and classification to differentiate between typical and atypical WBCs, followed by further subclassification of atypical WBCs into eight distinct categories. The dataset used includes 18,365 single-cell images from AML patients and non-malignant controls, categorized into 15 different types of WBCs.

The preprocessing steps include random rotation, vertical and horizontal flipping, and augmentation using the GT-DCAE model. Feature extraction is performed using the DCAE, which compresses the input data into a low-dimensional latent representation, followed by a CNN for additional feature extraction. The classification involves a two-stage process: the first stage differentiates between typical and atypical WBCs, and the second stage classifies atypical WBCs into subtypes. The proposed model achieved an average accuracy of 97%, a sensitivity of 97%, and a precision of 98%, with an AUC of 99.7%, demonstrating exceptional discriminating abilities. The methodology includes several key steps. Preprocessing involves augmentation techniques like random rotation and flipping, with the GT-DCAE model generating synthetic images. Segmentation is carried out using the CMYK-Moment Localization-Feature Fusion Extraction framework. For feature extraction, the DCAE model employs an encoder network with three convolutional layers, a latent vector space, and a decoder network. Classification is conducted in two stages using CNNs with ReLU activation, L2 regularization, and the SoftMax function.

Performance metrics highlight the model’s effectiveness with an overall accuracy of 97%, precision of 98%, sensitivity of 97%, and an AUC of 99.7%. Class-wise performance indicates high precision and sensitivity for various WBC subtypes, such as myeloblasts and monoblasts, although there are areas for improvement in detecting certain subtypes like metamyelocytes and bilobed promyelocytes.

Despite the promising results, the study identifies some limitations and future directions. The model’s performance depends on high-quality, well-annotated datasets, and there is potential for misclassification in the intermediate stages of myelopoiesis. Future research aims to refine the hybrid model by increasing the dataset size and incorporating additional augmentation techniques. The authors also propose integrating this model into clinical workflows to aid in the early and accurate diagnosis of AML, thereby improving patient outcomes through timely treatment interventions. Furthermore, they suggest exploring the application of their model to other hematologic and non-hematologic conditions to expand its utility and effectiveness in medical diagnostics.34

Negm et al. (2017) present an advanced decision support system for acute leukemia classification using digital microscopic images, combining neural networks and decision tree models to achieve high efficiency and accuracy. The neural network model, favored for its sensitivity, comprises multiple convolutional layers, batch normalization, and dropout to prevent overfitting, while the decision tree model, noted for its speed, provides a robust comparison.

The methodology involves several steps. Preprocessing techniques such as histogram equalization and contrast enhancement are employed to improve image quality. Segmentation is performed using Otsu thresholding and fuzzy c-means clustering. Feature extraction involves capturing geometric, textural, and color features, with Principal Component Analysis (PCA) used for feature reduction. Classification is conducted using support vector machines (SVM) and k-nearest neighbors (KNN), achieving high accuracy rates. The performance metrics reveal that the neural network model achieves an accuracy of 99.74%, demonstrating superior performance in terms of sensitivity, specificity, and precision compared to the decision tree model. However, the study highlights the need for larger datasets and suggests future work on noise reduction and fully automated systems to further enhance the model’s robustness and accuracy.

Despite its strengths, the study identifies some limitations. The limited dataset size may affect the generalizability of the results, and noise reduction methods were not fully explored. Future research aims to develop a robust segmentation system independent of stains, expand the dataset to include more types of acute myeloid leukemia cells, and implement noise reduction methods such as median, mean, and Gaussian smoothing. Additionally, the authors propose enhancing deep learning models to learn from scratch with larger datasets, aiming to create practical, everyday diagnostic tools.35

Niranjana Sampathila et al. (2022) present an advanced method for detecting acute lymphoblastic leukemia (ALL) using a customized convolutional neural network (CNN) named ALLNET. The study utilizes the C_NMC_2019 dataset, containing 10,661 images, with 7272 images of blast cells and 3389 images of healthy cells. The methodology involves extensive preprocessing, including color space conversion, histogram equalization, and thresholding to enhance image quality and facilitate segmentation. Data augmentation techniques such as vertical and horizontal flipping, random rotation, brightness adjustments, and Gaussian blur are employed to increase the dataset’s size and robustness.

Feature extraction and classification are performed using the customized ALLNET architecture, which consists of four convolutional layers, four max-pooling layers, and three fully connected layers, with batch normalization and dropout layers to prevent overfitting. Training was conducted on Google Collaboratory using the Nvidia Tesla P-100 GPU, achieving a maximum accuracy of 95.54%, specificity of 95.81%, sensitivity of 95.91%, F1-score of 95.43%, and precision of 96%. This study underscores the importance of using deep learning models in medical diagnostics to enhance the accuracy and efficiency of leukemia detection. The methodology includes several key steps. Preprocessing involves converting images to the HSI color space for better contrast, applying histogram equalization, and using thresholding to convert images to binary form for segmentation. Noise reduction techniques, including Gaussian blur, are also used. Data augmentation further improves robustness with techniques such as vertical and horizontal flipping, random rotation, brightness adjustments, and Gaussian blur. Feature extraction uses the ALLNET architecture, which includes four convolutional layers with ReLU activation, four max-pooling layers to reduce dimensionality, three fully connected layers with batch normalization and dropout to prevent overfitting, and batch normalization to stabilize and accelerate training. The classification process employs the customized CNN (ALLNET), using the Adam optimizer and categorical cross-entropy loss function. Performance metrics highlight the model’s effectiveness with an accuracy of 95.54%, precision of 96%, sensitivity of 95.91%, specificity of 95.81%, and an F1-score of 95.43%. These results demonstrate the high classification accuracy and robustness of the ALLNET model in detecting leukemia. Despite the promising outcomes, the study identifies some limitations and future directions. The model’s performance depends on high-quality, well-annotated datasets, and it requires extensive preprocessing. Future research aims to enhance the ALLNET model by expanding the dataset to include more diverse and noisy images, reducing preprocessing steps, and integrating more advanced neural network architectures like YOLOv4, ResNet, and AlexNet. These improvements aim to develop a robust and reliable diagnostic tool for clinical use, enhancing the accuracy and efficiency of leukemia detection and reducing the time and error associated with manual diagnostics.36

Pałczyński et al. (2021) developed a hybrid artificial intelligence system for classifying Acute Lymphoblastic Leukemia (ALL) using an optimized, IoT-friendly neural network architecture. This system integrates transfer learning with MobileNet v2 as the encoder and machine learning algorithms like XGBoost, Random Forest, and Decision Tree as classifiers. Utilizing the ALL-IDB2 dataset, which consists of segmented images of lymphocyte cells from healthy patients and patients with ALL, the methodology involves preprocessing the images using data augmentation techniques, encoding them with MobileNet v2, and classifying the feature vectors with the machine learning models. The results demonstrate an average accuracy of over 90%, reaching up to 97.4%, showcasing the effectiveness of hybrid AI systems in handling tasks with low computational complexity. The methodology includes several steps, starting with preprocessing, where RGB images are converted to grayscale, contrast is enhanced using adaptive histogram equalization, and background noise is removed using area and closing operations. Data augmentation techniques such as color jitter, Gaussian blur, horizontal flip, vertical flip, and rotation are applied, followed by Z-score normalization. For feature extraction, MobileNet v2 is utilized, optimized for small processing units like mobile CPUs or IoT devices. The extracted features are then fed into classifiers such as XGBoost, Random Forest, and Decision Tree, trained using Adam optimizer with early stopping after 1000 epochs. The system achieved a classification accuracy of 97.4% when using MobileNet v2 with a fully connected layer, and over 90% accuracy for most models. Performance metrics including precision, recall, specificity, and AUC values were high, indicating robust model performance. The study highlights the system’s strengths in achieving high classification accuracy, efficient preprocessing, and effective use of transfer learning to prevent overfitting, making it suitable for low-resource environments like IoT. Despite these strengths, the study acknowledges some limitations and future directions. The system’s performance depends on high-quality, well-annotated datasets, and it may be sensitive to variations in image quality and staining. The Decision Tree algorithm showed lower performance compared to XGBoost and Random Forest. Future research aims to further optimize the hybrid AI system to enhance its robustness and accuracy, extend its application to other hematologic conditions, and refine the machine learning models. The authors suggest integrating this automated system into clinical workflows, particularly in low-resource environments, to aid in early and accurate leukemia diagnosis. By leveraging transfer learning and IoT-friendly architectures, the system aims to enhance the efficiency and reliability of medical diagnostics.37

Alexandra Bodzas et al. propose an automated method for detecting acute lymphoblastic leukemia (ALL) from microscopic blood smear images, leveraging image processing techniques and machine learning models.

The dataset used in their study consists of 18 images of normal blood smears and 13 images from patients diagnosed with ALL, all collected from the University Hospital of Ostrava. Each image has a resolution of 4,080 × 3,072 pixels, capturing diverse regions of the blood smear to ensure comprehensive representation of leukocytes.

The images underwent preprocessing to correct variations in lighting and staining inconsistencies, followed by segmentation through a three-phase filtering technique to separate leukocytes from the rest of the image. Sixteen morphological and statistical features, mimicking the criteria used by hematology experts, were extracted from the segmented cells. These features were then used to train both a Support Vector Machine (SVM) and an Artificial Neural Network (ANN). While the results were promising, with the SVM achieving 96.72% accuracy and the ANN 97.52%, the limited dataset raises concerns about potential overfitting and the generalizability of the models. Additionally, the method is focused exclusively on the detection of ALL, limiting its applicability to other subtypes of leukemia.38

Nimesh Patel and Ashutosh Mishra proposed an automated method for leukemia detection from microscopic images in their 2015 study. Their approach focuses on extracting textural and morphological features from cells using Gray Level Co-occurrence Matrices (GLCM) along with cell shape analysis. These extracted features are classified using a binary Support Vector Machine (SVM) classifier, enabling the differentiation between leukemic and normal cells with a reported accuracy of 88.24%, based on blood smear images from leukemia patients. The dataset used in the study consists of 100 images from leukemia patients and 100 images from healthy individuals. However, several limitations are associated with this method. The relatively small dataset (only 200 images) raises concerns about the model’s ability to generalize its results to more diverse datasets. Additionally, the approach is limited to general leukemia detection, without addressing specific subtypes such as Acute Lymphoblastic Leukemia (ALL) or Acute Myeloid Leukemia (AML). Another challenge lies in the reliance on GLCM-based feature extraction, which depends on accurate cell segmentation. Achieving precise segmentation can be difficult with variable-quality images, increasing the risk of classification errors.39

Following the discussion of the various studies, below is Table 5, a summarized comparative chart on these studies alongside their methods.

Table 5. Comparative overview of automated Leukemia detection techniques and models.

Author, YearWorkMethods (algorithms used)Dataset usedAccuracy evaluationRef
Warnat-Herresthal et al. (2020)Scalable Prediction of AML Using Machine LearningL1-regularized logistic regression (lasso), k-nearest neighbors, linear SVM, linear discriminant analysis, random forests, deep neural networks (DNN)12029 samples from 105 studies (HG-U133A microarray, HG-U133 2.0 microarray, RNA sequencing)High accuracy, sensitivity, and specificity; up to 100% in some scenarios28
Loey et al. (2020)Deep Transfer Learning for Leukemia DiagnosisTransfer learning with AlexNet, various classifiers (SVM, linear discriminants, decision trees, k-nearest neighbors), fine-tuned AlexNetDataset of 2820 imagesFirst model: 99.79% accuracy; Second model: 100% accuracy29
Baig et al. (2022)Deep Learning Approach to Detect Malignant Leukemia CellsCNN (two architectures: 19 layers and 15 layers), feature fusion using Canonical Correlation Analysis (CCA), Bagging Ensemble classifierDataset of 4150 images97.04% accuracy (Bagging Ensemble)30
Shafique and Tehsin (2018)ALL Detection and Classification Using AlexNetTransfer learning with AlexNet (five convolutional layers, three fully connected layers), data augmentationALL-IDB database99.50% accuracy (ALL detection), 96.06% accuracy (subtype classification)31
Chiaretti et al. (2014)Diagnosis and Subclassification of Acute Lymphoblastic LeukemiaMorphological assessment, immunophenotyping (multi-channel flow cytometry), genetic/cytogenetic analysis (karyotyping, FISH, array-CGH, NGS)Not specifiedHigh accuracy in differentiating ALL subtypes using immunophenotyping and genetic analyses32
Joshi et al. (2013)WBC Segmentation and Classification for Acute Leukemia Detectionk-nearest neighbor (kNN) classifier, Otsu's thresholding, morphological operationsALL-IDB public dataset (108 images)93% accuracy33
Elhassan et al. (2023)Hybrid Model for AML ClassificationTwo-stage hybrid model (GT-DCAE, CNN), augmentation (random rotation, vertical and horizontal flipping), deep convolutional autoencoderDataset of 18365 single-cell images from AML patients and non-malignant controls97% accuracy, 98% precision, 97% sensitivity, 99.7% AUC34
Negm et al. (2017)Decision Support System for Acute Leukemia ClassificationNeural networks, decision tree, histogram equalization, Otsu thresholding, fuzzy c-means clustering, PCANot specified99.74% accuracy (neural network model)35
Sampathila et al. (2022)Customized Deep Learning Classifier for ALL DetectionCustomized CNN (ALLNET), data augmentation (vertical and horizontal flipping, random rotation, brightness adjustments, Gaussian blur)C_NMC_2019 dataset (10661 images)95.54% accuracy, 95.81% specificity, 95.91% sensitivity, 95.43% F1-score, 96% precision36
Pałczyński et al. (2021)IoT Application of Transfer Learning for ALL ClassificationTransfer learning with MobileNet v2, XGBoost, Random Forest, Decision TreeALL-IDB2 dataset97.4% accuracy37
Bodzas et al.Automated Detection of ALL Using ANN and SVMANN, SVM, image preprocessing and segmentation31 images (18 normal, 13 ALL cases) from the University Hospital of Ostrava96.72% accuracy (SVM), 97.52% accuracy (ANN)38
Patel and Mishra (2015)General Leukemia Detection Using Image ProcessingGray Level Co-occurrence Matrix (GLCM) for feature extraction, binary SVM classifier200 images (100 healthy, 100 leukemia)88.24% accuracy39

7. Discussion

This article delves into the epidemiology and high prevalence of leukemia, highlighting the significant global research efforts focused on its early diagnosis and treatment. The critical need for advancements in this area is emphasized by the widespread participation of nations worldwide, each contributing to the evolving landscape of leukemia research. The integration of artificial intelligence (AI) into diagnostic workflows offers promising opportunities to enhance the detection and classification of leukemia, especially given the complexities associated with its primary subtypes—acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), chronic lymphocytic leukemia (CLL), and chronic myeloid leukemia (CML). However, current diagnostic methods and AI-based approaches face several limitations that must be addressed to realize their full potential.

A comprehensive computational pipeline for leukemia detection generally involves three essential phases: preprocessing, feature extraction, and classification. Preprocessing ensures the data is optimized for analysis by improving image quality through noise reduction, data augmentation, and color correction. This step is essentiel, as variations in sample preparation and staining techniques can introduce inconsistencies that affect diagnostic accuracy. The feature extraction stage identifies key cellular traits such as morphology, texture, and size critical for differentiating between normal and malignant cells. In the final phase, machine learning algorithms and deep learning models, including convolutional neural networks (CNNs), support vector machines (SVMs), and ensemble classifiers, analyze these features to deliver accurate diagnostic outcomes.

Despite these advancements, AI models in leukemia diagnostics encounter significant challenges. These include overfitting, where models trained on limited datasets struggle to generalize to unseen data, and variability in clinical images, which affects performance. Current diagnostic approaches also depend heavily on high-quality, well-annotated datasets, but access to such data is often limited, and inconsistencies across datasets can impair model reliability. Additionally, most AI models lack the scalability to perform equally well across diverse healthcare settings, as differences in equipment, patient demographics, and clinical protocols can impact outcomes. Addressing these challenges requires robust international collaboration, development of diverse multi-institutional datasets, and implementation of federated learning approaches to maintain data privacy while training models across institutions. Overcoming these limitations will be essential for creating AI tools that not only enhance diagnostic precision but also support personalized treatment planning, ultimately reducing the global burden of leukemia and improving patient outcomes.

8. Limitations

AI-powered models have advanced leukemia detection and classification significantly, yet several limitations must be addressed for reliable performance in clinical settings. A key factor influencing model robustness and accuracy is data quality and diversity. Many studies emphasize that high-quality, diverse datasets are essential to counteract variability across different clinical environments. When models are trained on homogeneous, limited datasets, they struggle to generalize well, especially across institutions that use different imaging equipment or patient demographics. This dependency on well-annotated data and the variability of samples across platforms introduce performance inconsistencies, underscoring the need for extensive and varied datasets to reduce overfitting and improve robustness.

Overfitting risks are particularly prominent in models that rely on deep learning, such as convolutional neural networks (CNNs). Despite the use of techniques like dropout layers and normalization, CNN models can tend to memorize training data rather than identify generalizable patterns. This issue is exacerbated when models are trained on data with limited diversity, impacting their effectiveness on new, unseen data. Addressing overfitting requires a combination of data augmentation, transfer learning, and regularization techniques to enhance model adaptability.

Another critical challenge is dataset size constraints. Limited datasets restrict a model’s ability to capture the full range of leukemia variations, impacting classification accuracy. Smaller datasets are also more prone to overfitting, as the model may “memorize” specific details of the training data instead of learning generalizable features. Without access to larger datasets, models may deliver inconsistent results, particularly when applied to data from different patient populations or clinical environments.

Sensitivity to image quality and staining variations presents another challenge, as models can be highly sensitive to inconsistencies in sample preparation. Variations in staining techniques or image quality can affect classification accuracy, and models trained on high-quality samples may struggle to perform in less controlled environments. This sensitivity limits the practical application of AI models in resource-limited settings where high-quality imaging and staining facilities may not be available. Furthermore, automation and intermediate stage classification are areas that require improvement for AI models to be seamlessly integrated into clinical workflows. Challenges in achieving fully automated, robust classification are particularly evident when classifying subtle differences in cell morphology. Models that can accurately distinguish intermediate cell stages are vital, especially for personalized diagnostics, where precise classification is critical for treatment planning. Thus, achieving automation that can handle these complexities is essential for clinical applicability.

Noise reduction and segmentation sensitivity are also important for consistent model performance. Effective noise reduction is essential, as noisy or low-quality images can hinder model accuracy. Segmentation-based approaches, which involve isolating cell features, are especially sensitive to variations in image quality, making them prone to error if the images do not meet certain standards. Optimization of noise reduction methods could therefore improve model reliability.

Finally, broad applicability and real-world implementation remain significant challenges. Models that perform well in research environments may underperform in real-world clinical applications due to differences in clinical workflows, patient demographics, and medical equipment. This dependency on high-quality, well-annotated data makes it difficult for models to adapt to broader clinical settings without compromising accuracy. To make AI models truly viable for clinical use, they must be tested and validated extensively across multiple institutions. Federated learning, which allows models to be trained on data from various sources while preserving privacy, offers a promising approach to enhancing scalability and adaptability.

Overall, addressing these limitations is essential to ensure that AI-driven leukemia diagnostics are reliable, scalable, and adaptable for real-world clinical environments. With improvements in dataset quality and diversity, robust techniques to mitigate overfitting, and optimized workflows for automation, these tools could substantially enhance leukemia diagnostics and patient care.

Table 6 summarizes the main limitations identified in leukemia-related studies.

Table 6. Limitations in AI-Based Leukemia Classification and Detection.

Author, YearLimitations
Warnat-Herresthal et al. (2020)Performance variability due to differences across studies and platforms; high dependence on data quality and diversity for generalizability.
Loey et al. (2020)Potential overfitting despite dropout and normalization; high dependence on well-annotated datasets, limiting broad applicability.
Baig et al. (2022)Dependence on high-quality, well-annotated data; risk of overfitting due to limited dataset variety; limited applicability in real-world settings.
Shafique and Tehsin (2018)Limited dataset, affecting the model’s robustness; further improvements needed for full automation; dependency on high-quality images.
Chiaretti et al. (2014)Requires high-quality samples and advanced diagnostic facilities; may not be feasible in resource-limited settings; variability across labs.
Joshi et al. (2013)Sensitivity to image quality and staining variations; limited dataset size limits generalization potential.
Elhassan et al. (2023)Dataset quality-dependent; challenges in intermediate stage classification; further refinement needed for broader clinical use.
Negm et al. (2017)Limited dataset size affects generalizability; noise reduction methods could be further optimized for better performance.
Sampathila et al. (2022)High reliance on preprocessing and well-annotated datasets; limited generalizability due to constrained dataset variety.
Pałczyński et al. (2021)Dependent on high-quality data; sensitive to variations in image quality and staining, limiting robustness.
Bodzas et al.Limited by small dataset size, increasing overfitting risk; model applicability restricted to detecting ALL only.
Patel and Mishra (2015)Small dataset size limiting generalizability; segmentation-based approach sensitive to variable image quality.

9. Conclusion

The integration of artificial intelligence (AI) and advanced image processing techniques has revolutionized the field of leukemia diagnostics, enabling more accurate, rapid, and comprehensive detection of different leukemia subtypes. By automating key processes such as feature extraction, segmentation, and classification, AI-driven models address the inherent challenges of traditional diagnostic methods, improving both efficiency and diagnostic precision. Furthermore, understanding the global epidemiology and associated risk factors of leukemia is vital for the development of targeted screening programs and tailored treatment strategies. Future research should focus on addressing current limitations, such as ensuring high-quality datasets, mitigating overfitting risks, and enhancing the generalizability of AI models. Emphasis should also be placed on international collaboration, ensuring equitable access to advanced diagnostic technologies across regions to bridge healthcare disparities. Ongoing innovation in AI-based diagnostics will play a pivotal role in reducing the global burden of leukemia, fostering early detection, personalized treatment, and ultimately improving patient outcomes. Through focused efforts and interdisciplinary cooperation, the promise of AI in hematologic diagnostics can be fully realized, paving the way for enhanced healthcare delivery and better quality of life for patients worldwide.

9.1 Future directions

Future research must focus on overcoming the limitations currently faced by AI-based leukemia diagnostics to develop more reliable, scalable, and generalizable systems. A key priority is the creation of diverse, multi-institutional datasets to address variability in image quality, staining techniques, and patient demographics, ensuring that AI models perform consistently across different clinical environments. Implementing strategies such as data augmentation, transfer learning, and regularization techniques will be essential to mitigate the risk of overfitting and enhance model generalization. Additionally, the integration of multi-modal data—combining genetic, clinical, and imaging information can improve diagnostic accuracy and enable personalized treatment strategies. Advancing federated learning frameworks will allow AI models to be trained across multiple institutions while maintaining patient privacy and data security.

Future systems should also emphasize model transparency and interpretability, ensuring that predictions are understandable and actionable by healthcare professionals to build trust and facilitate clinical adoption. Addressing the challenges of scalability is essential, particularly for deploying AI solutions in resource-limited settings, where computational efficiency must be balanced with diagnostic accuracy. Real-world testing and extensive clinical trials will be necessary to validate thes e models, refining them to accommodate variations in equipment, clinical protocols, and patient populations. International collaboration, supported by equitable access to advanced diagnostic technologies, will play a pivotal role in bridging healthcare disparities and ensuring that the benefits of AI-driven leukemia diagnostics are accessible globally. Through continuous innovation, interdisciplinary cooperation, and targeted efforts, future AI systems can revolutionize leukemia detection and treatment, ultimately improving patient outcomes and reducing the global burden of this hematologic malignancy.

Ethics and consent

Ethics and consent were not required.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 19 Dec 2024
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Achir A, Debbarh I, Zoubir N et al. Advances in Leukemia detection and classification: A Systematic review of AI and image processing techniques [version 1; peer review: 1 approved with reservations]. F1000Research 2024, 13:1536 (https://doi.org/10.12688/f1000research.159318.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 19 Dec 2024
Views
29
Cite
Reviewer Report 14 Jan 2025
Huan Mo, National Institutes of Health, Bethesda, Maryland, USA 
Approved with Reservations
VIEWS 29
As a hematopathologist, this review will be focusing on the the Section 3 (Classification of Leukemia). 

The FAB morphological classifications used in this manuscript are very dated with limited therapeutic or prognostic values in the current standard ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Mo H. Reviewer Report For: Advances in Leukemia detection and classification: A Systematic review of AI and image processing techniques [version 1; peer review: 1 approved with reservations]. F1000Research 2024, 13:1536 (https://doi.org/10.5256/f1000research.175028.r354217)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 19 Dec 2024
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.