Progress and challenges for the application of machine learning for neglected tropical diseases

ChungYuen Khew; Rahmad Akbar; Norfarhan Mohd-Assaad

doi:10.12688/f1000research.129064.3

Home Browse Progress and challenges for the application of machine learning for...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Review

Revised

Progress and challenges for the application of machine learning for neglected tropical diseases

[version 3; peer review: 3 approved]

ChungYuen Khew¹, Rahmad Akbar ², Norfarhan Mohd-Assaad ^1,3

PUBLISHED 17 Jul 2025

Author details Author details

¹ Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, 43600, Malaysia
² Department of Immunology, University of Oslo, Oslo, Oslo, 0372, Norway
³ Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia, Bangi, Selangor, 43600, Malaysia

ChungYuen Khew
Roles: Writing – Original Draft Preparation

Rahmad Akbar
Roles: Conceptualization, Project Administration, Supervision, Writing – Review & Editing

Norfarhan Mohd-Assaad
Roles: Funding Acquisition, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the Machine Learning in Drug Discovery and Development collection.

This article is included in the Neglected Tropical Diseases collection.

Abstract

Neglected tropical diseases (NTDs) continue to affect the livelihood of individuals in countries in the Southeast Asia and Western Pacific region. These diseases have been long existing and have caused devastating health problems and economic decline to people in low- and middle-income (developing) countries. An estimated 1.7 billion of the world’s population suffer one or more NTDs annually, this puts approximately one in five individuals at risk for NTDs. In addition to health and social impact, NTDs inflict significant financial burden to patients, close relatives, and are responsible for billions of dollars lost in revenue from reduced labor productivity in developing countries alone. There is an urgent need to better improve the control and eradication or elimination efforts towards NTDs. This can be achieved by utilizing machine learning tools to better the surveillance, prediction and detection program, and combat NTDs through the discovery of new therapeutics against these pathogens. This review surveys the current application of machine learning tools for NTDs and the challenges to elevate the state-of-the-art of NTDs surveillance, management, and treatment.

Keywords

Neglected Tropical Diseases, Machine Learning, Drug Development, Drug Discovery.

Corresponding authors: Rahmad Akbar, Norfarhan Mohd-Assaad

Competing interests: No competing interests were disclosed.

Grant information: The authors acknowledge the Ministry of Higher Education, Malaysia for the financial support through Fundamental Research Grant Scheme (FRGS) funding (FRGS/1/2019/STG05/UKM/03/1) awarded to Norfarhan Mohd-Assaad. The APC was partially funded by Universiti Kebangsaan Malaysia (GGPM-2019-042).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Khew C et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Khew C, Akbar R and Mohd-Assaad N. Progress and challenges for the application of machine learning for neglected tropical diseases [version 3; peer review: 3 approved]. F1000Research 2025, 12:287 (https://doi.org/10.12688/f1000research.129064.3) First published: 15 Mar 2023, 12:287 (https://doi.org/10.12688/f1000research.129064.1) Latest published: 17 Jul 2025, 12:287 (https://doi.org/10.12688/f1000research.129064.3)

Revised Amendments from Version 2

This revised version incorporates substantial improvements in clarity, structure, and citation accuracy based on reviewer feedback. The literature review on the application of machine learning (ML) tools for neglected tropical diseases (NTDs) has been updated to reflect recent publications from 2019 onward, with the search and analysis conducted from 2023. We have also revised sections discussing the disease burden, current ML applications, and global data-sharing platforms to provide clearer context and alignment with recent developments. These revisions aim to enhance the overall readability, scientific rigor, and transparency of the manuscript.

See the authors' detailed response to the review by Karla P. Godinez-Macias
See the authors' detailed response to the review by Erma Sulistyaningsih

Introduction

Neglected tropical diseases (NTDs)

Communicable diseases are illnesses brought on by pathogens such as bacteria, viruses, parasitic worms, and fungi that can be contracted easily from contaminated surfaces, water, air droplets, air, bites from vector organisms, and direct contact with infected individuals. A recent Coronavirus disease (COVID-19) pandemic caused by the novel coronavirus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), with 623,470,447 confirmed cases of COVID-19, and 6,551,678 reported deaths, as of 20^th October 2022 (https://covid19.who.int/), underlines the need for highly efficient approaches to manage, survey, and develop new treatments for communicable diseases. Unlike the COVID-19 pandemic, a ‘silent yet horrific endemic’ caused by a diverse group of diseases affecting more than 149 countries and with more than 1.7 billion of infected individuals worldwide does not receive the same amount of attention. These diseases are referred to as neglected tropical diseases (NTDs) encompassing various infection categories, and were first introduced by Peter Hotez and colleagues (Winkler et al. 2018; Parashar et al. 2024). The World Health Organization (WHO) first established a list of 17 “official” NTDs (World Health Organization 2010). Later in 2017, an addition of 3 disease conditions were made at the end of the 10^th meeting of the Strategic and Technical Advisory Group for Neglected Tropical Diseases (World Health Organization 2017). The full list of 20 NTDs currently recognized by the WHO is presented in Table 1.

Table 1. List of NTDs recognized by WHO.

Category	Disease
(i) Protozoan infections	1. Chagas disease
	2. Human African trypanosomiasis
	3. Leishmaniasis
(ii) Helminth infections	4. Taeniasis/Cysticercosis
	5. Dracunculiasis
	6. Echinococcosis
	7. Foodborne trematodes
	8. Lymphatic filariasis (LF)
	9. Onchocerciasis
	10. Schistosomiasis
	11. Soil-transmitted helminthiases - Ascariasis - Hookworm diseases - Trichuriasis
(iii) Bacterial infections	12. Buruli ulcer
	13. Leprosy
	14. Trachoma
	15. Yaws
(iv) Viral infections	16. Dengue and chikungunya fevers
(iv) Viral infections	17. Rabies
(v) Fungal Infections*	18. Mycetoma, chromoblastomycosis, and other deep mycosis
(vi) Ectoparasitic infections*	19. Scabies, Myiasis
(vii) Venom*	20. Snakebite envenoming

* Newly added diseases conditions into the NTD list prior to the outcome of the 10^th meeting of the Strategic and Technical Advisory Group for Neglected Tropical Diseases.

Disability-adjusted life year (DALY) impact of NTDs

High incidences of NTDs were commonly reported from tropical countries due to its optimal humidity and climate for the pathogens to thrive. NTDs affect individuals across all age groups. However, limited access to clean water and inadequate waste management in low- to middle-income countries across Africa and Asia, disproportionately increase the risk of exposure among women and children, who often face greater environmental and health vulnerabilities (Hotez & Lo 2020). To measure the extent of devastation caused by NTDs, the disability-adjusted life year (DALY; one DALY represents the loss of the equivalent of one year of full health) metric was introduced as a means to quantify the overall burden of disease borne by individuals (Mitra & Mawson 2017; Lin et al. 2022). DALYs for a disease or health condition are the sum of years of life lost (YLLs) due to premature mortality and years of healthy life lost due to disability (YLDs) due to prevalent cases of the disease or health condition in a population (Vinkeles Melchers et al. 2021). Based on the data collected by WHO, we were able to summarize the global burden for 14 of the 20 NTDs as estimated by DALYs in Table 2 below. Global burden for five of the highest estimated DALY burden NTDs is rabies (2.635 million years), dengue (1.952 million years), soil-transmitted helminthiases (STHs) (1.943 million years), schistosomiasis (1.628 million years), and lymphatic filariasis (LF) (1.616 million years). In another report from the Global Burden of Disease Study 2019, the estimated DALYs of NTDs was 15.142 million years, with the highest burden for dengue, followed by STHs, schistosomiasis, LF, and cysticercosis (Vos et al. 2020). When grouped based on category, helminth infections are responsible for the highest DALYs burden and widespread debilitating illnesses, with STHs as the leading cause of human helminthiases (Vos et al. 2020; World Health Organization 2020).

Table 2. Global burden for 14 of 20 NTDs estimated in Disability-Adjusted Life Years (DALYs).

Diseases	WHO Global Health Estimate (2019)^a			Global Burden Disease Study (2019)^b
Diseases	YLLs	YLDs	DALYs	DALYs
Rabies	2,634,634	146	2,634,780	782,000
Dengue	1,413,126	539,243	1,952,369	2,380,000
Soil-transmitted helminthiasis:	170,570	1,772,444	1,943,014	1,970,000
a) Ascariasis	170,523	578,095	748,618	754,000
b) Trichuriasis	0	231,942	231,942	236,000
c) Hookworm disease	47	962,407	962,454	984,000
Schistosomiasis	428,141	1,199,703	1,627,844	1,640,000
Lymphatic filariasis	105	1,616,028	1,616,133	1,630,000
Onchocerciasis	12	1,209,707	1,209,720	1,230,000
Cysticercosis	348,191	639,618	987,809	1,370,000
Foodborne trematodes	0	805,406	805,406	780,000
Leishmaniasis	420,844	301,433	722,278	697,000
Echinococcosis	387,710	73,213	460,923	122,000
Chagas disease	159,632	57,482	217,113	275,000
Trachoma	22	194,369	194,391	181,000
Human African Trypanosomiasis	101,091	1,009	102,099	82,600
Leprosy	7,553	28,884	36,437	28,800
Total	6,242,201	10,211,129	16,453,330	15,142,400

a World Health Organization (2020).

b Vos et al. (2020).

NTDs and other diseases that are of public health concern in SEA region

To provide a focused understanding of the disproportionate burden of NTDs in Southeast Asia (SEA), we summarized the DALY for each WHO region in Table 3. Based on the DALYs estimate by cause and WHO region in 2019, dengue has the highest burden of 1.510 million DALYs, followed by LF (1.029 million years), STHs (0.616 million years), rabies (0.455 million years), and cysticercosis (0.109 million years) in WHO-SEA Region. As previously highlighted, the SEA region is endemic to several vector-borne diseases such as arboviral diseases, cysticercosis, and rabies, which post significant health challenges if left undiagnosed and untreated. While melioidosis, leptospirosis, and malaria (other diseases of public health concern) not being officially recognized as NTDs by the WHO and other international agencies, it’s persistent burden and neglect, specifically in the Southeast Asia region, remain as significant contributors to the public health burden in SEA, warrants them among the major contributors to the public health concern. In this paper, we focus on diseases form the WHO’s NTD list that are of alarming concern in SEA region, as well as other diseases that surpass the impact of NTDs and present substantial public health challenges.

Table 3. The disability-adjusted life year (DALY) Estimates for 14 of 20 NTDS by WHO Region.

Diseases	WHO Regions
Diseases	Africa	America	Eastern Mediterranean	Europe	Southeast Asia	Western Pacific
Rabies	1,328,185	5,548	107,891	10,279	454,966	727,911
Dengue	49,712	76,976	104,969	24	1,509,709	210,980
Soil-transmitted helminthiasis:	819,390	104,329	175,762	5,203	616,183	222,148
a) Ascariasis	243,250	35,320	111,253	930	303,462	54,403
b) Trichuriasis	57,841	23,285	4,445	100	74,341	71,931
c) Hookworm disease	518,299	45,724	60,064	4,173	238,380	95,814
Schistosomiasis	1,314,151	89,976	128,868	894	488	93,468
Lymphatic filariasis	412,847	17,432	22,406	0	1,028,642	134,805
Onchocerciasis	1,204,501	483	4,736	0	0	0
Cysticercosis	371,541	126,959	829	3,401	109,144	375,935
Foodborne trematodes	0	52,454	17,980	1,839	12,277	720,857
Leishmaniasis	271,033	36,327	341,843	3,464	69,590	21
Echinococcosis	156,691	23,966	132,135	39,846	47,016	61,269
Chagas disease	0	216,428	0	649	0	36
Trachoma	79,158	1,083	50,443	22	50,603	13,082
Human African Trypanosomiasis	102,066	34	0	0	0	0
Leprosy	6,059	8,786	1,712	61	16,988	2,831
Total	6,934,724	865,110	1,265,336	70,885	4,531,789	2,785,491

Dengue fever is a mosquito-borne viral disease spread to humans by the bites of infected female mosquitoes, primarily of the Aedes aegypti species (Malavige et al. 2023). The disease is caused by members of the genus Flavivirus, within the Flaviviridae family (Molyneux 2019). Members of Flavivirus are responsible for a wide range of other deadly infections, whereby the virus that causes dengue comprises four different but closely related serotypes: DENV-1, DENV-2, DENV-3 and DENV-4 (Côrtes et al. 2023). These viruses are capable of causing illnesses of dengue fever (DF), dengue haemorrhagic fever (DHF), and dengue shock syndrome (DSS). In the past two decades, a 10-fold increase of 505,430 cases in 2000 to more than 5.2 million dengue cases were observed in 2019 (World Health Organization 2021). GBD 2019 estimated 2.38 million DALYs lost and age-standardized rates of 32.1 DALYs per 100,000 (95% UI 11.1 – 44.1) (Vos et al. 2020).

Lymphatic filariasis (LF) is caused by a group of helminths (roundworm) from the family of Filariodidea that reside in the lymphatic systems of humans (Bizhani et al. 2021). Wuchereria bancrofti remains as the predominant causative agent of LF worldwide , followed by Brugia malayi and Brugia timori (Lin et al. 2022). Transmission of LF is mediated by mosquitoes belonging to the genera of Anopheles, Culex, Aedes, and Mansonia, which serve as biological vectors for the parasites (Kermelita et al. 2024). Like many mosquito-borne diseases, these mosquitoes ingest microfilariae during a blood meal from an infected individual. Then, the parasite develops within the mosquito and are then transmitted to a new human host during subsequent feeding events (Bizhani et al. 2021). The devastating harm of having adult parasite nests and microfilaria discharged into the circulation of a person’s lymphatic vessels leads to disease morbidity. Functionally impaired lymphatic systems lead to the manifestation of lymphoedema (elephantiasis), hence the enlarged state of the patient’s limbs. Physically impaired people experience years of disability, stigmatization, and mental health comorbidity (Abela-Ridder et al. 2020). GBD 2019 estimated a DALYs burden of 1.630 million years, with age-standardized rates of 20.7 DALYs per 100,000 (95% UI 12.2 – 34.7) (Vos et al. 2020).

Soil-transmitted helminths (STHs) typically infect the host’s gastrointestinal region. WHO focuses on three STHs illnesses namely ascariasis, trichuriasis, and hookworm diseases. Ascariasis and trichuriasis are spread through ingestion of fecal-contaminated food or water (Muñoz-Antoli et al. 2022). Hookworm infection can be contracted when going barefoot on contaminated ground, in which the larvae develop into a form that enables them to pierce through human skin (Caldrer et al. 2022). Studies revealed that the presence of high prevalence of STH are due to poor sociodemographic and socioeconomic status, especially in rural areas with poor infrastructure facilities, improper sewage and waste management, inadequate water supply, prolonged direct contact with soil such as walking barefooted, and poor sanitation and self-hygiene (Hussein et al. 2022; Ali et al. 2020). GBD 2019 estimated similar figures to WHO GHE 2019, with DALYs burden of 1.970 million years and age-standardized rates of 26.6 DALYs per 100,000 (95% UI 17 – 40.5) (Vos et al. 2020).

Rabies is a zoonotic disease from the Lyssavirus genus in the Rhabdoviridae family that has the capability to infect all mammalian lifeforms (Condori et al. 2020). The disease is a lyssavirus-induced acute, progressive encephalitis and has caused a high count of human mortality and economic consequences (World Organisation for Animal Health 2008; Alemar Ali, 2022). Once infected, the disease often results in inflammation of the brain’s active tissues, causing headaches, stiff necks, light sensitivity, mental confusion, and seizures (Chen et al. 2025). Bite wounds or contamination of open cut wounds with infected saliva are primarily how the virus is transmitted (World Organisation for Animal Health 2008). Individuals in close contact with any suspected rabid animals should seek for post-exposure prophylaxis (PEP) treatment as a necessary means to prevent human rabies infection (Mohammad Basir et al. 2025). The virus proliferates on the central nervous system (CNS) and thus laboratory techniques on sample processing are focused at said area (Chen et al. 2025). GBD 2019 estimated a DALYs burden of 782,000 years with age-standardized rates of 10.6 DALYs per 100,000 (95% UI 4.4 – 14.7) (Vos et al. 2020).

Cysticercosis is a cestode infection in both humans and porcine, caused by parasitic larval form (cysticercus) of Taenia solium, after consuming food or water contaminated with feces containing T. solium eggs (faecal-oral contamination) (Kabululu et al. 2023). When the eggs are consumed, they hatch in the colon, releasing oncospheres that breach the intestinal wall and enter the bloodstream. From there, they migrate to various tissues and organs (including the muscles, skin, eyes, and central nervous system), where they develop into cysticerci (Galipó et al. 2021). Cysticercosis can still develop in communities that do not consume pork or share habitats with pigs since this disease is spread by swallowing T. solium eggs that are shed in the feces of a human T. solium carrier. Development of parasitic cysts in the brain or central nervous system is referred to as neurocysticercosis (NCC) (Mlowe et al., 2024). GBD 2019 estimated a higher DALYs burden of 1.370 million years with age-standardized rates of 16.8 DALYs per 100,000 (95% UI 10.7 – 23.9) (Vos et al. 2020).

Melioidosis or Whitmore’s disease is a bacterial infection caused by Burkholderia pseudomallei, a soil- and water-borne Gram-negative bacterium capable of causing illness ranging from an acute or chronic localized infection to a widespread septicaemia infection in multiple organs (Cheng & Currie 2005; Narne et al. 2024). Cases of melioidosis are frequently reported in endemic countries such as Africa, Australia, China India, Middle East, and Southeast Asia (typically Malaysia, Singapore, and Thailand) (Rout et al. 2025). Since its discovery in 1912, the bacteria still remain a topic of discussion among researchers due to its zoonotic nature, limited therapeutic options with no available vaccines till date (Nyanasegran et al. 2023). Moreover, B. pseudomallei was designated as a Tier 1 select agent given its biothreat potential including high morbidity and mortality rates in low infectious doses, multidrug antibiotic resistance, and the amenability to be aerosolized (Gassiep et al. 2021). Melioidosis infection can be acquired through many routes with skin inoculation and inhalation or ingestion of contaminated water and air droplets to be the leading cause. The disease mimics the signs and symptoms of other diseases (tuberculosis, malaria, dengue) often complicating the accurate diagnosis for the disease (Bzdyl et al. 2022). As such, a study on the burden of melioidosis has estimated approximately 165,000 cases with a mortality rate greater than 50% (89,000 deaths) globally in 2015 (Limmathurotsakul et al. 2016; Meumann et al. 2024). There is no available data reporting DALYs estimate for melioidosis from WHO GHE 2019 and GBD 2019. Up-to-date presently available global burden of melioidosis in 2015 by Birnie et al. (2019) and Meumann et al. (2024) described an estimated 4.64 million DALYs which surpassed all 16 other NTDs.

Leptospirosis is a zoonotic disease caused by a lethal bacterium of the genus Leptospira (Narkkul et al. 2021). The bacteria resides in the host’s kidney, completing its lifecycle before being shed in the urine. Molecular serotyping studies have concluded more than 250 serovars, which can be further segregated into 30 serogroups (Hagedoorn et al. 2024). Various wild and domesticated mammals are suitable host reservoirs for Leptospira spp. In the city; rodents are the most important host sources of leptospirosis infection as they can persistently shed pathogenic Leptospira spp. to the environment throughout their lifecycle without any clinical manifestations (Urbanskas, Karvelienė & Radzijevskaja 2022). According to Sayanthi and Susanna (2024), Leptospira spp. can survive for up to 20 months at 30°C and up to 10 months at 4°C. Human individuals can contract the illness through direct contact with Leptospira-contaminated urine, water, and wet soil (Sun, Liu & Yan 2020). Pathogenic Leptospira spp. infections may be asymptomatic or exhibit a variety of clinical symptoms, from acute febrile sickness to severely defined multiple organ failures, mimicking symptoms of other threatening diseases such as dengue, influenza, and malaria (Azevedo et al. 2023). There is no accurate data available regarding the burden of leptospirosis from WHO GHE 2019 and GBD 2019, hence a model study calculated that there are roughly 1.03 million incidents of leptospirosis annually worldwide, of which 5.72% (58,900) result in mortality (Costa et al. 2015; Agampodi et al. 2023). Additionally, those figures were incorporated to estimate the global burden of leptospirosis in terms of DALYs which were predicted to be at 2.90 million DALYs annually, representing incidence of 41.8 DALYs per 100,000 population (UI 18.1 – 65.5) (Torgerson et al. 2015; Wainaina et al. 2024).

Mosquito bites from infected females Anopheles spp. deliver the deadly parasites that cause malaria, which is a long-standing disease. The causative agent for malaria is a group of unicellular protozoan parasites originating from the Plasmodium genus (Sato 2021). All Plasmodium spp. are capable of infecting malaria but to a specific range of host, with P. falciparum, P. vivax, P. malariae, P. ovale, and P. knowlesi as natural vectors for malaria among humans (Boundenga et al. 2024). Furthermore, P. falciparum is the most lethal and prevalent malaria parasite in Africa, whereas P. vivax is the most common malaria infection outside of Sub-Saharan Africa (Sato 2021). In 2020, it was predicted that there were 241 million cases of malaria, resulting in 627,000 deaths (Singh et al. 2022). In the same report, malaria has posed a great threat to about half of the world’s population, with sub-Saharan Africa reporting the greatest number of cases and fatalities. The burden of malaria in the WHO African Region is disproportionately high, accounting for 95% and 96%, respectively, of malaria cases and deaths in 2020. In addition, the WHO African Region documented that children under the age of five made up over 80% of all malaria deaths in 2020, making them the most vulnerable group to the disease (Saba, Balwan & Mushtaq 2022). GBD 2019 estimated a significantly higher DALYs burden of 46.4 million years compared to WHO GHE 2019, with age-standardized rates of 667 DALYs per 100,000 (95% UI 337 – 1,150) (Vos et al. 2020).

Analysis of recent literature

Addressing these diseases requires up-to-date, robust, and comprehensive information on their presence, species-strain diversity, ecology, and environmental and geographical factors influencing their transmission. A targeted and region-specific approach is essential to effectively manage and mitigate their impact in Southeast Asia.

In this section, we explore the applications of ML tools in drug discovery and development, and surveillance and disease management for the aforementioned diseases. Next, we briefly discuss the advances of ML tools in adjacent fields of protein- and antibody-language models, cancer research, and computer vision that can be further leveraged for NTDs research and disease management. Lastly, we discuss steps taken for regional collaboration, data and infrastructure sharing within and around SEA and Western Pacific regions.

Application of machine learning tools for NTDs

The conventional approach to drug discovery is expensive and takes up a considerable amount of time. Computational approaches to drug discovery using Artificial Intelligence (AI) can resolve both concerns and speed up the process of novel drug discovery. Machine learning (ML) is a subfield of Artificial Intelligence (AI) where sets of data and algorithms (mathematical and statistical) are utilized in search of distinct patterns within the data for a more efficiently accurate downstream analysis (McComb, Bies & Ramanathan 2022). ML in drug discovery and development are carried out by looking for patterns in sets of molecules with drug- and therapeutic-properties to describe in detail their biological activities (Dara et al. 2022). Tasks such as classification (prediction of classes), clustering (grouping of similar data items), and regression (prediction of continuous values) can be performed using ML approaches (Oguike et al. 2022). The role of ML in limiting the spread of deadly diseases (such as disease forecasting, outbreak prediction, disease outbreak detection, and risk prediction) has been detailed reviewed by Alfred and Obit (2021).

We note that literature mining conducted from 2023 onwards, focusing on studies published from 2019 to the present, was carried out using Google Scholar and Public Library of Science (PLOS) on application of ‘ML tools in drug discovery and development’ resulted in articles for dengue, cysticercosis, leptospirosis, and malaria. Literature search did not return relevant articles for four other diseases, namely STHs, rabies, LF, and melioidosis. Nevertheless, literature on ‘ML tools in surveillance and disease management’ were successfully retrieved for each disease. Moreover, six of the presently listed NTDs (dracunculiasis, lymphatic filariasis, onchocerciasis, schistosomiasis, soil-transmitted helminthiases, and trachoma) can be controlled, eliminated, and prevented through recommended strategic interventions. Efforts from conducting hygiene education programs, innovative and intensified disease management, mass drug administrations (also called preventive chemotherapy), provisioning and educating the principles of safe water, sanitation and hygiene (WASH), vector control, and veterinary public health have helped speed up efforts in eliminating these diseases (Hotez & Lo 2020; Zeynudin et al. 2022). We also suspect the possible fact that since helminthic diseases (LF and STHs in this review paper) can be easily treated with long existing anthelmintic drugs, this did not spark any interest among researchers to develop ML tools in developing a new drug against it (Butala et al. 2021). Moreover, reports on significant decrease of LF and STHs infections were achieved through the combination of strategic interventions as recommended by WHO (Yajima & Ichimori 2021; Zeynudin et al. 2022). In the case of rabies, there is a vaccine available to prevent the infection to both humans and animal companions along with comprehensive surveillance. Lastly, the number of melioidosis cases are comparably lower than any of the diverse group of NTDs despite the high mortality rate of the disease which could be a possible explanation on why no research employing ML tools were invested for the disease. Despite that, the DALYs burden estimates for each of the diseases underlines a pressing need among researchers to develop accurate, sensitive, and specific surveillance, clinical diagnostic, disease detecting, prediction and/or distribution modeling to reach the elimination targets for these diseases.

Dengue

Drug Discovery and Development ~ The DENV replication process requires the NS3 protease domain, and the cofactor NS2b which is vital for substrate recognition and complex stability. Both molecules form the NS2b-NS3 protease complex which is a popular target candidate for antiviral drug study due to its key importance for viral replication. However, presently available choices of inhibitors during that time were unsatisfactory due to weak activity or low selective index towards the NS3 active site. The work of Aguilera-Pesantes et al. (2017) as reviewed by dos Santos Nascimento et al. (2022) utilized ML methods to identify potential residues and sites for drug-like molecule interaction, and bindable sites for drug development through the computational analysis approach for each amino acid in the DENV protease. They used four ML models, Random Forest (RF), Least absolute deviation tree (LAD Tree), voting feature interval (VTI), and multilayer perceptron algorithm (MLP), to classify their predicted data of (i) mutational susceptibility; (ii) residual binding site; (iii) physicochemical properties; and (iv) computational alanine scanning mutagenesis for protein binding affinity and stability predictions. Each model’s performance was measured through recall, precision, area under the receiver operating characteristics curve (AUC: area under the ROC) metrics. At the end of their study, MLP-based models yielded the best performance in properly classifying residues interaction with NS3 that would cause major change in activity, moderate change in activity, and residues with similar activity as wild type residues (Aguilera-Pesantes et al. 2017; dos Santos Nascimento et al. 2022).

The application of Artificial Neural Network (ANN) to predict Dengue-Human protein interaction type leading to development of antiviral drugs was reported by Jainul Fathima et al. (2019). They trained the model with a dataset made up of 535 non-redundant interactions between 335 different human proteins and ten dengue proteins, each of which is made up of eight attributes and 550 instances, using the Feed Forward Back Propagation Neural Network (FFBPNN) technique. As many as 12 categories of human protein interaction with dengue protein were generated to be selected for attribute selection, and then ranked based on the weights. The prediction accuracy on the test dataset was 98.05%. Two human proteins HBA1 and HSPA5 were discovered to have greater interaction with dengue virus compared to others, plus the NS3 and NS5 dengue proteins were proven to be of therapeutic drug target potential.

Khalid et al. (2020) reported a study that uses ML methods to investigate the biological activities of inhibitor derivatives anti-dengue compounds. They employed an atom-based three-dimensional (3D) QSAR modeling study along with the machine learning software Schrödinger Drug Discovery Suite Phase™ to investigate the compound’s structural features with the anti-dengue activities. As a training dataset, a homologous series of 21 newly discovered 1,3,4-oxadiazole derivatives compounds was used. Using the built-in random selection mechanism of the Schrödinger Phase™ software, the prepared datasets were then separated into training and test sets, 75% and 25% respectively. Based on the predictability of the biological activities of the test molecules, the model’s predictive capacity was found to be adequate with Q2 (R² Training Set) = 0.73 and Q2 (R² Test Set) = 0.78. The chemical structural features of the predicted models were used as a benchmark for developing novel 1,3,4-oxadiazole derivatives compounds against dengue virus (Khalid, Rao Avupati & Hussain 2020). Subsequently in the same year, Geoffrey and his colleagues (2022) employed a ML-based AutoQSAR, which encompasses feature selection, QSAR modeling, validation, and prediction to generate drug leads from PubChem database for Dengue and West Nile virus. The ML-based AutoQSAR algorithm helps to expedite virtual drug screening and identification against Dengue and West Nile virus and also perform automated in silico examination of the drug lead compounds. Readers interested in conducting 3D-QSAR modeling for their research are recommended to opt for Py-CoMFA, an open source web-based alternative (Ragno 2019; Giordano et al. 2022).

Elakkiya Elumalai (2022) published a study in which ML techniques were used to identify and classify peptides as either inhibitory or non-inhibiting of the dengue virus. A dataset of 100 peptides that have been experimentally verified to inhibit the dengue virus and 16 negative datasets from the antiviral peptides database (AVPdb), were both divided into training and testing sets with a 7:3 ratio. Eight different ML algorithm models, including Adaptive Boosting (Adaboost), Bagging, k-Nearest Neighbor (kNN), Logistic Regression (LR), Multi-Layer Perceptron (MLP), Naïve Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM), were used to compare three amino acid descriptors, Amino Acid Composition (AAC), Grouped Amino Acid Composition, Transition, and Distribution (GAAC), and Composition, Transition and Distribution Features (CTD). Five of their best models on training data reported accuracies greater than 85%. The same five models were used for testing in which two models (AAC_RF_model and AAC_k-NN_model) reported accuracy of 85.71%, whereas the remaining models are less than 80% accuracy. Both k-NN and RF algorithms implemented were validated as the best algorithm in achieving the research goal. In addition, their study discovered higher frequency of glycine (G), phenylalanine (F), and tryptophan (W) amino acids found in dengue virus inhibitory peptides (Elumalai 2022).

Surveillance and Disease Management ~ Dengue surveillance is crucial for detecting outbreaks and monitoring disease incidences. Increasing the number of surveillance traps that capture eggs (ovitraps) and ovipositing females (gravid traps) with appropriate larvicide and mosquitocide (Selvarajoo et al. 2022). This is to prevent hatching of eggs or any subsequent production of mosquitoes inside the trap. This method is a two-pronged approach allowing authorities to survey the incidences and population of mosquitoes as well as for vector control. However, counting of both traps requires a group of individuals (insectaries) that possess considerable degree of skill to count eggs and sampling specific stages of mosquitoes (Tsheten et al. 2021). Other than vector surveillance, insecticide resistance in the vector population should be identified via susceptibility bioassays, which should be performed by the governing authorities. Observation of the phenotypic response of mosquitoes’ post-exposure to insecticides should be a sufficient metric in determining presence of insecticide resistance.

Control of dengue is mainly achieved through the cooperation of all walks of life. Common strategies include removal of mosquitoes breeding sources, eliminating container habitats that would collect water that are favorable for oviposition sites and development of mosquito larvae, killing of larval or pupal mosquitoes by applying environmentally friendly insecticides, and usage of spatial repellents (Srisawat et al. 2022). Various strategies of sterile insect technique (SIT) all aimed at causing decline of targeted insect population through the release of sterilized male insects were proven to be a significant measure in controlling mosquito populations (De Castro Poncio et al. 2021; Hugo et al. 2022; Ranathunge et al. 2022). Despite these advances, dengue infection remains largely uncontrollable in both rich and poor populations due to several factors. This was seen in Singapore’s 15 year-long intensive vector surveillance from the mid-1970s to the late 1980s, where low incidences of dengue were reported. In 1990 onwards, the country faced repeating cyclical epidemics where the largest epidemic occurred back in 2013 with over 22,000 cases despite the continued investment of US$50 million in vector control annually (Molyneux 2019). Hence, there is a need for improving active surveillance aspects and developing drugs for better disease management.

With respect to the three types of dengue infection that may manifest among patients, Hoyos, Aguilar, and Toro (2022) employed various ML decision support systems to develop an autonomous cycle of data analysis tasks (ACODAT) to help medical personnel in clinical disease management of dengue cases. Large population dataset of approximately 70,000 patients was utilized to train the tool in verifying patient data information, classifying the type of dengue a patient has contracted, and listing the best treatment accordingly. The authors used a MLP with a single layer in the case for ANN and classifier version of SVM. Both ML-based classification models achieved accurate dengue type classification (> 0.97). Then, a genetic algorithm (GA) was utilized to compute information from the classification step to generate the best treatment plan for the patient.

Two regression-based ML models of multiple linear regression (MLR) and support vector regression (SVR) were employed to predict dengue incidences using information of hospitalized dengue patients, metrological and socioeconomic (Dey et al. 2022). From the study, the SVR models showed higher prediction accuracy of 75% with mean absolute error (MAE) value of 4.95, whereas the MLR model displayed an accuracy of 67% and MAE value of 4.57. Moreover, the models were able to show a positive correlation between the relationship of rainfall index with the incidences of dengue.

Panja et al. (2022) built an ensemble wavelet neural network with exogenous factor (XEWNet) dengue outbreak forecast model based on climatic conditions, capable of performing 75% better when forecasting short-term (26 weeks) and long-term (52 weeks) dengue incidences in subtropical regions that experience moderate to heavy rainfall throughout the year. In their study, a maximal overlapping version of discrete wavelet transformation (MODWT) algorithm in interpreting the time-dependent wavelet and scaling coefficients of rainfall and dengue interrelationship. The authors reported the statistical model employing the auto-regressive neural network (ARNN) in evaluating root mean square error (RMSE) and MAE gave the best forecast accuracies of short-term and long-term dengue outbreaks simultaneously.

Subsequently, Nguyen et al. (2022) developed a dengue fever prediction model employing the DL technique of attention-enhanced long short-term memory (LSTM-ATT) model. In Nguyen’s study, they integrated more environmental variables irrespective of time-lagged (namely evaporation, humidity, rainfall, sunshine hours, and temperature) together with DF incidences for their dengue forecast model compared to Panja’s and colleague’s work. The authors proposed LSTM-ATT to be the best performing model when compared to CNN and regular LSTM because of having integrated an additional step of attention mechanism layer right after the LSTM network step (Zhang, Yang & Zhou 2021). When compared against the modern (CNN and LSTM) and traditional prediction models, the LSTM-ATT displayed better prediction performance in terms of low RMSE and MAE values in more than half of their geographical study locations (Nguyen et al. 2022). When selected for outbreak prediction evaluation, the LSTM-ATT model was able to distinguish months of an outbreak from normal at an average accuracy score of 0.99, and the average sensitivity score in detecting dengue outbreak months of 0.70.

Soil-transmitted Helminths (STHs)

Surveillance and Disease Management ~ Since no vectors are involved in the transmission of STHs illnesses among humans, surveillance programs can be separated into (i) prediction- and mathematical-based models covering transmission model, estimating of population at risk, predicting regions in need of MDA interventions, and (ii) active surveillance approach (Chong et al. 2022; Mogaji et al. 2022). Important factors that determine a population’s sensitivity to STHs include soil and stool samples (Oyewole & Simon-Oke 2022). The former includes usage of algorithms and large data inputs to make precise decision making of incidences and prevalence for STHs in endemic areas. In contrast, the latter involves the expert skills of laboratory staff and technicians to conduct microscopy-based and molecular-based detection methods to detect presence of helminths and accounts for disease surveillance at the same time. Microscopy-based methods, such as the Kato-Katz and McMaster techniques, are primary diagnostic tests to detect parasites by enumerating the eggs-per-gram (EPG) metric. Although these methods are cheaper, results may vary with an increase of sample size and different survey sites (Afolabi et al. 2022). Hence, molecular-based assays such as PCR, real-time PCR and digital PCR, loop-mediated isothermal amplification (LAMP), and cell-free DNA detection provide a more sensitive, less labor-intensive, and high-throughput detection method despite incurring additional costs for detection and surveillance programmes (Manuel, Ramanujam & Ajjampur 2021; Vegvari et al. 2021).

Measures taken in managing cases of STHs are shared among other helminth infections. Implementing MDA programs targeting high-risk groups in endemic tropical and subtropical areas has been recognized to be effective in eliminating STH globally (Alemu et al. 2022). In underdeveloped areas with poor facilities but endemic with STHs, adopting the principles of WASH by providing adequate sanitation, improving waste management facilities, plus public education on hygiene practices and behavioral changes targeted to populations at risk would accelerate the elimination goal (Monnier et al. 2020). As for those diagnosed with STHs, proper provision of treatment (stronger combination of antibiotics or surgery to remove the worms) is essential in ensuring a good universal health coverage (Abela-Ridder et al. 2020).

Identifying disease predictors for frequent incidences of NTDs co-infections (malaria and STHs) were conducted through a gut microbiota study (Easton et al. 2020). RF models were employed to account or classify for T. trichiura egg counts and P. vivax parasitemia. A 10,000 tree RF regression model with 1,345 variables at each split was used to forecast T. trichiura egg counts, while a 10,000 tree RF classification model with 1,346 variables at each split was utilized to assess P. vivax parasitemia. Accordingly, the models reported predictor variables of transforming growth factor β and bacteria taxa when predicting for T. trichiura egg counts or intestinal helminth burden and incidence of P. vivax parasitemia, respectively. The complexities caused by the co-infection are interesting, but the authors noted that longitudinal interventional studies (antimalarial and deworming treatments) are needed to further support or validate the reported results.

Dacal et al. (2021) developed a computer vision platform that aids in quantifying T. trichiura infection. From 51 Kato-Katz stool sample slides containing 949 Trichuris spp. eggs, these images were used to train and test the CNN algorithm for automatic assessment. The algorithm showed a mean precision of 98.44% and mean recall of 80.94%. Expanding their model in identifying other helminth eggs, they included positive egg samples for both Trichuris spp. and Ascaris spp. from a co-infection individual and obtained mean precision of 94.66% and mean recall of 93.08%.

Subsequently, Ward et al. (2022) proposed an AI-based digital pathology (AI-DP) device that is tasked for automated scanning and detection of helminth eggs from fecal samples prepared via the Kato-Katz technique. Images from the Kato-Katz technique stool smear slides were collected, annotated for helminths eggs, and then continued with AI training and evaluation. The authors employed the CNN technique as well for their annotated training set. About 90% of the 16,990-image annotated STHs eggs were used to train the DL-based detection model, and then tested with the remaining 10% as test set. Overall, the AI-DP was able to achieve both weighted average precision and weighted average recall of greater than 94%.

STHs epidemiological risk modeling employing Extreme Gradient Boosting (XGBoost) and Shapley Additive explanation (SHAP) as a means of STHs surveillance (Scavuzzo et al. 2022). A dataset of hookworm infection, environmental variables, and socioeconomic characteristics were supplied to the XGBoost model for analysis in order to model the risk of STHs infection. SHAP was utilized to understand the importance of variables for predictions in the trained model. The final XGBoost model’s findings outperformed the conventional statistical models that were compared in their study based on the performance metrics of R² and Mean Square Error (MSE).

Addressing the issue of complex and time-consuming manual diagnosis of STHs infections, fuzzy c-Mean (FCM) and CNN segmentation technique (ML- and DL-based, respectively) for surveillance of human intestinal parasite ova segmentation were conducted (Lim et al. 2022). Under the direction of parasitologists, a total of 166 pictures for each species were correctly assembled in order to train both ML-based and DL-based segmentation approaches in identifying intestinal STHs ova. Both segmentation technique models were able to accurately predict helminth species at 97% (FCM) and 100% (CNN). According to a further assessment of the segmentation identification performance (Intersection over Union, IoU) of the two models, the CNN segmentation technique yielded better results than the FCM segmentation technique approach.

Rabies

Surveillance and Disease Management ~ Majority of rabies surveillance and monitoring is done by promptly identifying animals exhibiting possible rabies clinical indications, documenting the background of recently deceased companion animals, and keeping track of dog bite incidents (Jane Ling et al. 2023). In response to the 2017 rabies outbreak that occurred in Malaysia, the authors have listed in detail rabies preventive measures, and control procedures for outbreak. Generally, the best method to control cases of rabies is to visit the nearest healthcare or veterinary services to get pre-exposure prophylaxis for both you and your companion animals. Rabies surveillance-diagnosis programs are challenging as the gold standard for rabies lyssavirus detection is direct diagnosis with brain tissue. Next, efforts in controlling the disease are troublesome when incidences of free roaming stray dogs and cats are considered a norm, plus close vicinity to wildlife habitat. Hence, most countries would only initiate a mass dog vaccination program to control the outbreak and to curb the transmission (Molyneux 2019).

Surveillance programs for rabies were proven to be challenging as rabies-infected (rabid) animals can only be identified when clinical symptoms manifest and often lead to death. Hence, drastic actions are needed to reduce rabies mortality. As children are more vulnerable to animals, education is crucial in preventing mortalities from rabies. The death rate among children is significantly decreased by teaching people how to avoid getting bitten and what to do if they fear they have been bitten by a rabid animal. The viral load in the bite wound can be considerably reduced by cleaning it as soon as possible. To break the cycle of rabies transmission, implementing mass vaccination for dogs by health authorities has been recognized to be more cost-effective and protects the well-being of livestock and humans at the same time. As mentioned before, People at high risk of exposure to the rabies virus, such as veterinarians, laboratory workers who work closely with the rabies virus, and those who have been bitten by a potentially rabid animal, are strongly advised to get vaccinated (pre- or post-exposure prophylaxis) (Abela-Ridder et al. 2020).

Saleh, Medang and Ibrahim (2020) conducted a comparison study on a rabies outbreak prediction model employing deep learning with long short-term memory (LSTM), a type of recurrent neural network, compared to the autoregressive integrated moving average (ARIMA) traditional algorithm model. Predictive capabilities of both models were tested against a dataset of one thousand rabies samples obtained from HealthData.com, and performance metrics were assessed based on RMSE and accuracy. The authors reported lower RMSE value (2.04) and greater accuracy (97.3%) displayed by the LSTM model, compared to the traditional ARIMA model performance of 3.12 RMSE value and only 72.1% accuracy.

In surveilling the zoonotic potential of novel viruses found in vampiric bats, Bergner et al. (2021) conducted the study while utilizing two ML models that were built by Mollentze, Babayan and Streicker (2021). Novel virus zoonotic potentials were evaluated based on a phylogenetic neighborhood model, followed by a genome composition-based model. These ML models were developed by Mollentze and colleagues employed gradient boosted machine (GBM) classifiers to predict 100 best models out of 1000 iterations (Mollentze, Babayan & Streicker 2021). Bergner and colleagues reported the rabies virus as the only known zoonosis detected from the bats and suggested for molecular surveillance as a measure for the rabies outbreak. As published by Bergner and colleagues, they concluded the genome composition-based ML model worked best (have greater accuracy) in predicting zoonotic potential among novel viruses found in bats (Astroviridae, Coronaviridae, Hepeviridae, Picornaviridae, and Reoviridae) as they were able to gain valuable insights of viral information allowing researchers to prioritize potentially zoonotic novel viruses compared to the phylogenetic neighborhood model.

As dogs (domestic and stray) are largely responsible for the transmission of rabies virus to humans, Thanapongtharm et al. (2021) employed a ML-based random forest algorithm surveillance study on the spatial population of dogs. The goal of the dog distribution and population RF model was to determine how dog populations and environmental and human population variables interacted. Two RF models were developed by the authors. The first (quantitative RF model) selected for grids with presence of dogs and were modeled as quantitative RF before evaluating the predictive power of the models with correlation coefficient and RMSE metrics. In the second model (binary RF model), grids were defined into presence for dogs or absence for dogs (represented by 1 and 0, respectively), and assessed the predictive power with a correlation coefficient and AUC. The models were able to accurately predict the distribution of owned dogs to stray dogs at a ratio of 6:1 (numerical figures; 12,027:1,868), with approximately 75% of the stray dogs being feral. Moreover, human population factors such as communities, human population density, and proximity to religious praying sites had a high correlation with the number of stray dogs. Association results on the population and spatial distribution of stray dogs can lighten the burden of governing bodies in better managing dog vaccination campaigns to achieve elimination targets of rabies.

Cysticercosis

Drug Discovery and Development ~ A multi-epitope chimeric vaccine design study which targets T. solium membrane proteins reported promising results of cellular and humoral immune response stimulation, plus providing protection against both taeniasis and neurocysticercosis (Kaur et al. 2020). Various ML algorithms were utilized in creating the vaccine. The authors used SVM modules to categorize allergic and antigenic proteins based on amino acid and dipeptide composition, the Hidden Markov model to predict B-cell epitope antigenic determinants, and ANN to detect proteasomal C terminal cleavage and T-cell epitopes. Five suitable cell membrane peptides were reported from their vaccine study, capable of stimulating required immune responses against taeniasis and neurocysticercosis.

Surveillance and Disease Management ~ A virtual meeting convened by WHO aimed at reviewing existing diagnostic tools for T. solium before implementing them as public health programs to control the disease (World Health Organization 2022a). In the context of a programmatic surveillance, the WHO recommended inclusion of specific communities or villages that are of a wider geographical area. Purposive sampling should be implemented to target high-risk humans and pigs’ interactions when the community lives in close contact with pigs or pigs roam freely where sanitation is inadequate. All means of diagnostic mapping and monitoring of T. solium presence in humans and pigs did not achieve the required sensitivity due to confirmatory methods via microscopy (humans), meat inspection, and serology testing. Diagnostic tests for public health programs are not well-suited as they are not commercially available and of unsatisfactory sensitivity and/or specificity. Due to the low sensitivity of currently available tests, the WHO urged for an appropriate response even if it’s a weak signal of prevalence detected. Preventive chemotherapy (PC) interventions should be initiated with confirmation of key risk factors of roaming pigs and inadequate sanitation.

According to WHO, it is possible to completely eradicate cysticercosis as a public health issue. as interventions in disease management are feasible and achievable. Beginning with T. solium in pigs, improving the well-being quality and management of pig husbandry can effectively break the transmission cycle. Such actions are such as vaccination programs, anthelmintic treatment for pigs, and proper set up of enclosure habitats to prevent access to human feces. To curb incidences of taeniasis and cysticercosis among humans, practice of proper WASH principles along with improved food safety and hygiene standards, as well as sufficient sanitation for the safe disposal of excrement would significantly reduce chances of contracting the diseases. In addition, community health education, MDA interventions, and appropriate case management for taeniasis (medical prescriptions) and NCC (surgery) would aid in removing the disease status as a public health concern (Abela-Ridder et al. 2020).

A study on the cysticercosis diagnosis in pigs using proteomic information from tissues rich in antigens, which was assessed using the leave-one-out cross validation method (Navarrete-Perea et al. 2017; Garcia et al. 2020). Protein extraction and purification experiments were conducted for the extracted T. solium cysts from the central nervous system and skeletal muscles of infected pigs. Generated proteomic information was then used to train the ML-based model in distinguishing between cysticercosis infected and uninfected pigs. The ML-based model was able to successfully classify cysticercosis infected pigs from non-infected ones, plus displaying similar results to the complex crude cysts extracts technique. When testing their model on human cysticercosis patients, they reported satisfactory performance of their model but noted that the model’s sensitivity was only at about 75%, and its performance could be improved with a new protein mixture dataset suited for human diagnosis.

Greater number of free-roaming pigs have a positive correlation with the transmission rate of infectious diseases, while not knowing the activity that the pigs had gone through. In tackling such matters, a body harness monitoring device has been developed and tasked in reporting the location and activity of free-roaming pigs which can provide substantial information in understanding where and how the pigs might have gotten the infection plus identifying the place of infection (Haladjian et al. 2017). Here, three ML models (linear discriminant, KNN, and SVM) were employed to classify the activities (walking, eating, and resting) of free-roaming pigs and were then validated through the 10-fold cross validation technique. The authors noted the SVM-model outperformed the other two ML-based models with accuracy, precision, and recall of 95.8%, 75.4%, and 86.6%, respectively.

Lymphatic filariasis (LF)

Surveillance and Disease Management ~ Establishment of GPELF by the WHO had set the goal in managing the transmission of LF infections via MDA of anthelmintics and alleviating the sufferings of people affected through morbidity management and disability prevention (MMDP). Surveillance programs for LF were conducted through sentinel and spot-check community surveys. Periodic transmission assessment survey (TAS) measures the impact of MDA interventions and to determine if the level of infection decreases below a target threshold (World Health Organization 2022b).

MDA intervention to halt the transmission of infection through WHO-recommended prescriptions of albendazole, diethylcarbamazine, and ivermectin are strategies for managing LF disease. Usage of insecticide-treated bednets is recommended as a means of vector control in household environments. Application of WASH could be used to guarantee correct sanitation procedures to limit vector breeding sites and hygiene treatment of afflicted limbs for morbidity control. As LF is known to cause significant physical impairment, health authorities must ensure essential care is given to patients. Examples include skin care, exercise, and elevation to stop the progression and severity of lymphedema, treatment for adenolymphangitis flare-ups, hydrocele surgery, and encouraging community cooperation to finish the course of treatment and deal with its physical and psychological repercussions (Abela-Ridder et al. 2020).

Analysis of epidemiological and socio-economic data to predict LF were conducted by employing ML techniques such as classification and regression tree (RT), gradient boosting machine (GBM), J48 algorithm, JRip algorithm, logistic model tree (LMT), probabilistic neural network (PNN), and NB (Kondeti et al. 2019). In order to eliminate biases that are present in the dataset, the authors combined socioeconomic and epidemiological data before using feature selection and gain ratio feature selection to pick out pertinent features for the prediction model. The data is then partitioned (training, testing, and validation) and experimented under the 10-Fold Cross Validation framework while applying oversampling and undersampling methods to balance the dataset. The performance of the ML-based prediction models was then assessed by sensitivity, specificity, AUC, and accuracy criteria. According to the authors, the J48-based prediction model produced an AUC value of 62% and 23 additional classification rules based on six features, whereas the NB-based prediction model produced the best sensitivity and AUC (64%) results when using gain ratio feature selection and at 400% oversampling. The development of early warning systems to better apply prevention and control measures in managing LF disease within the community are a few of the benefits from both these ML-based prediction models.

Elvana and Suryanto (2022) trained a CNN-based model with the Image Processing and Data Augmentation approach to identify parasitic worms on a dataset of 210 microfilarial images. With models like VGG-16, ResNet-50, and Inception-v3 that had previously been trained with a simple 8 Convolutional layer CNN model, the authors performed transfer learning for the CNN model. The CNN was able to recognise LF worms from digital photographs with a 70% accuracy rate, even in the presence of noisy images of blood cell images during the training process.

A recent study by Dickson et al. (2022) investigated the potential of diagnostic testing scenarios surveillance after MDA campaigns against LF in detecting transmission and prevalence of the disease using a Bayesian network framework. The effectiveness of several infection markers in detecting signs of transmission was assessed by using a Bayesian network framework using antigen- and antibody-based data (Wb123 Ab and Bm14 Ab). The algorithm’s ability to compare the probability of a missed positive LF result with various diagnostic testing situations and evaluate the impact of numerous participant characteristics led the authors to choose a Bayesian network analysis in this case. The network performances were evaluated in a criteria of sensitivity, specificity, True Skill Statistics, and AUC. According to the Bayesian network model, a sizable fraction of LF-positive cases went undetected by antigen- and antibody-based tests on their own. The most sensitive indication of present or previous LF infection diagnosis came from antigen-antibody combination testing (antigen plus Bm14 Ab). Hence, to increase the sensitivity of transmission surveys and prevent sudden and premature termination of MDA campaigns against LF, the combination of antigen plus Bm14 Ab were proposed for inclusion in post-MDA surveillance.

Melioidosis

Surveillance and Disease Management ~ A ML-based Raman spectroscopic assay was developed to identify B. pseudomallei and Burkholderia mallei strains (Moawad et al. 2019; Scoffone et al. 2021). To train the algorithm, 12 B. mallei strains, 13 B. pseudomallei strains, and 11 other Burkholderia spp. strains were prepared. Physical recording of the Burkholderia spp. Raman spectra were analyzed by the SVM algorithm together with information of the principal component during the Raman spectroscopic assay preprocessing step. The SVM Raman spectroscopic assay was also trained to produce three Burkholderia spp. classification models of (i) pseudomallei-mallei-thailandensis complex from cepacian-glathei-phytofirmans complex, (ii) identifying species of B. mallei, B. pseudomallei, and B. thailandensis accurately from one another, and (iii) identifying species of joined B. cepacian complex and B. glathei from B. phytofirmans. In each of the models, the SVM had identified the assigned bacterial complex and species accurately (>90%), except for the identification of cepacian-glathei-phytofirmans complex group, and B. thailandensis (65%). When validating the performance and sensitivities of the SVM-based Raman spectroscopic assay with unknown Burkholderia strains, sensitivities greater than 80% were obtained.

Xu et al. (2021) developed a SVM-based model tasked in detecting clinical septicemic melioidosis infection. Obtaining their data from the human peripheral blood microarray dataset, as many as 69 patients with septicemic melioidosis and a mix total of 175 non-septicemic melioidosis (healthy, type2 diabetes, recovered from melioidosis, and septicemic from other organism) were used to train the SVM-based detection model. When testing against the instance of detecting B. pseudomallei from a mixed group of healthy, type 2 diabetes, and recovery dataset, the SVM classifier yielded sensitivity and specificity of 0.988 and 1.000 respectively. When testing against the instance of detecting B. pseudomallei from other infection dataset, the SVM classifier yielded sensitivity ranging 0.857 to 1.000, and specificity of 0.889 to 1.000. A last validation of B. pseudomallei detection from combination of all health data plus modified infection dataset generated mean sensitivity and specificity of 0.962 and 0.979 respectively.

Leptospirosis

Drug Discovery and Development ~ Abdullah et al. (2021) studied the identification of a suitable Leptospira spp. multiepitope-based vaccine candidate which utilized two ML programs, namely Vaxign-ML and C-ImmSim. In his study, all protein antigens have a protegenicity score greater than 90% signifying as effective antigens for vaccine developments, and simulations from C-ImmSim showed diverse immune reactions of the vaccine construct indicating promising subunits of multiepitope vaccine candidate for immunity against Leptospira spp. Infections.

Vaxign-ML is a reverse vaccinology (RV) tool that uses supervised ML to predict the rank score (also known as protegencity) of bacterial protective antigens (BPAgs) using a training set of viral and bacterial antigens (Ong et al. 2020). Out of five additional ML techniques used to develop Vaxign-ML, extreme gradient boosting (XGBoost) was found to be the most effective when using nested 5-fold cross-validation (N5CV) and leave-one-pathogen-out validation (LOPOV) evaluation methods. Set as the benchmark against five other existing programs and methods, Vaxign-EGB-ML displayed satisfactory results outperforming four programs. Final validation on external data sets of clinical trials or licensed vaccines reported ranked calculation of best top 10% BPAg candidates for 20 proteins. Next, the immune simulation study server C-ImmSim uses position-specific scoring matrices and machine learning techniques to seek peptides with epitopes and other immunological interactions (Rapin et al. 2010; Ong et al. 2020). The program combines a mesoscopic scale simulator of the immune system with a set of agent-based class computational models to predict molecular-levels of major histocompatibility complex-peptide binding interactions and neural networks for prediction of epitopes.

Surveillance and Disease Management ~ To investigate the spatial distribution of human leptospirosis, Mohammadinia et al. (2019) have employed the ANN, geographically weighted regression (GWR), generalized linear model (GLM), and SVM approaches to model and predict the disease based on environmental parameters of temperature, precipitation, humidity, elevation, and vegetation, which has also been further reciewed and expanded upon by Guo et al. (2023). All four models were assessed based on mean square error (MSE), mean absolute error (MAE), mean relative error (MRE) and R². The authors reported that the GWR-based model displayed the best performance in the prediction of leptospirosis, followed by SVM, GLM, and ANN. It was also discovered that temperature and humidity parameters had a great influence on the distribution of leptospirosis among humans.

Predictive risk maps of leptospirosis distribution employing SVM and MLPNN algorithms were conducted by Ahangarcani et al. (2019). With further analysis and review provided by Bradley & Lockaby (2023), they evaluated the model’s performance using the Kappa coefficient and AUC metrics, they looked at the association between altitude, average humidity, average temperature, days below 0°C, land cover, rainfall, slope, and leptospirosis incidents from the previous year. Incidences of leptospirosis in prior years were positively correlated with rainfall, average humidity, and average temperature, but negatively correlated with altitude, slope, land cover, and days below 0°C. Both SVM- and MLPNN-based predictive models displayed satisfactory results with Kappa coefficient and AUC greater than 83% and 0.84, respectively.

A subsequent study in identifying the relationship between the occurrence of leptospirosis with exploratory data analysis of temperature, rainfall, and relative humidity (Rahmat et al. 2020). The authors used an ANN method using back-propagation training, optimization of hidden layers, and hidden nodes to categorize a combination of selected features into determining the presence or absence of diseases. Performance measurement of the ANN-based leptospirosis prediction is evaluated by the model’s accuracy, sensitivity, and specificity. The ANN model produced the maximum accuracy, sensitivity, and specificity of 84.0%, 86.4%, and 79.3% when measuring the robustness of the model (AUC metric) using a randomized dataset. Additionally, it was reported that using exploratory data methodologies improved the leptospirosis predictive model’s accuracy from 13.3% to 31.3%. The weekly average temperature and weekly rainfall total amount at lags of 16 weeks and 12 to 20 weeks, respectively, were found to have a significant link with the incidences of leptospirosis.

Malaria

Drug Discovery and Development ~ A recent study by Mswahili et al. (2021) developed and compared the performance of five ML models to predict antimalarial bioactivities against P. falciparum. They trained ML models of artificial neural network (ANN), SVM, RF, extreme gradient boost (XGB), and LR over a data set of 4,794 antimalarial drug candidate compounds (2,070 active and 2724 inactive molecules). The Recursive Feature Elimination (RFE) wrapper-based algorithm that treats feature selection as a search problem and the K-best filter-based algorithm that selects potential features according to a particular function were chosen as feature selection algorithms for performance examination and comparison. K-best was adopted as an accuracy metric whereas RFE was viewed as an efficiency metric. Based on the two metrics, they found that XGB, ANN, and RF models gave the best three accuracies in finding new antimalarial drug formation without losing too much precision.

Apichat Suratanee and colleagues reported the use of four ML classification algorithms, namely NB, NN, RF, and SVM, to investigate protein-protein interaction (PPI) networks for human and malarial parasites retrieved from STRING database (version 11.0), in order to identify new human proteins associated with malaria as a means of developing additional drugs against the disease (Suratanee, Buaboocha & Plaimas 2021). A total of 12,038 human proteins with 313,359 interactions and 1,787 P. vivax proteins with 11,477 interactions were used to train the ML models. While examining the five topological features, including (i) betweenness centrality, (ii) closeness centrality, (iii) degree, (iv) eccentricity, and (v) Kelinberg’s hub centrality, they built a heterogeneous network connecting human-human protein interactions and P. vivax-P. vivax protein interactions with the human-P. vivax protein associations. Next, they applied ten 10-fold cross-validations to each algorithm to produce performance metrics of a ROC curve with an AUC, with the RF method coming out on top (AUC of 0.85) and being followed by the NN, SVM, and NB algorithms (AUCs of 0.79, 0.77, and 0.74 respectively). By calculating the top-ranking score for each human protein using the RF classifier’s greatest performance and results, the authors were able to acquire 411 human proteins. Subsequent functional annotation of the proteins revealed previously reports of promising candidates for multistage targets for malaria therapy.

Utilizing solely the peptide sequence data, an interpretable scoring card system was employed to pinpoint the antimalarial activity (Charoenkwan et al. 2022). The authors trained an iAMAP with SCM-based predictor with eight other conventional supervised classifiers of decision tree (DT), k-nearest neighbor (KNN), logistic regression (LR), multilayer perceptron (MLP), naïve Bayes (NB), partial least squares regression (PLSLR), random forest (RF), and support vector machine (SVM) with nine conventional feature descriptors namely amino acid index (AAindex), PCP, amino acid composition (AAC), composition, transition and distribution (CTD), CTD-composition (CTDC), CTD-distribution (CTDD), CTD-trancomposition (CTDT), dipeptide composition (DPC), and tripeptide composition (TPC). The training data set consists of 139 positive and 2,135 negative molecules and the test set of 139 positive and 677 negative molecules. The 10-fold cross-validation approach was used to refine the program after initially estimating the propensities of 20 amino acids and 400 dipeptides. Then, estimated propensities were used to choose significant physicochemical features, and the 400 dipeptides’ best propensities were used to construct the final prediction model (iAMAP-SCM). Scoring based on maximum accuracy (ACC) and Matthew’s coefficient correlation (MCC), iAMAP-SCM was reported to achieve scores of 0.957 and 0.834 respectively and outperformed the other three classifiers employed, when the model was screened independent test datasets for validation. The application of AI and ML methods to drug discovery and development of malaria have also been reviewed in several recent papers (Oguike et al. 2022; Winkler 2021).

Surveillance and Disease Management ~ Classification of clinical malaria using ML approaches based on hematological parameters is beneficial in improving disease management (Morang’a et al. 2020). The authors employed six approaches to accurately categorize malaria outcomes into uncomplicated malaria, non-malarial infections, and severe malaria. ML algorithms in this study includes ANN, DT, multiple adaptive regression splines (MARS), partial least squares logistic regression (PLSLR), RF, and SVM were all fine-tuned into their best performance state according to the algorithm’s kernel. ANN algorithm was further developed into three different models used for multi-classification (among all three malaria categories) and two binary classification (between two malaria categories) of the three categories, and then proceeded with RF algorithm for confirmation of clinical malaria category classification. To measure the performances of each model, metrics such as the accuracy, AUC, confusion matrix, F1 score, and precision and recall were used. All the ML-based models were able to accurately (0.78 to 0.86) classify clinical malaria from non-malarial infections, with SVM and ANN generating the best overall classification outcome. According to an assessment of ANN classification models for clinical malaria, it can distinguish between simple malaria, non-malarial illnesses, and severe malaria with an accuracy better than 0.8 and diagnostic capacities (measured by AUC) greater than 0.86. The three models created by ANN were subjected to RF analysis, which revealed that all three models had accuracy levels higher than 0.76. Platelet counts and red blood cell counts were found to be the most crucial features for categorizing the clinical malaria categories.

Okagbue et al. (2021) employed six ML algorithms, namely Adaboost, DT, kNN, LR, NN, and RF, to build malaria diagnosis models using a sample size of 337 composed of age and sex data, and 15 disease symptoms. Performance of all six models in correctly classifying positive-for-malaria from negative-for-malaria, are as follows; 68.2% (LR); 71.8% (kNN); 89.6% (DT); 92.6% (RF); 95.8% (NN); and 100% (Adaboost). Investigating the effects of including age and sex data on the performance of all six classification models, a second run solely utilizing 15 symptoms was conducted by the authors. A slight decrease in precision and accuracy performance were noted across all models. Performance for all six models remains unchanged with Adaboost being the best performing model and LR performing the least, as ranked previously in ascending order. Here, the Adaboost-based model was still the best performing model with classification accuracy of 98.2%, precision of 96.6%, and error rate of 1.8%. The authors noted that the Adaboost-based model agrees prediction accuracy from similar studies (Alzheimer’s diagnosis from MRI scan, breast cancer diagnosis, diabetes diagnosis, and prostate cancer diagnosis), and results outcome with the inclusion of sex and age data generate better AUC metrics with zero error-rate (misclassification of positive- or negative-for malaria).

A recent study employing ML-based models to predict risk for malaria based on mutation location were published by Tai and Dhaliwal (2022). In this research, researchers examined genetic variation data from 20,817 people from the Malaria Genomic Epidemiology Network (MalariaGEN). Based on 104 feature sets of malaria genetic markers, three ML-based models of LightGBM, Ridge Regression, and SVR were built to predict malaria risk. LightGBM was the top-performing ML-based model (MAE score of 6.39E-01 on all 104 features), and it also outperformed the other two ML-based models when it came to performance results when predicting with fewer features information. Additionally, 50% fewer features (52 features are enough to replace the 104 features used in the previous malaria risk prediction) were reported to be sufficient in predicting malaria risks.

In summary, nationally representative survey programs suited for the geographical and environmental etiological factors for each respective country. For example, DHS or demographic and health surveys, could provide a suitable platform for ongoing disease surveillance programs. Picking the proverb “prevention is better than cure”, core strategic interventions together with disease management will better facilitate in eliminating the prevalence and transmission of diseases, and at the same time decrease the morbidity and mortality inflicted.

Comparisons with ML application in cancer research, computer vision, protein language models

The majority of studies employing ML models to discover novel drug candidates for NTDs (but not all), have been published in the past two decades. Similarly, applications of ML in cancer research have been in practice since the early 2000s (Bertsimas & Wiberg 2020). Research domains where ML-based methods can be employed in cancer biology includes genomics, proteomics, metabolomics, epigenetics, transcriptomics, and system biology (Kourou et al. 2021; You et al. 2022). In the computer science bibliography of the Digital Bibliography and Library Project (DBLP) and the biomedical repository PubMed, literature mining on ML-based studies on cancer diagnosis, patients’ classification, and prognosis (excluding reviews and technical reports) between 2016 and 2020 were able to retrieve 921 and 165 studies, respectively.

Despite the advances in chemotherapy and immunotherapy, early detection of cancer increases one’s survival rate tremendously. Technological innovations have sprouted a new branch of AI known as computer vision (CV), which will significantly lighten the burden of physicians and radiologists when it comes to interpreting an MRI or histology slide for the presence of a tumor. Successful early diagnosis of breast cancer using convolutional neural network (CNN) to analyze histopathological images were reported, with additional validation from other researchers of promising and accurate diagnostic capabilities in analyzing imaging slides with the use of deep CNN architectures. Compared to detection at later stages, the cancer would have metastasized, spread to vital organs where surgery may not be feasible. As reviewed by Kourou et al. (2021), most cancer ML-based studies on cancer detection and diagnosis centered around developing DL architectures of automated diagnostic models aiding radiologists and physicians to handle and better identify or characterize imaging data (input) from computed tomography (CT), magnetic resonance imaging (MRI), X-ray radiography, and positron-emission tomography (PET). Another short example is the development of an image-based lung cancer detection model, whereby a region-based CNN model trained with 42,290 whole-CT lung scans has outperformed the average radiologists at malignancy risk-prediction and achieved AUC score greater than 95% when validated with 1,139 clinical cases (Ardila et al. 2019; Syed & Khan, 2022). An overview of the application of AI in identifying cancer targets and drug discovery has been reviewed (Alqahtani 2022; Shao et al. 2022; Taylor 2020; You et al. 2022).

Advances in the field of AI, typically ML and DL methods, were utilized to develop language models to predict proteins. Algorithms from these methods were employed to process the efficiency and quality of the natural language processing (NLP). To develop a protein language model (PLM), large text (protein sequences from large databases) is given as input to train the prediction of masked or missing amino acids (Bepler & Berger 2021; Ofer, Brandes & Linial 2021; Rives et al. 2021). Literature findings on advances of protein embeddings displayed great performances in predicting secondary structure and subcellular location comparable to other methods that employ evolutionary information from MSA inputs, substituting sequence similarity for homology-based annotation transfer, and predicting mutational effects on protein-protein interactions (PPI) (Alley et al. 2019; Heinzinger et al. 2019; Littmann et al. 2021; Stärk et al. 2021; Zhou et al. 2020).

Variant Effect Score Prediction without Alignments (VESPA) is able to predict sequence residue conservation and single amino acid variants (SAV) almost as comparatively accurately to other existing methods (DeepSequence, ESM-1v, and GEMME) without employing multiple sequence alignment (MSA) approach to learn more about the functional, structural, sequence conservation and evolutionary information that the organism or gene underwent (Marquet et al. 2022). PLM-based structure prediction models, such as AlphaFold2 (AF2) and RoseTTAFold, successfully solved the atomic-resolution structure prediction problem by using MSAs and templates of related protein structures to provide the highest possible structural prediction performance (Baek et al. 2021; Jumper et al. 2021). Evaluating the performance of MSA-based LMs in isolating coevolutionary signals encoding functional and structural constraints from phylogenetic correlations, Lupo, Sgarbossa, and Bitbol (2022) discovered less phylogenetic deterioration from MSA contact inference plus greater structural contacts accuracy compared to the Potts model despite the MSA Transformer being pre-trained with a minimized diversity dataset. Advancing the MSA approach, an end-to-end differentiable recurrent geometric network (RGN2) for structure prediction of single protein sequences outperforming MSA-based tools (AF2 and RoseTTAFold) was published by Chowdhury et al. (2021) which employed: (i) AminoBERT PLM, which learns latent structural information from millions of misaligned proteins; and (ii) geometric modules representing the Cα backbone geometry. Subsequently, Lin et al. (2022) developed a similar program named ESMFold capable of competing with AF2 and RoseTTAFold in atomic level protein structure prediction accuracy solely with information on individual sequences of rare proteins. Absence of MSA-based elements have significantly shortened the process of protein structure prediction as much as up to six-fold faster than existing tools.

Antibodies represent a unique group of proteins, development of an antibody language model (ALM) for prediction would definitely outperform a trained protein language model that covers a holistic range of protein. ALM AbLang outperformed both IMGT germlines and protein language model ESM-1b in terms of a faster completion time and capability in restoring missing residues of antibody sequences retrieved from the Observed Antibody Space (OAS) database (Olsen, Moal & Deane 2022). Ablang was able to scrutinize better by separating the sequences based on V-genes into smaller clusters, segregate and classify accurately between naïve and memory B-cells, and restored missing residues of sequences with 15 missing residues at the N-terminal without any additional germlines information. Another antibody LM, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), was reported to outperform two existing PLMs (ProtBert and Sapiens) and exhibited better B cell receptors representation when compared to ProtBERT that was assigned with a smaller dataset (Choi 2022; Leem et al. 2022). The authors described self-attention changes as the element of AntiBERTa allowed it to correctly predict paratope positions of both CDR and non-CDR positions, better distinguish naïve and memory B-cells than two other PLMs and focused on what is functionally important for specific binding. Interested readers are referred to the design methods of a linguistic-based formalization of the antibody language (Vu et al. 2022). Readers interested in methods of antibody language models are referred to a comprehensive review that discuss the progress, methods, and challenges (Akbar et al. 2021, 2022).

In summary, the applications of ML tools in both neglected diseases and cancer, computer vision, and protein language research are very similar with one another. Both fields utilized ML tools for drug discovery and development, novel drug target sites, disease surveillance, prediction, detection, and disease management. The main factor that gaps both fields from one another is that researchers are more invested in advancing breakthroughs in cancer, computer vision, and protein language. Moreover, this research have been primarily driven by the accumulation of large training datasets and the development of highly sophisticated deep learning architectures. Typically, large attention-based models are trained on datasets in the order of 10⁶–10⁷ data points. In contrast to NTD research where datasets remain restricted at the scale of 10²–10³ (up to five orders of magnitude lower), shallow ML-methods, namely LR, RF, SVM, among others, are more prevalent. The lack of large datasets restricts the widespread application of large deep learning models for the discovery of new NTD therapeutics and thus hampers the potential for efficient management and eradication of these diseases.

On regional collaboration, data, and infrastructure sharing

The Southeast Asian countries are strategically located and exceptionally diverse in culture. Apart from the geographic proximity of Member States of the ASEAN regional association, the countries share a few other similarities of having densely populated communities, mineral-rich economies that open throughout the globe, and share similar tropical and subtropical climates. Hence, when there’s a disease outbreak reported among any of the SEA countries, chances of the imported disease to neighboring countries are very high. Hence, there is a need to circumvent the matter through regional collaboration, and data plus infrastructure sharing among the SEA countries.

As previously described, the SEA region is endemic to vector-borne diseases such as arboviral diseases (dengue, LF, and malaria), leptospirosis, cysticercosis, and rabies. These diseases require up-to-date, robust, and comprehensive information on presence, species-strain diversity, ecology, environmental and geographical information regarding the organisms that carry and transmit the infectious agents. As such, a few platforms had been established, all aimed at tackling these vector-borne diseases at a regional scale, may it be as an open-access publicly available database, or an online platform aimed at collecting data, analyzing them, and providing technical skills and collaboration with both local and global stakeholders. Two very notable and frequently visited database for health metrics and disease related data retrievals that have been actively mentioned throughout this review are none other than WHO’s Global Health Observatory (GHO) and Global Health Data Exchange (GHDx) data catalog that is created and supported by the Institute for Health Metrics and Evaluation’s (IHME) at the University of Washington, where both the Global Health Estimate (GHE) 2019 data and Global Burden of Disease Study (GBD) 2019 could be retrieved respectively by interested readers. Next, the European Centre for Disease Prevention and Control (ECDC) is an open-access database on dengue surveillance, threats, and outbreaks governed by the European Union. The database consists of almost all NTDs and other diseases that are of public health concern. Like MAP, ECDC serves as a platform that collects, analyses, shares, and provides infectious disease data and guidance as a means of assessing disease risks, preventing, and responding to outbreaks and other public health threats. Another publicly available but somewhat geographically irrelevant to SEA region is the WHO-driven Expanded Special Project for the Elimination of NTDs (ESPEN). However, the ESPEN portal only contains survey data sets of NTDs in Africa in response to 39% of the global NTDs burden that occurs in Africa. Undeviating from the WHO goal of accelerating the elimination and eradication of NTDs, ESPEN serves as a portal to aid governing bodies and health officials to rapidly strategies and deploy NTDs interventions to reach key targeted communities.

The Global Alliance for Rabies Control (GARC) is the leading international non-profit organization dedicated to eradicating canine rabies. GARC collaborates with global stakeholders, governments, and local partners to increase public awareness of rabies, promote teamwork, and develop the data required to increase political commitment and funding. Their main team of nine members work across three established work networks of ARACON (Asian Rabies Control Network), MERACON (Middle East, Eastern Europe, Central Asia and North Africa Rabies Control Network), and PARACON (Pan-African Rabies Control Network) as an effort to end rabies. International body responsible for infrastructure sharing in combating LF is the Global Alliance to Eliminate Lymphatic Filariasis (GAELF). Having great breakthroughs by GPELF in eradicating LF as mentioned previously, GAELF is a steering body aimed at bringing relevant partners to support the GPELF established by WHO via political, financial and technical resources mobilization. The Global Atlas of Helminth Infections (GAHI) is an open-access database that details the global and geographical distribution of three neglected tropical worm-borne diseases: LF, STHs, and schistosomiasis. The platform was developed by London Applied & Spatial Epidemiology Research Group (LASER) at the University of London. All GAHI resources are available on an open access basis but up till the year 2015 only.

In response to the neglect of melioidosis, there is an active website Melioidosis.info which serves as an online-platform for reporting melioidosis cases and for disseminating information of melioidosis for the public, researchers and health policy makers. Researchers or health authorities with culture-confirmed melioidosis cases and deaths, backed with institutional support can contribute to the platform by uploading melioidosis cases and serological information. The team’s aim is to allow local and global policy makers to easily pinpoint incidences of melioidosis cases geographically and to identify institutions that are capable of diagnosing the disease. In response to the outbreak of leptospirosis, Global Leptospirosis Environment Action Network (GLEAN) was established to lessen the impact of leptospirosis on the planet by improving our knowledge of the connections between the disease’s occurrence and various associated factors, such as biological, demographic, environmental, ecological, and economic factors, delivering more prompt warnings of outbreaks, and identifying prevention and control strategies. In order to predict geographical limitations, prevalence, and endemicity of malaria in every region of the world, the Malaria Atlas Project (MAP) is an open-access database and the WHO collaborative platform for geospatial disease modeling. Solely by handling data, collaboration, analytics, and engagement elements by members of MAP, they have successfully helped (i) to generate malaria mapping risk (infection prevalence, incidence rates, and mortality estimations) at national and global levels; (ii) forecast annual global malaria burden; (iii) tracked interventions coverage (malaria drugs, diagnostics, and vector control); (iv) employed statistical models to measure the effectiveness of currently available control strategies against malaria; (v) developing tools to support efficient commodities planning to ensure enough resources were prepared to protect a population; and (iv) strengthening skills among researchers and technicians in malaria analytics.

Data and infrastructure sharing is undeniably crucial for NTDs since the scale of publicly available datasets for NTDs research is restricted. Efforts displayed by each governing body in maintaining and keeping up-to-date open-access databases or infrastructure and technical outreach organizations are to ensure that every country would have the latest disease intelligence and technical skills in order for effective surveillance, preventive, and disease management control to be executed according to each country’s governing leadership. Importance of having centralized data sharing at a regional scale has been highlighted in a study by Alemu et al. (2022). With access to publicly available standardized survey and treatment coverage data, which was at first unavailable probably due to absence of reports by the country’s Ministry of Health to the WHO, they were now able to access ample amounts of collected evidence pointing to the advantages of school-based deworming programs and LF MDA campaigns.

Conclusion

NTDs impact nearly 2 billion people especially in countries with developing economies such as countries in Southeast Asia region causing reduction in productivity and substantial accumulation of Disability-Adjusted Life Years, DALY. Machine learning has been widely applied in the fields of NTDs for drug discovery and development, plus surveillance and disease management. However, the application of machine learning in NTDs research is hampered by the limited amount of data, the absence of centralized/standardized collaborative framework and the general lack of attention from public and private stakeholders alike when compared to other fields. In Southeast Asia, endemic diseases such as dengue, leptospirosis, malaria, and melioidosis persist despite ongoing efforts. Regional platforms such as GHO, GHDx, GLEAN, MAP, and Melioidosis.info have been established to consolidate surveillance data and enhance cross-border collaboration. When integrated with ML tools, these platforms hold promise for predictive modelling and targeted intervention. However, realizing this potential will require greater investment in data infrastructure, open-access systems, and coordinated research frameworks tailored to resource-limited settings. Addressing NTDs alongside other regionally significant public health threats through this manner would enhance disease management and mitigate its longstanding burden in vulnerable communities.

Data availability

No data is associated with this article.

Acknowledgements

No acknowledgement declared.

References

Abdullah M, Kadivella M, Sharma R, et al.: Designing of multiepitope-based vaccine against Leptospirosis using Immuno-Informatics approaches. bioRxiv. 2021. 2021.02.22.431920.
Abela-Ridder B, Biswas G, Mbabazi PS, et al.: Ending the neglect to attain the sustainable development goals: a road map for neglected tropical diseases 2021–2030. Who. 2020.
Afolabi MO, Adebiyi A, Cano J, et al.: Prevalence and distribution pattern of malaria and soil-transmitted helminth co-endemicity in sub-Saharan Africa, 2000–2018: A geospatial analysis. PLOS Neglected Tropical Diseases. 2022; 16(9): e0010321. PubMed Abstract | Publisher Full Text | Free Full Text
Agampodi S, Gunarathna S, Lee JS, et al.: Global, regional, and country-level cost of leptospirosis due to loss of productivity in humans. PLoS Neglected Tropical Diseases. 2023; 17(8 August): e0011291. PubMed Abstract | Publisher Full Text | Free Full Text
Aguilera-Pesantes D, Robayo LE, Méndez PE, et al.: Discovering key residues of dengue virus NS2b-NS3-protease: New binding sites for antiviral inhibitors design. Biochemical and Biophysical Research Communications. 2017; 492(4): 631–642. Publisher Full Text
Ahangarcani M, Farnaghi M, Shirzadi MR, et al.: Predictive risk mapping of human leptospirosis using support vector machine classification and multilayer perceptron neural network. Geospatial Health. 2019; 14(1). PubMed Abstract | Publisher Full Text
Akbar R, Bashour H, Rawat P, et al.: Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies. mAbs. 2022; 14(1). PubMed Abstract | Publisher Full Text | Free Full Text
Akbar R, Robert PA, Pavlović M, et al.: A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Reports. 2021; 34(11): 108856. PubMed Abstract | Publisher Full Text
Alemar Ali J: Prevention and Control of Rabies in Animals and Humans in Ethiopia.2022. Publisher Full Text
Alemu Y, Degefa T, Bajiro M, et al.: Prevalence and intensity of soil-transmitted helminths infection among individuals in model and non-model households, South West Ethiopia: A comparative cross-sectional community based study. PLoS One. 2022; 17(10): e0276137. PubMed Abstract | Publisher Full Text | Free Full Text
Alfred R, Obit JH: The roles of machine learning methods in limiting the spread of deadly diseases: A systematic review. Heliyon. 2021; 7(6): e07371. PubMed Abstract | Publisher Full Text | Free Full Text
Ali SA, Niaz S, Aguilar-Marcelino L, et al.: Prevalence of Ascaris lumbricoides in contaminated faecal samples of children residing in urban areas of Lahore, Pakistan. Scientific Reports. 2020; 10(1): 1–8. Publisher Full Text
Alley EC, Khimulya G, Biswas S, et al.: Unified rational protein engineering with sequence-based deep representation learning. Nature Methods. 2019; 16(12): 1315–1322. PubMed Abstract | Publisher Full Text | Free Full Text
Alqahtani A: Application of Artificial Intelligence in Discovery and Development of Anticancer and Antidiabetic Therapeutic Agents. Evidence-based Complementary and Alternative Medicine. 2022; 2022.
Andre-Fontaine G, Aviat F, Thorin C: Waterborne Leptospirosis: Survival and Preservation of the Virulence of Pathogenic Leptospira spp. in Fresh Water. Current Microbiology. 2015; 71(1): 136–142. PubMed Abstract | Publisher Full Text
Ardila D, Kiraly AP, Bharadwaj S, et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine. 2019; 25(6): 954–961.
Azevedo IR, Amamura TA, Isaac L: Human leptospirosis: In search for a better vaccine. Scandinavian Journal of Immunology. 2023; 98(5). John Wiley and Sons Inc. Publisher Full Text
Baek M, DiMaio F, Anishchenko I, et al.: Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021; 373(6557): 871–876. PubMed Abstract | Publisher Full Text | Free Full Text
Bepler T, Berger B: Learning the protein language: Evolution, structure, and function. Cell Systems. 2021; 12(6): 654–669.e3.
Bergner LM, Mollentze N, Orton RJ, et al.: Characterizing and evaluating the zoonotic potential of novel viruses discovered in vampire bats. Viruses. 2021; 13(2). PubMed Abstract | Publisher Full Text | Free Full Text
Bertsimas D, Wiberg H: Machine Learning in Oncology: Methods, Applications, and Challenges. JCO Clinical Cancer Informatics. 2020; 4(4): 885–894. PubMed Abstract | Publisher Full Text | Free Full Text
Birnie E, Virk HS, Savelkoel J, et al.: Global burden of melioidosis in 2015: a systematic review and data synthesis. The Lancet Infectious Diseases. 2019; 19(8): 892–902. PubMed Abstract | Publisher Full Text | Free Full Text
Bizhani N, Hashemi Hafshejani S, Mohammadi N, et al.: Lymphatic filariasis in Asia: a systematic review and meta-analysis. Parasitology Research. 2021 Feb; 120(2): 411–422. Springer Science and Business Media Deutschland GmbH. PubMed Abstract | Publisher Full Text | Free Full Text
Boundenga L, Sima-Biyang YV, Longo-Pendy NM, et al.: Epidemiology and diversity of Plasmodium species in Franceville and their implications for malaria control. Scientific Reports. 2024; 14(1). PubMed Abstract | Publisher Full Text | Free Full Text
Bradley EA, Lockaby G: Leptospirosis and the Environment: A Review and Future Directions. Pathogens. 2023; 12(9). Multidisciplinary Digital Publishing Institute (MDPI). PubMed Abstract | Publisher Full Text | Free Full Text
Butala C, Brook TM, Majekodunmi AO, et al.: Neurocysticercosis: Current Perspectives on Diagnosis and Management. Frontiers in Veterinary Science. 2021; 8(May): 1–10.
Bzdyl NM, Moran CL, Bendo J, et al.: Pathogenicity and virulence of Burkholderia pseudomallei. Virulence. 2022; 13(1): 1945–1965. PubMed Abstract | Publisher Full Text | Free Full Text
Caldrer S, Ursini T, Santucci B, et al.: Soil-Transmitted Helminths and Anaemia: A Neglected Association Outside the Tropics. Microorganisms. 2022 May 13; 10(5): 1027. MDPI.PubMed Abstract | Publisher Full Text | Free Full Text
Charoenkwan P, Schaduangrat N, Lio P, et al.: iAMAP-SCM: A Novel Computational Tool for Large-Scale Identification of Antimalarial Peptides Using Estimated Propensity Scores of Dipeptides. ACS Omega. 2022.
Chen SJ, Rai CI, Wang SC, et al.: Infection and Prevention of Rabies Viruses. Microorganisms. 2025 Feb 9; 13(2): 380. Multidisciplinary Digital Publishing Institute (MDPI). PubMed Abstract | Publisher Full Text | Free Full Text
Cheng AC, Currie BJ: Melioidosis: Epidemiology, Pathophysiology, and Management. Clinical Microbiology Reviews. 2005; 18(2): 383–416. PubMed Abstract | Publisher Full Text | Free Full Text
Choi Y: Artificial intelligence for antibody reading comprehension: AntiBERTa. Patterns. 2022; 3(7): 100535. PubMed Abstract | Publisher Full Text | Free Full Text
Chong NS, Hardwick RJ, Smith SR, et al.: A prevalence-based transmission model for the study of the epidemiology and control of soil-transmitted helminthiasis. PLoS One. 2022; 17(8 August): 1–28. Publisher Full Text
Chowdhury R, Bouatta N, Biswas S, et al.: Single-sequence protein structure prediction using language models from deep learning. AIChE Annual Meeting, Conference Proceedings. 2021-Novem.
Christophers SR: History of Malaria. British Medical Journal. 1951; 1(4711): 865–866. Publisher Full Text
Condori RE, Niezgoda M, Lopez G, et al.: Using the LN34 pan-lyssavirus real-time RT-PCR assay for rabies diagnosis and rapid genetic typing from formalin-fixed human brain tissue. Viruses. 2020; 12(1). PubMed Abstract | Publisher Full Text | Free Full Text
Côrtes N, Lira A, Prates-Syed W, et al.: Integrated control strategies for dengue, Zika, and Chikungunya virus infections. Frontiers in Immunology. 2023; 14. Publisher Full Text
Costa F, Hagan JE, Calcagno J, et al.: Global Morbidity and Mortality of Leptospirosis: A Systematic Review. PLoS Neglected Tropical Diseases. 2015; 9(9): 0–1. Publisher Full Text
Dacal E, Bermejo-Peláez D, Lin L, et al.: Mobile microscopy and telemedicine platform assisted by deep learning for the quantification of Trichuris trichiura infection. PLoS Neglected Tropical Diseases. 2021; 15(9): 1–14. Publisher Full Text
Dara S, Dhamercherla S, Jadav SS, et al.: Machine Learning in Drug Discovery: A Review. Artificial Intelligence Review. Netherlands: Springer; 2022; 55. : 1947–1999. PubMed Abstract | Publisher Full Text | Free Full Text
De Castro Poncio L, Dos Anjos FA, De Oliveira DA, et al.: Novel Sterile Insect Technology Program Results in Suppression of a Field Mosquito Population and Subsequently to Reduced Incidence of Dengue. Journal of Infectious Diseases. 2021; 224(6): 1005–1014. PubMed Abstract | Publisher Full Text
Dey SK, Rahman MM, Howlader A, et al.: Prediction of dengue incidents using hospitalized patients, metrological and socioeconomic data in Bangladesh: A machine learning approach. PLoS One. 2022; 17(7 July): 1–17.
Dickson BFR, Masson JJR, Mayfield HJ, et al.: Bayesian Network Analysis of Lymphatic Filariasis Serology from Myanmar Shows Benefit of Adding Antibody Testing to Post-MDA Surveillance. Tropical Medicine and Infectious Disease. 2022; 7(7). PubMed Abstract | Publisher Full Text | Free Full Text
Douglass J, Graves P, Lindsay D, et al.: Lymphatic filariasis increases tissue compressibility and extracellular fluid in lower limbs of asymptomatic young people in central Myanmar. Tropical Medicine and Infectious Disease. 2017; 2(4): 1–14. Publisher Full Text
Easton AV, Raciny-Aleman M, Liu V, et al.: Immune Response and Microbiota Profiles during Coinfection with Plasmodium vivax and Soil-Transmitted Helminths. mBio. 2020; 11(5): 1–17. Publisher Full Text
Elumalai E: Characterization and Prediction of Dengue Virus targeting peptides based on three class of descriptors using k-NN and Random Forest.2022; pp. 1–17.
Elvana A, Suryanto ED: Lymphatic Filariasis Detection Using Image Analysis.2022.
Famakinde DO: Mosquitoes and the Lymphatic Filarial Parasites: Research Trends and Budding Roadmaps to Future Disease Eradication. Tropical Medicine and Infectious Disease. 2018; 3(1). PubMed Abstract | Publisher Full Text | Free Full Text
Galipó E, Dixon MA, Fronterrè C, et al.: Spatial distribution and risk factors for human cysticercosis in Colombia. Parasites and Vectors. 2021; 14(1): 1–15. Publisher Full Text
Garcia HH, Gonzalez AE, Gilman RH: Taenia solium Cysticercosis and Its Impact in Neurological Disease. 2020. Publisher Full Text
Gassiep I, Bauer MJ, Harris PNA, et al.: Laboratory Safety: Handling Burkholderia pseudomallei Isolates without a Biosafety Cabinet. 2021.
Geoffrey B, Sanker A, Madaj R, et al.: A program to automate the discovery of drugs for West Nile and Dengue virus—programmatic screening of over a billion compounds on PubChem, generation of drug leads and automated in silico modelling. Journal of Biomolecular Structure and Dynamics. 2022; 40(10): 4293–4300. PubMed Abstract | Publisher Full Text
Giordano D, Biancaniello C, Argenio MA, et al.: Drug Design by Pharmacophore and Virtual Screening Approach. Pharmaceuticals. 2022; 15(5). MDPI. PubMed Abstract | Publisher Full Text | Free Full Text
Guo W, Lv C, Guo M, et al.: Innovative applications of artificial intelligence in zoonotic disease management. Science in One Health. 2023; 2: 100045. Elsevier B.V. PubMed Abstract | Publisher Full Text | Free Full Text
Haake DA, Levett PN: Leptospirosis in Humans. Journal of Biological Education. 2015; 65–97. Publisher Full Text
Hagedoorn NN, Maze MJ, Carugati M, et al.: Global distribution of Leptospira serovar isolations and detections from animal host species: A systematic review and online database. Tropical Medicine and International Health. 2024; 29(3): 161–172. John Wiley and Sons Inc.PubMed Abstract | Publisher Full Text | Free Full Text
Haladjian J, Ermis A, Hodaie Z, et al.: IPig: Towards tracking the behavior of free-roaming pigs. ACM International Conference Proceeding Series Part F1325. 2017.
Heinzinger M, Elnaggar A, Wang Y, et al.: Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics. 2019; 20(1): 1–17. Publisher Full Text
Hotez PJ, Lo NC: Neglected Tropical Diseases: Public Health Control Programs and Mass Drug Administration. Hunter’s Tropical Medicine and Emerging Infectious Diseases. 10th ed.Elsevier Inc.; 2020. Publisher Full Text
Hoyos W, Aguilar J, Toro M: An autonomous cycle of data analysis tasks for the clinical management of dengue. Heliyon. 2022; 8(10): e10846. PubMed Abstract | Publisher Full Text | Free Full Text
Hugo LE, Rašić G, Maynard AJ, et al.: Wolbachia wAlbB inhibit dengue and Zika infection in the mosquito Aedes aegypti with an Australian background. bioRxiv. 2022. 2022.03.22.485408.
Hussein A, Alemu M, Ayehu A: Soil Contamination and Infection of School Children by Soil-Transmitted Helminths and Associated Factors at Kola Diba Primary School, Northwest Ethiopia: An Institution-Based Cross-Sectional Study. Journal of Tropical Medicine. 2022; 2022: 1–8. PubMed Abstract | Publisher Full Text | Free Full Text
Jainul Fathima A, Revathy R, Balamurali S, et al.: Prediction of Dengue-Human Protein Interaction Using Artificial Neural Network for Anti-Viral Drug Discovery. SSRN Electronic Journal. 2019. (January). Publisher Full Text
Jane Ling MY, Halim AFNA, Ahmad D, et al.: Rabies in Southeast Asia: A systematic review of its incidence, risk factors and mortality. BMJ Open. 2023; 13(5). PubMed Abstract | Publisher Full Text | Free Full Text
Jumper J, Evans R, Pritzel A, et al.: Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596(7873): 583–589. PubMed Abstract | Publisher Full Text | Free Full Text
Kabululu ML, Johansen MV, Lightowlers M, et al.: Aggregation of Taenia solium cysticerci in pigs: Implications for transmission and control. Parasite Epidemiology and Control. 2023; 22: e00307. Elsevier Ltd.PubMed Abstract | Publisher Full Text | Free Full Text
Kaur R, Arora N, Jamakhani MA, et al.: Development of multi-epitope chimeric vaccine against Taenia solium by exploring its proteome: an in silico approach. Expert Review of Vaccines. 2020; 19(1): 105–114. PubMed Abstract | Publisher Full Text
Kermelita D, Hadi UK, Soviana S, et al.: Species diversity, mosquito behavior, and microfilariae detection in vectors and reservoirs in filariasis-endemic areas of Bengkulu, Indonesia. Biodiversitas. 2024; 25(9): 3125–3131. Publisher Full Text
Khalid AQ, Rao Avupati V, Hussain H: Machine Learning Model For Predicting Anti-Dengue Drugs: A Three-Dimensional Quantitative Structure-Activity Relationship (3D QSAR) Study. International Journal of Science & Technology Research. 2020; 9(6): 1107–1115.
Kondeti PK, Ravi K, Mutheneni SR, et al.: Applications of machine learning techniques to predict filariasis using socio-economic factors. Epidemiology and Infection. 2019; 147: e260. PubMed Abstract | Publisher Full Text | Free Full Text
Kourou K, Exarchos KP, Papaloukas C, et al.: Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Computational and Structural Biotechnology Journal. 2021; 19: 5546–5555.
Lalremruata A, Jeyaraj S, Engleitner T, et al.: Species and genotype diversity of Plasmodium in malaria patients from Gabon analysed by next generation sequencing. Malaria Journal. 2017; 16(1): 1–11. Publisher Full Text
Larsen JC, Johnson NH: Pathogenesis of Burkholderia pseudomallei and Burkholderia mallei. Military Medicine. 2009; 174(6): 647–651. Publisher Full Text
Leem J, Mitchell LS, Farmery JHR, et al.: Deciphering the language of antibodies using self-supervised learning. Patterns. 2022; 3(7): 100513. PubMed Abstract | Publisher Full Text | Free Full Text
Levett PN: Systematics of leptospiraceae. Current Topics in Microbiology and Immunology. 2015; 387: 11–20. Publisher Full Text
Lim CC, Khairudin NAA, Loke SW, et al.: Comparison of Human Intestinal Parasite Ova Segmentation Using Machine Learning and Deep Learning Techniques. Applied Sciences (Switzerland). 2022; 12(15).
Limmathurotsakul D, Golding N, Dance DAB, et al.: Predicted global distribution of Burkholderia pseudomallei and burden of melioidosis. Nature Microbiology. 2016; 1(1): 6–10. Publisher Full Text
Lin Y, Fang K, Zheng Y, et al.: Global burden and trends of neglected tropical diseases from 1990 to 2019. Journal of Travel Medicine. 2022; 29(3). PubMed Abstract | Publisher Full Text
Lin Z, Akin H, Rao R, et al.: Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv. 2022. 2022.07.20.500902.
Littmann M, Heinzinger M, Dallago C, et al.: Embeddings from deep learning transfer GO annotations beyond homology. Scientific Reports. 2021; 11(1): 1–14. Publisher Full Text
Lupo U, Sgarbossa D, Bitbol AF: Protein language models trained on multiple sequence alignments learn phylogenetic relationships. Nature Communications. 2022; 13(1): 1–23. Publisher Full Text
Malavige GN, Sjö P, Singh K, et al.: Facing the escalating burden of dengue: Challenges and perspectives PLOS Global Public Health. 2023 Dec 15; 3(12): e0002598. Public Library of Science. PubMed Abstract | Publisher Full Text | Free Full Text
Manuel M, Ramanujam K, Ajjampur SSR: Molecular Tools for Diagnosis and Surveillance of Soil-Transmitted Helminths in Endemic Areas. Parasitologia. 2021; 1(3): 105–118. Publisher Full Text
Marquet C, Heinzinger M, Olenyi T, et al.: Embeddings from protein language models predict conservation and variant effects. Human Genetics. 2022; 141(10): 1629–1647. PubMed Abstract | Publisher Full Text | Free Full Text
McComb M, Bies R, Ramanathan M: Machine learning in pharmacometrics: Opportunities and challenges. British Journal of Clinical Pharmacology. 2022; 88(4): 1482–1499. PubMed Abstract | Publisher Full Text
Meumann EM, Limmathurotsakul D, Dunachie SJ, et al.: Burkholderia pseudomallei and melioidosis. Nature Reviews Microbiology. 2024; 22(3): 155–169. Publisher Full Text
Mitra AK, Mawson AR: Neglected tropical diseases: Epidemiology and global burden. Tropical Medicine and Infectious Disease. 2017; 2(3). PubMed Abstract | Publisher Full Text | Free Full Text
Mlowe F, Mlangwa J, Mkupasi E, et al.: Taenia solium Cysticercosis and Taeniosis Reporting in the Current Medical and Veterinary Diseases Reporting Systems in Tanzania: A Cross-Sectional Study. Veterinary Medicine International. 2024; 2024. PubMed Abstract | Publisher Full Text | Free Full Text
Moawad AA, Silge A, Bocklitz T, et al.: A Machine Learning-Based Raman Spectroscopic Assay for the Identification of Burkholderia mallei and Related Species. Molecules. 2019; 24(24): 4516. PubMed Abstract | Publisher Full Text | Free Full Text
Mogaji HO, Johnson OO, Adigun AB, et al.: Estimating the population at risk with soil transmitted helminthiasis and annual drug requirements for preventive chemotherapy in Ogun State, Nigeria. Scientific Reports. 2022; 12(1): 1–12. Publisher Full Text
Mohammad Basir MF, Mohd Hairon S, Ibrahim MI, et al.: Development and Validation of Rabies Health Education Module (RaHEM) for Dog Owners in Kelantan, Malaysia: An ADDIE Model. Journal of Epidemiology and Global Health. 2025; 15(1): 12. PubMed Abstract | Publisher Full Text | Free Full Text
Mohammadinia A, Saeidian B, Pradhan B, et al.: Prediction mapping of human leptospirosis using ANN, GWR, SVM and GLM approaches. BMC Infectious Diseases. 2019; 19(1): 1–18. Publisher Full Text
Mollentze N, Babayan SA, Streicker DG: Identifying and prioritizing potential humaninfecting viruses from their genome sequences. PLoS Biology. 2021; 19(9). Publisher Full Text
Molyneux D: Neglected Tropical Diseases - East Asia.Utzinger J, Yap P, Bratschi M, et al., (Pnyt.) Community Eye Health Journal. Cham: Springer International Publishing; 2019.
Monnier N, Barth-Jaeggi T, Knopp S, et al.: Core components, concepts and strategies for parasitic and vector-borne disease elimination with a focus on schistosomiasis: A landscape analysis. PLoS Neglected Tropical Diseases. 2020; 14(10): e0008837. PubMed Abstract | Publisher Full Text | Free Full Text
Morang’a CM, Amenga-Etego L, Bah SY, et al.: Machine learning approaches classify clinical malaria outcomes based on haematological parameters. BMC Medicine. 2020; 18(1): 1–16. Publisher Full Text
Mswahili ME, Martin GL, Woo J, et al.: Antimalarial drug predictions using molecular descriptors and machine learning against plasmodium falciparum. Biomolecules. 2021; 11(12): 1–15. Publisher Full Text
Muñoz-Antoli C, Pérez P, Pavón A, et al.: High intestinal parasite infection detected in children from Región Autónoma Atlántico Norte (R.A.A.N.) of Nicaragua. Scientific Reports. 2022; 12(1): 1–10. Publisher Full Text
Narkkul U, Thaipadungpanit J, Srisawat N, et al.: Human, animal, water source interactions and leptospirosis in Thailand. Scientific Reports. 2021; 11(1): 3215. PubMed Abstract | Publisher Full Text | Free Full Text
Narne KG, Kakumani J, aishnavi KISN, et al.: Disseminated Melioidosis Complicated by Prostatic Abscess and Splenic Involvement: Diagnostic and Therapeutic Insights. Cureus. 2024; 16: e69961. PubMed Abstract | Publisher Full Text | Free Full Text
Navarrete-Perea J, Isasa M, Paulo JA, et al.: Quantitative multiplexed proteomics of Taenia solium cysts obtained from the skeletal muscle and central nervous system of pigs. PLoS Neglected Tropical Diseases. 2017; 11: e0005962. PubMed Abstract | Publisher Full Text | Free Full Text
Nguyen VH, Tuyet-Hanh TT, Mulhall J, et al.: Deep learning models for forecasting dengue fever based on climate data in Vietnam. PLoS Neglected Tropical Diseases. 2022; 16(6): 1–22. Publisher Full Text
Nyanasegran PK, Nathan S, Firdaus-Raih M, et al.: Biofilm Signaling, Composition and Regulation in Burkholderia pseudomallei.In Journal of Microbiology and Biotechnology.2023; 33(1): 15–27. Korean Society for Microbiolog and Biotechnology. PubMed Abstract | Publisher Full Text | Free Full Text
Ofer D, Brandes N, Linial M: The language of proteins: NLP, machine learning & protein sequences. Computational and Structural Biotechnology Journal. 2021; 19: 1750–1758. PubMed Abstract | Publisher Full Text | Free Full Text
Oguike OE, Ugwuishiwu CH, Asogwa CN, et al.: Systematic review on the application of machine learning to quantitative structure–activity relationship modeling against Plasmodium falciparum. Molecular Diversity. 2022; 26(6): 3447–3462. PubMed Abstract | Publisher Full Text | Free Full Text
Okagbue HI, Oguntunde PE, Obasi ECM, et al.: Diagnosing malaria from some symptoms: a machine learning approach and public health implications. Health and Technology. 2021; 11(1): 23–37. Publisher Full Text
Olsen TH, Moal IH, Deane CM: AbLang: an antibody language model for completing antibody sequences. Bioinformatics Advances. 2022; 2(1): 0–7. Publisher Full Text
Ong E, Wang H, Wong MU, et al.: Vaxign-ML: Supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens. Bioinformatics. 2020; 36(10): 3185–3191. PubMed Abstract | Publisher Full Text | Free Full Text
Ong E, Wang H, Wong MU, et al.: Vaxign-ML: Supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens. Bioinformatics. 2020; 36(10): 3185–3191. PubMed Abstract | Publisher Full Text | Free Full Text
Oyewole OE, Simon-Oke IA: Ecological risk factors of soil-transmitted helminths infections in Ifedore district, Southwest Nigeria. Bulletin of the National Research Centre. 2022; 46(1). Publisher Full Text
Panja M, Chakraborty T, Nadim SS, et al.: An ensemble neural network approach to forecast Dengue outbreak based on climatic condition.2022; 1–32.
Parashar R, Nanda S, Smith SL, et al.: Comparing priority received by global health issues: A measurement framework applied to tuberculosis, malaria, diarrhoeal diseases and dengue fever. BMJ Global Health. 2024; 9(7). PubMed Abstract | Publisher Full Text | Free Full Text
Ragno R: www.3d-qsar.com: a web portal that brings 3-D QSAR to all electronic devices—the Py-CoMFA web application as tool to build models from pre-aligned datasets. Journal of Computer-Aided Molecular Design. 2019; 33(9): 855–864. Publisher Full Text
Rahmat F, Zulkafli Z, Juraiza Ishak A, et al.: Exploratory Data Analysis and Artificial Neural Network for Prediction of Leptospirosis Occurrence in Seremban, Malaysia Based on Meteorological Data. Frontiers in Earth Science. 2020; 8(November): 1–14.
Ranathunge T, Harishchandra J, Maiga H, et al.: Development of the Sterile Insect Technique to control the dengue vector Aedes aegypti (Linnaeus) in Sri Lanka. PLoS One. 2022; 17(4 April): 1–15.
Rapin N, Lund O, Bernaschi M, et al.: Computational immunology meets bioinformatics: The use of prediction tools for molecular binding in the simulation of the immune system. PLoS One. 2010; 5(4): e9862. PubMed Abstract | Publisher Full Text | Free Full Text
Rives A, Meier J, Sercu T, et al.: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences of the United States of America. 2021; 118(15). Publisher Full Text
Rout SS, Roy S, Mishra SN, et al.: Melioidosis and Burkholderia pseudomallei: Disease mechanisms, drug resistance, and treatment challenges. Journal of Integrative Medicine and Research. 2025; 3(1): 14–23. Publisher Full Text
Saba N, Balwan WK, Mushtaq F: Burden of Malaria - A Journey Revisited. Scholars Journal of Applied Medical Sciences. 2022; 10(6): 934–939. Publisher Full Text
Saleh AY, Medang SA, Ibrahim AO: Rabies Outbreak Prediction Using Deep Learning with Long Short-Term Memory. Advances in Intelligent Systems and Computing. 2020; 330–340.
dos Santos Nascimento IJ , da Silva Rodrigues ÉE , da Silva MF , et al.: Advances in Computational Methods to Discover New NS2B-NS3 Inhibitors Useful Against Dengue and Zika Viruses. Current Topics in Medicinal Chemistry. 2022; 22(29): 2435–2462. PubMed Abstract | Publisher Full Text
Sarkar-Tyson M, Titball RW: Burkholderia mallei and Burkholderia pseudomallei. Vaccines for Biodefense and Emerging and Neglected Diseases. Elsevier; 2009; 831–843. Publisher Full Text
Sato S: Plasmodium—a brief introduction to the parasites causing human malaria and their basic biology. Journal of Physiological Anthropology. 2021; 40(1): 1. BioMed Central Ltd. PubMed Abstract | Publisher Full Text | Free Full Text
Sayanthi Y, Susanna D: Pathogenic Leptospira contamination in the environment: a systematic review Infection Ecology and Epidemiology. 2024; 14(1). Taylor and Francis Ltd.PubMed Abstract | Publisher Full Text | Free Full Text
Scavuzzo CM, Scavuzzo JM, Campero MN, et al.: Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP. Infectious Disease Modelling. 2022; 7(1): 262–276. PubMed Abstract | Publisher Full Text | Free Full Text
Scoffone VC, Trespidi G, Barbieri G, et al.: Methodological tools to study species of the genus Burkholderia. Applied Microbiology and Biotechnology. 2021; 105(24): 9019–9034. Springer Science and Business Media Deutschland GmbH. PubMed Abstract | Publisher Full Text | Free Full Text
Selvarajoo S, Liew JWK, Chua TH, et al.: Dengue surveillance using gravid oviposition sticky (GOS) trap and dengue non-structural 1 (NS1) antigen test in Malaysia: randomized controlled trial. Scientific Reports. 2022; 12(1): 1–12. Publisher Full Text
Shao D, Dai Y, Li N, et al.: Artificial intelligence in clinical research of cancers. Briefings in Bioinformatics. 2022; 23(1): 1–12. Publisher Full Text
Singh J, Arora MS, Sharma S, et al.: Modeling the variable transmission rate and various discharges on the spread of Malaria. Electronic Research Archive. 2022; 31(1): 319–341. Publisher Full Text
Srisawat N, Thisyakorn U, Ismail Z, et al.: World Dengue Day: A call for action. PLoS Neglected Tropical Diseases. 2022; 16(8): 2–10. Publisher Full Text
Stärk H, Dallago C, Heinzinger M, et al.: Light attention predicts protein location from the language of life. Bioinformatics Advances. 2021; 1(1). PubMed Abstract | Publisher Full Text | Free Full Text
Sun AH, Liu XX, Yan J: Leptospirosis is an invasive infectious and systemic inflammatory disease. Biomedical Journal. 2020; 43(1): 24–31. PubMed Abstract | Publisher Full Text | Free Full Text
Suratanee A, Buaboocha T, Plaimas K: Prediction of Human-Plasmodium vivax Protein Associations From Heterogeneous Network Structures Based on Machine-Learning Approach. Bioinformatics and Biology Insights. 2021; 15: 117793222110133. PubMed Abstract | Publisher Full Text | Free Full Text
Syed AH, Khan T: Evolution of research trends in artificial intelligence for breast cancer diagnosis and prognosis over the past two decades: A bibliometric analysis. Frontiers in Oncology. 2022; 12. PubMed Abstract | Publisher Full Text | Free Full Text
Sykes JE, Reagan KL, Nally JE, et al.: Role of Diagnostics in Epidemiology, Management, Surveillance, and Control of Leptospirosis. Pathogens. 2022; 11(4): 1–24. Publisher Full Text
Tai KY, Dhaliwal J: Machine learning model for malaria risk prediction based on mutation location of large-scale genetic variation data. Journal of Big Data. 2022; 9(1). Publisher Full Text
Taylor B: Artificial Intelligence in Oncology Drug Discovery and Development. Artificial Intelligence in Oncology Drug Discovery and Development. 2020.
Thanapongtharm W, Kasemsuwan S, Wongphruksasoong V, et al.: Spatial Distribution and Population Estimation of Dogs in Thailand: Implications for Rabies Prevention and Control. Frontiers in Veterinary Science. 2021; 8(December): 1–12. Publisher Full Text
Torgerson PR, Hagan JE, Costa F, et al.: Global Burden of Leptospirosis: Estimated in Terms of Disability Adjusted Life Years. PLoS Neglected Tropical Diseases. 2015; 9(10): e0004122. PubMed Abstract | Publisher Full Text | Free Full Text
Torgerson PR, Hagan JE, Costa F, et al.: Global Burden of Leptospirosis: Estimated in Terms of Disability Adjusted Life Years. PLoS Neglected Tropical Diseases. 2015; 9(10): e0004122. PubMed Abstract | Publisher Full Text | Free Full Text
Tsheten T, Gray DJ, Clements ACA, et al.: Epidemiology and challenges of dengue surveillance in the WHO South-East Asia Region. Transactions of the Royal Society of Tropical Medicine and Hygiene. 2021; 115(6): 583–599. Publisher Full Text
Urbanskas E, Karvelienė B, Radzijevskaja J: Leptospirosis: classification, epidemiology, and methods of detection. A review. Biologija. 2022; 68(2): 129–136. Publisher Full Text
Vegvari C, Giardina F, Malizia V, et al.: Impact of Key Assumptions about the Population Biology of Soil-Transmitted Helminths on the Sustainable Control of Morbidity. Clinical Infectious Diseases. 2021; 72: S188–S194. PubMed Abstract | Publisher Full Text | Free Full Text
Vinkeles Melchers NVS, Stolk WA, van Loon W , et al.: The burden of skin disease and eye disease due to onchocerciasis in countries formerly under the african programme for onchocerciasis control mandate for 1990, 2020, and 2030. PLoS Neglected Tropical Diseases. 2021; 15(7): 1–18. PubMed Abstract | Publisher Full Text | Free Full Text
Vos T, Lim SS, Abbafati C, et al.: Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. The Lancet. 2020; 396(10258): 1204–1222. PubMed Abstract | Publisher Full Text | Free Full Text
Vu MH, Akbar R, Robert PA, et al.: Advancing protein language models with linguistics: a roadmap for improved interpretability. arXiv. 2022; 1–26.
Wainaina M, Wasonga J, Cook EAJ: Epidemiology of human and animal leptospirosis in Kenya: A systematic review and meta-analysis of disease occurrence, serogroup diversity and risk factors. PLOS Neglected Tropical Diseases. 2024; 18(9): e0012527. PubMed Abstract | Publisher Full Text | Free Full Text
Ward P, Dahlberg P, Lagatie O, et al.: Affordable artificial intelligence-based digital pathology for neglected tropical diseases: A proof-of-concept for the detection of soil-transmitted helminths and Schistosoma mansoni eggs in Kato-Katz stool thick smears. PLoS Neglected Tropical Diseases. 2022; 16(6): 1–16. Publisher Full Text
Winkler AS, Klohe K, Schmidt V, et al.: Neglected tropical diseases – the present and the future. Tidsskrift for Den Norske Legeforening (born 1971). 2018; 138. PubMed Abstract | Publisher Full Text | Free Full Text
World Health Organization: Global Health Estimates 2020: Disease burden by Cause, Age, Sex, by Country and by Region.2020; 2000–2019.
World Health Organization: Tenth report of the Strategic and Technical Advisory Group for Neglected Tropical Diseases (STAG-NTDs).2017.
World Health Organization: Virtual Meeting of Regional Technical Advisory Group for dengue and other arbovirus diseases (October).2021; pp. 4–6.
World Health Organization: First WHO report on neglected tropical diseases: working to overcome the global impact of neglected tropical diseases. World Health Organization; 2010; 1–184.
World Health Organization: Global programme to eliminate lymphatic filariasis: progress report, 2021. WHO Weekly Epidemiological Record; 2022b.
World Health Organization: Taenia solium - Use of existing diagnostic tools in public health programmes. World Health Organization; 2022a.
World Organisation for Animal Health: Rabies Technical Disease Information.2008; pp. 1–4.
Xu K, Lian F, Quan Y, et al.: Septicemic Melioidosis Detection Using Support Vector Machine with Five Immune Cell Types. Disease Markers. 2021; 2021: 1–9. PubMed Abstract | Publisher Full Text | Free Full Text
Yajima A, Ichimori K: Progress in the elimination of lymphatic filariasis in the Western Pacific Region: Successes and challenges. International Health. 2021; 13: S10–S16. Publisher Full Text
You Y, Lai X, Pan Y, et al.: Artificial intelligence in cancer target identification and drug discovery. Signal Transduction and Targeted Therapy. 2022; 7(1): 1–24. Publisher Full Text
Zeynudin A, Degefa T, Tesfaye M, et al.: Prevalence and intensity of soil-Transmitted helminth infections and associated risk factors among household heads living in the peri-urban areas of Jimma town, Oromia, Ethiopia: A community-based cross-sectional study. PLoS One. 2022; 17(9 September): 1–17.
Zhang Q, Yang L, Zhou F: Attention enhanced long short-term memory network with multi-source heterogeneous information fusion: An application to BGI Genomics. Information Sciences. 2021; 553: 305–330. Publisher Full Text
Zhou G, Chen M, Ju CJT, et al.: Mutation effect estimation on protein–protein interactions using deep contextualized representation learning. NAR Genomics and Bioinformatics. 2020; 2(2): 1–12. Publisher Full Text

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 15 Mar 2023

Author details Author details

ChungYuen Khew
Roles: Writing – Original Draft Preparation

Rahmad Akbar
Roles: Conceptualization, Project Administration, Supervision, Writing – Review & Editing

Norfarhan Mohd-Assaad
Roles: Funding Acquisition, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The authors acknowledge the Ministry of Higher Education, Malaysia for the financial support through Fundamental Research Grant Scheme (FRGS) funding (FRGS/1/2019/STG05/UKM/03/1) awarded to Norfarhan Mohd-Assaad. The APC was partially funded by Universiti Kebangsaan Malaysia (GGPM-2019-042).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (3)

version 3

Revised

Published: 17 Jul 2025, 12:287

https://doi.org/10.12688/f1000research.129064.3

version 2

Revised

Published: 20 May 2025, 12:287

https://doi.org/10.12688/f1000research.129064.2

version 1

Published: 15 Mar 2023, 12:287

https://doi.org/10.12688/f1000research.129064.1

© 2025 Khew C et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Khew C, Akbar R and Mohd-Assaad N. Progress and challenges for the application of machine learning for neglected tropical diseases [version 3; peer review: 3 approved]. F1000Research 2025, 12:287 (https://doi.org/10.12688/f1000research.129064.3)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 3

VERSION 3

PUBLISHED 17 Jul 2025

Revised

Views

Reviewer Report 23 Jul 2025

Erma Sulistyaningsih, Department of Parasitology, University of Jember, Jawa Timur, Indonesia

Approved

https://doi.org/10.5256/f1000research.184903.r398896

Dear authors,

Thank you ... Continue reading

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 23 Jul 2025

Karla P. Godinez-Macias, University of California San Diego, San Diego, California, USA

Approved

https://doi.org/10.5256/f1000research.184903.r398897

The authors addressed this author's ... Continue reading

CITE

Report a concern

Respond or Comment

Version 2

VERSION 2

PUBLISHED 20 May 2025

Revised

Views

Reviewer Report 09 Jul 2025

Karla P. Godinez-Macias, University of California San Diego, San Diego, California, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.179710.r394708

In this review, Khew et al. describe the use of machine learning in neglected tropical disease (NTD) and how it can be utilized to improve surveillance and treatment of neglected tropical diseases. This review provides a cohesive overview of neglected diseases defined by WHO, machine learning advances in various fields and how it has been applied to neglected diseases surveillance. Authors also discuss the current implications of machine learning, and thoroughly describe the various AI models available and what their outcomes are in terms of disease understanding and advances.

In this reviewer opinion, the suggested minor changes are:

It would be helpful to provide few examples to support some statements throughout the manuscript. For example, it is not clear which diseases are contributing to “149 countries and with more than 1.7 billion of infected individuals”
The line “thus greatly contributing to the spread of NTDs among women and children” (under DALY impact section) suggest that NTDs only affect women and children. Please rephrase to improve clarity and provide the corresponding citation.
Please consider replacing some casual phrasing (e.g., runner-ups, picked up) to a formal writing. This will drastically improve the tone and clarity throughout the text.
There are several minor grammatical errors throughout the text, please revise and correct them. Also, please check for consistency with the use of parenthesis and other punctuation markers (e.g., roman bullet points).
The second paragraph under “application of machine learning tools for NTDs” describes the literature of the past five years but describe results from 2017-2022 (8 years ago); I suggest revisiting this paragraph because literature has changed since that period. It will also be helpful to confirm the searches still hold; the use of machine learning has become very popular and some claims from this paragraph needs to be adjusted.
Could you please mention which NTDs do not have published studies with ML predicted drug targets (first paragraph under comparison with ML application in cancer research, …). Also, could you please update the number of studies in the repositories to reflect the past 5 years (search between 2020 – 2025).
Please add the missing citations to databases/datasets mentioned throughout the manuscript, if available. This is important to acknowledge the contributions of the original authors.
The conclusion describes the current implications of ML. It would be helpful to add a paragraph to summarize the main points of this review (e.g., current diseases status, ongoing work to tackle them), as well as some of the current and potential limitations of using ML in neglected diseases.

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes
Are all factual statements correct and adequately supported by citations?

Partly
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: bioinformatics, neglected diseases, drug discovery

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 09 Aug 2025

Ethan Khew, Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Malaysia

09 Aug 2025

Author Response

Response to reviewer on the suggested minor changes.

1. We thank the reviewer for this important observation. We acknowledge that the statement on global burden lacked immediate clarity. The ... Continue reading Response to reviewer on the suggested minor changes.

1. We thank the reviewer for this important observation. We acknowledge that the statement on global burden lacked immediate clarity. The figure referring to “149 countries and more than 1.7 billion infected individuals” is further contextualized in the following sentence, where we introduce the term "neglected tropical diseases" (NTDs) as defined by Peter Hotez and colleagues.

2. We thank the reviewer for highlighting this important point. The sentence under the DALY impact section has been revised to clarify that while NTDs affect all populations, women and children face increased risk due to environmental and socioeconomic factors. The revised sentence also includes an appropriate citation. This change has been implemented in the updated manuscript to improve clarity and accuracy.

3. Thank you for your helpful feedback. We have revised the relevant sentences to remove informal phrasing such as “runner-ups” and “picked up,” replacing them with more formal and precise language. These changes have been made to improve the overall tone and clarity of the manuscript, as suggested.

4. Thank you for pointing this out. We have carefully revised the manuscript to correct minor grammatical errors throughout the text. Additionally, we have reviewed and standardized the use of parentheses, punctuation marks, and formatting for Roman numeral bullet points to ensure consistency and alignment with academic writing conventions. We appreciate your attention to these important details.

5. We appreciate the reviewer’s observation regarding the inconsistency in the time frame of the literature review. In response, we have revised the paragraph under the “Application of Machine Learning Tools for NTDs” section to reflect that literature mining was conducted from 2023 onwards, focusing on studies published from 2019 to the present. This updated review ensures that recent developments, particularly those related to the growing use of machine learning in biomedical research, are accurately represented. The revised paragraph now includes specific examples of relevant studies and clarifies the rationale for the selected timeframe.

6. Thank you for your thoughtful comment. Upon re-evaluation, we agree that the referenced sentence is no longer essential to the clarity or value of the discussion. To improve the focus and flow of the paragraph, we have removed this sentence entirely. We believe the paragraph reads more cohesively without it and better aligns with the updated scope and timeframe of our literature review. Also, literature mining and data extraction were conducted starting in 2023, using a framework designed to capture publications from 2019 onward. This time window was selected to ensure consistency between the search methodology, dataset compilation, and subsequent analysis. Updating the repository counts specifically for the 2020–2025 period would introduce a temporal mismatch with the data pipeline and may not accurately reflect the framework used throughout the study. We believe the current approach offers a representative and coherent snapshot of recent developments in machine learning applications for NTDs.

7. Thank you for your thoughtful comment. We have carefully reviewed the manuscript and ensured that all databases and datasets mentioned are now properly cited and acknowledged, where citations are available. We understand the importance of recognizing the contributions of original authors and data providers, and we have checked this aspect thoroughly across multiple rounds of revision. Please be assured that all relevant sources have been appropriately credited in the updated manuscript.

8. We thank the reviewer for this valuable suggestion. While the current disease status and ongoing efforts to address them were originally discussed in the section titled "On regional collaboration, data, and infrastructure sharing", we recognize the importance of reiterating these points in the conclusion for clarity and emphasis. In response, we have updated the conclusion section to include a concise summary of disease burden, ongoing regional initiatives, and the potential and limitations of applying machine learning to neglected tropical diseases. We appreciate the reviewer’s input in helping us improve the structure and completeness of the manuscript.
Response to reviewer on the suggested minor changes.

1. We thank the reviewer for this important observation. We acknowledge that the statement on global burden lacked immediate clarity. The figure referring to “149 countries and more than 1.7 billion infected individuals” is further contextualized in the following sentence, where we introduce the term "neglected tropical diseases" (NTDs) as defined by Peter Hotez and colleagues.

2. We thank the reviewer for highlighting this important point. The sentence under the DALY impact section has been revised to clarify that while NTDs affect all populations, women and children face increased risk due to environmental and socioeconomic factors. The revised sentence also includes an appropriate citation. This change has been implemented in the updated manuscript to improve clarity and accuracy.

3. Thank you for your helpful feedback. We have revised the relevant sentences to remove informal phrasing such as “runner-ups” and “picked up,” replacing them with more formal and precise language. These changes have been made to improve the overall tone and clarity of the manuscript, as suggested.

4. Thank you for pointing this out. We have carefully revised the manuscript to correct minor grammatical errors throughout the text. Additionally, we have reviewed and standardized the use of parentheses, punctuation marks, and formatting for Roman numeral bullet points to ensure consistency and alignment with academic writing conventions. We appreciate your attention to these important details.

5. We appreciate the reviewer’s observation regarding the inconsistency in the time frame of the literature review. In response, we have revised the paragraph under the “Application of Machine Learning Tools for NTDs” section to reflect that literature mining was conducted from 2023 onwards, focusing on studies published from 2019 to the present. This updated review ensures that recent developments, particularly those related to the growing use of machine learning in biomedical research, are accurately represented. The revised paragraph now includes specific examples of relevant studies and clarifies the rationale for the selected timeframe.

6. Thank you for your thoughtful comment. Upon re-evaluation, we agree that the referenced sentence is no longer essential to the clarity or value of the discussion. To improve the focus and flow of the paragraph, we have removed this sentence entirely. We believe the paragraph reads more cohesively without it and better aligns with the updated scope and timeframe of our literature review. Also, literature mining and data extraction were conducted starting in 2023, using a framework designed to capture publications from 2019 onward. This time window was selected to ensure consistency between the search methodology, dataset compilation, and subsequent analysis. Updating the repository counts specifically for the 2020–2025 period would introduce a temporal mismatch with the data pipeline and may not accurately reflect the framework used throughout the study. We believe the current approach offers a representative and coherent snapshot of recent developments in machine learning applications for NTDs.

7. Thank you for your thoughtful comment. We have carefully reviewed the manuscript and ensured that all databases and datasets mentioned are now properly cited and acknowledged, where citations are available. We understand the importance of recognizing the contributions of original authors and data providers, and we have checked this aspect thoroughly across multiple rounds of revision. Please be assured that all relevant sources have been appropriately credited in the updated manuscript.

8. We thank the reviewer for this valuable suggestion. While the current disease status and ongoing efforts to address them were originally discussed in the section titled "On regional collaboration, data, and infrastructure sharing", we recognize the importance of reiterating these points in the conclusion for clarity and emphasis. In response, we have updated the conclusion section to include a concise summary of disease burden, ongoing regional initiatives, and the potential and limitations of applying machine learning to neglected tropical diseases. We appreciate the reviewer’s input in helping us improve the structure and completeness of the manuscript.
Competing Interests: No competing interests. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 09 Aug 2025

Ethan Khew, Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Malaysia

09 Aug 2025

Author Response

Response to reviewer on the suggested minor changes.

1. We thank the reviewer for this important observation. We acknowledge that the statement on global burden lacked immediate clarity. The ... Continue reading Response to reviewer on the suggested minor changes.

1. We thank the reviewer for this important observation. We acknowledge that the statement on global burden lacked immediate clarity. The figure referring to “149 countries and more than 1.7 billion infected individuals” is further contextualized in the following sentence, where we introduce the term "neglected tropical diseases" (NTDs) as defined by Peter Hotez and colleagues.

2. We thank the reviewer for highlighting this important point. The sentence under the DALY impact section has been revised to clarify that while NTDs affect all populations, women and children face increased risk due to environmental and socioeconomic factors. The revised sentence also includes an appropriate citation. This change has been implemented in the updated manuscript to improve clarity and accuracy.

3. Thank you for your helpful feedback. We have revised the relevant sentences to remove informal phrasing such as “runner-ups” and “picked up,” replacing them with more formal and precise language. These changes have been made to improve the overall tone and clarity of the manuscript, as suggested.

4. Thank you for pointing this out. We have carefully revised the manuscript to correct minor grammatical errors throughout the text. Additionally, we have reviewed and standardized the use of parentheses, punctuation marks, and formatting for Roman numeral bullet points to ensure consistency and alignment with academic writing conventions. We appreciate your attention to these important details.

5. We appreciate the reviewer’s observation regarding the inconsistency in the time frame of the literature review. In response, we have revised the paragraph under the “Application of Machine Learning Tools for NTDs” section to reflect that literature mining was conducted from 2023 onwards, focusing on studies published from 2019 to the present. This updated review ensures that recent developments, particularly those related to the growing use of machine learning in biomedical research, are accurately represented. The revised paragraph now includes specific examples of relevant studies and clarifies the rationale for the selected timeframe.

6. Thank you for your thoughtful comment. Upon re-evaluation, we agree that the referenced sentence is no longer essential to the clarity or value of the discussion. To improve the focus and flow of the paragraph, we have removed this sentence entirely. We believe the paragraph reads more cohesively without it and better aligns with the updated scope and timeframe of our literature review. Also, literature mining and data extraction were conducted starting in 2023, using a framework designed to capture publications from 2019 onward. This time window was selected to ensure consistency between the search methodology, dataset compilation, and subsequent analysis. Updating the repository counts specifically for the 2020–2025 period would introduce a temporal mismatch with the data pipeline and may not accurately reflect the framework used throughout the study. We believe the current approach offers a representative and coherent snapshot of recent developments in machine learning applications for NTDs.

7. Thank you for your thoughtful comment. We have carefully reviewed the manuscript and ensured that all databases and datasets mentioned are now properly cited and acknowledged, where citations are available. We understand the importance of recognizing the contributions of original authors and data providers, and we have checked this aspect thoroughly across multiple rounds of revision. Please be assured that all relevant sources have been appropriately credited in the updated manuscript.

8. We thank the reviewer for this valuable suggestion. While the current disease status and ongoing efforts to address them were originally discussed in the section titled "On regional collaboration, data, and infrastructure sharing", we recognize the importance of reiterating these points in the conclusion for clarity and emphasis. In response, we have updated the conclusion section to include a concise summary of disease burden, ongoing regional initiatives, and the potential and limitations of applying machine learning to neglected tropical diseases. We appreciate the reviewer’s input in helping us improve the structure and completeness of the manuscript.
Response to reviewer on the suggested minor changes.

1. We thank the reviewer for this important observation. We acknowledge that the statement on global burden lacked immediate clarity. The figure referring to “149 countries and more than 1.7 billion infected individuals” is further contextualized in the following sentence, where we introduce the term "neglected tropical diseases" (NTDs) as defined by Peter Hotez and colleagues.

2. We thank the reviewer for highlighting this important point. The sentence under the DALY impact section has been revised to clarify that while NTDs affect all populations, women and children face increased risk due to environmental and socioeconomic factors. The revised sentence also includes an appropriate citation. This change has been implemented in the updated manuscript to improve clarity and accuracy.

3. Thank you for your helpful feedback. We have revised the relevant sentences to remove informal phrasing such as “runner-ups” and “picked up,” replacing them with more formal and precise language. These changes have been made to improve the overall tone and clarity of the manuscript, as suggested.

4. Thank you for pointing this out. We have carefully revised the manuscript to correct minor grammatical errors throughout the text. Additionally, we have reviewed and standardized the use of parentheses, punctuation marks, and formatting for Roman numeral bullet points to ensure consistency and alignment with academic writing conventions. We appreciate your attention to these important details.

5. We appreciate the reviewer’s observation regarding the inconsistency in the time frame of the literature review. In response, we have revised the paragraph under the “Application of Machine Learning Tools for NTDs” section to reflect that literature mining was conducted from 2023 onwards, focusing on studies published from 2019 to the present. This updated review ensures that recent developments, particularly those related to the growing use of machine learning in biomedical research, are accurately represented. The revised paragraph now includes specific examples of relevant studies and clarifies the rationale for the selected timeframe.

6. Thank you for your thoughtful comment. Upon re-evaluation, we agree that the referenced sentence is no longer essential to the clarity or value of the discussion. To improve the focus and flow of the paragraph, we have removed this sentence entirely. We believe the paragraph reads more cohesively without it and better aligns with the updated scope and timeframe of our literature review. Also, literature mining and data extraction were conducted starting in 2023, using a framework designed to capture publications from 2019 onward. This time window was selected to ensure consistency between the search methodology, dataset compilation, and subsequent analysis. Updating the repository counts specifically for the 2020–2025 period would introduce a temporal mismatch with the data pipeline and may not accurately reflect the framework used throughout the study. We believe the current approach offers a representative and coherent snapshot of recent developments in machine learning applications for NTDs.

7. Thank you for your thoughtful comment. We have carefully reviewed the manuscript and ensured that all databases and datasets mentioned are now properly cited and acknowledged, where citations are available. We understand the importance of recognizing the contributions of original authors and data providers, and we have checked this aspect thoroughly across multiple rounds of revision. Please be assured that all relevant sources have been appropriately credited in the updated manuscript.

8. We thank the reviewer for this valuable suggestion. While the current disease status and ongoing efforts to address them were originally discussed in the section titled "On regional collaboration, data, and infrastructure sharing", we recognize the importance of reiterating these points in the conclusion for clarity and emphasis. In response, we have updated the conclusion section to include a concise summary of disease burden, ongoing regional initiatives, and the potential and limitations of applying machine learning to neglected tropical diseases. We appreciate the reviewer’s input in helping us improve the structure and completeness of the manuscript.
Competing Interests: No competing interests. Close
Report a concern

Version 1

VERSION 1

PUBLISHED 15 Mar 2023

Views

Reviewer Report 15 May 2024

Erma Sulistyaningsih, Department of Parasitology, University of Jember, Jawa Timur, Indonesia

Approved with Reservations

https://doi.org/10.5256/f1000research.141719.r215469

I found some references are more than 5-years old publications (2015, 2012, 2013, 2016).

The manuscript is about the application of machine learning for Neglected Tropical Diseases (NTDs), the authors already mentioned the list of 20 diseases of NTDs recognized by WHO (Table 1), however, the authors also write on malaria in the next session. Please specify the discussion on the 20 NTDs.

Is the topic of the review discussed comprehensively in the context of the current literature?

Partly
Are all factual statements correct and adequately supported by citations?

Yes
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Molecular parasitology, molecular medicine

CITE

Report a concern

Author Response 28 Jun 2024

Ethan Khew, Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Malaysia

28 Jun 2024

Author Response

Dear Erma Sulistyaningsih,

Thank you very much for your valuable feedback on our manuscript. We appreciate the time and effort you have taken to review our work and provide ... Continue reading Dear Erma Sulistyaningsih,

Thank you very much for your valuable feedback on our manuscript. We appreciate the time and effort you have taken to review our work and provide constructive comments.

Regarding the use of references older than five years, we selected these sources intentionally as many of them are seminal works or original studies that first addressed key issues (e.g., global distribution burden studies). These foundational papers are frequently cited by more recent publications, and we believe it is more ethical and academically honest to acknowledge the original sources. However, we understand the importance of including up-to-date references, and we will review our citations to ensure a balanced representation of both seminal and recent works. We will also add perspectives from recent publications, but we have noted that there is a lack of new research or evaluation on certain topics (such as disease burden), leading us to rely on older, yet still relevant, publications.

Concerning the inclusion of malaria in our discussion, we recognize that this might appear to deviate from the list of 20 NTDs recognized by the WHO. Our review is specifically focused on the Southeast Asia and Western Pacific regions, where there is a limited number of machine learning-based studies on the recognized NTDs. To provide a comprehensive overview, we expanded our scope to include other significant diseases such as malaria and melioidosis, which are of considerable concern in these regions. Both diseases, we believe, merit attention and potential inclusion in the NTD list due to their impact. We apologize for any confusion this may have caused and are open to amending the manuscript to clarify our rationale and align more closely with the recognized NTDs if that would address the concern.

We hope this clarifies our decisions and are ready to make necessary adjustments to improve the manuscript once we hear back from you. Thank you once again for your insightful feedback.

Best regards,
Ethan Khew
Dear Erma Sulistyaningsih,

Thank you very much for your valuable feedback on our manuscript. We appreciate the time and effort you have taken to review our work and provide constructive comments.

Regarding the use of references older than five years, we selected these sources intentionally as many of them are seminal works or original studies that first addressed key issues (e.g., global distribution burden studies). These foundational papers are frequently cited by more recent publications, and we believe it is more ethical and academically honest to acknowledge the original sources. However, we understand the importance of including up-to-date references, and we will review our citations to ensure a balanced representation of both seminal and recent works. We will also add perspectives from recent publications, but we have noted that there is a lack of new research or evaluation on certain topics (such as disease burden), leading us to rely on older, yet still relevant, publications.

Concerning the inclusion of malaria in our discussion, we recognize that this might appear to deviate from the list of 20 NTDs recognized by the WHO. Our review is specifically focused on the Southeast Asia and Western Pacific regions, where there is a limited number of machine learning-based studies on the recognized NTDs. To provide a comprehensive overview, we expanded our scope to include other significant diseases such as malaria and melioidosis, which are of considerable concern in these regions. Both diseases, we believe, merit attention and potential inclusion in the NTD list due to their impact. We apologize for any confusion this may have caused and are open to amending the manuscript to clarify our rationale and align more closely with the recognized NTDs if that would address the concern.

We hope this clarifies our decisions and are ready to make necessary adjustments to improve the manuscript once we hear back from you. Thank you once again for your insightful feedback.

Best regards,
Ethan Khew
Competing Interests: No competing interests. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 28 Jun 2024

Ethan Khew, Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Malaysia

28 Jun 2024

Author Response

Dear Erma Sulistyaningsih,

Thank you very much for your valuable feedback on our manuscript. We appreciate the time and effort you have taken to review our work and provide ... Continue reading Dear Erma Sulistyaningsih,

Thank you very much for your valuable feedback on our manuscript. We appreciate the time and effort you have taken to review our work and provide constructive comments.

Regarding the use of references older than five years, we selected these sources intentionally as many of them are seminal works or original studies that first addressed key issues (e.g., global distribution burden studies). These foundational papers are frequently cited by more recent publications, and we believe it is more ethical and academically honest to acknowledge the original sources. However, we understand the importance of including up-to-date references, and we will review our citations to ensure a balanced representation of both seminal and recent works. We will also add perspectives from recent publications, but we have noted that there is a lack of new research or evaluation on certain topics (such as disease burden), leading us to rely on older, yet still relevant, publications.

Concerning the inclusion of malaria in our discussion, we recognize that this might appear to deviate from the list of 20 NTDs recognized by the WHO. Our review is specifically focused on the Southeast Asia and Western Pacific regions, where there is a limited number of machine learning-based studies on the recognized NTDs. To provide a comprehensive overview, we expanded our scope to include other significant diseases such as malaria and melioidosis, which are of considerable concern in these regions. Both diseases, we believe, merit attention and potential inclusion in the NTD list due to their impact. We apologize for any confusion this may have caused and are open to amending the manuscript to clarify our rationale and align more closely with the recognized NTDs if that would address the concern.

We hope this clarifies our decisions and are ready to make necessary adjustments to improve the manuscript once we hear back from you. Thank you once again for your insightful feedback.

Best regards,
Ethan Khew
Dear Erma Sulistyaningsih,

Thank you very much for your valuable feedback on our manuscript. We appreciate the time and effort you have taken to review our work and provide constructive comments.

Regarding the use of references older than five years, we selected these sources intentionally as many of them are seminal works or original studies that first addressed key issues (e.g., global distribution burden studies). These foundational papers are frequently cited by more recent publications, and we believe it is more ethical and academically honest to acknowledge the original sources. However, we understand the importance of including up-to-date references, and we will review our citations to ensure a balanced representation of both seminal and recent works. We will also add perspectives from recent publications, but we have noted that there is a lack of new research or evaluation on certain topics (such as disease burden), leading us to rely on older, yet still relevant, publications.

Concerning the inclusion of malaria in our discussion, we recognize that this might appear to deviate from the list of 20 NTDs recognized by the WHO. Our review is specifically focused on the Southeast Asia and Western Pacific regions, where there is a limited number of machine learning-based studies on the recognized NTDs. To provide a comprehensive overview, we expanded our scope to include other significant diseases such as malaria and melioidosis, which are of considerable concern in these regions. Both diseases, we believe, merit attention and potential inclusion in the NTD list due to their impact. We apologize for any confusion this may have caused and are open to amending the manuscript to clarify our rationale and align more closely with the recognized NTDs if that would address the concern.

We hope this clarifies our decisions and are ready to make necessary adjustments to improve the manuscript once we hear back from you. Thank you once again for your insightful feedback.

Best regards,
Ethan Khew
Competing Interests: No competing interests. Close
Report a concern

Views

Reviewer Report 10 Oct 2023

Anupam Nath Jha, Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam, India

Approved

https://doi.org/10.5256/f1000research.141719.r208370

The authors have written about different NTDs and machine learning techniques in the review article.

Different types of research works are done and reviewed on neglected tropical diseases (e.g. Jha, 2023¹). Here the authors have listed all such diseases and explain the causing agent. Further the application of advance machine learning approaches in drug discovery and development of such diseases are elaborated.

It is good to read articles about the present status of NTDs but the authors are suggested to add the future scope of different computational approaches in surveillance, management and treatment of these diseases. Also, the limitations of people working in this field should be mentioned because limited data is not the only problem in handling NTDs.

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes
Are all factual statements correct and adequately supported by citations?

Partly
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

Partly

References

1. Jha A: Editorial: Computational approaches to build therapeutic paradigms targeting genes, proteins and pathways against neglected tropical diseases (NTDs). Frontiers in Genetics. 2023; 14. Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Computational Biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 15 Mar 2023

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 3 (revision) 17 Jul 25		read	read
Version 2 (revision) 20 May 25			read
Version 1 15 Mar 23	read	read

Anupam Nath Jha, Tezpur University, Tezpur, India
Erma Sulistyaningsih, University of Jember, Jawa Timur, Indonesia
Karla P. Godinez-Macias, University of California San Diego, San Diego, USA

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

3 Views

23 Jul 2025 | for Version 3

Erma Sulistyaningsih, Department of Parasitology, University of Jember, Jawa Timur, Indonesia

3 Views Cite this report Responses(0)

Approved

Dear authors,

Thank you for your clarification and added comprehensive discussion on NTDs.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

molecular tropical medicine especially in parasitology.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

4 Views

23 Jul 2025 | for Version 3

Karla P. Godinez-Macias, University of California San Diego, San Diego, California, USA

4 Views Cite this report Responses(0)

Approved

The authors addressed this author's concerns. No additional changes are needed.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

bioinformatics, neglected diseases, drug discovery

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

3 Views

09 Jul 2025 | for Version 2

Karla P. Godinez-Macias, University of California San Diego, San Diego, California, USA

3 Views Cite this report Responses(1)

Approved With Reservations

It would be helpful to provide few examples to support some statements throughout the manuscript. For example, it is not clear which diseases are contributing to “149 countries and with more than 1.7 billion of infected individuals”
The line “thus greatly contributing to the spread of NTDs among women and children” (under DALY impact section) suggest that NTDs only affect women and children. Please rephrase to improve clarity and provide the corresponding citation.
Please consider replacing some casual phrasing (e.g., runner-ups, picked up) to a formal writing. This will drastically improve the tone and clarity throughout the text.
There are several minor grammatical errors throughout the text, please revise and correct them. Also, please check for consistency with the use of parenthesis and other punctuation markers (e.g., roman bullet points).
The second paragraph under “application of machine learning tools for NTDs” describes the literature of the past five years but describe results from 2017-2022 (8 years ago); I suggest revisiting this paragraph because literature has changed since that period. It will also be helpful to confirm the searches still hold; the use of machine learning has become very popular and some claims from this paragraph needs to be adjusted.
Could you please mention which NTDs do not have published studies with ML predicted drug targets (first paragraph under comparison with ML application in cancer research, …). Also, could you please update the number of studies in the repositories to reflect the past 5 years (search between 2020 – 2025).
Please add the missing citations to databases/datasets mentioned throughout the manuscript, if available. This is important to acknowledge the contributions of the original authors.
The conclusion describes the current implications of ML. It would be helpful to add a paragraph to summarize the main points of this review (e.g., current diseases status, ongoing work to tackle them), as well as some of the current and potential limitations of using ML in neglected diseases.

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes
Are all factual statements correct and adequately supported by citations?

Partly
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

bioinformatics, neglected diseases, drug discovery

Respond to this report

Responses (1)

Author Response

09 Aug 2025

Ethan Khew, Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Malaysia

Response to reviewer on the suggested minor changes.

1. We thank the reviewer for this important observation. We acknowledge that the statement on global burden lacked immediate clarity. The figure referring to “149 countries and more than 1.7 billion infected individuals” is further contextualized in the following sentence, where we introduce the term "neglected tropical diseases" (NTDs) as defined by Peter Hotez and colleagues.

2. We thank the reviewer for highlighting this important point. The sentence under the DALY impact section has been revised to clarify that while NTDs affect all populations, women and children face increased risk due to environmental and socioeconomic factors. The revised sentence also includes an appropriate citation. This change has been implemented in the updated manuscript to improve clarity and accuracy.

3. Thank you for your helpful feedback. We have revised the relevant sentences to remove informal phrasing such as “runner-ups” and “picked up,” replacing them with more formal and precise language. These changes have been made to improve the overall tone and clarity of the manuscript, as suggested.

4. Thank you for pointing this out. We have carefully revised the manuscript to correct minor grammatical errors throughout the text. Additionally, we have reviewed and standardized the use of parentheses, punctuation marks, and formatting for Roman numeral bullet points to ensure consistency and alignment with academic writing conventions. We appreciate your attention to these important details.

5. We appreciate the reviewer’s observation regarding the inconsistency in the time frame of the literature review. In response, we have revised the paragraph under the “Application of Machine Learning Tools for NTDs” section to reflect that literature mining was conducted from 2023 onwards, focusing on studies published from 2019 to the present. This updated review ensures that recent developments, particularly those related to the growing use of machine learning in biomedical research, are accurately represented. The revised paragraph now includes specific examples of relevant studies and clarifies the rationale for the selected timeframe.

6. Thank you for your thoughtful comment. Upon re-evaluation, we agree that the referenced sentence is no longer essential to the clarity or value of the discussion. To improve the focus and flow of the paragraph, we have removed this sentence entirely. We believe the paragraph reads more cohesively without it and better aligns with the updated scope and timeframe of our literature review. Also, literature mining and data extraction were conducted starting in 2023, using a framework designed to capture publications from 2019 onward. This time window was selected to ensure consistency between the search methodology, dataset compilation, and subsequent analysis. Updating the repository counts specifically for the 2020–2025 period would introduce a temporal mismatch with the data pipeline and may not accurately reflect the framework used throughout the study. We believe the current approach offers a representative and coherent snapshot of recent developments in machine learning applications for NTDs.

7. Thank you for your thoughtful comment. We have carefully reviewed the manuscript and ensured that all databases and datasets mentioned are now properly cited and acknowledged, where citations are available. We understand the importance of recognizing the contributions of original authors and data providers, and we have checked this aspect thoroughly across multiple rounds of revision. Please be assured that all relevant sources have been appropriately credited in the updated manuscript.

8. We thank the reviewer for this valuable suggestion. While the current disease status and ongoing efforts to address them were originally discussed in the section titled "On regional collaboration, data, and infrastructure sharing", we recognize the importance of reiterating these points in the conclusion for clarity and emphasis. In response, we have updated the conclusion section to include a concise summary of disease burden, ongoing regional initiatives, and the potential and limitations of applying machine learning to neglected tropical diseases. We appreciate the reviewer’s input in helping us improve the structure and completeness of the manuscript.

View more View less

Competing Interests

No competing interests.

Back to all reports

Reviewer Report

23 Views

15 May 2024 | for Version 1

Erma Sulistyaningsih, Department of Parasitology, University of Jember, Jawa Timur, Indonesia

23 Views Cite this report Responses(1)

Approved With Reservations

Is the topic of the review discussed comprehensively in the context of the current literature?

Partly
Are all factual statements correct and adequately supported by citations?

Yes
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Molecular parasitology, molecular medicine

Respond to this report

Responses (1)

Author Response

28 Jun 2024

Ethan Khew, Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Malaysia

Dear Erma Sulistyaningsih,

Thank you very much for your valuable feedback on our manuscript. We appreciate the time and effort you have taken to review our work and provide constructive comments.

Regarding the use of references older than five years, we selected these sources intentionally as many of them are seminal works or original studies that first addressed key issues (e.g., global distribution burden studies). These foundational papers are frequently cited by more recent publications, and we believe it is more ethical and academically honest to acknowledge the original sources. However, we understand the importance of including up-to-date references, and we will review our citations to ensure a balanced representation of both seminal and recent works. We will also add perspectives from recent publications, but we have noted that there is a lack of new research or evaluation on certain topics (such as disease burden), leading us to rely on older, yet still relevant, publications.

Concerning the inclusion of malaria in our discussion, we recognize that this might appear to deviate from the list of 20 NTDs recognized by the WHO. Our review is specifically focused on the Southeast Asia and Western Pacific regions, where there is a limited number of machine learning-based studies on the recognized NTDs. To provide a comprehensive overview, we expanded our scope to include other significant diseases such as malaria and melioidosis, which are of considerable concern in these regions. Both diseases, we believe, merit attention and potential inclusion in the NTD list due to their impact. We apologize for any confusion this may have caused and are open to amending the manuscript to clarify our rationale and align more closely with the recognized NTDs if that would address the concern.

We hope this clarifies our decisions and are ready to make necessary adjustments to improve the manuscript once we hear back from you. Thank you once again for your insightful feedback.

Best regards,
Ethan Khew

View more View less

Competing Interests

No competing interests.

Back to all reports

Reviewer Report

14 Views

10 Oct 2023 | for Version 1

Anupam Nath Jha, Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam, India

14 Views Cite this report Responses(0)

Approved

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes
Are all factual statements correct and adequately supported by citations?

Partly
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

Partly

References

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Computational Biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] Abdullah M, Kadivella M, Sharma R, et al.: Designing of multiepitope-based vaccine against Leptospirosis using Immuno-Informatics approaches. bioRxiv. 2021. 2021.02.22.431920.

[2] Abela-Ridder B, Biswas G, Mbabazi PS, et al.: Ending the neglect to attain the sustainable development goals: a road map for neglected tropical diseases 2021–2030. Who. 2020.

[3] Afolabi MO, Adebiyi A, Cano J, et al.: Prevalence and distribution pattern of malaria and soil-transmitted helminth co-endemicity in sub-Saharan Africa, 2000–2018: A geospatial analysis. PLOS Neglected Tropical Diseases. 2022; 16(9): e0010321. PubMed Abstract | Publisher Full Text | Free Full Text

[4] Agampodi S, Gunarathna S, Lee JS, et al.: Global, regional, and country-level cost of leptospirosis due to loss of productivity in humans. PLoS Neglected Tropical Diseases. 2023; 17(8 August): e0011291. PubMed Abstract | Publisher Full Text | Free Full Text

[5] Aguilera-Pesantes D, Robayo LE, Méndez PE, et al.: Discovering key residues of dengue virus NS2b-NS3-protease: New binding sites for antiviral inhibitors design. Biochemical and Biophysical Research Communications. 2017; 492(4): 631–642. Publisher Full Text

[6] Ahangarcani M, Farnaghi M, Shirzadi MR, et al.: Predictive risk mapping of human leptospirosis using support vector machine classification and multilayer perceptron neural network. Geospatial Health. 2019; 14(1). PubMed Abstract | Publisher Full Text

[7] Akbar R, Bashour H, Rawat P, et al.: Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies. mAbs. 2022; 14(1). PubMed Abstract | Publisher Full Text | Free Full Text

[8] Akbar R, Robert PA, Pavlović M, et al.: A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Reports. 2021; 34(11): 108856. PubMed Abstract | Publisher Full Text

[9] Alemar Ali J: Prevention and Control of Rabies in Animals and Humans in Ethiopia.2022. Publisher Full Text

[10] Alemu Y, Degefa T, Bajiro M, et al.: Prevalence and intensity of soil-transmitted helminths infection among individuals in model and non-model households, South West Ethiopia: A comparative cross-sectional community based study. PLoS One. 2022; 17(10): e0276137. PubMed Abstract | Publisher Full Text | Free Full Text

[11] Alfred R, Obit JH: The roles of machine learning methods in limiting the spread of deadly diseases: A systematic review. Heliyon. 2021; 7(6): e07371. PubMed Abstract | Publisher Full Text | Free Full Text

[12] Ali SA, Niaz S, Aguilar-Marcelino L, et al.: Prevalence of Ascaris lumbricoides in contaminated faecal samples of children residing in urban areas of Lahore, Pakistan. Scientific Reports. 2020; 10(1): 1–8. Publisher Full Text

[13] Alley EC, Khimulya G, Biswas S, et al.: Unified rational protein engineering with sequence-based deep representation learning. Nature Methods. 2019; 16(12): 1315–1322. PubMed Abstract | Publisher Full Text | Free Full Text

[14] Alqahtani A: Application of Artificial Intelligence in Discovery and Development of Anticancer and Antidiabetic Therapeutic Agents. Evidence-based Complementary and Alternative Medicine. 2022; 2022.

[15] Andre-Fontaine G, Aviat F, Thorin C: Waterborne Leptospirosis: Survival and Preservation of the Virulence of Pathogenic Leptospira spp. in Fresh Water. Current Microbiology. 2015; 71(1): 136–142. PubMed Abstract | Publisher Full Text

[16] Ardila D, Kiraly AP, Bharadwaj S, et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine. 2019; 25(6): 954–961.

[17] Azevedo IR, Amamura TA, Isaac L: Human leptospirosis: In search for a better vaccine. Scandinavian Journal of Immunology. 2023; 98(5). John Wiley and Sons Inc. Publisher Full Text

[18] Baek M, DiMaio F, Anishchenko I, et al.: Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021; 373(6557): 871–876. PubMed Abstract | Publisher Full Text | Free Full Text

[19] Bepler T, Berger B: Learning the protein language: Evolution, structure, and function. Cell Systems. 2021; 12(6): 654–669.e3.

[20] Bergner LM, Mollentze N, Orton RJ, et al.: Characterizing and evaluating the zoonotic potential of novel viruses discovered in vampire bats. Viruses. 2021; 13(2). PubMed Abstract | Publisher Full Text | Free Full Text

[21] Bertsimas D, Wiberg H: Machine Learning in Oncology: Methods, Applications, and Challenges. JCO Clinical Cancer Informatics. 2020; 4(4): 885–894. PubMed Abstract | Publisher Full Text | Free Full Text

[22] Birnie E, Virk HS, Savelkoel J, et al.: Global burden of melioidosis in 2015: a systematic review and data synthesis. The Lancet Infectious Diseases. 2019; 19(8): 892–902. PubMed Abstract | Publisher Full Text | Free Full Text

[23] Bizhani N, Hashemi Hafshejani S, Mohammadi N, et al.: Lymphatic filariasis in Asia: a systematic review and meta-analysis. Parasitology Research. 2021 Feb; 120(2): 411–422. Springer Science and Business Media Deutschland GmbH. PubMed Abstract | Publisher Full Text | Free Full Text

[24] Boundenga L, Sima-Biyang YV, Longo-Pendy NM, et al.: Epidemiology and diversity of Plasmodium species in Franceville and their implications for malaria control. Scientific Reports. 2024; 14(1). PubMed Abstract | Publisher Full Text | Free Full Text

[25] Bradley EA, Lockaby G: Leptospirosis and the Environment: A Review and Future Directions. Pathogens. 2023; 12(9). Multidisciplinary Digital Publishing Institute (MDPI). PubMed Abstract | Publisher Full Text | Free Full Text

[26] Butala C, Brook TM, Majekodunmi AO, et al.: Neurocysticercosis: Current Perspectives on Diagnosis and Management. Frontiers in Veterinary Science. 2021; 8(May): 1–10.

[27] Bzdyl NM, Moran CL, Bendo J, et al.: Pathogenicity and virulence of Burkholderia pseudomallei. Virulence. 2022; 13(1): 1945–1965. PubMed Abstract | Publisher Full Text | Free Full Text

[28] Caldrer S, Ursini T, Santucci B, et al.: Soil-Transmitted Helminths and Anaemia: A Neglected Association Outside the Tropics. Microorganisms. 2022 May 13; 10(5): 1027. MDPI.PubMed Abstract | Publisher Full Text | Free Full Text

[29] Charoenkwan P, Schaduangrat N, Lio P, et al.: iAMAP-SCM: A Novel Computational Tool for Large-Scale Identification of Antimalarial Peptides Using Estimated Propensity Scores of Dipeptides. ACS Omega. 2022.

[30] Chen SJ, Rai CI, Wang SC, et al.: Infection and Prevention of Rabies Viruses. Microorganisms. 2025 Feb 9; 13(2): 380. Multidisciplinary Digital Publishing Institute (MDPI). PubMed Abstract | Publisher Full Text | Free Full Text

[31] Cheng AC, Currie BJ: Melioidosis: Epidemiology, Pathophysiology, and Management. Clinical Microbiology Reviews. 2005; 18(2): 383–416. PubMed Abstract | Publisher Full Text | Free Full Text

[32] Choi Y: Artificial intelligence for antibody reading comprehension: AntiBERTa. Patterns. 2022; 3(7): 100535. PubMed Abstract | Publisher Full Text | Free Full Text

[33] Chong NS, Hardwick RJ, Smith SR, et al.: A prevalence-based transmission model for the study of the epidemiology and control of soil-transmitted helminthiasis. PLoS One. 2022; 17(8 August): 1–28. Publisher Full Text

[34] Chowdhury R, Bouatta N, Biswas S, et al.: Single-sequence protein structure prediction using language models from deep learning. AIChE Annual Meeting, Conference Proceedings. 2021-Novem.

[35] Christophers SR: History of Malaria. British Medical Journal. 1951; 1(4711): 865–866. Publisher Full Text

[36] Condori RE, Niezgoda M, Lopez G, et al.: Using the LN34 pan-lyssavirus real-time RT-PCR assay for rabies diagnosis and rapid genetic typing from formalin-fixed human brain tissue. Viruses. 2020; 12(1). PubMed Abstract | Publisher Full Text | Free Full Text

[37] Côrtes N, Lira A, Prates-Syed W, et al.: Integrated control strategies for dengue, Zika, and Chikungunya virus infections. Frontiers in Immunology. 2023; 14. Publisher Full Text

[38] Costa F, Hagan JE, Calcagno J, et al.: Global Morbidity and Mortality of Leptospirosis: A Systematic Review. PLoS Neglected Tropical Diseases. 2015; 9(9): 0–1. Publisher Full Text

[39] Dacal E, Bermejo-Peláez D, Lin L, et al.: Mobile microscopy and telemedicine platform assisted by deep learning for the quantification of Trichuris trichiura infection. PLoS Neglected Tropical Diseases. 2021; 15(9): 1–14. Publisher Full Text

[40] Dara S, Dhamercherla S, Jadav SS, et al.: Machine Learning in Drug Discovery: A Review. Artificial Intelligence Review. Netherlands: Springer; 2022; 55. : 1947–1999. PubMed Abstract | Publisher Full Text | Free Full Text

[41] De Castro Poncio L, Dos Anjos FA, De Oliveira DA, et al.: Novel Sterile Insect Technology Program Results in Suppression of a Field Mosquito Population and Subsequently to Reduced Incidence of Dengue. Journal of Infectious Diseases. 2021; 224(6): 1005–1014. PubMed Abstract | Publisher Full Text

[42] Dey SK, Rahman MM, Howlader A, et al.: Prediction of dengue incidents using hospitalized patients, metrological and socioeconomic data in Bangladesh: A machine learning approach. PLoS One. 2022; 17(7 July): 1–17.

[43] Dickson BFR, Masson JJR, Mayfield HJ, et al.: Bayesian Network Analysis of Lymphatic Filariasis Serology from Myanmar Shows Benefit of Adding Antibody Testing to Post-MDA Surveillance. Tropical Medicine and Infectious Disease. 2022; 7(7). PubMed Abstract | Publisher Full Text | Free Full Text

[44] Douglass J, Graves P, Lindsay D, et al.: Lymphatic filariasis increases tissue compressibility and extracellular fluid in lower limbs of asymptomatic young people in central Myanmar. Tropical Medicine and Infectious Disease. 2017; 2(4): 1–14. Publisher Full Text

[45] Easton AV, Raciny-Aleman M, Liu V, et al.: Immune Response and Microbiota Profiles during Coinfection with Plasmodium vivax and Soil-Transmitted Helminths. mBio. 2020; 11(5): 1–17. Publisher Full Text

[46] Elumalai E: Characterization and Prediction of Dengue Virus targeting peptides based on three class of descriptors using k-NN and Random Forest.2022; pp. 1–17.

[47] Elvana A, Suryanto ED: Lymphatic Filariasis Detection Using Image Analysis.2022.

[48] Famakinde DO: Mosquitoes and the Lymphatic Filarial Parasites: Research Trends and Budding Roadmaps to Future Disease Eradication. Tropical Medicine and Infectious Disease. 2018; 3(1). PubMed Abstract | Publisher Full Text | Free Full Text

[49] Galipó E, Dixon MA, Fronterrè C, et al.: Spatial distribution and risk factors for human cysticercosis in Colombia. Parasites and Vectors. 2021; 14(1): 1–15. Publisher Full Text

[50] Garcia HH, Gonzalez AE, Gilman RH: Taenia solium Cysticercosis and Its Impact in Neurological Disease. 2020. Publisher Full Text

[51] Gassiep I, Bauer MJ, Harris PNA, et al.: Laboratory Safety: Handling Burkholderia pseudomallei Isolates without a Biosafety Cabinet. 2021.

[52] Geoffrey B, Sanker A, Madaj R, et al.: A program to automate the discovery of drugs for West Nile and Dengue virus—programmatic screening of over a billion compounds on PubChem, generation of drug leads and automated in silico modelling. Journal of Biomolecular Structure and Dynamics. 2022; 40(10): 4293–4300. PubMed Abstract | Publisher Full Text

[53] Giordano D, Biancaniello C, Argenio MA, et al.: Drug Design by Pharmacophore and Virtual Screening Approach. Pharmaceuticals. 2022; 15(5). MDPI. PubMed Abstract | Publisher Full Text | Free Full Text

[54] Guo W, Lv C, Guo M, et al.: Innovative applications of artificial intelligence in zoonotic disease management. Science in One Health. 2023; 2: 100045. Elsevier B.V. PubMed Abstract | Publisher Full Text | Free Full Text

[55] Haake DA, Levett PN: Leptospirosis in Humans. Journal of Biological Education. 2015; 65–97. Publisher Full Text

[56] Hagedoorn NN, Maze MJ, Carugati M, et al.: Global distribution of Leptospira serovar isolations and detections from animal host species: A systematic review and online database. Tropical Medicine and International Health. 2024; 29(3): 161–172. John Wiley and Sons Inc.PubMed Abstract | Publisher Full Text | Free Full Text

[57] Haladjian J, Ermis A, Hodaie Z, et al.: IPig: Towards tracking the behavior of free-roaming pigs. ACM International Conference Proceeding Series Part F1325. 2017.

[58] Heinzinger M, Elnaggar A, Wang Y, et al.: Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics. 2019; 20(1): 1–17. Publisher Full Text

[59] Hotez PJ, Lo NC: Neglected Tropical Diseases: Public Health Control Programs and Mass Drug Administration. Hunter’s Tropical Medicine and Emerging Infectious Diseases. 10th ed.Elsevier Inc.; 2020. Publisher Full Text

[60] Hoyos W, Aguilar J, Toro M: An autonomous cycle of data analysis tasks for the clinical management of dengue. Heliyon. 2022; 8(10): e10846. PubMed Abstract | Publisher Full Text | Free Full Text

[61] Hugo LE, Rašić G, Maynard AJ, et al.: Wolbachia wAlbB inhibit dengue and Zika infection in the mosquito Aedes aegypti with an Australian background. bioRxiv. 2022. 2022.03.22.485408.

[62] Hussein A, Alemu M, Ayehu A: Soil Contamination and Infection of School Children by Soil-Transmitted Helminths and Associated Factors at Kola Diba Primary School, Northwest Ethiopia: An Institution-Based Cross-Sectional Study. Journal of Tropical Medicine. 2022; 2022: 1–8. PubMed Abstract | Publisher Full Text | Free Full Text

[63] Jainul Fathima A, Revathy R, Balamurali S, et al.: Prediction of Dengue-Human Protein Interaction Using Artificial Neural Network for Anti-Viral Drug Discovery. SSRN Electronic Journal. 2019. (January). Publisher Full Text

[64] Jane Ling MY, Halim AFNA, Ahmad D, et al.: Rabies in Southeast Asia: A systematic review of its incidence, risk factors and mortality. BMJ Open. 2023; 13(5). PubMed Abstract | Publisher Full Text | Free Full Text

[65] Jumper J, Evans R, Pritzel A, et al.: Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596(7873): 583–589. PubMed Abstract | Publisher Full Text | Free Full Text

[66] Kabululu ML, Johansen MV, Lightowlers M, et al.: Aggregation of Taenia solium cysticerci in pigs: Implications for transmission and control. Parasite Epidemiology and Control. 2023; 22: e00307. Elsevier Ltd.PubMed Abstract | Publisher Full Text | Free Full Text

[67] Kaur R, Arora N, Jamakhani MA, et al.: Development of multi-epitope chimeric vaccine against Taenia solium by exploring its proteome: an in silico approach. Expert Review of Vaccines. 2020; 19(1): 105–114. PubMed Abstract | Publisher Full Text

[68] Kermelita D, Hadi UK, Soviana S, et al.: Species diversity, mosquito behavior, and microfilariae detection in vectors and reservoirs in filariasis-endemic areas of Bengkulu, Indonesia. Biodiversitas. 2024; 25(9): 3125–3131. Publisher Full Text

[69] Khalid AQ, Rao Avupati V, Hussain H: Machine Learning Model For Predicting Anti-Dengue Drugs: A Three-Dimensional Quantitative Structure-Activity Relationship (3D QSAR) Study. International Journal of Science & Technology Research. 2020; 9(6): 1107–1115.

[70] Kondeti PK, Ravi K, Mutheneni SR, et al.: Applications of machine learning techniques to predict filariasis using socio-economic factors. Epidemiology and Infection. 2019; 147: e260. PubMed Abstract | Publisher Full Text | Free Full Text

[71] Kourou K, Exarchos KP, Papaloukas C, et al.: Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Computational and Structural Biotechnology Journal. 2021; 19: 5546–5555.

[72] Lalremruata A, Jeyaraj S, Engleitner T, et al.: Species and genotype diversity of Plasmodium in malaria patients from Gabon analysed by next generation sequencing. Malaria Journal. 2017; 16(1): 1–11. Publisher Full Text

[73] Larsen JC, Johnson NH: Pathogenesis of Burkholderia pseudomallei and Burkholderia mallei. Military Medicine. 2009; 174(6): 647–651. Publisher Full Text

[74] Leem J, Mitchell LS, Farmery JHR, et al.: Deciphering the language of antibodies using self-supervised learning. Patterns. 2022; 3(7): 100513. PubMed Abstract | Publisher Full Text | Free Full Text

[75] Levett PN: Systematics of leptospiraceae. Current Topics in Microbiology and Immunology. 2015; 387: 11–20. Publisher Full Text

[76] Lim CC, Khairudin NAA, Loke SW, et al.: Comparison of Human Intestinal Parasite Ova Segmentation Using Machine Learning and Deep Learning Techniques. Applied Sciences (Switzerland). 2022; 12(15).

[77] Limmathurotsakul D, Golding N, Dance DAB, et al.: Predicted global distribution of Burkholderia pseudomallei and burden of melioidosis. Nature Microbiology. 2016; 1(1): 6–10. Publisher Full Text

[78] Lin Y, Fang K, Zheng Y, et al.: Global burden and trends of neglected tropical diseases from 1990 to 2019. Journal of Travel Medicine. 2022; 29(3). PubMed Abstract | Publisher Full Text

[79] Lin Z, Akin H, Rao R, et al.: Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv. 2022. 2022.07.20.500902.

[80] Littmann M, Heinzinger M, Dallago C, et al.: Embeddings from deep learning transfer GO annotations beyond homology. Scientific Reports. 2021; 11(1): 1–14. Publisher Full Text

[81] Lupo U, Sgarbossa D, Bitbol AF: Protein language models trained on multiple sequence alignments learn phylogenetic relationships. Nature Communications. 2022; 13(1): 1–23. Publisher Full Text

[82] Malavige GN, Sjö P, Singh K, et al.: Facing the escalating burden of dengue: Challenges and perspectives PLOS Global Public Health. 2023 Dec 15; 3(12): e0002598. Public Library of Science. PubMed Abstract | Publisher Full Text | Free Full Text

[83] Manuel M, Ramanujam K, Ajjampur SSR: Molecular Tools for Diagnosis and Surveillance of Soil-Transmitted Helminths in Endemic Areas. Parasitologia. 2021; 1(3): 105–118. Publisher Full Text

[84] Marquet C, Heinzinger M, Olenyi T, et al.: Embeddings from protein language models predict conservation and variant effects. Human Genetics. 2022; 141(10): 1629–1647. PubMed Abstract | Publisher Full Text | Free Full Text

[85] McComb M, Bies R, Ramanathan M: Machine learning in pharmacometrics: Opportunities and challenges. British Journal of Clinical Pharmacology. 2022; 88(4): 1482–1499. PubMed Abstract | Publisher Full Text

[86] Meumann EM, Limmathurotsakul D, Dunachie SJ, et al.: Burkholderia pseudomallei and melioidosis. Nature Reviews Microbiology. 2024; 22(3): 155–169. Publisher Full Text

[87] Mitra AK, Mawson AR: Neglected tropical diseases: Epidemiology and global burden. Tropical Medicine and Infectious Disease. 2017; 2(3). PubMed Abstract | Publisher Full Text | Free Full Text

[88] Mlowe F, Mlangwa J, Mkupasi E, et al.: Taenia solium Cysticercosis and Taeniosis Reporting in the Current Medical and Veterinary Diseases Reporting Systems in Tanzania: A Cross-Sectional Study. Veterinary Medicine International. 2024; 2024. PubMed Abstract | Publisher Full Text | Free Full Text

[89] Moawad AA, Silge A, Bocklitz T, et al.: A Machine Learning-Based Raman Spectroscopic Assay for the Identification of Burkholderia mallei and Related Species. Molecules. 2019; 24(24): 4516. PubMed Abstract | Publisher Full Text | Free Full Text

[90] Mogaji HO, Johnson OO, Adigun AB, et al.: Estimating the population at risk with soil transmitted helminthiasis and annual drug requirements for preventive chemotherapy in Ogun State, Nigeria. Scientific Reports. 2022; 12(1): 1–12. Publisher Full Text

[91] Mohammad Basir MF, Mohd Hairon S, Ibrahim MI, et al.: Development and Validation of Rabies Health Education Module (RaHEM) for Dog Owners in Kelantan, Malaysia: An ADDIE Model. Journal of Epidemiology and Global Health. 2025; 15(1): 12. PubMed Abstract | Publisher Full Text | Free Full Text

[92] Mohammadinia A, Saeidian B, Pradhan B, et al.: Prediction mapping of human leptospirosis using ANN, GWR, SVM and GLM approaches. BMC Infectious Diseases. 2019; 19(1): 1–18. Publisher Full Text

[93] Mollentze N, Babayan SA, Streicker DG: Identifying and prioritizing potential humaninfecting viruses from their genome sequences. PLoS Biology. 2021; 19(9). Publisher Full Text

[94] Molyneux D: Neglected Tropical Diseases - East Asia.Utzinger J, Yap P, Bratschi M, et al., (Pnyt.) Community Eye Health Journal. Cham: Springer International Publishing; 2019.

[95] Monnier N, Barth-Jaeggi T, Knopp S, et al.: Core components, concepts and strategies for parasitic and vector-borne disease elimination with a focus on schistosomiasis: A landscape analysis. PLoS Neglected Tropical Diseases. 2020; 14(10): e0008837. PubMed Abstract | Publisher Full Text | Free Full Text

[96] Morang’a CM, Amenga-Etego L, Bah SY, et al.: Machine learning approaches classify clinical malaria outcomes based on haematological parameters. BMC Medicine. 2020; 18(1): 1–16. Publisher Full Text

[97] Mswahili ME, Martin GL, Woo J, et al.: Antimalarial drug predictions using molecular descriptors and machine learning against plasmodium falciparum. Biomolecules. 2021; 11(12): 1–15. Publisher Full Text

[98] Muñoz-Antoli C, Pérez P, Pavón A, et al.: High intestinal parasite infection detected in children from Región Autónoma Atlántico Norte (R.A.A.N.) of Nicaragua. Scientific Reports. 2022; 12(1): 1–10. Publisher Full Text

[99] Narkkul U, Thaipadungpanit J, Srisawat N, et al.: Human, animal, water source interactions and leptospirosis in Thailand. Scientific Reports. 2021; 11(1): 3215. PubMed Abstract | Publisher Full Text | Free Full Text

[100] Narne KG, Kakumani J, aishnavi KISN, et al.: Disseminated Melioidosis Complicated by Prostatic Abscess and Splenic Involvement: Diagnostic and Therapeutic Insights. Cureus. 2024; 16: e69961. PubMed Abstract | Publisher Full Text | Free Full Text

[101] Navarrete-Perea J, Isasa M, Paulo JA, et al.: Quantitative multiplexed proteomics of Taenia solium cysts obtained from the skeletal muscle and central nervous system of pigs. PLoS Neglected Tropical Diseases. 2017; 11: e0005962. PubMed Abstract | Publisher Full Text | Free Full Text

[102] Nguyen VH, Tuyet-Hanh TT, Mulhall J, et al.: Deep learning models for forecasting dengue fever based on climate data in Vietnam. PLoS Neglected Tropical Diseases. 2022; 16(6): 1–22. Publisher Full Text

[103] Nyanasegran PK, Nathan S, Firdaus-Raih M, et al.: Biofilm Signaling, Composition and Regulation in Burkholderia pseudomallei.In Journal of Microbiology and Biotechnology.2023; 33(1): 15–27. Korean Society for Microbiolog and Biotechnology. PubMed Abstract | Publisher Full Text | Free Full Text

[104] Ofer D, Brandes N, Linial M: The language of proteins: NLP, machine learning & protein sequences. Computational and Structural Biotechnology Journal. 2021; 19: 1750–1758. PubMed Abstract | Publisher Full Text | Free Full Text

[105] Oguike OE, Ugwuishiwu CH, Asogwa CN, et al.: Systematic review on the application of machine learning to quantitative structure–activity relationship modeling against Plasmodium falciparum. Molecular Diversity. 2022; 26(6): 3447–3462. PubMed Abstract | Publisher Full Text | Free Full Text

[106] Okagbue HI, Oguntunde PE, Obasi ECM, et al.: Diagnosing malaria from some symptoms: a machine learning approach and public health implications. Health and Technology. 2021; 11(1): 23–37. Publisher Full Text

[107] Olsen TH, Moal IH, Deane CM: AbLang: an antibody language model for completing antibody sequences. Bioinformatics Advances. 2022; 2(1): 0–7. Publisher Full Text

[108] Ong E, Wang H, Wong MU, et al.: Vaxign-ML: Supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens. Bioinformatics. 2020; 36(10): 3185–3191. PubMed Abstract | Publisher Full Text | Free Full Text

[109] Ong E, Wang H, Wong MU, et al.: Vaxign-ML: Supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens. Bioinformatics. 2020; 36(10): 3185–3191. PubMed Abstract | Publisher Full Text | Free Full Text

[110] Oyewole OE, Simon-Oke IA: Ecological risk factors of soil-transmitted helminths infections in Ifedore district, Southwest Nigeria. Bulletin of the National Research Centre. 2022; 46(1). Publisher Full Text

[111] Panja M, Chakraborty T, Nadim SS, et al.: An ensemble neural network approach to forecast Dengue outbreak based on climatic condition.2022; 1–32.

[112] Parashar R, Nanda S, Smith SL, et al.: Comparing priority received by global health issues: A measurement framework applied to tuberculosis, malaria, diarrhoeal diseases and dengue fever. BMJ Global Health. 2024; 9(7). PubMed Abstract | Publisher Full Text | Free Full Text

[113] Ragno R: www.3d-qsar.com: a web portal that brings 3-D QSAR to all electronic devices—the Py-CoMFA web application as tool to build models from pre-aligned datasets. Journal of Computer-Aided Molecular Design. 2019; 33(9): 855–864. Publisher Full Text

[114] Rahmat F, Zulkafli Z, Juraiza Ishak A, et al.: Exploratory Data Analysis and Artificial Neural Network for Prediction of Leptospirosis Occurrence in Seremban, Malaysia Based on Meteorological Data. Frontiers in Earth Science. 2020; 8(November): 1–14.

[115] Ranathunge T, Harishchandra J, Maiga H, et al.: Development of the Sterile Insect Technique to control the dengue vector Aedes aegypti (Linnaeus) in Sri Lanka. PLoS One. 2022; 17(4 April): 1–15.

[116] Rapin N, Lund O, Bernaschi M, et al.: Computational immunology meets bioinformatics: The use of prediction tools for molecular binding in the simulation of the immune system. PLoS One. 2010; 5(4): e9862. PubMed Abstract | Publisher Full Text | Free Full Text

[117] Rives A, Meier J, Sercu T, et al.: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences of the United States of America. 2021; 118(15). Publisher Full Text

[118] Rout SS, Roy S, Mishra SN, et al.: Melioidosis and Burkholderia pseudomallei: Disease mechanisms, drug resistance, and treatment challenges. Journal of Integrative Medicine and Research. 2025; 3(1): 14–23. Publisher Full Text

[119] Saba N, Balwan WK, Mushtaq F: Burden of Malaria - A Journey Revisited. Scholars Journal of Applied Medical Sciences. 2022; 10(6): 934–939. Publisher Full Text

[120] Saleh AY, Medang SA, Ibrahim AO: Rabies Outbreak Prediction Using Deep Learning with Long Short-Term Memory. Advances in Intelligent Systems and Computing. 2020; 330–340.

[121] dos Santos Nascimento IJ , da Silva Rodrigues ÉE , da Silva MF , et al.: Advances in Computational Methods to Discover New NS2B-NS3 Inhibitors Useful Against Dengue and Zika Viruses. Current Topics in Medicinal Chemistry. 2022; 22(29): 2435–2462. PubMed Abstract | Publisher Full Text

[122] Sarkar-Tyson M, Titball RW: Burkholderia mallei and Burkholderia pseudomallei. Vaccines for Biodefense and Emerging and Neglected Diseases. Elsevier; 2009; 831–843. Publisher Full Text

[123] Sato S: Plasmodium—a brief introduction to the parasites causing human malaria and their basic biology. Journal of Physiological Anthropology. 2021; 40(1): 1. BioMed Central Ltd. PubMed Abstract | Publisher Full Text | Free Full Text

[124] Sayanthi Y, Susanna D: Pathogenic Leptospira contamination in the environment: a systematic review Infection Ecology and Epidemiology. 2024; 14(1). Taylor and Francis Ltd.PubMed Abstract | Publisher Full Text | Free Full Text

[125] Scavuzzo CM, Scavuzzo JM, Campero MN, et al.: Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP. Infectious Disease Modelling. 2022; 7(1): 262–276. PubMed Abstract | Publisher Full Text | Free Full Text

[126] Scoffone VC, Trespidi G, Barbieri G, et al.: Methodological tools to study species of the genus Burkholderia. Applied Microbiology and Biotechnology. 2021; 105(24): 9019–9034. Springer Science and Business Media Deutschland GmbH. PubMed Abstract | Publisher Full Text | Free Full Text

[127] Selvarajoo S, Liew JWK, Chua TH, et al.: Dengue surveillance using gravid oviposition sticky (GOS) trap and dengue non-structural 1 (NS1) antigen test in Malaysia: randomized controlled trial. Scientific Reports. 2022; 12(1): 1–12. Publisher Full Text

[128] Shao D, Dai Y, Li N, et al.: Artificial intelligence in clinical research of cancers. Briefings in Bioinformatics. 2022; 23(1): 1–12. Publisher Full Text

[129] Singh J, Arora MS, Sharma S, et al.: Modeling the variable transmission rate and various discharges on the spread of Malaria. Electronic Research Archive. 2022; 31(1): 319–341. Publisher Full Text

[130] Srisawat N, Thisyakorn U, Ismail Z, et al.: World Dengue Day: A call for action. PLoS Neglected Tropical Diseases. 2022; 16(8): 2–10. Publisher Full Text

[131] Stärk H, Dallago C, Heinzinger M, et al.: Light attention predicts protein location from the language of life. Bioinformatics Advances. 2021; 1(1). PubMed Abstract | Publisher Full Text | Free Full Text

[132] Sun AH, Liu XX, Yan J: Leptospirosis is an invasive infectious and systemic inflammatory disease. Biomedical Journal. 2020; 43(1): 24–31. PubMed Abstract | Publisher Full Text | Free Full Text

[133] Suratanee A, Buaboocha T, Plaimas K: Prediction of Human-Plasmodium vivax Protein Associations From Heterogeneous Network Structures Based on Machine-Learning Approach. Bioinformatics and Biology Insights. 2021; 15: 117793222110133. PubMed Abstract | Publisher Full Text | Free Full Text

[134] Syed AH, Khan T: Evolution of research trends in artificial intelligence for breast cancer diagnosis and prognosis over the past two decades: A bibliometric analysis. Frontiers in Oncology. 2022; 12. PubMed Abstract | Publisher Full Text | Free Full Text

[135] Sykes JE, Reagan KL, Nally JE, et al.: Role of Diagnostics in Epidemiology, Management, Surveillance, and Control of Leptospirosis. Pathogens. 2022; 11(4): 1–24. Publisher Full Text

[136] Tai KY, Dhaliwal J: Machine learning model for malaria risk prediction based on mutation location of large-scale genetic variation data. Journal of Big Data. 2022; 9(1). Publisher Full Text

[137] Taylor B: Artificial Intelligence in Oncology Drug Discovery and Development. Artificial Intelligence in Oncology Drug Discovery and Development. 2020.

[138] Thanapongtharm W, Kasemsuwan S, Wongphruksasoong V, et al.: Spatial Distribution and Population Estimation of Dogs in Thailand: Implications for Rabies Prevention and Control. Frontiers in Veterinary Science. 2021; 8(December): 1–12. Publisher Full Text

[139] Torgerson PR, Hagan JE, Costa F, et al.: Global Burden of Leptospirosis: Estimated in Terms of Disability Adjusted Life Years. PLoS Neglected Tropical Diseases. 2015; 9(10): e0004122. PubMed Abstract | Publisher Full Text | Free Full Text

[140] Torgerson PR, Hagan JE, Costa F, et al.: Global Burden of Leptospirosis: Estimated in Terms of Disability Adjusted Life Years. PLoS Neglected Tropical Diseases. 2015; 9(10): e0004122. PubMed Abstract | Publisher Full Text | Free Full Text

[141] Tsheten T, Gray DJ, Clements ACA, et al.: Epidemiology and challenges of dengue surveillance in the WHO South-East Asia Region. Transactions of the Royal Society of Tropical Medicine and Hygiene. 2021; 115(6): 583–599. Publisher Full Text

[142] Urbanskas E, Karvelienė B, Radzijevskaja J: Leptospirosis: classification, epidemiology, and methods of detection. A review. Biologija. 2022; 68(2): 129–136. Publisher Full Text

[143] Vegvari C, Giardina F, Malizia V, et al.: Impact of Key Assumptions about the Population Biology of Soil-Transmitted Helminths on the Sustainable Control of Morbidity. Clinical Infectious Diseases. 2021; 72: S188–S194. PubMed Abstract | Publisher Full Text | Free Full Text

[144] Vinkeles Melchers NVS, Stolk WA, van Loon W , et al.: The burden of skin disease and eye disease due to onchocerciasis in countries formerly under the african programme for onchocerciasis control mandate for 1990, 2020, and 2030. PLoS Neglected Tropical Diseases. 2021; 15(7): 1–18. PubMed Abstract | Publisher Full Text | Free Full Text

[145] Vos T, Lim SS, Abbafati C, et al.: Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. The Lancet. 2020; 396(10258): 1204–1222. PubMed Abstract | Publisher Full Text | Free Full Text

[146] Vu MH, Akbar R, Robert PA, et al.: Advancing protein language models with linguistics: a roadmap for improved interpretability. arXiv. 2022; 1–26.

[147] Wainaina M, Wasonga J, Cook EAJ: Epidemiology of human and animal leptospirosis in Kenya: A systematic review and meta-analysis of disease occurrence, serogroup diversity and risk factors. PLOS Neglected Tropical Diseases. 2024; 18(9): e0012527. PubMed Abstract | Publisher Full Text | Free Full Text

[148] Ward P, Dahlberg P, Lagatie O, et al.: Affordable artificial intelligence-based digital pathology for neglected tropical diseases: A proof-of-concept for the detection of soil-transmitted helminths and Schistosoma mansoni eggs in Kato-Katz stool thick smears. PLoS Neglected Tropical Diseases. 2022; 16(6): 1–16. Publisher Full Text

[149] Winkler AS, Klohe K, Schmidt V, et al.: Neglected tropical diseases – the present and the future. Tidsskrift for Den Norske Legeforening (born 1971). 2018; 138. PubMed Abstract | Publisher Full Text | Free Full Text

[150] World Health Organization: Global Health Estimates 2020: Disease burden by Cause, Age, Sex, by Country and by Region.2020; 2000–2019.

[151] World Health Organization: Tenth report of the Strategic and Technical Advisory Group for Neglected Tropical Diseases (STAG-NTDs).2017.

[152] World Health Organization: Virtual Meeting of Regional Technical Advisory Group for dengue and other arbovirus diseases (October).2021; pp. 4–6.

[153] World Health Organization: First WHO report on neglected tropical diseases: working to overcome the global impact of neglected tropical diseases. World Health Organization; 2010; 1–184.

[154] World Health Organization: Global programme to eliminate lymphatic filariasis: progress report, 2021. WHO Weekly Epidemiological Record; 2022b.

[155] World Health Organization: Taenia solium - Use of existing diagnostic tools in public health programmes. World Health Organization; 2022a.

[156] World Organisation for Animal Health: Rabies Technical Disease Information.2008; pp. 1–4.

[157] Xu K, Lian F, Quan Y, et al.: Septicemic Melioidosis Detection Using Support Vector Machine with Five Immune Cell Types. Disease Markers. 2021; 2021: 1–9. PubMed Abstract | Publisher Full Text | Free Full Text

[158] Yajima A, Ichimori K: Progress in the elimination of lymphatic filariasis in the Western Pacific Region: Successes and challenges. International Health. 2021; 13: S10–S16. Publisher Full Text

[159] You Y, Lai X, Pan Y, et al.: Artificial intelligence in cancer target identification and drug discovery. Signal Transduction and Targeted Therapy. 2022; 7(1): 1–24. Publisher Full Text

[160] Zeynudin A, Degefa T, Tesfaye M, et al.: Prevalence and intensity of soil-Transmitted helminth infections and associated risk factors among household heads living in the peri-urban areas of Jimma town, Oromia, Ethiopia: A community-based cross-sectional study. PLoS One. 2022; 17(9 September): 1–17.

[161] Zhang Q, Yang L, Zhou F: Attention enhanced long short-term memory network with multi-source heterogeneous information fusion: An application to BGI Genomics. Information Sciences. 2021; 553: 305–330. Publisher Full Text

[162] Zhou G, Chen M, Ju CJT, et al.: Mutation effect estimation on protein–protein interactions using deep contextualized representation learning. NAR Genomics and Bioinformatics. 2020; 2(2): 1–12. Publisher Full Text

Progress and challenges for the application of machine learning for neglected tropical diseases

Abstract

Keywords

Revised Amendments from Version 2

Introduction

Neglected tropical diseases (NTDs)

Table 1. List of NTDs recognized by WHO.

Disability-adjusted life year (DALY) impact of NTDs

Table 2. Global burden for 14 of 20 NTDs estimated in Disability-Adjusted Life Years (DALYs).

NTDs and other diseases that are of public health concern in SEA region

Table 3. The disability-adjusted life year (DALY) Estimates for 14 of 20 NTDS by WHO Region.

Analysis of recent literature

Application of machine learning tools for NTDs

Comparisons with ML application in cancer research, computer vision, protein language models

On regional collaboration, data, and infrastructure sharing

Conclusion

Data availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated