Risk prediction model for 24-hour mortality in preterm infants using lactate and blood gas analysis: A machine learning approach and retrospective cohort study

Felipe Yu Matsushita; Vera Lúcia Jornada Krebs; Werther Brunow de Carvalho

doi:10.12688/f1000research.110711.1

Home Browse Risk prediction model for 24-hour mortality in preterm infants using...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Risk prediction model for 24-hour mortality in preterm infants using lactate and blood gas analysis: A machine learning approach and retrospective cohort study

[version 1; peer review: 2 not approved]

Felipe Yu Matsushita ¹, Vera Lúcia Jornada Krebs¹, Werther Brunow de Carvalho¹

PUBLISHED 20 Apr 2022

Author details Author details

¹ Pediatrics, Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo, São Paulo, São Paulo, 05403-000, Brazil

Felipe Yu Matsushita
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Software, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Vera Lúcia Jornada Krebs
Roles: Conceptualization, Supervision, Validation, Writing – Review & Editing

Werther Brunow de Carvalho
Roles: Conceptualization, Project Administration, Supervision, Validation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the Python collection.

Abstract

Background: This study aimed to evaluate the performance of machine learning algorithms using lactate and arterial blood gas parameters to predict the imminent risk of death in extremely low birth weight infants.
Methods: A retrospective cohort study analyzing preterm infants with birth weight less than 1000 grams in a single-center tertiary neonatal intensive care unit in São Paulo, Brazil, between 2012 and 2017 was carried out. We included all infants with at least one arterial blood gas analysis with paired serum lactate. To assess 24-hour mortality risk, we conducted three machine learning algorithms (Logistic Regression, Extreme Gradient Boosting, and AutoML Tables).
Results: We analyzed 1932 blood gas samples with matched lactate measurements. Our study population had a median gestational age of 27.1 (26 – 29.1) weeks and a median birth weight of 746 (600 – 880) grams. The Extreme Gradient Boosting model with lactate achieved the highest area under the receiver operating characteristic (AUROC) of 0.898. Base excess, lactate, and pH were, in order of importance, the most important features associated with 24-hour mortality.
Conclusions: Incorporating lactate and blood gas samples into real-time mortality predictive models may aid to identify those preterm infants with a higher risk of death.

Keywords

preterm, machine learning, artificial intelligence, prediction, mortality

Corresponding author: Felipe Yu Matsushita

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2022 Matsushita FY et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Matsushita FY, Krebs VLJ and de Carvalho WB. Risk prediction model for 24-hour mortality in preterm infants using lactate and blood gas analysis: A machine learning approach and retrospective cohort study [version 1; peer review: 2 not approved]. F1000Research 2022, 11:444 (https://doi.org/10.12688/f1000research.110711.1) First published: 20 Apr 2022, 11:444 (https://doi.org/10.12688/f1000research.110711.1) Latest published: 20 Apr 2022, 11:444 (https://doi.org/10.12688/f1000research.110711.1)

Introduction

Despite the fact that child mortality has decreased significantly in recent decades, neonatal mortality remains the major contributor¹ and prematurity is the largest direct cause of death in this population.² More than half of neonatal deaths occur within the first three days of life.³ In this context, several predictive models use prenatal and immediately post-natal factors to predict death. However, in the neonatal intensive care unit (NICU) the patient’s condition changes over time. Moreover, more than 30% of neonatal deaths occur after three days of life.³ Real-time features, rather than non-modifiable baseline variables, may provide a more customized evaluation of mortality risk.

Predictive models are becoming increasingly popular in clinical research. A number of conditions have been predicted using machine learning techniques, from mortality in intensive care units to morbidities, such as acute kidney injury, septic shock, and heart failure.⁴ The models that are now available for assessing neonatal mortality risk are mostly focused on NICU admission.⁵^–⁹ Unfortunately, few models use objective criteria to evaluate mortality risk in real-time.

To address this knowledge gap, we conducted a study to compare the effectiveness of machine learning algorithms for predicting NICU mortality using objective laboratorial features. The objective of this study was to evaluate serum lactate and blood gas analysis as a predictor of mortality in extremely low birth weight infants using machine learning algorithms.

Methods

Ethics statement

The study protocol was approved by the institutional ethics committee – Comitê de Ética do Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo, approval number: CAAE 15762719.6.0000.0068. The requirement for informed consent was waived by the committee given the retrospective nature of the study.

Study population

We analyzed data of extremely low birth weight infants born at a single-center tertiary neonatal intensive care unit in São Paulo, Brazil, between 2012 and 2017. The data collection and the methods have already been published in our previous study.¹⁰ All data were obtained from electronic medical records of each patient using specific keywords related to clinical and laboratorial parameters in December 2019 and extracted to a CSV file.²² All neonates with birth weight lower than 1000 grams born between 2012 and 2017 and who had at least one arterial blood gas analysis with paired serum lactate level during neonatal intensive care stay were included. Neonates with severe malformation, complex congenital heart disease, or transferred to another unit before discharge were excluded. Baseline characteristics included birth gestational age (in weeks), birth weight (grams), CRIB II score (Clinical Risk Index for Babies), small for gestational age (defined as birth weight < p10 Fenton growth scale), female gender, 5-minute APGAR score, vaginal birth, twin birth, antenatal corticoid, endotracheal intubation in the delivery room, epinephrine necessity in the delivery room, chorioamnionitis, and the lowest temperature in the first 12 hours of life. Serum lactate levels are presented in mmoL/L.

Predictive parameters

A total of seven feasible parameters were introduced into the machine learning algorithms. These parameters included blood gas analysis features (pH, pCO₂, HCO₃, base excess), serum lactate, and general characteristics (days of life and corrected gestational age at the time of laboratory measurement).

Machine learning model development

We compared the performance of three machine learning methods to assess 24-hour mortality risk: Logistic Regression,¹¹ Extreme Gradient Boosting,¹² and AutoML Tables.¹³ Patient data were randomly divided into two subsets: a training subset (80%) for hyperparameter tuning to create a plausible model, and a validation subset (20%) for testing the model’s performance. All data with missing values were excluded from the analysis. To select the optimal model, we performed hyperparameter tuning for the Extreme Gradient Boosting model. Hyperparameters are a set of extra parameters that must be established before the learning process to improve the algorithm performance.

Logistic regression classification is a machine learning technique that predicts outcomes using dependent variables and a logit function. It is frequently used as a baseline comparison for binary classifications because it is not only simple to construct but also capable of high performance.¹¹ Extreme Gradient Boosting is an ensemble machine learning technique of weak prediction models. These weak models are added one at a time and fit to correct the prediction errors made by prior models.¹⁴ For Extreme Gradient Boosting the following hyperparameters were used: LEARN_RATE = 0.1, BOOSTER_TYPE = ‘GBTREE’, MAX_TREE_DEPTH = 3, SUBSAMPLE = 0.85, EARLY_STOP = TRUE. AutoML Tables is a Google Cloud Platform feature that automatically starts training for multiple model architectures (Linear, Feedforward deep neural network, Gradient Boosted Decision Tree, AdaNet, Ensembles), and it determines the best model. The primary outcome was death within 24 hours after collecting arterial blood gas and lactate.

Statistical analysis

Continuous variables were tested for normality using Kolmogorov-Smirnov test. To compare demographic characteristics, we used chi-square for categorical variables and Mann-Whitney test for continuous variables. After the optimal hyperparameters were determined for each ML algorithm, we calculated the area under the receiver operating characteristics (AUROC), accuracy, precision, and recall. All analyses were conducted using Python version 3.6.9¹⁵ and Google Cloud Platform. All patients with missing data were excluded from the study.

Results

We identified a total of 257 neonates who had at least one blood gas analysis and a paired serum lactate during neonatal intensive care unit stay from 2012 to 2017. Eleven patients were excluded because of missing data. Baseline characteristics are presented in Table 1. The median gestational age was 27.1 (26 – 29.1) weeks and the median birth weight was 746 (600 – 880) grams. We found 1932 blood gas samples with corresponding serum lactate levels.

Table 1. Baseline characteristics of the study sample.

Variable	Total (n = 257)	Survivors (n = 148)	Non-survivors (n = 109)	P-value
Gestational age (wk), median (IQR)	27.1 (26–29.1)	28 (26.5–30.1)	26.3 (25–27.4)	<0.001
Birth weight (g), median (IQR)	746 (600–880)	822 (700–940)	610 (530–760)	<0.001
CRIB II score, median (IQR)	12 (10–14)	11 (9–12)	14 (12–15)	<0.001
Small for gestational age, n (%)	125 (48.6)	73 (49.3)	52 (47.7)	0.798
Female gender, n (%)	123 (47.9)	74 (50)	49 (45)	0.424
5-minute APGAR score, median (IQR)	8 (6–9)	8 (7–9)	8 (5–8)	<0.001
Vaginal birth, n (%)	41 (16)	13 (8.8)	28 (25.7)	<0.001
Twin birth, n (%)	77 (30)	33 (22.3)	44 (40.4)	0.002
Antenatal corticoid, n (%)	129 (50.2)	79 (53.4)	51 (46.8)	0.234
Endotracheal intubation in the delivery room, n (%)	167 (65)	81 (54.7)	86 (78.9)	<0.001
Epinephrine in the delivery room, n (%)	28 (10.9)	11 (7.4)	17 (15.6)	0.038
Chorioamnionitis, n (%)	29 (11.3)	12 (8.1)	17 (15.6)	0.061
Lowest temperature in the first 12 hours of life, median (IQR)	35 (34.1–35.8)	35 (34.5–35.8)	34.5 (33.8–35.2)	<0.001

All models demonstrated good accuracy (91 – 94%) and AUROC results (0.807 – 0.898) when using blood gas measurements with lactate as features. However, their recall (9-29%) was low. The Extreme Gradient Boosting algorithm obtained the highest AUROC score (0.898), accuracy (94.1%), and precision (87.5%) (Table 2). We then used AutoML Tables to determine the importance of each feature associated with 24-hour mortality, and the top three features were, in order: base excess, lactate, and pH levels (Figure 1).

Table 2. 24-hour mortality prediction performance of machine learning models using blood gas features with lactate.

AUROC, area under the receiver operating characteristic.

	Accuracy	Precision	Recall	AUROC
Extreme Gradient Boosting	0.941	0.875	0.25	0.898
AutoML Tables	0.933	0.833	0.29	0.826
Logistic Regression	0.913	0.375	0.09	0.807

Figure 1. Feature importance.

BE: Base excess; cGA: Corrected gestational age; BIC: HCO3; DOL: Days of life.

When lactate was removed as a feature in machine learning models, the AUROC score of the Extreme Gradient Boosting model dropped considerably (0.807) (Table 3).

Table 3. 24-hour mortality prediction performance of machine learning models using blood gas features without lactate.

AUROC, area under the receiver operating characteristic.

	Accuracy	Precision	Recall	AUROC
AutoML Tables	0.938	1.000	0.294	0.857
Logistic Regression	0.940	0.600	0.130	0.848
Extreme Gradient Boosting	0.942	0.666	0.173	0.807

Discussion

Our research found that utilizing blood gas samples and lactate, the Extreme Gradient Boosting algorithm may predict 24-hour mortality in extremely low birth weight infants. Lactate can be used to improve predictive models.

In the NICU, an accurate mortality estimate is a valuable tool that assists healthcare providers. However, most mortality predictive models in preterm infants are limited to NICU admission. For assessing the mortality risk in NICU admission, the Score for Neonatal Acute Physiology Perinatal Extension-II (SNAPPE-II) includes vital signs, laboratory, and baseline characteristics.⁵ The Clinical Risk for Infants and Babies (CRIB-II) score considers sex, birth weight, gestational age, temperature, and base excess to assess the mortality risk upon NICU admission. TRIPS-II,⁷ NMR-2000,⁸ and PISA⁹ are other recent models that predict mortality after admission. However, dynamic events in the critical care unit may have an impact on the initial risk. Combining baseline characteristics and real-time features provides a more individualized assessment of mortality risk that evolves as the patient’s health changes. To overcome this problem, Jaskari J et al.,¹⁶ Lee J et al.,¹⁷ and Feng J et al.¹⁸ created predictive mortality models based on vital sign data collected during the NICU stay. However, vital signs such as respiratory rate, heart rate, and blood pressure may vary amongst neonates and there is no consensus on what constitutes a normal reference range.¹⁹^,²⁰ Furthermore, in Lee J et al. study, baseline variables (birth weight and gestational age at birth) were more important than vital sign readings.

Lactate and blood gas values can be acquired quickly and is currently a standard clinical practice in intensive care units. The lack of bias from the examiner which contributes to objectivity is an advantage over clinical examination.²¹ As a result, our research combines the predictive capability of machine learning models with the objectivity of a blood gas analysis, which is a simple readily available, and widely used test. Our findings imply that machine learning models based on lactate and blood gas indicators appears to be superior to logistic regression classifiers in predicting 24-hour mortality in extremely low birth weight infants. To our knowledge, this is the first study to include real-time objective laboratorial features to predict death in preterm infants.

Our research has some limitations, which should be noted. First, this is a single-center retrospective study, and generalizing our findings is challenging. Second, larger datasets assist machine learning predictive models, therefore a larger study is required. Third, it is worth noting that using blood gas analysis to forecast mortality is a highly unbalanced problem; in other words, there are far more blood gas samples than events (death). In unbalanced data, we can acquire a high accuracy and AUROC score by simply predicting that all observations belong to the majority class. Lastly, it is important to note that our models had very high precision with low recall, and it should not be used for screening purposes.

Conclusions

Incorporating lactate and blood gas measurements into mortality predictive models may improve real-time risk stratification in preterm infants. Traditional logistic classification models appear to be outperformed by more robust machine learning algorithms. Extreme Gradient Boosting models could be used as a support tool for clinical risk stratification of extremely low birth weight infants in the neonatal intensive care unit.

Data availability

Underlying data

As this study involved extracting data from patient records, these records/patient files are considered the raw, source data. The raw data (patient files) are not available to readers and reviewers for data protection.

Harvard Dataverse: Blood Gas – Preterm. https://doi.org/10.7910/DVN/LFRNJE.²²

This project contains the following underlying data: Lactate_Mortality – Sheet1.csv (contains 1932 serum lactate and blood gas analysis along with days of life and corrected gestational age at the time of measurement).

Extended data

Harvard Dataverse: Blood Gas – Preterm. https://doi.org/10.7910/DVN/LFRNJE.²²

This project contains the following extended data: Data key.docx (contains key to make data more accessible)

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

References

1. United Nations Inter-agency Group for Child Mortality Estimation: Levels & Trends in child mortality: 2020 report. Report 2020.2020; 1–56 p.
2. Hug L, Alexander M, You D, et al.: National, regional, and global levels and trends in neonatal mortality between 1990 and 2017, with scenario-based projections to 2030: a systematic analysis. Lancet Glob. Heal. 2019; 7(6): e710–e720. PubMed Abstract | Publisher Full Text
3. Sankar MJ, Natarajan CK, Das RR, et al.: When do newborns die? A systematic review of timing of overall and cause-specific neonatal deaths in developing countries. J. Perinatol. 2016; 36(S1): S1–S11. PubMed Abstract | Publisher Full Text
4. Subudhi S, Verma A, Patel AB, et al.: Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19. npj Digit. Med. 2021; 4(1): 1–7. Publisher Full Text
5. Harsha SS, Archana BR: SNAPPE-II (score for neonatal acute physiology with perinatal extension-II) in predicting mortality and morbidity in NICU. J. Clin. Diagnostic Res. 2015; 9(10): SC10–SC12. Publisher Full Text
6. Parry G, Tucker J, Tarnow-Mordi W: CRIB II: an update of the clinical risk index for babies score For personal use. Only reproduce with permission from The Lancet Publishing Group; 2003; vol. 361: 1789–1791.
7. Lee S, Aziz K, Dunn M, et al.: Transport risk index of physiologic stability, version II (TRIPS-II): A simple and practical neonatal illness severity score. Am. J. Perinatol. 2013; 30(5): 395–400. PubMed Abstract | Publisher Full Text
8. Medvedev MM, Brotherton H, Gai A, et al.: Development and validation of a simplified score to predict neonatal mortality risk among neonates weighing 2000 g or less (NMR-2000): an analysis using data from the UK and The Gambia. Lancet Child Adolesc. Heal. 2020; 4(4): 299–311. PubMed Abstract | Publisher Full Text
9. Podda M, Bacciu D, Micheli A, et al.: A machine learning approach to estimating preterm infants survival: development of the Preterm Infants Survival Assessment (PISA) predictor. Sci. Rep. 2018; 8(1): 1–9.
10. Matsushita F, Krebs V, Ferraro A, et al.: Early fluid overload is associated with mortality and prolonged mechanical ventilation in extremely low birth weight infants. Eur. J. Pediatr. 2020; 179(11): 1665–1671. PubMed Abstract | Publisher Full Text
11. LaValley MP: Logistic regression. Circulation. 2008; 117(18): 2395–2399. Publisher Full Text
12. Chen T, He T, Benesty M: XGBoost: eXtreme Gradient Boosting. R Packag version 071-2.2018; 1–4.
13. Google: Cloud AutoML.Reference Source
14. Friedman JH: Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001; 29(5): 1189–1232. Publisher Full Text
15. Pedregosa F, Grisel O, Weiss R, et al.: Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011; 12: 2825–2830.
16. Jaskari J, Myllarinen J, Leskinen M, et al.: Machine Learning Methods for Neonatal Mortality and Morbidity Classification. IEEE Access. 2020; 8: 123347–123358. Publisher Full Text
17. Lee J, Cai J, Li F, et al.: Predicting mortality risk for preterm infants using random forest. Sci. Rep. 2021; 46(1): 1–6. Publisher Full Text
18. Feng J, Lee J, Vesoulis ZA, et al.: Predicting mortality risk for preterm infants using deep learning models with time-series vital sign data. npj Digit. Med. 2021; 4(1): 1–8. Publisher Full Text
19. Paliwoda M, New K, Davies M, et al.: Physiological vital sign ranges in newborns from 34 weeks gestation: A systematic review. Int. J. Nurs. Stud. 2018; 77(May 2017): 81–90. PubMed Abstract | Publisher Full Text
20. Manja V, Lakshminrusimha S, Cook DJ: Oxygen saturation targetrange for extremely preterm infants: A systematic review and meta-analysis. JAMA Pediatr. 2015; 169(4): 332–340. PubMed Abstract | Publisher Full Text
21. von Auenmueller KI , Christ M, Sasko BM, et al.: The Value of Arterial Blood Gas Parameters for Prediction of Mortality in Survivors of Out-of-hospital Cardiac Arrest. J. Emerg. Trauma Shock. . 2017; 10(3): 134–139. PubMed Abstract | Publisher Full Text | Free Full Text
22. Yu Matsushita F: Blood Gas - Preterm.2022. Harvard Dataverse, V2, UNF:6:cVcTuPkuzNi4QItQBkW1RA==[fileUNF]. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Apr 2022

Author details Author details

¹ Pediatrics, Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo, São Paulo, São Paulo, 05403-000, Brazil

Felipe Yu Matsushita
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Software, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Vera Lúcia Jornada Krebs
Roles: Conceptualization, Supervision, Validation, Writing – Review & Editing

Werther Brunow de Carvalho
Roles: Conceptualization, Project Administration, Supervision, Validation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 20 Apr 2022, 11:444

https://doi.org/10.12688/f1000research.110711.1

Copyright

© 2022 Matsushita FY et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Matsushita FY, Krebs VLJ and de Carvalho WB. Risk prediction model for 24-hour mortality in preterm infants using lactate and blood gas analysis: A machine learning approach and retrospective cohort study [version 1; peer review: 2 not approved]. F1000Research 2022, 11:444 (https://doi.org/10.12688/f1000research.110711.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 20 Apr 2022

Views

2

Reviewer Report 12 Dec 2023

Fu-Sheng Chou, Loma Linda University, CA, USA

Not Approved

https://doi.org/10.5256/f1000research.122344.r211184

In their research, Matsushita et al. created several machine learning models aiming to predict 24-hour mortality in extremely low birth weight infants, utilizing arterial blood gas and lactate data from 2012 to 2017.
While the study addresses a crucial ... Continue reading

In their research, Matsushita et al. created several machine learning models aiming to predict 24-hour mortality in extremely low birth weight infants, utilizing arterial blood gas and lactate data from 2012 to 2017.
While the study addresses a crucial topic, predicting severe outcomes in neonatal care, I find two significant issues with the research:

Lack of Validation and Incomplete Model Performance Reporting: One major concern lies in the absence of validation techniques and the incomplete reporting of model performance. Without validation, the reliability and accuracy of the developed models remain uncertain. Furthermore, the study lacks comprehensive details on how well these models performed, leaving gaps in understanding their effectiveness.
Inadequate Rationale for Feature Selection: Another concern pertains to the rationale behind selecting specific features. Notably, the inclusion of calculated values like bicarbonate and base excess in the model raises questions about their necessity. Additionally, the exclusion of demographic and perinatal features lacks explanation. Understanding why certain features were included while others were omitted is crucial for the transparency and credibility of the study. For instance, clarifying why calculated values were preferred over demographic and perinatal data would enhance the study's overall rationale.

Addressing these concerns would not only strengthen the study's methodology but also contribute significantly to the credibility and applicability of the findings in neonatal care.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Neonatal respiratory outcome prediction. Growth chart development for preterm infants.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

6

Reviewer Report 29 Nov 2023

Thomas Wood, University of Washington, Seattle, Washington, USA

Not Approved

https://doi.org/10.5256/f1000research.122344.r201605

The authors address an important problem, but I have several concerns regarding how the methods were performed/reported:

The idea of “real-time” mortality risk is mentioned throughout the manuscript, but it is not clear if this analysis

The authors address an important problem, but I have several concerns regarding how the methods were performed/reported:

The idea of “real-time” mortality risk is mentioned throughout the manuscript, but it is not clear if this analysis incorporated blood gas/lactate data as a time-dependent component of the prediction models. For example, if 1932 blood gas samples were included from 257 infants, how was the test:train split performed – by infant or by blood gas sample? How were multiple blood gas samples from the same infant handled, how did the prediction update with each new blood gas measurement from the same infant, and how did the prediction window update with each new blood gas? This needs to be more clearly explained. Time series data have been used for mortality risk prediction in preterm infants previously (e.g. Feng et al., npj Digital Medicine 2021), but I’m not sure that’s what was done here.
Considering the relatively low sample size, why not do cross-validated predictions so that each infant can be used for both training and testing across 5 or 10 folds?
Why were the most important features only extracted from AutoML Tables? This can also be done with Extreme Gradient Boosting models.
Why were some of the variables from Table 1 such as sex, multiple gestation, and lowest temperature temperature not included as predictors? Especially considering that they were different between survivors and non-survivors.
How was chorioamnionitis determined?
Some more discussion of the utility of this kind of model is needed. For instance, it is probably worth mentioning that high precision is much more critical than high recall for this kind of model - if a high predicted risk of mortality is used to redirect care, then the most important thing is to avoid false positives. More detail on the goal of the study, how the methods align with the goal (e.g., is this truly a real-time prediction model?), and the context of the outcome is needed for both the introduction and discussion.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions drawn adequately supported by the results?

Partly

References

1. Feng J, Lee J, Vesoulis ZA, Li F: Predicting mortality risk for preterm infants using deep learning models with time-series vital sign data.NPJ Digit Med. 2021; 4 (1): 108 PubMed Abstract | Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Neonatal neuroscience, neonatal outcome prediction

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Apr 2022

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 20 Apr 22	read	read

Thomas Wood, University of Washington, Seattle, USA
Fu-Sheng Chou, Loma Linda University, CA, USA

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

2 Views

12 Dec 2023 | for Version 1

Fu-Sheng Chou, Loma Linda University, CA, USA

2 Views Cite this report Responses(0)

Not Approved

In their research, Matsushita et al. created several machine learning models aiming to predict 24-hour mortality in extremely low birth weight infants, utilizing arterial blood gas and lactate data from 2012 to 2017.
While the study addresses a crucial topic, predicting severe outcomes in neonatal care, I find two significant issues with the research:

Lack of Validation and Incomplete Model Performance Reporting: One major concern lies in the absence of validation techniques and the incomplete reporting of model performance. Without validation, the reliability and accuracy of the developed models remain uncertain. Furthermore, the study lacks comprehensive details on how well these models performed, leaving gaps in understanding their effectiveness.
Inadequate Rationale for Feature Selection: Another concern pertains to the rationale behind selecting specific features. Notably, the inclusion of calculated values like bicarbonate and base excess in the model raises questions about their necessity. Additionally, the exclusion of demographic and perinatal features lacks explanation. Understanding why certain features were included while others were omitted is crucial for the transparency and credibility of the study. For instance, clarifying why calculated values were preferred over demographic and perinatal data would enhance the study's overall rationale.

Addressing these concerns would not only strengthen the study's methodology but also contribute significantly to the credibility and applicability of the findings in neonatal care.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Neonatal respiratory outcome prediction. Growth chart development for preterm infants.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

6 Views

29 Nov 2023 | for Version 1

Thomas Wood, University of Washington, Seattle, Washington, USA

6 Views Cite this report Responses(0)

Not Approved

The authors address an important problem, but I have several concerns regarding how the methods were performed/reported:

The idea of “real-time” mortality risk is mentioned throughout the manuscript, but it is not clear if this analysis incorporated blood gas/lactate data as a time-dependent component of the prediction models. For example, if 1932 blood gas samples were included from 257 infants, how was the test:train split performed – by infant or by blood gas sample? How were multiple blood gas samples from the same infant handled, how did the prediction update with each new blood gas measurement from the same infant, and how did the prediction window update with each new blood gas? This needs to be more clearly explained. Time series data have been used for mortality risk prediction in preterm infants previously (e.g. Feng et al., npj Digital Medicine 2021), but I’m not sure that’s what was done here.
Considering the relatively low sample size, why not do cross-validated predictions so that each infant can be used for both training and testing across 5 or 10 folds?
Why were the most important features only extracted from AutoML Tables? This can also be done with Extreme Gradient Boosting models.
Why were some of the variables from Table 1 such as sex, multiple gestation, and lowest temperature temperature not included as predictors? Especially considering that they were different between survivors and non-survivors.
How was chorioamnionitis determined?
Some more discussion of the utility of this kind of model is needed. For instance, it is probably worth mentioning that high precision is much more critical than high recall for this kind of model - if a high predicted risk of mortality is used to redirect care, then the most important thing is to avoid false positives. More detail on the goal of the study, how the methods align with the goal (e.g., is this truly a real-time prediction model?), and the context of the outcome is needed for both the introduction and discussion.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions drawn adequately supported by the results?

Partly

References

1. Feng J, Lee J, Vesoulis ZA, Li F: Predicting mortality risk for preterm infants using deep learning models with time-series vital sign data.NPJ Digit Med. 2021; 4 (1): 108 PubMed Abstract | Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Neonatal neuroscience, neonatal outcome prediction

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

[1] 1. United Nations Inter-agency Group for Child Mortality Estimation: Levels & Trends in child mortality: 2020 report. Report 2020.2020; 1–56 p.

[2] 2. Hug L, Alexander M, You D, et al.: National, regional, and global levels and trends in neonatal mortality between 1990 and 2017, with scenario-based projections to 2030: a systematic analysis. Lancet Glob. Heal. 2019; 7(6): e710–e720. PubMed Abstract | Publisher Full Text

[3] 3. Sankar MJ, Natarajan CK, Das RR, et al.: When do newborns die? A systematic review of timing of overall and cause-specific neonatal deaths in developing countries. J. Perinatol. 2016; 36(S1): S1–S11. PubMed Abstract | Publisher Full Text

[4] 4. Subudhi S, Verma A, Patel AB, et al.: Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19. npj Digit. Med. 2021; 4(1): 1–7. Publisher Full Text

[5] 5. Harsha SS, Archana BR: SNAPPE-II (score for neonatal acute physiology with perinatal extension-II) in predicting mortality and morbidity in NICU. J. Clin. Diagnostic Res. 2015; 9(10): SC10–SC12. Publisher Full Text

[6] 6. Parry G, Tucker J, Tarnow-Mordi W: CRIB II: an update of the clinical risk index for babies score For personal use. Only reproduce with permission from The Lancet Publishing Group; 2003; vol. 361: 1789–1791.

[7] 7. Lee S, Aziz K, Dunn M, et al.: Transport risk index of physiologic stability, version II (TRIPS-II): A simple and practical neonatal illness severity score. Am. J. Perinatol. 2013; 30(5): 395–400. PubMed Abstract | Publisher Full Text

[8] 8. Medvedev MM, Brotherton H, Gai A, et al.: Development and validation of a simplified score to predict neonatal mortality risk among neonates weighing 2000 g or less (NMR-2000): an analysis using data from the UK and The Gambia. Lancet Child Adolesc. Heal. 2020; 4(4): 299–311. PubMed Abstract | Publisher Full Text

[9] 9. Podda M, Bacciu D, Micheli A, et al.: A machine learning approach to estimating preterm infants survival: development of the Preterm Infants Survival Assessment (PISA) predictor. Sci. Rep. 2018; 8(1): 1–9.

[10] 10. Matsushita F, Krebs V, Ferraro A, et al.: Early fluid overload is associated with mortality and prolonged mechanical ventilation in extremely low birth weight infants. Eur. J. Pediatr. 2020; 179(11): 1665–1671. PubMed Abstract | Publisher Full Text

[11] 11. LaValley MP: Logistic regression. Circulation. 2008; 117(18): 2395–2399. Publisher Full Text

[12] 12. Chen T, He T, Benesty M: XGBoost: eXtreme Gradient Boosting. R Packag version 071-2.2018; 1–4.

[13] 13. Google: Cloud AutoML.Reference Source

[14] 14. Friedman JH: Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001; 29(5): 1189–1232. Publisher Full Text

[15] 15. Pedregosa F, Grisel O, Weiss R, et al.: Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011; 12: 2825–2830.

[16] 16. Jaskari J, Myllarinen J, Leskinen M, et al.: Machine Learning Methods for Neonatal Mortality and Morbidity Classification. IEEE Access. 2020; 8: 123347–123358. Publisher Full Text

[17] 17. Lee J, Cai J, Li F, et al.: Predicting mortality risk for preterm infants using random forest. Sci. Rep. 2021; 46(1): 1–6. Publisher Full Text

[18] 18. Feng J, Lee J, Vesoulis ZA, et al.: Predicting mortality risk for preterm infants using deep learning models with time-series vital sign data. npj Digit. Med. 2021; 4(1): 1–8. Publisher Full Text

[19] 19. Paliwoda M, New K, Davies M, et al.: Physiological vital sign ranges in newborns from 34 weeks gestation: A systematic review. Int. J. Nurs. Stud. 2018; 77(May 2017): 81–90. PubMed Abstract | Publisher Full Text

[20] 20. Manja V, Lakshminrusimha S, Cook DJ: Oxygen saturation targetrange for extremely preterm infants: A systematic review and meta-analysis. JAMA Pediatr. 2015; 169(4): 332–340. PubMed Abstract | Publisher Full Text

[21] 21. von Auenmueller KI , Christ M, Sasko BM, et al.: The Value of Arterial Blood Gas Parameters for Prediction of Mortality in Survivors of Out-of-hospital Cardiac Arrest. J. Emerg. Trauma Shock. . 2017; 10(3): 134–139. PubMed Abstract | Publisher Full Text | Free Full Text

[22] 22. Yu Matsushita F: Blood Gas - Preterm.2022. Harvard Dataverse, V2, UNF:6:cVcTuPkuzNi4QItQBkW1RA==[fileUNF]. Publisher Full Text

Risk prediction model for 24-hour mortality in preterm infants using lactate and blood gas analysis: A machine learning approach and retrospective cohort study

Abstract

Keywords

Introduction

Methods

Ethics statement

Study population

Predictive parameters

Machine learning model development

Statistical analysis

Results

Table 1. Baseline characteristics of the study sample.

Table 2. 24-hour mortality prediction performance of machine learning models using blood gas features with lactate.

Figure 1. Feature importance.

Table 3. 24-hour mortality prediction performance of machine learning models using blood gas features without lactate.

Discussion

Conclusions

Data availability

Underlying data

Extended data

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated