An interpretable machine learning model of biological age

Thomas R. Wood; Christopher Kelly; Megan Roberts; Bryan Walsh

doi:10.12688/f1000research.17555.1

Home Browse An interpretable machine learning model of biological age

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

An interpretable machine learning model of biological age

[version 1; peer review: 2 approved with reservations]

Thomas R. Wood ^1-3, Christopher Kelly², Megan Roberts², Bryan Walsh⁴

PUBLISHED 04 Jan 2019

Author details Author details

¹ Department of Pediatrics, University of Washington, Seattle, Washington, 98195, USA
² Nourish Balance Thrive, Redding, Californa, USA
³ Institute for Human and Machine Cognition, Penscola, Florida, USA
⁴ University of Western States, Portland, Oregon, USA

Thomas R. Wood
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Supervision, Writing – Original Draft Preparation

Christopher Kelly
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Software, Visualization, Writing – Review & Editing

Megan Roberts
Roles: Conceptualization, Methodology, Writing – Review & Editing

Bryan Walsh
Roles: Conceptualization, Formal Analysis, Methodology, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background: Assessments of biological (rather than chronological) age derived from patient biochemical data have been shown to strongly predict both all-cause and disease-specific mortality. However, these population-based approaches have yet to be translated to the individual. As well as using biological age as a research tool, by being able to better answer the question “why did we get this result?”, clinicians may be able to apply personalised interventions that could improve the long-term health of individual patients.
Methods: Here, the boosted decision tree algorithm XGBoost was used to predict biological age using 39 commonly-available blood test results from the US National Health and Nutrition Examination Survey (NHANES) database.
Results: Interrogation of the algorithm produced a description of how each marker contributed to the final output in a single individual. Additive explanation plots were then used to determine biomarker ranges associated with a lower biological age. Importantly, a number of markers that are modifiable with lifestyle changes were found to have a significant effect on biological age, including fasting blood glucose, lipids, and markers of red blood cell production.
Conclusions: The combination of individualised outputs with target ranges could provide the ability to personalise interventions or recommendations based on an individual’s biochemistry and resulting predicted age. This would allow for the investigation of interventions designed to improve health and longevity in a targeted manner, many of which could be rooted in targeted lifestyle modifications.

Keywords

Aging, Machine Learning, Age

Corresponding author: Thomas R. Wood

Competing interests: The authors are all co-founders of an online commercial tool, bloodcalculator.com, developed to assist in the analysis of blood test results. The predicted age algorithm described in the manuscript is online and freely-available through this tool.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2019 Wood TR et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Wood TR, Kelly C, Roberts M and Walsh B. An interpretable machine learning model of biological age [version 1; peer review: 2 approved with reservations]. F1000Research 2019, 8:17 (https://doi.org/10.12688/f1000research.17555.1) First published: 04 Jan 2019, 8:17 (https://doi.org/10.12688/f1000research.17555.1) Latest published: 04 Jan 2019, 8:17 (https://doi.org/10.12688/f1000research.17555.1)

Introduction

One of the fastest-growing areas at the intersection of clinical medicine and data science is the investigation of human aging¹, with multiple avenues being explored to find biomarkers of aging that could be used to inform efforts to enhance human longevity^2–4. If robust and easily-accessible biomarkers of aging are identified, they could assist in the rapid assessment of promising interventions aimed at increasing longevity, without the need to perform clinical trials that last decades. For instance, epigenetic modifications on DNA are increasingly being used to determine biological (rather than chronological) age, including how environmental determinants may affect an epigenetic signal for longevity⁴.

An individual’s biological age can be described based on the assumption that cellular aging processes, which are highly-influenced by the environment⁵, occur at different rates in different people with the same chronological age. As these ageing processes are associated with changes in routine biochemical measures⁶, algorithmic determination of biological or phenotypic age using widely-available indices such as those from blood test results is therefore becoming increasingly common. This has previously been done using both machine learning (ML) and statistical techniques^3,6.

One important aspect for the utility of biological age measures is that a given output can be interpreted in order to guide individualized interventions. ML-based predictions of biological age have the potential to elucidate and describe complex, non-linear, and unintuitive patterns in biochemical data, which may provide greater predictive power compared to other statistical techniques. To date, published approaches to generate predicted biological age from biochemical data have used deep neural networks (DNNs), with the output being directly associated with mortality risk³. However, while individual outputs from DNNs are interpretable⁷, it is currently not possible to interrogate the effects of the entire training dataset on the model output, which may be important for determining how one may intervene given an individual’s output.

As a result of the issues with interpreting certain ML algorithms, the field of explainable artificial intelligence is developing rapidly⁸. If such approaches can be successfully applied to determining biological age from commonly available data, biological signatures of aging could be more rapidly discovered and tracked, including the ability to personalise interventions based on the outputs of the model. Here, we describe the development of an explainable ML model using blood marker data from the National Health and Nutrition Examination Survey (NHANES) database to predict biological age, as well as provide individual weighting for how each biomarker affected the final output. By determining how markers affect the model globally, potential target reference ranges associated with lower biological age can also be determined.

Methods

Input data

Data from a total of 46,739 participants (n=22,545 males and n=24,194 females) in the NHANES database were included, with a mean (range) age of 48.5 (19.0–85.0) years. A total of 39 common blood markers were used: complete blood count (CBC) with differential, lipids, fasting glucose, iron panel, and a comprehensive metabolic panel (including electrolytes, and liver and kidney function). Descriptive data for the dataset is listed in Table 1.

Table 1. Demographic data from the entire NHANES dataset.

Variable	Females (n=24,194)		Males (n=22,545)
Variable	Mean	SD	Mean	SD
Chronological Age (years)	48.1	18.8	48.8	18.8
Red blood cells (×10³/µl)	4.4	0.4	4.9	0.5
Red blood cell distribution width (%)	13.2	1.5	13.0	1.1
Hematocrit (%)	39.2	3.6	44.2	3.8
Hemoglobin (g/dl)	13.3	1.3	15.0	1.3
Mean corpuscular hemoglobin (pg)	30.1	2.5	30.6	2.2
Mean corpuscular hemoglobin concentration (g/dl)	33.9	1.0	34.0	1.0
Mean corpuscular volume (fl)	88.9	6.1	90.1	5.3
Platelets (×10³/µl)	267.9	70.7	238.0	61.1
Mean platelet volume (fl)	8.2	0.9	8.2	0.9
Neutrophils (×10³/µl)	4.4	1.8	4.2	1.8
Lymphocytes (×10³/µl)	2.2	1.0	2.1	1.5
Monocytes (×10³/µl)	0.5	0.2	0.6	0.2
Eosinophils (×10³/µl)	0.2	0.2	0.2	0.2
Basophils (×10³/µl)	0.0	0.1	0.0	0.1
Total Cholesterol (mg/dl)	199.0	42.5	192.4	42.9
Low-density lipoprotein cholesterol (mg/dl)	115.7	35.8	115.7	36.1
High-density lipoprotein cholesterol (mg/dl)	57.5	16.4	47.8	14.2
Triglycerides (mg/dl)	129.0	101.6	146.9	136.6
Glucose (mg/dl)	98.8	37.1	103.4	39.5
Iron (µg/dl)	77.3	34.4	92.9	35.8
Total Iron Binding Capacity (µg/dl)	380.9	71.7	351.9	55.0
Ferritin (ng/ml)	78.1	103.6	183.5	180.3
Sodium (mmol/l)	138.9	2.4	139.3	2.3
Potassium (mmol/l)	3.9	0.3	4.1	0.3
Chloride (mmol/l)	103.8	3.0	103.3	2.9
Carbon Dioxide (mmol/l)	24.3	2.4	25.1	2.2
Calcium (mg/dl)	9.4	0.4	9.5	0.4
Phosphorus (mg/dl)	3.8	0.5	3.7	0.6
Creatinine (mg/dl)	0.8	0.4	1.0	0.5
Urea nitrogen (mg/dl)	12.6	6.0	14.4	6.1
Albumin (g/dl)	4.1	0.4	4.4	0.3
Globulins (g/dl)	3.0	0.5	2.9	0.5
Alanine transaminase (IU/l)	21.5	20.5	29.5	27.8
Aspartate transaminase (IU/l)	23.6	15.1	27.8	22.2
Alkaline Phosphatase (IU/l)	71.1	27.8	71.8	26.6
Bilirubin (mg/dl)	0.6	0.3	0.8	0.3
Gamma glutamyl-transferase (IU/l)	24.3	35.7	34.8	52.1
Lactate dehydrogenase (IU/l)	131.6	29.7	133.1	35.8
Uric Acid (mg/dl)	4.8	1.3	6.1	1.3

Model generation

NHANES data (all available individuals with the 39 markers listed in Table 1 from years 1999–2015) was downloaded as .xpt files from the NHANES website using their in-built web search engine. The data was then concatenated, cross-tabulated, and stratified by gender. A random split in the data set was created to withhold 20% of participants (n=4,509 males and n=4,839) for model validation. The remaining 80% of the dataset was used to train an XGBRegressor model (XGBoost version 0.81) using chronological age and the 39 biochemical input markers. For the remaining 20% of the data, the 39 markers were provided to the algorithm⁹ with the chronological age withheld, and the resulting dependent variable “predicted age” defined as a measure of biological age. Age predictions for the withheld data were plotted against actual age using jointplot from the seaborn Python library (version 0.9.0).

Model interrogation

For individual predictions, the weight of each marker was extracted using ELI5 (version 0.8.1), and graphed using a waterfall chart (version 3.8). For a given age prediction, each marker was individually weighted with regard to how it contributed to the final output. Shapley additive explanations plots (SHAP, version 0.26.0) were constructed to describe how each individual marker affects the predicted age output within the laboratory normal range.

Worked example

To provide an individual output example based on data not seen by the algorithm⁹ previously, author C.K. had the necessary input markers measured by Quest Laboratories (Santa Cruz, CA). As C.K. is an author who ran his own data through the algorithm⁹ he trained during development of the manuscript, institutional ethical approval was not sought for publication of this data. C.K. approved the publication of his data in this manner.

Results

Differences between predicted age and biological age

Linear regression analysis (Figure 1) showed a significant correlation between predicted (biological) and actual (chronological) age (r=0.77 and 0.75 in females and males, respectively; p<0.0001 for both). However, discrepancies between the biological and chronological age could be considered clinically relevant, as they would allow for the generation of a signature of premature biological aging.

Figure 1. Linear regression analysis comparing actual (chronological) and predicted (biological) age.

Data shown for women (A) and men (B) using the 20% withheld data (n=4,509 males and n=4,839). A significant correlation between predicted and actual age (r=0.77 and 0.75 in females and males, respectively) was seen in both sexes (p<0.0001).

SHAP plots of input markers

SHAP summary plots (Figure 2) were used to determine which markers have the greatest influence on predicted biological age. The top 20 markers in terms of importance are shown. In females, blood urea nitrogen (BUN) had the greatest influence on biological age, with albumin the most influential marker in men. Fasting glucose was the second most influential marker in both sexes (Figure 2). SHAP plots for each of the 20 most influential markers are available on GitHub and Zenodo⁹. Based on each of these 20 markers, the level at which an inflection point was seen in the SHAP plot (i.e. when a further change in a marker would result in a net increase in predicted biological age) was determined, as well as the estimated range over which each marker would be associated with the lowest biological age (Table 2 and Table 3). Using the five most influential markers as an example, the lowest predicted age in women would be associated with a BUN 6–11 mg/dl, fasting glucose 71–86 mg/dl, bicarbonate (carbon dioxide) 19–22 mmol/l, total cholesterol 130–150 mg/dl, and mean corpuscular volume (MCV) 80–85 fl. In men, the lowest predicted age would be associated with albumin 4.6–4.8 g/dl, fasting glucose 70–88 mg/dl, BUN 6–12 mg/dl, red blood cell (RBC) 5.0–5.7 ×10³/µl, and RBC distribution width (RDW) 11.0–12.5%.

Figure 2. SHAP summary plots showing the adjustment to predicted age (x-axis) for each of the top 20 markers.

Data shown for women (A) and men (B). Each plot is made up of thousands of individual points from the training dataset such with a higher value being more red, and a lower value being more blue. This is depicted by the “feature value” bar on the right of each plot. Therefore, if the dots on one side of the central line are increasingly red or blue, that suggests that increasing values or decreasing values, respectively, move the predicated age in that direction. For instance, lower BUN values (blue dots) are associated with lower predicted age in both men and women.

Table 2. Top 20 markers affecting predicted age in women.

Ranking of markers affecting predicted age in women, in order of importance, as determined by the SHAP summary outputs. Visual examination of the individual SHAP plots for each marker was used to estimate the range over which each marker would result in the lowest predicted age, and the magnitude of the adjustment in years. The final column is the value at which a marker changes from a net negative to net positive effect on biological age.

Marker Rank	Marker	Estimated range for lowest predicted age	Magnitude of effect (years)	Inflection point
1	BUN	6–11 mg/dl	-9 to -2	12 mg/dl
2	Glucose	71–86 mg/dl	-7.5 to -3	86 mg/dl
3	Carbon Dioxide	18–22 mmol/l	-6 to -2	25 mmol/l
4	Total Cholesterol	130–150 mg/dl	-5 to -1.5	195 mg/dl
5	MCV	80–85 fl	-3.5 to -1.0	90 fl
6	LDH	120–130 IU	-1 to 0	130 IU
7	Creatinine	0.62–0.78 mg/dl	-3 to 0	0.82 mg/dl
8	RDW	10–12 %	-4 to -0.5	0
9	Lymphocytes	2.3–3.0 ×10E3/µl	-1.2 to 0	1.9 ×10E3/µl
10	Sodium	137–139 mmol/l	-1.2 to 0	140 mmol/l
11	AST	13–17 IU	-2 to -0.5	22 IU
12	Chloride	103–106 mmol/l	-1.5 to 0	103 mmol/l
13	GGT	5–10 IU	-3 to -0.5	15 IU
14	ALT	42–44 IU	-4 to -1	21 IU
15	ALP	40–56 IU	-2 to -0.2	65 IU
16	Albumin	4.6–4.8 g/dl	-4 to -0.5	4.3 g/dl
17	Neutrophils	6.6–7 ×10E3/µl	-2.5 to -0.5	5 ×10E3/µl
18	Ferritin	30–50 ng/ml	-3 to 0	50 ng/ml
19	Phosphorus	5.2–7.1 mg/dl	-4.8 to -1	4.1 mg/dl
20	Potassium	3.5–3.9 mmol/l	-1 to 0	4.1 mmol/l

Table 3. Top 20 markers affecting predicted age in men.

Ranking of markers affecting predicted age in women, in order of importance, as determined by the SHAP summary outputs. Visual examination of the individual SHAP plots for each marker was used to estimate the range over which each marker would result in the lowest predicted age, and the magnitude of the adjustment in years. The final column is the value at which a marker changes from a net negative to net positive effect on biological age.

Marker Rank	Marker	Estimated range for lowest predicted age	Magnitude of effect (years)	Inflection point
1	Albumin	4.6–4.8 g/dl	-10 to -1	4.4 g/dl
2	Glucose	70–88 mg/dl	-7.0 to -1	96 mg/dl
3	BUN	6–12 mg/dl	-7.5 to -1	14 mg/dl
4	RBC	5.0–5.7 ×10E3/µl	-4.0 to -0.5	4.8 ×10E3/µl
5	RDW	11.0–12.5 %	-7.5 to -1	13%
6	MCV	79–87 fl	-4.0 to -1.5	90 fl
7	ALT	33–45 IU	-4.5 to -0.5	28 IU
8	Phosphorus	4.1–4.5 mg/dl	-4.0 to -0.5	3.8 mg/dl
9	Lymphocytes	1.9–3.0 mg/dl	-2.0 to -0.5	1.8 ×10E3/µl
10	Total Cholesterol	100–160 mg/dl	-5.5 to -0.5	190 mg/dl
11	Platelets	250–400 ×10E3/µl	-3.0 to -0.2	210 ×10E3/µl
12	Potassium	3.5–4.1 mmol/l	-1.8 to -0.2	4.2 mmol/l
13	Creatinine	0.5–1.0 mg/dl	-1.5 to 0	1.0 mg/dl
14	LDH	80–120 IU	-3 to -0.5	130 IU
15	Triglycerides	40–60 mg/dl	-6 to -2	100 mg/dl
16	Monocytes	0.1–0.5 ×10E3/µl	-1.5 to 0	0.6 ×10E3/µl
17	Neutrophils	1.5–2.8 ×10E3/µl	-2.2 to -0.2	3.4 ×10E3/µl
18	MCHC	34.3–35.7 g/dl	-1 to 0	33.9 g/dl
19	GGT	6–15 IU	-2.0 to 0	21 IU
20	Total Bilirubin	0.1–0.6 mg/dl	-0.75 to 0	0.8 mg/dl

Fully interpretable personalised predictions

For a given individual, the model output allows for each marker to be individually weighted with regard to how it contributed to the final output (Figure 3). The average age in the training dataset (BIAS) is given as a starting point, with each marker subsequently increasing or decreasing predicted age by a number of years. This allows for the most influential markers for the individual to be determined. The example shown is for one of the study authors (C.K.), the data for whom is available on Zenodo⁹. Bias (48.3 years) is sequentially adjusted, with the five markers contributing most to an increase in biological age were BUN (+3.5 years), total cholesterol (+2.8 years), potassium (+1.7 years), phosphorus (+1.2 years), and LDH (+0.9 years). The five markers contributing most to a decrease in biological age were lymphocytes (-1.2 years), RBCs (-2.3 years), albumin (-2.7 years) fasting glucose (-3.1 years), and triglycerides (-3.9 years). The final predicted biological age was 43.0 years.

Figure 3. Waterfall chart depicting how individual input markers contribute to a given predicted biological age (y-axis) for author C.K.

Bias (first column, 48.3 years) is the mean age in the input population. The five markers contributing most to an increase in biological age (columns 2–6 from the left) were BUN, total cholesterol, potassium, phosphorus, and LDH. The five markers contributing most to a decrease in biological age (columns 2–6 from the right) were lymphocytes, RBCs, albumin, glucose and triglycerides. The final predicted biological age (43.0 years) is in the last column.

Discussion

Biomarkers of aging are increasingly important in the development and investigation of interventions with which to slow aging processes, which may also have the ability to aid in the treatment or prevention of aging-associated chronic disease. One such marker is the individual’s biological or phenotypic age, as reflected by patterns of biochemical markers in the blood, which have previously been shown to be associated with risk of mortality^2,3,6. While there are a number of approaches to this problem in the published literature, we provide an alternative using a tree-based ML model that a) is fully interpretable, b) can be completely individualized for a given patient, and c) allows the development of target ranges associated with a potential signature for slowed biological aging.

One issue surrounding the utility of algorithmically-derived biological age is the response to any associated interventions or therapeutics. As this field is relatively new, it is uncertain how much an improvement in predicted biological age resulting from a given therapeutic approach will translate into improvements in longevity. Even if a given marker decreases predicted biological age, this also does not guarantee that manipulating the value will increase longevity. For instance, in our models, increasing ALT and decreasing total cholesterol were associated with lower predicted biological age; however, there are a number of scenarios where lower total cholesterol and higher ALT may be associated with increased mortality despite a lower predicted biological age^10,11. Despite this, these models are at least able to generate hypotheses that can be tested in both the preclinical and clinical setting. Our approach also provides an example that other groups may use to produce fully-interpretable and personalisable outputs.

Though the current analysis does not include confirmation of the ability to predict mortality risk, certain outputs from the algorithm⁹ do provide some confidence that the output is likely to be associated with individual health outcomes. For instance, the greatest increase in predicted age associated with fasting glucose level occurs in the range 90–100 mg/dl, which is strikingly similar to the blood glucose level associated with the largest increase in mortality risk in multiple population studies^12,13. Similar associations are seen with many of the target ranges derived from the algorithm⁹, such as for albumin, RDW, and ferritin (especially in men)^14–16.

If modulation of certain markers does indeed contribute to the reversal of cellular aging processes, the combination of an individual output with the population SHAP plots for a given marker could therefore allow for targeted therapeutic interventions aimed at improving biological age based on an individual’s specific output. For instance, elevated fasting blood glucose could be decreased by addressing diet, exercise, micronutrient deficiencies, and reducing inflammation or psychosocial stress¹⁷. Similar approaches are also likely to improve cholesterol, RDW, and MCV, confirming that lifestyle factors should play a key role in the pursuit of health and longevity^15,18,19. A personalised approach is important, because the markers contributing most strongly to biological age in the whole dataset are not necessarily the same markers that most strongly contribute to a prediction in a single individual (see example in Figure 3).

The current approach does have some limitations. The dataset may only be applicable in the United States, as different countries and ethnic backgrounds might display variations in both baseline biochemistry and predicted longevity³. Expanding available input data and allowing for stratification based on nationality and ethnic background will be the focus of future work. Larger and more expanded datasets will also allow for the analysis of biological aging in association with other potentially important factors such as genetics and the microbiota^20,21. It is also worth mentioning that NHANES is designed to capture data that is representative of the US population. Therefore, this data comes from participants that represent a population that has some of the highest metabolic and cardiovascular disease prevalence in the Western world^22,23, which may distort the results. Additionally, the current outputs would benefit from being correlated with disease outcomes or mortality in order to determine how well predicted biological age acts as an accurate biomarker of health and longevity.

By using well-understood and robust biomarkers that are available to almost any clinician, methods such as those described in this study can be used immediately as adjuncts to research investigating the outcomes of interventions designed to increase human longevity. As multiple methods are currently available with which to predict biological or phenotypic age, the field should also collaborate in an attempt to compare methods such that we can find the approach that results in an accurate output that can most easily be used in both the research and clinical settings.

Data availability

All NHANES data used to produce the models is accessible through the CDC website (listed by NHANES study year): https://wwwn.cdc.gov/nchs/nhanes/search/default.aspx.

Data access, tabulation, and concatenation is automated by the “01-download-preprocess” Jupyter notebook file within our Zenodo repository; DOI: https://doi.org/10.5281/zenodo.2440203⁹. This repository also includes the original Quest laboratory test results from author C.K., which were used to provide the worked example (Figure 3).

Software availability

The algorithm developed here, including the associated libraries and the necessary versions, are available on Zenodo: https://doi.org/10.5281/zenodo.2440203⁹.

License: GNU General Public License version 3

Notes: The algorithm itself can be trained and tested by running the “02-train-test-explain” Jupyter notebook. Note that each time the algorithm runs, a new random split in the dataset is generated in order to train and test the algorithm. Therefore, the resulting outputs might be slightly different.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Faculty Opinions recommended

References

1. Pyrkov TV, Slipensky K, Barg M, et al.: Extracting biological age from biomedical data via deep learning: too much of a good thing? Sci Rep. 2018; 8(1): 5210. PubMed Abstract | Publisher Full Text | Free Full Text
2. Liu Z, Kuo PL, Horvath S, et al.: Phenotypic Age: a novel signature of mortality and morbidity risk. bioRxiv. 2018: 363291. Publisher Full Text
3. Mamoshina P, Kochetov K, Putin E, et al.: Population Specific Biomarkers of Human Aging: A Big Data Study Using South Korean, Canadian, and Eastern European Patient Populations. J Gerontol A Biol Sci Med Sci. 2018; 73(11): 1482–1490. PubMed Abstract | Publisher Full Text | Free Full Text
4. Levine ME, Lu AT, Quach A, et al.: An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY). 2018; 10(4): 573–91. PubMed Abstract | Publisher Full Text | Free Full Text
5. Feil R, Fraga MF: Epigenetics and the environment: emerging patterns and implications. Nat Rev Genet. 2012; 13(2): 97–109. PubMed Abstract | Publisher Full Text
6. Belsky DW, Caspi A, Houts R, et al.: Quantification of biological aging in young adults. Proc Natl Acad Sci U S A. 2015; 112(30): E4104–10. PubMed Abstract | Publisher Full Text | Free Full Text
7. Montavon G, Samek W, Müller K-R: Methods for interpreting and understanding deep neural networks. Digital Signal Processing. 2018; 73: 1–15. Publisher Full Text
8. Lundberg SM, Nair B, Vavilala MS, et al.: Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018; 2: 749–60. Publisher Full Text
9. Kelly C: cck197/ml-bio-age: Initial release (Version v1.0). Zenodo. 2018. http://www.doi.org/10.5281/zenodo.2440203
10. Petursson H, Sigurdsson JA, Bengtsson C, et al.: Is the use of cholesterol in mortality risk algorithms in clinical guidelines valid? Ten years prospective data from the Norwegian HUNT 2 study. J Eval Clin Pract. 2012; 18(1): 159–68. PubMed Abstract | Publisher Full Text | Free Full Text
11. Kunutsor SK, Apekey TA, Seddoh D, et al.: Liver enzymes and risk of all-cause mortality in general populations: a systematic review and meta-analysis. Int J Epidemiol. 2014; 43(1): 187–201. PubMed Abstract | Publisher Full Text
12. Yi SW, Park S, Lee YH, et al.: Association between fasting glucose and all-cause mortality according to sex and age: a prospective cohort study. Sci Rep. 2017; 7(1): 8194. PubMed Abstract | Publisher Full Text | Free Full Text
13. Bjørnholt JV, Erikssen G, Aaser E, et al.: Fasting blood glucose: an underestimated risk factor for cardiovascular death. Results from a 22-year follow-up of healthy nondiabetic men. Diabetes Care. 1999; 22(1): 45–9. PubMed Abstract | Publisher Full Text
14. Fulks M, Stout RL, Dolan VF: Albumin and all-cause mortality risk in insurance applicants. J Insur Med. 2010; 42(1): 11–7. PubMed Abstract
15. Zurauskaite G, Meier M, Voegeli A, et al.: Biological pathways underlying the association of red cell distribution width and adverse clinical outcome: Results of a prospective cohort study. PLoS One. 2018; 13(1): e0191280. PubMed Abstract | Publisher Full Text | Free Full Text
16. Kadoglou NPE, Biddulph JP, Rafnsson SB, et al.: The association of ferritin with cardiovascular and all-cause mortality in community-dwellers: The English longitudinal study of ageing. PLoS One. 2017; 12(6): e0178994. PubMed Abstract | Publisher Full Text | Free Full Text
17. Kolb H, Martin S: Environmental/lifestyle factors in the pathogenesis and prevention of type 2 diabetes. BMC Med. 2017; 15(1): 131. PubMed Abstract | Publisher Full Text | Free Full Text
18. Kelley GA, Kelley KS, Roberts S, et al.: Comparison of aerobic exercise, diet or both on lipids and lipoproteins in adults: a meta-analysis of randomized controlled trials. Clin Nutr. 2012; 31(2): 156–67. PubMed Abstract | Publisher Full Text | Free Full Text
19. Aslinia F, Mazza JJ, Yale SH: Megaloblastic anemia and other causes of macrocytosis. Clin Med Res. 2006; 4(3): 236–41. PubMed Abstract | Publisher Full Text | Free Full Text
20. Biagi E, Franceschi C, Rampelli S, et al.: Gut Microbiota and Extreme Longevity. Curr Biol. 2016; 26(11): 1480–5. PubMed Abstract | Publisher Full Text
21. Govindaraju D, Atzmon G, Barzilai N: Genetics, lifestyle and longevity: Lessons from centenarians. Appl Transl Genom. 2015; 4: 23–32. PubMed Abstract | Publisher Full Text | Free Full Text
22. Benjamin EJ, Blaha MJ, Chiuve SE, et al.: Heart Disease and Stroke Statistics-2017 Update: A Report From the American Heart Association. Circulation. 2017; 135(10): e146–e603. PubMed Abstract | Publisher Full Text | Free Full Text
23. Bhupathiraju SN, Hu FB: Epidemiology of Obesity and Diabetes and Their Cardiovascular Complications. Circ Res. 2016; 118(11): 1723–35. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (1)

Version 1

VERSION 1 PUBLISHED 04 Jan 2019

Reader Comment 15 Jan 2019

Evgeny Izumchenko, Johns Hopkins University, USA

15 Jan 2019

Reader Comment

Similar work was performed and published in 2016:
https://www.ncbi.nlm.nih.gov/pubmed/27191382
So here it is the same study performed on a different dataset with just one method. And in that study, the comparison ... Continue reading Similar work was performed and published in 2016:
https://www.ncbi.nlm.nih.gov/pubmed/27191382
So here it is the same study performed on a different dataset with just one method. And in that study, the comparison with multiple other machine learning performed. Multiple reviews on this type of aging biomarkers were published since then, not sure why the authors chose to ignore them. Many other "aging clocks" were published since 2013 and there are common metrics for these clocks. For example, Mean Absolute Error (MAE). Since the work is not novel, it should at least provide a few case studies. For example, in cancer, BMT, etc.
Similar work was performed and published in 2016:
https://www.ncbi.nlm.nih.gov/pubmed/27191382
So here it is the same study performed on a different dataset with just one method. And in that study, the comparison with multiple other machine learning performed. Multiple reviews on this type of aging biomarkers were published since then, not sure why the authors chose to ignore them. Many other "aging clocks" were published since 2013 and there are common metrics for these clocks. For example, Mean Absolute Error (MAE). Since the work is not novel, it should at least provide a few case studies. For example, in cancer, BMT, etc.
Competing Interests: None Close
Report a concern
Comment

Author details Author details

¹ Department of Pediatrics, University of Washington, Seattle, Washington, 98195, USA
² Nourish Balance Thrive, Redding, Californa, USA
³ Institute for Human and Machine Cognition, Penscola, Florida, USA
⁴ University of Western States, Portland, Oregon, USA

Thomas R. Wood
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Supervision, Writing – Original Draft Preparation

Christopher Kelly
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Software, Visualization, Writing – Review & Editing

Megan Roberts
Roles: Conceptualization, Methodology, Writing – Review & Editing

Bryan Walsh
Roles: Conceptualization, Formal Analysis, Methodology, Writing – Review & Editing

Competing interests

The authors are all co-founders of an online commercial tool, bloodcalculator.com, developed to assist in the analysis of blood test results. The predicted age algorithm described in the manuscript is online and freely-available through this tool.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 04 Jan 2019, 8:17

https://doi.org/10.12688/f1000research.17555.1

Copyright

© 2019 Wood TR et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Wood TR, Kelly C, Roberts M and Walsh B. An interpretable machine learning model of biological age [version 1; peer review: 2 approved with reservations]. F1000Research 2019, 8:17 (https://doi.org/10.12688/f1000research.17555.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 04 Jan 2019

Views

33

Reviewer Report 12 Feb 2019

Peter O. Fedichev, Moscow Institute of Physics and Technology, Moscow Region, Russian Federation; Gero LLC, Singapore, Singapore

Approved with Reservations

https://doi.org/10.5256/f1000research.19198.r43612

The manuscript concerns quantification of aging by means of biological age (BA) model trained as a predictor of chronological age from the widely available blood markers (complete blood cell counts and biochemistry). Understanding capabilities and the biology behind such biomarkers ... Continue reading

The manuscript concerns quantification of aging by means of biological age (BA) model trained as a predictor of chronological age from the widely available blood markers (complete blood cell counts and biochemistry). Understanding capabilities and the biology behind such biomarkers are among the key issues in fundamental aging studies and could be very helpful for practical applications.

The manuscript, however, falls short to provide the necessary characterization of the proposed BA model. I believe that the presentation could be improved by addressing the issues listed below so that the results of the study could be eventually indexed in a revised form.

The authors introduced and documented the performance of the particular ML pipeline (XGBoost flavor of decision tree algorithms) trained to predict the chronological age from the blood markers provided by the National Health and Nutrition Examination Survey (NHANES). The rationalizations behind the approach were two-fold. First, the biological age predictor could be (at least according to previous studies) associated with all-cause and disease-specific mortality. Second, the proposed algorithm could produce a better interpretation of the biological age model output in a form, eventually suitable for personalized recommendations.

Unfortunately, the results presented in the manuscript are not sufficient to fully judge the merits of the model.

Major issues:

This is not the first work concerning the biological age estimation from the blood markers in general or in NHANES in particular. I would expect more references to previous work and different machine learning techniques (from principal components analysis to deep learning). I would take a log-linear mortality model from Levine 2018¹ and a deep learning model from Putin 2016⁵ as state of the art modern implementations
The results should be compared with a reference model. I would not expect anything sophisticated, but there must be a comparison. For example, would the novel XGBoost method perform better than a linear regression to chronological age?
What is the correct measure of the model's performance? A biological age should not be judged by the quality of the chronological age prediction only. Iеt has been shown that improvements in the accuracy of this class of BA models may lead to a degradation of the association with chronic diseases and mortality (Levine 2018¹, Pyrkov 2018a². The open access part of the NHANES database contains enough death events and clinical diagnosis. I propose to demonstrate how strongly the proposed BA is associated with the remaining lifespan (Cox-regression significance test )? Is there an association of the biological age (after adjustment for age and sex) with lifestyles (such as smoking, see Pyrkov 2018b³, Mamoshina 2019⁴, etc). Are the effects of smoking reversible in cohorts of individuals, who quit smoking (see Pyrkov 2018b³)? What is the aging acceleration in years associated with smoking (see Mamoshina 2019⁴)? How is it related to the actual lifespan depreciation associated with smoking? Is the biological age associated with chronic diseases?
A linear model, such as a (regularized) regression to age, a log-linear proportional hazard model, would also provide the biological age estimation with contributions associated with the specific markers. Without comparison with a reference linear model, it would be difficult to argue that a more sophisticated approach is easier to interpret.

Let me list a number of minor points, recommendations for the discussion (not necessarily calculations!):

It would be reasonable to discuss hyperparameters involved in the XGBoost model tuning. How those parameters were selected?
There is a log-linear proportional hazard model predicting mortality in NHANES (Levine 2018¹). Is there a way to see if the XGBoost model is better? Is it possible to produce a prophetic statement? Could the authors speculate if their model is more or less statistically powerful than the phenoage?
In the authors' opinion, what are advantages or disadvantages of XGBoost over deep learning models, such as Zhavoronkov?
Is there a way to improve the biological age assessment with XGBoost in combination with proportional hazards models?

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

References

1. Levine M, Lu A, Quach A, Chen B, et al.: An epigenetic biomarker of aging for lifespan and healthspan. Aging. 2018; 10 (4): 573-591 Publisher Full Text
2. Pyrkov TV, Slipensky K, Barg M, Kondrashin A, et al.: Extracting biological age from biomedical data via deep learning: too much of a good thing?. Sci Rep. 2018; 8 (1): 5210 PubMed Abstract | Publisher Full Text
3. Pyrkov TV, Getmantsev E, Zhurov B, Avchaciov K, et al.: Quantitative characterization of biological age and frailty based on locomotor activity records.Aging (Albany NY). 2018; 10 (10): 2973-2990 PubMed Abstract | Publisher Full Text
4. Mamoshina P, Kochetov K, Cortese F, Kovalchuk A, et al.: Blood Biochemistry Analysis to Detect Smoking Status and Quantify Accelerated Aging in Smokers.Sci Rep. 2019; 9 (1): 142 PubMed Abstract | Publisher Full Text
5. Putin E, Mamoshina P, Aliper A, Korzinkin M, et al.: Deep biomarkers of human aging: Application of deep neural networks to biomarker development.Aging (Albany NY). 8 (5): 1021-33 PubMed Abstract | Publisher Full Text

Competing Interests: PF is a founder and an employee of Gero LLC, the company is involved in development and commercialization of biomarkers of aging

Reviewer Expertise: aging research, biomarkers of aging, theory of aging, aging therapeutics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

56

Reviewer Report 15 Jan 2019

Alex Zhavoronkov, Insilico Medicine, Inc.,, Baltimore, MD, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.19198.r42590

While the study is not novel and the technical sophistication is considerably low, the study addresses one of the most important challenges in biomedicine and I recommend accepting it if the authors agree to make substantial improvements to the manuscript, ... Continue reading

While the study is not novel and the technical sophistication is considerably low, the study addresses one of the most important challenges in biomedicine and I recommend accepting it if the authors agree to make substantial improvements to the manuscript, try out the other machine learning methods, research the prior art, and expand the methodology.

Firstly, the study does not provide an overview of the other interpretable biomarkers of aging developed using the multiple data types. Some of the prior clocks are described here: Zhavoronkov et al.¹. It is very similar to the study published in 2016 (Putin et al.²) which not only introduced the concept but also provided a comparison with the other machine learning methods including GBM, RF, DT, LR, kNN, ElasticNet, SVM and DNNs and an online testing platform for the hematological aging clocks.

All of these machine learning methods allow for the various feature selection and feature importance techniques that provide very different results and pick the most important features differently. This paper explains the differences in how the different machine learning techniques prioritize different genes using the transcriptomic age predictor (Mamoshina et al.³). This is not a recommendation for citing these papers but an example of the work that needs to be done.
As it stands, the study looks like a student machine learning data processing exercise and application of the out-of-the-box of python library on the NHANES dataset rather than a complete research paper. The conclusion that SHAP library is a good tool for interpreting the results from a machine learning model is not surprising at all. The paper can be hardly called a methodological paper because it lacks novelty of both methods of age prediction and comparison with classical methods of age prediction using a common blood test.

There is a number of issues I noticed that need to be addressed:

The paper is lacking the information on how the train and test set were selected along with the age by sex distribution. Was the training and optimization of models performed without cross-validation? At the same time, NHANES data also contains people with various conditions including diabetes and kidney disease. Were those individuals excluded from the training process? These important questions are not clear from the paper and need to be clarified.
Related to comment #1: how does the model perform on individuals with chronic diseases?
It is not clear why the predicted age is referred to as ‘biological age’. Biological age should be predictive of mortality. The observed difference between predicted and actual age should be associated with outcome in terms of morbidity or mortality. This should be explored in details with respect to the interpretation of the age predictor results. NHANES data has information about mortality that can be used for this type of analysis. At this point, the analysis suggests that selected blood parameters are associated with age and so predictive of chronological age. This type of analysis was performed in one of the referenced papers utilizing the NHANES dataset but not in this paper. It needs to be performed in order for the paper to be published.
The baseline is lacking. What would the performance be if you predict all samples as a median age for the population? Would it be higher or would it be the same as the test set error?
In line with the above comments, because the performance evaluation is not rigorous and no hyperparameter selection was performed, it is not clear why this age prediction method was selected. One of the commonly used and extensively validated models is Klemera and Doubal. (Klemera P, Doubal S. A new approach to the concept and computation of biological age ⁴). I would suggest exploring KD age prediction model in terms of interoperability of the blood test markers. Would be the machine learning model better? If so, why?
As mentioned above, there is no baseline model, comparison of different models or hyperparameters tuning. Without the interpretation of the difference between the predicted and actual chronological age (association with mortality or diseases for example), this difference is just an error of the model. How this error of the model would affect the results? Would the results change if the model is trained on samples that were initially predicted accurately? What about the samples predicted with a greater error? This need to be explored.
Related to the point, age distribution plots of those randomly selected samples are needed. How would different age groups contribute to the results?
Instead of using k-fold cross-validation authors used just random 80/20 train/test split, so results presented at figure 2 (SHAP summary plots) cannot be interpreted as stable. E.g. for men the first 5 biomarkers are very similar in terms of importance for age prediction, so the order of these five biomarkers probably will be changed using different random data split.
Preprocessing is rather scarce. E.g. outlier analysis was not provided. Were they excluded from the analysis? If not, why and how they would contribute the SHAP summary plots?
I would like to see the comparison of the estimated reference ranges with commonly accepted reference ranges.
A linear fit line on figure 1 is barely visible because dots and line are plotted using the same color
It is always a good practice to provide figures optimized color blind readers. Figure 2 colors are hardly distinguishable.
Figure 3 is lacking the actual chronological age of the individual analyzed.

My recommendation is to address these points and explore the prior art. Biological age prediction using machine learning is a very interesting and important field and the studies need to be consistent and comparable.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

References

1. Zhavoronkov A, Mamoshina P, Vanhaelen Q, Scheibye-Knudsen M, et al.: Artificial intelligence for aging and longevity research: Recent advances and perspectives.Ageing Res Rev. 2019; 49: 49-66 PubMed Abstract | Publisher Full Text
2. Putin E, Mamoshina P, Aliper A, Korzinkin M, et al.: Deep biomarkers of human aging: Application of deep neural networks to biomarker development.Aging (Albany NY). 8 (5): 1021-33 PubMed Abstract | Publisher Full Text
3. Mamoshina P, Volosnikova M, Ozerov IV, Putin E, et al.: Machine Learning on Human Muscle Transcriptomic Data for Biomarker Discovery and Tissue-Specific Drug Target Identification.Front Genet. 2018; 9: 242 PubMed Abstract | Publisher Full Text
4. Klemera P, Doubal S: A new approach to the concept and computation of biological age.Mech Ageing Dev. 2006; 127 (3): 240-8 PubMed Abstract | Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: aging research, machine learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (1)

Version 1

VERSION 1 PUBLISHED 04 Jan 2019

Reader Comment 15 Jan 2019

Evgeny Izumchenko, Johns Hopkins University, USA

15 Jan 2019

Reader Comment

Similar work was performed and published in 2016:
https://www.ncbi.nlm.nih.gov/pubmed/27191382
So here it is the same study performed on a different dataset with just one method. And in that study, the comparison ... Continue reading Similar work was performed and published in 2016:
https://www.ncbi.nlm.nih.gov/pubmed/27191382
So here it is the same study performed on a different dataset with just one method. And in that study, the comparison with multiple other machine learning performed. Multiple reviews on this type of aging biomarkers were published since then, not sure why the authors chose to ignore them. Many other "aging clocks" were published since 2013 and there are common metrics for these clocks. For example, Mean Absolute Error (MAE). Since the work is not novel, it should at least provide a few case studies. For example, in cancer, BMT, etc.
Similar work was performed and published in 2016:
https://www.ncbi.nlm.nih.gov/pubmed/27191382
So here it is the same study performed on a different dataset with just one method. And in that study, the comparison with multiple other machine learning performed. Multiple reviews on this type of aging biomarkers were published since then, not sure why the authors chose to ignore them. Many other "aging clocks" were published since 2013 and there are common metrics for these clocks. For example, Mean Absolute Error (MAE). Since the work is not novel, it should at least provide a few case studies. For example, in cancer, BMT, etc.
Competing Interests: None Close
Report a concern
Comment

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 04 Jan 19	read	read

Alex Zhavoronkov, Insilico Medicine, Inc.,, Baltimore, USA
Peter O. Fedichev, Moscow Institute of Physics and Technology, Moscow Region, Russian Federation; Gero LLC, Singapore, Singapore

Comments on this article

All Comments(1)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

33 Views

12 Feb 2019 | for Version 1

Peter O. Fedichev, Moscow Institute of Physics and Technology, Moscow Region, Russian Federation; Gero LLC, Singapore, Singapore

33 Views Cite this report Responses(0)

Approved With Reservations

The manuscript concerns quantification of aging by means of biological age (BA) model trained as a predictor of chronological age from the widely available blood markers (complete blood cell counts and biochemistry). Understanding capabilities and the biology behind such biomarkers are among the key issues in fundamental aging studies and could be very helpful for practical applications.

The manuscript, however, falls short to provide the necessary characterization of the proposed BA model. I believe that the presentation could be improved by addressing the issues listed below so that the results of the study could be eventually indexed in a revised form.

The authors introduced and documented the performance of the particular ML pipeline (XGBoost flavor of decision tree algorithms) trained to predict the chronological age from the blood markers provided by the National Health and Nutrition Examination Survey (NHANES). The rationalizations behind the approach were two-fold. First, the biological age predictor could be (at least according to previous studies) associated with all-cause and disease-specific mortality. Second, the proposed algorithm could produce a better interpretation of the biological age model output in a form, eventually suitable for personalized recommendations.

Unfortunately, the results presented in the manuscript are not sufficient to fully judge the merits of the model.

Major issues:

This is not the first work concerning the biological age estimation from the blood markers in general or in NHANES in particular. I would expect more references to previous work and different machine learning techniques (from principal components analysis to deep learning). I would take a log-linear mortality model from Levine 2018¹ and a deep learning model from Putin 2016⁵ as state of the art modern implementations
The results should be compared with a reference model. I would not expect anything sophisticated, but there must be a comparison. For example, would the novel XGBoost method perform better than a linear regression to chronological age?
What is the correct measure of the model's performance? A biological age should not be judged by the quality of the chronological age prediction only. Iеt has been shown that improvements in the accuracy of this class of BA models may lead to a degradation of the association with chronic diseases and mortality (Levine 2018¹, Pyrkov 2018a². The open access part of the NHANES database contains enough death events and clinical diagnosis. I propose to demonstrate how strongly the proposed BA is associated with the remaining lifespan (Cox-regression significance test )? Is there an association of the biological age (after adjustment for age and sex) with lifestyles (such as smoking, see Pyrkov 2018b³, Mamoshina 2019⁴, etc). Are the effects of smoking reversible in cohorts of individuals, who quit smoking (see Pyrkov 2018b³)? What is the aging acceleration in years associated with smoking (see Mamoshina 2019⁴)? How is it related to the actual lifespan depreciation associated with smoking? Is the biological age associated with chronic diseases?
A linear model, such as a (regularized) regression to age, a log-linear proportional hazard model, would also provide the biological age estimation with contributions associated with the specific markers. Without comparison with a reference linear model, it would be difficult to argue that a more sophisticated approach is easier to interpret.

Let me list a number of minor points, recommendations for the discussion (not necessarily calculations!):

It would be reasonable to discuss hyperparameters involved in the XGBoost model tuning. How those parameters were selected?
There is a log-linear proportional hazard model predicting mortality in NHANES (Levine 2018¹). Is there a way to see if the XGBoost model is better? Is it possible to produce a prophetic statement? Could the authors speculate if their model is more or less statistically powerful than the phenoage?
In the authors' opinion, what are advantages or disadvantages of XGBoost over deep learning models, such as Zhavoronkov?
Is there a way to improve the biological age assessment with XGBoost in combination with proportional hazards models?

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

References

1. Levine M, Lu A, Quach A, Chen B, et al.: An epigenetic biomarker of aging for lifespan and healthspan. Aging. 2018; 10 (4): 573-591 Publisher Full Text
2. Pyrkov TV, Slipensky K, Barg M, Kondrashin A, et al.: Extracting biological age from biomedical data via deep learning: too much of a good thing?. Sci Rep. 2018; 8 (1): 5210 PubMed Abstract | Publisher Full Text
3. Pyrkov TV, Getmantsev E, Zhurov B, Avchaciov K, et al.: Quantitative characterization of biological age and frailty based on locomotor activity records.Aging (Albany NY). 2018; 10 (10): 2973-2990 PubMed Abstract | Publisher Full Text
4. Mamoshina P, Kochetov K, Cortese F, Kovalchuk A, et al.: Blood Biochemistry Analysis to Detect Smoking Status and Quantify Accelerated Aging in Smokers.Sci Rep. 2019; 9 (1): 142 PubMed Abstract | Publisher Full Text
5. Putin E, Mamoshina P, Aliper A, Korzinkin M, et al.: Deep biomarkers of human aging: Application of deep neural networks to biomarker development.Aging (Albany NY). 8 (5): 1021-33 PubMed Abstract | Publisher Full Text

Competing Interests

PF is a founder and an employee of Gero LLC, the company is involved in development and commercialization of biomarkers of aging

Reviewer Expertise

aging research, biomarkers of aging, theory of aging, aging therapeutics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

56 Views

15 Jan 2019 | for Version 1

Alex Zhavoronkov, Insilico Medicine, Inc.,, Baltimore, MD, USA

56 Views Cite this report Responses(0)

Approved With Reservations

While the study is not novel and the technical sophistication is considerably low, the study addresses one of the most important challenges in biomedicine and I recommend accepting it if the authors agree to make substantial improvements to the manuscript, try out the other machine learning methods, research the prior art, and expand the methodology.

Firstly, the study does not provide an overview of the other interpretable biomarkers of aging developed using the multiple data types. Some of the prior clocks are described here: Zhavoronkov et al.¹. It is very similar to the study published in 2016 (Putin et al.²) which not only introduced the concept but also provided a comparison with the other machine learning methods including GBM, RF, DT, LR, kNN, ElasticNet, SVM and DNNs and an online testing platform for the hematological aging clocks.

All of these machine learning methods allow for the various feature selection and feature importance techniques that provide very different results and pick the most important features differently. This paper explains the differences in how the different machine learning techniques prioritize different genes using the transcriptomic age predictor (Mamoshina et al.³). This is not a recommendation for citing these papers but an example of the work that needs to be done.
As it stands, the study looks like a student machine learning data processing exercise and application of the out-of-the-box of python library on the NHANES dataset rather than a complete research paper. The conclusion that SHAP library is a good tool for interpreting the results from a machine learning model is not surprising at all. The paper can be hardly called a methodological paper because it lacks novelty of both methods of age prediction and comparison with classical methods of age prediction using a common blood test.

There is a number of issues I noticed that need to be addressed:

The paper is lacking the information on how the train and test set were selected along with the age by sex distribution. Was the training and optimization of models performed without cross-validation? At the same time, NHANES data also contains people with various conditions including diabetes and kidney disease. Were those individuals excluded from the training process? These important questions are not clear from the paper and need to be clarified.
Related to comment #1: how does the model perform on individuals with chronic diseases?
It is not clear why the predicted age is referred to as ‘biological age’. Biological age should be predictive of mortality. The observed difference between predicted and actual age should be associated with outcome in terms of morbidity or mortality. This should be explored in details with respect to the interpretation of the age predictor results. NHANES data has information about mortality that can be used for this type of analysis. At this point, the analysis suggests that selected blood parameters are associated with age and so predictive of chronological age. This type of analysis was performed in one of the referenced papers utilizing the NHANES dataset but not in this paper. It needs to be performed in order for the paper to be published.
The baseline is lacking. What would the performance be if you predict all samples as a median age for the population? Would it be higher or would it be the same as the test set error?
In line with the above comments, because the performance evaluation is not rigorous and no hyperparameter selection was performed, it is not clear why this age prediction method was selected. One of the commonly used and extensively validated models is Klemera and Doubal. (Klemera P, Doubal S. A new approach to the concept and computation of biological age ⁴). I would suggest exploring KD age prediction model in terms of interoperability of the blood test markers. Would be the machine learning model better? If so, why?
As mentioned above, there is no baseline model, comparison of different models or hyperparameters tuning. Without the interpretation of the difference between the predicted and actual chronological age (association with mortality or diseases for example), this difference is just an error of the model. How this error of the model would affect the results? Would the results change if the model is trained on samples that were initially predicted accurately? What about the samples predicted with a greater error? This need to be explored.
Related to the point, age distribution plots of those randomly selected samples are needed. How would different age groups contribute to the results?
Instead of using k-fold cross-validation authors used just random 80/20 train/test split, so results presented at figure 2 (SHAP summary plots) cannot be interpreted as stable. E.g. for men the first 5 biomarkers are very similar in terms of importance for age prediction, so the order of these five biomarkers probably will be changed using different random data split.
Preprocessing is rather scarce. E.g. outlier analysis was not provided. Were they excluded from the analysis? If not, why and how they would contribute the SHAP summary plots?
I would like to see the comparison of the estimated reference ranges with commonly accepted reference ranges.
A linear fit line on figure 1 is barely visible because dots and line are plotted using the same color
It is always a good practice to provide figures optimized color blind readers. Figure 2 colors are hardly distinguishable.
Figure 3 is lacking the actual chronological age of the individual analyzed.

My recommendation is to address these points and explore the prior art. Biological age prediction using machine learning is a very interesting and important field and the studies need to be consistent and comparable.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

References

1. Zhavoronkov A, Mamoshina P, Vanhaelen Q, Scheibye-Knudsen M, et al.: Artificial intelligence for aging and longevity research: Recent advances and perspectives.Ageing Res Rev. 2019; 49: 49-66 PubMed Abstract | Publisher Full Text
2. Putin E, Mamoshina P, Aliper A, Korzinkin M, et al.: Deep biomarkers of human aging: Application of deep neural networks to biomarker development.Aging (Albany NY). 8 (5): 1021-33 PubMed Abstract | Publisher Full Text
3. Mamoshina P, Volosnikova M, Ozerov IV, Putin E, et al.: Machine Learning on Human Muscle Transcriptomic Data for Biomarker Discovery and Tissue-Specific Drug Target Identification.Front Genet. 2018; 9: 242 PubMed Abstract | Publisher Full Text
4. Klemera P, Doubal S: A new approach to the concept and computation of biological age.Mech Ageing Dev. 2006; 127 (3): 240-8 PubMed Abstract | Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

aging research, machine learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Pyrkov TV, Slipensky K, Barg M, et al.: Extracting biological age from biomedical data via deep learning: too much of a good thing? Sci Rep. 2018; 8(1): 5210. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Liu Z, Kuo PL, Horvath S, et al.: Phenotypic Age: a novel signature of mortality and morbidity risk. bioRxiv. 2018: 363291. Publisher Full Text

[3] 3. Mamoshina P, Kochetov K, Putin E, et al.: Population Specific Biomarkers of Human Aging: A Big Data Study Using South Korean, Canadian, and Eastern European Patient Populations. J Gerontol A Biol Sci Med Sci. 2018; 73(11): 1482–1490. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Levine ME, Lu AT, Quach A, et al.: An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY). 2018; 10(4): 573–91. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Feil R, Fraga MF: Epigenetics and the environment: emerging patterns and implications. Nat Rev Genet. 2012; 13(2): 97–109. PubMed Abstract | Publisher Full Text

[6] 6. Belsky DW, Caspi A, Houts R, et al.: Quantification of biological aging in young adults. Proc Natl Acad Sci U S A. 2015; 112(30): E4104–10. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Montavon G, Samek W, Müller K-R: Methods for interpreting and understanding deep neural networks. Digital Signal Processing. 2018; 73: 1–15. Publisher Full Text

[8] 8. Lundberg SM, Nair B, Vavilala MS, et al.: Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018; 2: 749–60. Publisher Full Text

[9] 9. Kelly C: cck197/ml-bio-age: Initial release (Version v1.0). Zenodo. 2018. http://www.doi.org/10.5281/zenodo.2440203

[10] 10. Petursson H, Sigurdsson JA, Bengtsson C, et al.: Is the use of cholesterol in mortality risk algorithms in clinical guidelines valid? Ten years prospective data from the Norwegian HUNT 2 study. J Eval Clin Pract. 2012; 18(1): 159–68. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Kunutsor SK, Apekey TA, Seddoh D, et al.: Liver enzymes and risk of all-cause mortality in general populations: a systematic review and meta-analysis. Int J Epidemiol. 2014; 43(1): 187–201. PubMed Abstract | Publisher Full Text

[12] 12. Yi SW, Park S, Lee YH, et al.: Association between fasting glucose and all-cause mortality according to sex and age: a prospective cohort study. Sci Rep. 2017; 7(1): 8194. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. Bjørnholt JV, Erikssen G, Aaser E, et al.: Fasting blood glucose: an underestimated risk factor for cardiovascular death. Results from a 22-year follow-up of healthy nondiabetic men. Diabetes Care. 1999; 22(1): 45–9. PubMed Abstract | Publisher Full Text

[14] 14. Fulks M, Stout RL, Dolan VF: Albumin and all-cause mortality risk in insurance applicants. J Insur Med. 2010; 42(1): 11–7. PubMed Abstract

[15] 15. Zurauskaite G, Meier M, Voegeli A, et al.: Biological pathways underlying the association of red cell distribution width and adverse clinical outcome: Results of a prospective cohort study. PLoS One. 2018; 13(1): e0191280. PubMed Abstract | Publisher Full Text | Free Full Text

[16] 16. Kadoglou NPE, Biddulph JP, Rafnsson SB, et al.: The association of ferritin with cardiovascular and all-cause mortality in community-dwellers: The English longitudinal study of ageing. PLoS One. 2017; 12(6): e0178994. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Kolb H, Martin S: Environmental/lifestyle factors in the pathogenesis and prevention of type 2 diabetes. BMC Med. 2017; 15(1): 131. PubMed Abstract | Publisher Full Text | Free Full Text

[18] 18. Kelley GA, Kelley KS, Roberts S, et al.: Comparison of aerobic exercise, diet or both on lipids and lipoproteins in adults: a meta-analysis of randomized controlled trials. Clin Nutr. 2012; 31(2): 156–67. PubMed Abstract | Publisher Full Text | Free Full Text

[19] 19. Aslinia F, Mazza JJ, Yale SH: Megaloblastic anemia and other causes of macrocytosis. Clin Med Res. 2006; 4(3): 236–41. PubMed Abstract | Publisher Full Text | Free Full Text

[20] 20. Biagi E, Franceschi C, Rampelli S, et al.: Gut Microbiota and Extreme Longevity. Curr Biol. 2016; 26(11): 1480–5. PubMed Abstract | Publisher Full Text

[21] 21. Govindaraju D, Atzmon G, Barzilai N: Genetics, lifestyle and longevity: Lessons from centenarians. Appl Transl Genom. 2015; 4: 23–32. PubMed Abstract | Publisher Full Text | Free Full Text

[22] 22. Benjamin EJ, Blaha MJ, Chiuve SE, et al.: Heart Disease and Stroke Statistics-2017 Update: A Report From the American Heart Association. Circulation. 2017; 135(10): e146–e603. PubMed Abstract | Publisher Full Text | Free Full Text

[23] 23. Bhupathiraju SN, Hu FB: Epidemiology of Obesity and Diabetes and Their Cardiovascular Complications. Circ Res. 2016; 118(11): 1723–35. PubMed Abstract | Publisher Full Text | Free Full Text

An interpretable machine learning model of biological age

Abstract

Keywords

Introduction

Methods

Input data

Table 1. Demographic data from the entire NHANES dataset.

Model generation

Model interrogation

Worked example

Results

Differences between predicted age and biological age

Figure 1. Linear regression analysis comparing actual (chronological) and predicted (biological) age.

SHAP plots of input markers

Figure 2. SHAP summary plots showing the adjustment to predicted age (x-axis) for each of the top 20 markers.

Table 2. Top 20 markers affecting predicted age in women.

Table 3. Top 20 markers affecting predicted age in men.

Fully interpretable personalised predictions

Figure 3. Waterfall chart depicting how individual input markers contribute to a given predicted biological age (y-axis) for author C.K.

Discussion

Data availability

Software availability

Grant information

References

Comments on this article Comments (1)

Open Peer Review

Comments on this article Comments (1)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated