Methylation risk score in peripheral blood predictive of conversion from mild cognitive impairment to Alzheimer's Disease

Jarrett D. Morrow

doi:10.12688/f1000research.140403.1

Home Browse Methylation risk score in peripheral blood predictive of conversion...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Brief Report

Methylation risk score in peripheral blood predictive of conversion from mild cognitive impairment to Alzheimer's Disease

[version 1; peer review: 1 approved with reservations, 1 not approved]

Jarrett D. Morrow

PUBLISHED 01 Sep 2023

Author details Author details

Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA

Jarrett D. Morrow
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Bioinformatics gateway.

This article is included in the Genomics and Genetics gateway.

Abstract

Background: Alzheimer’s disease (AD) is a neurodegenerative and heterogeneous disorder with complex etiology. Mild cognitive impairment (MCI) may represent an intermediate stage of AD, and the ability to identify MCI patients at greater risk of conversion to AD could guide personalized treatments. This study sought to develop a methylation risk score predictive of conversion from MCI to AD using publicly available blood DNA methylation (DNAm) data.
Methods: Using blood DNA methylation data from an epigenome-wide association study of AD that included 111 subjects with MCI, a methylation risk score of MCI conversion was created using an elastic-net framework. The elastic-net model was trained with a high-variance subset of the DNAm data, age and sex as predictors.
Results: The final model included four CpG sites: PNCK (cg01231576), SLC6A3 (cg09892121), and TRIM62 (cg25342005), with a fourth (cg17292662) near the genes ATP6V1H and RGS20. A significant difference (p < 0.0001, t-test) was observed in the scores for MCI stable subjects compared with MCI converters. No statistically significant difference was observed between AD subjects and controls, suggesting specificity of the risk score for susceptibility to conversion.
Conclusions: The ability to identify MCI patients at greater risk of progression could inform early interventions and is a critical component in mitigation strategies for AD. This study provides insight into a potential role for epigenetics in the development of a multi-omic risk score of conversion.

Keywords

Alzheimer’s disease, cognitive decline, risk score, DNA methylation, epigenetics

Corresponding author: Jarrett D. Morrow

Competing interests: No competing interests were disclosed.

Grant information: This work was supported by the National Heart, Lung, and Blood Institute (K25 HL136846)
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2023 Morrow JD. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Morrow JD. Methylation risk score in peripheral blood predictive of conversion from mild cognitive impairment to Alzheimer's Disease [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2023, 12:1087 (https://doi.org/10.12688/f1000research.140403.1) First published: 01 Sep 2023, 12:1087 (https://doi.org/10.12688/f1000research.140403.1) Latest published: 15 Mar 2024, 12:1087 (https://doi.org/10.12688/f1000research.140403.2)

Introduction

Alzheimer’s disease (AD) is a neurodegenerative and heterogeneous disorder with complex etiology and devastating impact on individuals and families. Although genome-wide association studies continue to provide insight into the genetic susceptibility to AD,¹^,² epigenome-wide association studies (EWAS) and, in particular, studies of DNA methylation (DNAm) have the potential to capture signals related to environmental contributions.³ Mild cognitive impairment (MCI) may represent an intermediate stage of AD⁴ and the ability to identify MCI patients at greater risk of conversion to AD could guide personalized strategies for mitigation of decline, as the pathology of AD may be present years before onset of symptoms.⁵ Risk scores that provide predictive measures of AD susceptibility have been created using genetic⁶^,⁷ and blood transcriptomic data.⁸ Epigenome-wide associations studies in peripheral blood have identified differentially methylated CpG sites associated with AD⁹^–¹¹ and AD progression.¹²^–¹⁶ Associations between peripheral blood epigenetic age acceleration and cognitive function have also been examined.¹⁷^,¹⁸

This study was focused on development of a methylation risk score (MRS) predictive of conversion from MCI to AD, using publicly available DNA methylation data⁹ and machine learning methods. This score was evaluated in cross-sectional data in AD subjects and controls in both the primary and a secondary datasets to help understand the relationship of the conversion risk score to overall disease severity. This study provides insight into the value epigenetics could provide in a multi-omic risk score of conversion.

Methods

Primary data

To create the risk score, blood DNA methylation (DNAm) data from an EWAS of AD in 300 subjects⁹ were obtained from the Gene Expression Omnibus (GEO: GSE144858). The cross-European AddNeuroMed study dataset includes 93 subjects with Alzheimer’s, 111 with MCI and 96 control subjects. Of the 111 MCI subjects, 68 were stable after one year, 39 converted to AD within one year and four converted at an unknown time. Roubroeks and colleagues⁹ extracted DNA from the blood samples and assayed DNA methylation levels using the Illumina Infinium Human Methylation 450K BeadChip array. After quality control analyses, they quantile-normalized the data using the dasen method from the R package wateRmelon to create a matrix of beta values (CpG sites in rows and subjects in columns). The data from the four subjects with unknown conversion date were excluded. Two MCI subjects less than 65 years of age, excluded from the study by Roubroeks et al.,⁹ were included in this study. Prior to the analyses in this study, any CpG site with a detection p-value >0 for any subject was excluded. Using the annotation from Zhou et al.,¹⁹ CpG sites with probe mapping issues or having a SNP with minor allele frequencies >1% within five bases were also removed. To reduce the influence of genetics on the prediction models, CpG sites with significant genetic associations (methylation quantitative trait loci: mQTL) in peripheral blood²⁰ were also removed. The beta values for the remaining CG-annotated sites were retained for analysis. To identify possible sex mismatches, multidimensional scaling (MDS) plots were created using the cmdscale function in R and the X and Y chromosome data. MDS plots were also created using high variance DNAm data to observe batch effects. No sex mismatches or batch effects were observed.

Secondary data

An independent set of blood DNA methylation data from an epigenome-wide meta-analysis of neurodegenerative disorders²¹ was obtained from the Gene Expression Omnibus. The Australian Imaging, Biomarker & Lifestyle Flagship Study of Aging (AIBL) dataset of 726 subjects included 161 subjects with Alzheimer’s, 94 with MCI and 471 control subjects. Longitudinal information regarding conversion to AD is not available in this study. Nabais and colleagues²¹ assayed DNAm using the Illumina HumanMethylationEPIC BeadChip Array. Although data processed using functional normalization were publicly available, the methylated and unmethylated intensities were used to create a dataset normalized using dasen in the R package wateRmelon²² to create a matrix of beta values. Prior to normalization, CpG sites with a detection p-value >0 for any subject were excluded. CpG sites with probe mapping issues, nearby SNPs with minor allele frequencies >1%¹⁹ or a significant mQTL in peripheral blood²⁰ were also removed. Following dasen normalization, no sex mismatches were observed in the MDS plot created using the X and Y chromosome data. The beta values for the remaining CG-annotated sites were retained for analysis.

Analysis

A methylation risk score of MCI conversion was created using an elastic-net binomial (classification) model via the R package glmnet.²³ This regularized regression method combines the L₁ and L₂ penalties of the lasso and ridge methods and provides the ability to retain correlated features. To adjust model and feature selection performance, the contribution of each penalty is selected using the hyperparameter alpha. Using DNAm beta values, age and sex as predictors and the MCI outcome (stable or conversion to AD) as the response, a model was trained using a 10-fold cross-validation approach. Given the limited number of MCI subjects, all data were used in the training process. The alpha hyperparameter was chosen to minimize the cross-validation misclassification error. To compare the performance with a score based on demographics only, a binomial risk score model was created using the glm function in R with age and sex as predictors.

Receiver operating characteristic (ROC) curves were created for each fold using the function roc.glmnet and for the final model with all training data using the function roc in the R package pROC.²⁴ Risk scores were calculated using the final conversion risk score model via the predict function in the glmnet package.

Results

After quality control procedures, peripheral blood DNAm data were available for 251,491 CpG sites and 296 samples, including 93 AD subjects, 107 MCI subjects (68 stable, 39 converter) and 96 controls in the Roubroeks et al. (GSE144858) dataset (Table 1). Approximately 60% of the European subjects were female (52% female among MCI subjects).

Table 1. Demographics of European study subjects in primary and secondary datasets.

	GSE144858⁹	GSE153712²¹
Age in years (mean ± sd)	75 ± 6.5	N/A
Sex (Female/Male)	177/119	400/326
Disease status
Control	96	471
Mild cognitive impairment	107 (68 stable, 39 converted*)	94
Alzheimer’s disease	93	161

* Within one year.

An elastic net model was trained using the DNAm beta values (range: 0 to 1) for the MCI subjects. The beta values, age, and sex were included as possible predictors and the MCI outcome (stable or conversion to AD) was the response. Blood cell distribution values were not included in the model, seeking an epigenetic prediction model that may capture biology including shifts in cell abundance with conversion to AD. With a focus on more robust signatures, only CpG sites with variance in the top quartile (62,873 CpG sites) were included in the training set. After executing cross-validation for values of alpha from 0 to 1, an alpha value of 0.8 was observed to minimize cross-validation classification error (error = 0.34).

The final model from this process included four features, all CpG sites (Table 2). Three of the sites are annotated to the genes PNCK (cg01231576), SLC6A3 (cg09892121), and TRIM62 (cg25342005), with a fourth (cg17292662) near the genes ATP6V1H and RGS20. The distribution of the beta values for these sites are centered between 0.6 and 0.8 (Figures S1 to S4, Extended data). ROC curves were produced for each of the 10 folds (Figure S5, Extended data) with alpha=0.8 and the highest observed area under the ROC curve (AUC) was 0.635. A ROC curve was also created using the training data with the final model (Figure S6, Extended data) and the AUC was 0.877.

Table 2. CpG sites included in the conversion risk score model.

CpG site	Chromosome	Gene annotation*	Relationship to gene	CpG island location
cg01231576	Chr X	PNCK	TSS1500	S_Shore
cg09892121	Chr 5	SLC6A3	Body	OpenSea
cg17292662	Chr 8	ATP6V1H ^#, RGS20 ^#	N/A	S_Shelf
cg25342005	Chr 1	TRIM62	TSS1500	S_Shore

# Genes with nearby TSS.

* Illumina Infinium 450K BeadChip annotation from Zhou et al.¹⁹

The methylation risk score (predicted probability of conversion) was calculated using the final model and the DNAm beta values. A significant difference (p < 0.0001, t-test) in the score for MCI stable subjects compared with MCI converters may be observed in the box plots of the risk score (Figure 1). For MCI subjects, being above the mean MRS compared with below the mean equates to an odds ratio of 9.8 for conversion. Also included in Figure 1 are the scores for the AD subjects and controls. The seven controls and seven AD cases less than 65 years of age, excluded in the analyses by Roubroeks et al., were included in the MRS calculations in Figure 1. A nominal increase in MRS values may be observed with increased severity at baseline (Figure S7, Extended data), with no statistically significant difference between AD and controls (p = 0.83, t-test), perhaps suggesting specificity of the risk score for susceptibility to conversion. Box plots were created using the beta values for each of the four predictive sites across disease severity, stratified by sex (Figures S8 to S11, Extended data). The direction of effect for DNAm with respect to conversion in the MCI subjects is consistent across males and females.

Figure 1. Box plots of the MRS for all subjects across disease states including MCI outcomes.

After processing, peripheral blood DNAm data were available for 601,732 CpG sites and 296 samples, including 161 AD subjects, 94 MCI subjects and 471 controls in the Nabais et al. (GSE153712) dataset (Table 1). Approximately 55% of the subjects were female. The MRS was calculated using the final conversion risk score model and the DNAm beta values in this secondary dataset. In the box plots across disease severity (Figure S12, Extended data), the MRS values increase with severity consistent with the discovery dataset. The MRS values are higher overall for the Nabais et al. data compared with Roubroeks et al. The secondary DNAm data were created using the Illumina HumanMethylationEPIC platform in contrast to the use of the Illumina 450K platform by Roubroeks et al.⁹

For the risk score created using only age and sex as predictors in the Roubroeks et al. dataset (Figure S13, Extended data), a significant difference was not observed between MCI stable and MCI converter (p = 0.2, t-test) and the AUC was 0.579 with the training data (Figure S14, Extended data).

Discussion

In this study, a methylation risk score was created to quantify susceptibility to conversion from MCI to AD. Highly variable CpG sites were selected for development of the score, seeking information to inform identification of robust biomarkers. In addition, the effects of genetics were suppressed by excluding sites previously associated with an mQTL or located near a common SNP. Using a modest set of 107 subjects with MCI, the model demonstrated predictive capabilities. The limited set of selected features, an AUC in the training data of 0.877, and a maximum AUC during cross-validation of 0.635 suggests both higher variance and bias. An MRS may find better utility as a component in a comprehensive conversion susceptibility prediction strategy.

Two of the four predictive CpG sites (cg17292662, cg25342005) were identified in the previous study by Roubroeks and colleagues.⁹ These two sites were their top two findings in the EWAS of MCI conversion, though neither site was statistically significant. The CpG site cg17292662 is within 6,000 bases of the transcription start sites for two genes: ATP6V1H and RGS20. A prior GWAS has identified a variant in ATP6V1H (ATPase H+ transporting V1 subunit H) influencing human cerebrospinal fluid (CSF) β-site APP cleaving enzyme (BACE) activity.²⁵ Previous studies have suggested that elevated beta-site amyloid precursor protein-cleaving enzyme 1 (BACE1) levels in CSF may be an indicator of MCI and early-stage AD (Zong Arch Gen Psychiatry), with BACE1 a possible therapeutic target.²⁶ The gene RGS20 (regulator of G protein signaling 20) has biased expression in the brain and has been found to be downregulated in AD astrocytes.²⁷ The CpG site cg25342005 is within 1500 bases of TSS of the gene TRIM62 (tripartite motif containing 62), a gene expressed in the brain. The CpG site cg09892121 is located within the gene SLC6A3 (solute carrier family 6 member 3) and was among the top 500 findings in the study by Roubroeks et al.⁹ The gene SLC6A3 encodes a dopamine transporter and a genetic variant in the gene was previously identified that may confer greater risk of dementia and cognitive decline.²⁸ The fourth predictive site (cg01231576) is within 1500 bases of the PNCK (pregnancy up-regulated nonubiquitous CaM kinase) transcription start site. This site was not among the findings of Roubroeks, as the authors of that study did not include the sex chromosome DNAm data. The gene PNCK has biased expression in the brain and in a recent brain RNA-sequencing study, expression was associated with cognitive trajectories in a sex-specific manner.²⁹ The MRS model leverages DNAm differences in the TSS of PNCK that are concordant across females and males.

Limitations of this study include the small population of MCI subjects with longitudinal outcomes for development of an MRS. Identifying a signature predictive of cognitive decline in peripheral blood presents many challenges with respect to signal and noise and a larger study population with longitudinal data would enhance future MRS creation efforts. The ability to effectively train an MRS model with 80% of the data, while holding out 20% of the data for validation, would provide an effective internal validation. Future efforts may also explore other machine learning frameworks, including deep learning, random forests and support vector machines. The secondary population for MRS evaluation lacked longitudinal outcome data. The disparity between MRS values for the training and secondary datasets may be due to the difference in assay platforms, as low correlations have been previously observed between Illumina 450K and EPIC DNA methylation data in blood for many CpG sites.³⁰

The ability to identify MCI patients at greater risk of progression could inform early interventions and is a critical component in mitigation strategies for AD. This study is the first to develop a blood-based methylation risk score of conversion from mild cognitive impairment to Alzheimer’s disease. Although the predictive ability of the score is limited, this study demonstrates the potential value epigenetics would add to a risk score based on multi-omic and phenotypic data collected from the same patients.

Author’s contributions

JDM: conceptualization, methodology, formal analysis, interpretation of data, manuscript preparation and approval of the final version

Data availability

Underlying data

Gene Expression Omnibus: An epigenome-wide association study of Alzheimer’s disease blood highlights robust DNA hypermethylation in the HOXB6 gene, https://identifiers.org/geo:GSE144858.⁹

Gene Expression Omnibus: Meta-analysis of genome-wide DNA methylation identifies shared associations across neurodegenerative disorders, https://identifiers.org/geo:GSE153712.²¹

Extended data

Zenodo: Methylation risk score in peripheral blood predictive of conversion from mild cognitive impairment to Alzheimer’s Disease, https://doi.org/10.5281/zenodo.8189746.

This project contains the following extended data:

- Methylation_Risk_Score_Supplemental.pdf

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

References

1. Bellenguez C, Küçükali F, Jansen IE, et al.: New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat. Genet. 2022 Apr; 54(4): 412–436. PubMed Abstract | Publisher Full Text | Free Full Text
2. Wightman DP, Jansen IE, Savage JE, et al.: A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nat. Genet. 2021 Sep; 53(9): 1276–1282. PubMed Abstract | Publisher Full Text | Free Full Text
3. Hannon E, Lunnon K, Schalkwyk L, et al.: Interindividual methylomic variation across blood, cortex, and cerebellum: implications for epigenetic studies of neurological and neuropsychiatric phenotypes. Epigenetics. 2015 Oct 12; 10(11): 1024–1032. PubMed Abstract | Publisher Full Text | Free Full Text
4. Morris JC, Storandt M, Miller JP, et al.: Mild Cognitive Impairment Represents Early-Stage Alzheimer Disease. Arch. Neurol. 2001 Mar 1; 58(3): 397–405. PubMed Abstract | Publisher Full Text
5. Markesbery WR: Neuropathologic Alterations in Mild Cognitive Impairment: A Review. J. Alzheimers Dis. JAD. 2010 Jan; 19(1): 221–228. PubMed Abstract | Publisher Full Text | Free Full Text
6. Leonenko G, Baker E, Stevenson-Hoare J, et al.: Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores. Nat. Commun. 2021 Jul 23; 12(1): 4506. PubMed Abstract | Publisher Full Text | Free Full Text
7. Manzali SB, Yu E, Ravona-Springer R, et al.: Alzheimer’s Disease Polygenic Risk Score Is Not Associated With Cognitive Decline Among Older Adults With Type 2 Diabetes. Front. Aging Neurosci. 2022 [cited 2023 Jul 26]; 14. PubMed Abstract | Publisher Full Text | Free Full Text
8. Park YH, Hodges A, Simmons A, et al.: Association of blood-based transcriptional risk scores with biomarkers for Alzheimer disease. Neurol Genet. 2020 Dec 1; 6(6): e517. PubMed Abstract | Publisher Full Text | Free Full Text
9. Roubroeks JAY, Smith AR, Smith RG, et al.: An epigenome-wide association study of Alzheimer’s disease blood highlights robust DNA hypermethylation in the HOXB6 gene. Neurobiol. Aging. 2020 Nov 1; 95: 26–45. PubMed Abstract | Publisher Full Text | Free Full Text
10. Vasanthakumar A, Davis JW, Idler K, et al.: Harnessing peripheral DNA methylation differences in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) to reveal novel biomarkers of disease. Clin. Epigenetics. 2020 Jun 15; 12(1): 84. PubMed Abstract | Publisher Full Text | Free Full Text
11. Chouliaras L, Pishva E, Haapakoski R, et al.: Peripheral DNA methylation, cognitive decline and brain aging: pilot findings from the Whitehall II imaging study. Epigenomics. 2018 May; 10(5): 585–595. PubMed Abstract | Publisher Full Text | Free Full Text
12. Li QS, Vasanthakumar A, Davis JW, et al.: Association of peripheral blood DNA methylation level with Alzheimer’s disease progression. Clin. Epigenetics. 2021 Oct 15; 13(1): 191. PubMed Abstract | Publisher Full Text | Free Full Text
13. Fransquet PD, Lacaze P, Saffery R, et al.: Blood DNA methylation signatures to detect dementia prior to overt clinical symptoms. Alzheimers Dement. Diagn. Assess. Dis. Monit. 2020; 12(1): e12056. Publisher Full Text
14. Pérez RF, Alba-Linares JJ, Tejedor JR, et al.: Blood DNA Methylation Patterns in Older Adults With Evolving Dementia. J. Gerontol. Ser. A. 2022 Sep 1; 77(9): 1743–1749. PubMed Abstract | Publisher Full Text | Free Full Text
15. Manzali S, Ravona-Springer R, Jacob-Hirsch J, et al.: Blood DNA methylation biomarkers for cognitive decline in older adults with type 2 diabetes. Alzheimers Dement. 2023; 19(S2): e065120. Publisher Full Text
16. Lunnon K, Smith RG, Cooper I, et al.: Blood methylomic signatures of presymptomatic dementia in elderly subjects with type 2 diabetes mellitus. Neurobiol. Aging. 2015 Mar 1; 36(3): 1600.e1–1600.e4. PubMed Abstract | Publisher Full Text | Free Full Text
17. Shadyab AH, McEvoy LK, Horvath S, et al.: Association of Epigenetic Age Acceleration With Incident Mild Cognitive Impairment and Dementia Among Older Women. J. Gerontol. Ser. A. 2022 Jun 1; 77(6): 1239–1244. PubMed Abstract | Publisher Full Text | Free Full Text
18. Fransquet PD, Lacaze P, Saffery R, et al.: Accelerated Epigenetic Aging in Peripheral Blood does not Predict Dementia Risk. Curr. Alzheimer Res. 2021; 18(5): 443–451. PubMed Abstract | Publisher Full Text | Free Full Text
19. Zhou W, Laird PW, Shen H: Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 2017 Feb 28; 45(4): e22. PubMed Abstract | Publisher Full Text
20. McRae AF, Marioni RE, Shah S, et al.: Identification of 55,000 Replicated DNA Methylation QTL. Sci. Rep. 2018 Dec [cited 2019 Mar 25]; 8(1): 17605. PubMed Abstract | Publisher Full Text | Free Full Text Reference Source
21. Nabais MF, Laws SM, Lin T, et al.: Meta-analysis of genome-wide DNA methylation identifies shared associations across neurodegenerative disorders. Genome Biol. 2021 Mar 26; 22(1): 90. PubMed Abstract | Publisher Full Text | Free Full Text
22. Pidsley R, Wong CCY, Volta M, et al.: A data-driven approach to preprocessing Illumina 450K methylation array data. BMCGenomics. 2013; 14(14). Publisher Full Text
23. Friedman JH, Hastie T, Tibshirani R: Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010; 33(1): 1–22. PubMed Abstract
24. Robin X, Turck N, Hainard A, et al.: pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011; 12: 77. PubMed Abstract | Publisher Full Text | Free Full Text
25. Hu H, Li H, Li J, et al.: Alzheimer’s Disease Neuroimaging Initiative. Genome-wide association study identified ATP6V1H locus influencing cerebrospinal fluid BACE activity. BMC Med. Genet. 2018 May 11; 19(1): 75. PubMed Abstract | Publisher Full Text | Free Full Text
26. Hampel H, Vassar R, De Strooper B, et al.: The β-Secretase BACE1 in Alzheimer’s Disease. Biol. Psychiatry. 2021 Apr 15; 89(8): 745–756. PubMed Abstract | Publisher Full Text | Free Full Text
27. Preman P, Alfonso-Triguero M, Alberdi E, et al.: Astrocytes in Alzheimer’s Disease: Pathological Significance and Molecular Pathways. Cell. 2021 Mar 4; 10(3): 540. PubMed Abstract | Publisher Full Text | Free Full Text
28. Roussotte FF, Gutman BA, Hibar DP, et al.: Carriers of a common variant in the dopamine transporter gene have greater dementia risk, cognitive decline, and faster ventricular expansion. Alzheimers Dement. J. Alzheimers Assoc. 2015 Oct; 11(10): 1153–1162. PubMed Abstract | Publisher Full Text | Free Full Text
29. Davis EJ, Solsberg CW, White CC, et al.: Sex-Specific Association of the X Chromosome With Cognitive Change and Tau Pathology in Aging and Alzheimer Disease. JAMA Neurol. 2021 Oct 1; 78(10): 1249–1254. PubMed Abstract | Publisher Full Text | Free Full Text
30. Logue MW, Smith AK, Wolf EJ, et al.: The correlation of methylation levels measured using Illumina 450K and EPIC BeadChips in blood samples. Epigenomics. 2017 Nov; 9(11): 1363–1371. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 01 Sep 2023

Author details Author details

Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA

Competing interests

No competing interests were disclosed.

Grant information

This work was supported by the National Heart, Lung, and Blood Institute (K25 HL136846)
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (2)

version 2

Revised

Published: 15 Mar 2024, 12:1087

https://doi.org/10.12688/f1000research.140403.2

version 1

Published: 01 Sep 2023, 12:1087

https://doi.org/10.12688/f1000research.140403.1

© 2023 Morrow JD. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Morrow JD. Methylation risk score in peripheral blood predictive of conversion from mild cognitive impairment to Alzheimer's Disease [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2023, 12:1087 (https://doi.org/10.12688/f1000research.140403.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 01 Sep 2023

Views

Reviewer Report 06 Feb 2024

Rachel Cavill, Department of Advanced Computing Sciences, Maastricht University, Maastricht, Limburg, The Netherlands

Not Approved

https://doi.org/10.5256/f1000research.153748.r235879

This paper presents an interesting analysis of using methylation data to predict cognitive decline in alzheimers. However, I have serious concerns about the results, in particular with regard to over-fitting of the model.

Potential indications of over-fitting:

* Figure 1 shows the box-plot of the methylation risk scores (MRS) for the different groups of subjects. If this score was a real predictor of cognitive impairment and its development into alzheimers disease (AD), the a priori hypothesis would be that the subjects with AD would have more extreme scores than the mild-converters, this is not shown to be the case, with the AD group having intermediate scores between the mild-converters and the controls.

* The MRS score is based off just 4 methylation sites. The initial full dataset contained >250,000 sites and therefore it would be very easy to find a small subset which is predictive by chance.

* The cross validation AUC is significantly lower (0.635) than the training AUC (0.877), which is a sign of over-fitting the training set, given that the methods describe optimising the cross validation mis-classification error, it seems likely that the cross validation AUC is also over-fitted and an independent test set would display an even lower AUC.

* The discussion that 2/4 CpG sites were previously identified is not evidence backing this up, as this result was obtained in a previous analysis of the same dataset. It would be unusual that two different analyses of the same dataset failed to find similar results (even when the analyses use different methods). A false positive CpG site in one analysis is likely to show up as a false positive in a different analysis on the same dataset.

Individually each of these indications is not conclusive, but put together they indicate a very strong likelihood of overfitting occuring.

* The secondary dataset is not suitable to be used as an independent validation set, the study design is too different.

Additionally, I raise concerns about the description in the methods of "any CpG site with a detection p-value > 0 for any subject was excluded", this would surely exclude all detected CpGs, as a p-value of 0 for detection, surely indicates an undetected CpG site in that sample, and p-values can not go negative.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Data science applied to biological data, in particular, omics data.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Author Response 04 Apr 2024

Jarrett Morrow, Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, USA

04 Apr 2024

Author Response

Reviewer 2:

Updates to the manuscript prompted by the comments and suggestions from Reviewer #2 have greatly helped to improve the manuscript, through both a revised approach and more ... Continue reading Reviewer 2:

Updates to the manuscript prompted by the comments and suggestions from Reviewer #2 have greatly helped to improve the manuscript, through both a revised approach and more clarification regarding the limitations of the study. I hope the Reviewer finds the revised manuscript suitable for approval.

C1. Figure 1 shows the box-plot of the methylation risk scores (MRS) for the different groups of subjects. If this score was a real predictor of cognitive impairment and its development into alzheimers disease (AD), the a priori hypothesis would be that the subjects with AD would have more extreme scores than the mild-converters, this is not shown to be the case, with the AD group having intermediate scores between the mild-converters and the controls.

R1. These comments by the Reviewer regarding an a priori hypothesis are much appreciated. However, with the elastic-net model trained using longitudinal outcomes of decline instead of cross-sectional disease severity, the specific hypothesis proposed by the reviewer was not supported by the findings of the study.

However, in the fourth paragraph of Results, a nominal increase in MRS values with increased severity at baseline was noted. While the observations suggest specificity of the risk score for susceptibility to conversion, a nominally higher score in AD provides evidence to support the spirit of the reviewer comment regarding AD and MCI scores.

C2. The MRS score is based off just 4 methylation sites. The initial full dataset contained >250,000 sites and therefore it would be very easy to find a small subset which is predictive by chance.

R2. This comment from the Reviewer is appreciated. An elastic-net model was the focus of the study, as this method seeks a model with fewer features to help avoid overfitting. In addition, to emphasize biological significance in the model, the variance of DNA methylation was considered. That is, the full set of methylation sites was filtered to approximately 63,000 sites based on variance, as low-variance sites may be less useful in practical applications.

A predictive model leveraging a small number of features is not typically considered of lesser value. For example, a recent Nature Communications (2022) paper, van Breugel and colleagues (PMCID: PMC9715628) created a three CpG site predictor of allergic disease in a cohort of 348 subjects. This is one example of predictive models based on a limited set of features that have been developed in various tissues and complex diseases.

C3. The cross validation AUC is significantly lower (0.635) than the training AUC (0.877), which is a sign of over-fitting the training set, given that the methods describe optimising the cross validation mis-classification error, it seems likely that the cross validation AUC is also over-fitted and an independent test set would display an even lower AUC.

R3. I completely agree with this comment by the Reviewer and thank the Reviewer for the insight. The last sentence in the first paragraph of the Discussion states: "The limited set of selected features, an AUC in the training data of 0.877, and a maximum AUC during cross-validation of 0.635 suggests both higher variance and bias. An MRS may find better utility as a component in a comprehensive conversion susceptibility prediction strategy."

However, while revisiting the model development and results, prompted by the reviewer comments, the approach was modified to include mean squared error as the measure instead of misclassification error. With this change, an alpha of 1 now produced the lower error. In addition, lambda was chosen to reduce overfitting, leading to a three-site model. As a result, the AUC for the training data was 0.843, the average AUC across the ten folds was 0.752 and the out-of-fold AUC was 0.653. Revisions were made to the manuscript to reflect these methods changes. I found the updated results demonstrating improved performance encouraging and I hope the Reviewer also views these revised findings favorably.

C4. The discussion that 2/4 CpG sites were previously identified is not evidence backing this up, as this result was obtained in a previous analysis of the same dataset. It would be unusual that two different analyses of the same dataset failed to find similar results (even when the analyses use different methods). A false positive CpG site in one analysis is likely to show up as a false positive in a different analysis on the same dataset.

R4. These insightful comments from the Reviewer are appreciated. The discussion regarding the selected sites in the original study was not intended to justify the predictive model. However, the discussion of the two sites from the study by Roubroeks and colleagues is relevant to the current study, as it seemed unethical to omit these details and claim the sites as novel findings, particularly given the current study uses publicly available data linked to the Roubroeks et al. study.

C5. The secondary dataset is not suitable to be used as an independent validation set, the study design is too different.

R5. Although the particular aspects of the secondary data raising a concern were not noted by Reviewer #2, this appears to be similar to the second set of comments from Reviewer #1. With the secondary dataset not used to identify replication of score trends observed in the longitudinal primary data, clarifying text was added in the fifth paragraph of the Results to note concordance of the score at baseline in the primary data with the cross-sectional secondary data.

C6. Additionally, I raise concerns about the description in the methods of "any CpG site with a detection p-value > 0 for any subject was excluded", this would surely exclude all detected CpGs, as a p-value of 0 for detection, surely indicates an undetected CpG site in that sample, and p-values can not go negative.

R6. The reviewer noticing this in the manuscript is appreciated. As outlined in my response to Reviewer 1 (response #1), a significant percentage of the overall p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.
Reviewer 2:

Updates to the manuscript prompted by the comments and suggestions from Reviewer #2 have greatly helped to improve the manuscript, through both a revised approach and more clarification regarding the limitations of the study. I hope the Reviewer finds the revised manuscript suitable for approval.

C1. Figure 1 shows the box-plot of the methylation risk scores (MRS) for the different groups of subjects. If this score was a real predictor of cognitive impairment and its development into alzheimers disease (AD), the a priori hypothesis would be that the subjects with AD would have more extreme scores than the mild-converters, this is not shown to be the case, with the AD group having intermediate scores between the mild-converters and the controls.

R1. These comments by the Reviewer regarding an a priori hypothesis are much appreciated. However, with the elastic-net model trained using longitudinal outcomes of decline instead of cross-sectional disease severity, the specific hypothesis proposed by the reviewer was not supported by the findings of the study.

However, in the fourth paragraph of Results, a nominal increase in MRS values with increased severity at baseline was noted. While the observations suggest specificity of the risk score for susceptibility to conversion, a nominally higher score in AD provides evidence to support the spirit of the reviewer comment regarding AD and MCI scores.

C2. The MRS score is based off just 4 methylation sites. The initial full dataset contained >250,000 sites and therefore it would be very easy to find a small subset which is predictive by chance.

R2. This comment from the Reviewer is appreciated. An elastic-net model was the focus of the study, as this method seeks a model with fewer features to help avoid overfitting. In addition, to emphasize biological significance in the model, the variance of DNA methylation was considered. That is, the full set of methylation sites was filtered to approximately 63,000 sites based on variance, as low-variance sites may be less useful in practical applications.

A predictive model leveraging a small number of features is not typically considered of lesser value. For example, a recent Nature Communications (2022) paper, van Breugel and colleagues (PMCID: PMC9715628) created a three CpG site predictor of allergic disease in a cohort of 348 subjects. This is one example of predictive models based on a limited set of features that have been developed in various tissues and complex diseases.

C3. The cross validation AUC is significantly lower (0.635) than the training AUC (0.877), which is a sign of over-fitting the training set, given that the methods describe optimising the cross validation mis-classification error, it seems likely that the cross validation AUC is also over-fitted and an independent test set would display an even lower AUC.

R3. I completely agree with this comment by the Reviewer and thank the Reviewer for the insight. The last sentence in the first paragraph of the Discussion states: "The limited set of selected features, an AUC in the training data of 0.877, and a maximum AUC during cross-validation of 0.635 suggests both higher variance and bias. An MRS may find better utility as a component in a comprehensive conversion susceptibility prediction strategy."

However, while revisiting the model development and results, prompted by the reviewer comments, the approach was modified to include mean squared error as the measure instead of misclassification error. With this change, an alpha of 1 now produced the lower error. In addition, lambda was chosen to reduce overfitting, leading to a three-site model. As a result, the AUC for the training data was 0.843, the average AUC across the ten folds was 0.752 and the out-of-fold AUC was 0.653. Revisions were made to the manuscript to reflect these methods changes. I found the updated results demonstrating improved performance encouraging and I hope the Reviewer also views these revised findings favorably.

C4. The discussion that 2/4 CpG sites were previously identified is not evidence backing this up, as this result was obtained in a previous analysis of the same dataset. It would be unusual that two different analyses of the same dataset failed to find similar results (even when the analyses use different methods). A false positive CpG site in one analysis is likely to show up as a false positive in a different analysis on the same dataset.

R4. These insightful comments from the Reviewer are appreciated. The discussion regarding the selected sites in the original study was not intended to justify the predictive model. However, the discussion of the two sites from the study by Roubroeks and colleagues is relevant to the current study, as it seemed unethical to omit these details and claim the sites as novel findings, particularly given the current study uses publicly available data linked to the Roubroeks et al. study.

C5. The secondary dataset is not suitable to be used as an independent validation set, the study design is too different.

R5. Although the particular aspects of the secondary data raising a concern were not noted by Reviewer #2, this appears to be similar to the second set of comments from Reviewer #1. With the secondary dataset not used to identify replication of score trends observed in the longitudinal primary data, clarifying text was added in the fifth paragraph of the Results to note concordance of the score at baseline in the primary data with the cross-sectional secondary data.

C6. Additionally, I raise concerns about the description in the methods of "any CpG site with a detection p-value > 0 for any subject was excluded", this would surely exclude all detected CpGs, as a p-value of 0 for detection, surely indicates an undetected CpG site in that sample, and p-values can not go negative.

R6. The reviewer noticing this in the manuscript is appreciated. As outlined in my response to Reviewer 1 (response #1), a significant percentage of the overall p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 04 Apr 2024

Jarrett Morrow, Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, USA

04 Apr 2024

Author Response

Reviewer 2:

Updates to the manuscript prompted by the comments and suggestions from Reviewer #2 have greatly helped to improve the manuscript, through both a revised approach and more ... Continue reading Reviewer 2:

Updates to the manuscript prompted by the comments and suggestions from Reviewer #2 have greatly helped to improve the manuscript, through both a revised approach and more clarification regarding the limitations of the study. I hope the Reviewer finds the revised manuscript suitable for approval.

C1. Figure 1 shows the box-plot of the methylation risk scores (MRS) for the different groups of subjects. If this score was a real predictor of cognitive impairment and its development into alzheimers disease (AD), the a priori hypothesis would be that the subjects with AD would have more extreme scores than the mild-converters, this is not shown to be the case, with the AD group having intermediate scores between the mild-converters and the controls.

R1. These comments by the Reviewer regarding an a priori hypothesis are much appreciated. However, with the elastic-net model trained using longitudinal outcomes of decline instead of cross-sectional disease severity, the specific hypothesis proposed by the reviewer was not supported by the findings of the study.

However, in the fourth paragraph of Results, a nominal increase in MRS values with increased severity at baseline was noted. While the observations suggest specificity of the risk score for susceptibility to conversion, a nominally higher score in AD provides evidence to support the spirit of the reviewer comment regarding AD and MCI scores.

C2. The MRS score is based off just 4 methylation sites. The initial full dataset contained >250,000 sites and therefore it would be very easy to find a small subset which is predictive by chance.

R2. This comment from the Reviewer is appreciated. An elastic-net model was the focus of the study, as this method seeks a model with fewer features to help avoid overfitting. In addition, to emphasize biological significance in the model, the variance of DNA methylation was considered. That is, the full set of methylation sites was filtered to approximately 63,000 sites based on variance, as low-variance sites may be less useful in practical applications.

A predictive model leveraging a small number of features is not typically considered of lesser value. For example, a recent Nature Communications (2022) paper, van Breugel and colleagues (PMCID: PMC9715628) created a three CpG site predictor of allergic disease in a cohort of 348 subjects. This is one example of predictive models based on a limited set of features that have been developed in various tissues and complex diseases.

C3. The cross validation AUC is significantly lower (0.635) than the training AUC (0.877), which is a sign of over-fitting the training set, given that the methods describe optimising the cross validation mis-classification error, it seems likely that the cross validation AUC is also over-fitted and an independent test set would display an even lower AUC.

R3. I completely agree with this comment by the Reviewer and thank the Reviewer for the insight. The last sentence in the first paragraph of the Discussion states: "The limited set of selected features, an AUC in the training data of 0.877, and a maximum AUC during cross-validation of 0.635 suggests both higher variance and bias. An MRS may find better utility as a component in a comprehensive conversion susceptibility prediction strategy."

However, while revisiting the model development and results, prompted by the reviewer comments, the approach was modified to include mean squared error as the measure instead of misclassification error. With this change, an alpha of 1 now produced the lower error. In addition, lambda was chosen to reduce overfitting, leading to a three-site model. As a result, the AUC for the training data was 0.843, the average AUC across the ten folds was 0.752 and the out-of-fold AUC was 0.653. Revisions were made to the manuscript to reflect these methods changes. I found the updated results demonstrating improved performance encouraging and I hope the Reviewer also views these revised findings favorably.

C4. The discussion that 2/4 CpG sites were previously identified is not evidence backing this up, as this result was obtained in a previous analysis of the same dataset. It would be unusual that two different analyses of the same dataset failed to find similar results (even when the analyses use different methods). A false positive CpG site in one analysis is likely to show up as a false positive in a different analysis on the same dataset.

R4. These insightful comments from the Reviewer are appreciated. The discussion regarding the selected sites in the original study was not intended to justify the predictive model. However, the discussion of the two sites from the study by Roubroeks and colleagues is relevant to the current study, as it seemed unethical to omit these details and claim the sites as novel findings, particularly given the current study uses publicly available data linked to the Roubroeks et al. study.

C5. The secondary dataset is not suitable to be used as an independent validation set, the study design is too different.

R5. Although the particular aspects of the secondary data raising a concern were not noted by Reviewer #2, this appears to be similar to the second set of comments from Reviewer #1. With the secondary dataset not used to identify replication of score trends observed in the longitudinal primary data, clarifying text was added in the fifth paragraph of the Results to note concordance of the score at baseline in the primary data with the cross-sectional secondary data.

C6. Additionally, I raise concerns about the description in the methods of "any CpG site with a detection p-value > 0 for any subject was excluded", this would surely exclude all detected CpGs, as a p-value of 0 for detection, surely indicates an undetected CpG site in that sample, and p-values can not go negative.

R6. The reviewer noticing this in the manuscript is appreciated. As outlined in my response to Reviewer 1 (response #1), a significant percentage of the overall p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.
Reviewer 2:

Updates to the manuscript prompted by the comments and suggestions from Reviewer #2 have greatly helped to improve the manuscript, through both a revised approach and more clarification regarding the limitations of the study. I hope the Reviewer finds the revised manuscript suitable for approval.

C1. Figure 1 shows the box-plot of the methylation risk scores (MRS) for the different groups of subjects. If this score was a real predictor of cognitive impairment and its development into alzheimers disease (AD), the a priori hypothesis would be that the subjects with AD would have more extreme scores than the mild-converters, this is not shown to be the case, with the AD group having intermediate scores between the mild-converters and the controls.

R1. These comments by the Reviewer regarding an a priori hypothesis are much appreciated. However, with the elastic-net model trained using longitudinal outcomes of decline instead of cross-sectional disease severity, the specific hypothesis proposed by the reviewer was not supported by the findings of the study.

However, in the fourth paragraph of Results, a nominal increase in MRS values with increased severity at baseline was noted. While the observations suggest specificity of the risk score for susceptibility to conversion, a nominally higher score in AD provides evidence to support the spirit of the reviewer comment regarding AD and MCI scores.

C2. The MRS score is based off just 4 methylation sites. The initial full dataset contained >250,000 sites and therefore it would be very easy to find a small subset which is predictive by chance.

R2. This comment from the Reviewer is appreciated. An elastic-net model was the focus of the study, as this method seeks a model with fewer features to help avoid overfitting. In addition, to emphasize biological significance in the model, the variance of DNA methylation was considered. That is, the full set of methylation sites was filtered to approximately 63,000 sites based on variance, as low-variance sites may be less useful in practical applications.

A predictive model leveraging a small number of features is not typically considered of lesser value. For example, a recent Nature Communications (2022) paper, van Breugel and colleagues (PMCID: PMC9715628) created a three CpG site predictor of allergic disease in a cohort of 348 subjects. This is one example of predictive models based on a limited set of features that have been developed in various tissues and complex diseases.

C3. The cross validation AUC is significantly lower (0.635) than the training AUC (0.877), which is a sign of over-fitting the training set, given that the methods describe optimising the cross validation mis-classification error, it seems likely that the cross validation AUC is also over-fitted and an independent test set would display an even lower AUC.

R3. I completely agree with this comment by the Reviewer and thank the Reviewer for the insight. The last sentence in the first paragraph of the Discussion states: "The limited set of selected features, an AUC in the training data of 0.877, and a maximum AUC during cross-validation of 0.635 suggests both higher variance and bias. An MRS may find better utility as a component in a comprehensive conversion susceptibility prediction strategy."

However, while revisiting the model development and results, prompted by the reviewer comments, the approach was modified to include mean squared error as the measure instead of misclassification error. With this change, an alpha of 1 now produced the lower error. In addition, lambda was chosen to reduce overfitting, leading to a three-site model. As a result, the AUC for the training data was 0.843, the average AUC across the ten folds was 0.752 and the out-of-fold AUC was 0.653. Revisions were made to the manuscript to reflect these methods changes. I found the updated results demonstrating improved performance encouraging and I hope the Reviewer also views these revised findings favorably.

C4. The discussion that 2/4 CpG sites were previously identified is not evidence backing this up, as this result was obtained in a previous analysis of the same dataset. It would be unusual that two different analyses of the same dataset failed to find similar results (even when the analyses use different methods). A false positive CpG site in one analysis is likely to show up as a false positive in a different analysis on the same dataset.

R4. These insightful comments from the Reviewer are appreciated. The discussion regarding the selected sites in the original study was not intended to justify the predictive model. However, the discussion of the two sites from the study by Roubroeks and colleagues is relevant to the current study, as it seemed unethical to omit these details and claim the sites as novel findings, particularly given the current study uses publicly available data linked to the Roubroeks et al. study.

C5. The secondary dataset is not suitable to be used as an independent validation set, the study design is too different.

R5. Although the particular aspects of the secondary data raising a concern were not noted by Reviewer #2, this appears to be similar to the second set of comments from Reviewer #1. With the secondary dataset not used to identify replication of score trends observed in the longitudinal primary data, clarifying text was added in the fifth paragraph of the Results to note concordance of the score at baseline in the primary data with the cross-sectional secondary data.

C6. Additionally, I raise concerns about the description in the methods of "any CpG site with a detection p-value > 0 for any subject was excluded", this would surely exclude all detected CpGs, as a p-value of 0 for detection, surely indicates an undetected CpG site in that sample, and p-values can not go negative.

R6. The reviewer noticing this in the manuscript is appreciated. As outlined in my response to Reviewer 1 (response #1), a significant percentage of the overall p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 19 Dec 2023

Adam Smith, University of Exeter Medical School, Exeter, England, UK

Approved with Reservations

https://doi.org/

Morrow develops a methylation risk score (MRS), based on publicly available blood DNA methylation data, with the aim to predict progression from mild cognitive impairment (MCI) to Alzheimer’s disease (AD). The author uses machine learning methodology and identifies methylation levels at four genomic loci that, in combination with age and sex, form the final MRS for conversion to AD. This study is novel and has the potential to elucidate a useful conversion susceptibility prediction method, either alone or in conjunction with other measures. However, at this stage there are considerable limitations and improvements that need to be addressed before I can recommend publication. Given the limitations including cohort size and lack of validation, the conclusions drawn are overstated.
Major Revisions:

Detection p value >0 led to exclusion of that probe. I would suggest that this is a typo or rounding error as a threshold of p>0 would yield 0 probes for downstream processing.
Secondary dataset does not have suitable data to replicate model derived from primary data. The application of MRS onto this cohort gives little to no validation of the model. Further work is needed using an alternative dataset or splitting the initial cohort into a design vs. test approach to better validate the score.
Para 13 – last line, “The MRS model leverages DNAm differences in the TSS of PNCK that are concordant across females and males.” This is incorrect looking at the beta vales presented in Figure S8, this site shows a beta difference of 0.05 (5%) in controls between males and females with a similar difference seen across disease states. This difference is considerably larger than the differences seen between disease states and is a strong argument for the removal of sex chromosomes from the analysis.
Comment on the absence of age information in secondary cohort, was this information not needed for the final MRS calculation in this cohort?

Minor Revisions:

Para 1 – line 5, “stage of AD progression and the…” add "progression" to improve readability of this sentence .
Para 2 – line 1, “on the development” add "the" to improve readability of this sentence.
Para 7 – line 3, “seeking an epigenetic…” Suggest “To enable an epigenetic prediction model that is accurate despite shifts in cell abundance, blood cell distribution values were not included in the model.”
Para 13 – line 2, “These two sites were their top two findings in the EWAS of MCI conversion, though neither site was statistically significant.” Sites were significant but did not reach multiple testing correction threshold.
S6 ROC curve to be moved to main paper and legend expanded.
Table 1 moved to supplementary.
Table 2, “#genes with nearby TSS” – define nearby.
Figure 1, a suitable legend is needed, with statistical significance quoted for all analyses.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Epigenetics, differential methylation.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 04 Apr 2024

Jarrett Morrow, Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, USA

04 Apr 2024

Author Response

Reviewer 1:

The comments and suggestions from Reviewer #1 have greatly helped to improve the manuscript and I hope the Reviewer finds the revised manuscript suitable for approval.

... Continue reading Reviewer 1:

The comments and suggestions from Reviewer #1 have greatly helped to improve the manuscript and I hope the Reviewer finds the revised manuscript suitable for approval.

Major Revisions:

C1. Detection p value >0 led to exclusion of that probe. I would suggest that this is a typo or rounding error as a threshold of p>0 would yield 0 probes for downstream processing.

R1. The Reviewer noticing this in the manuscript is appreciated. A significant percentage of the p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.

C2. Secondary dataset does not have suitable data to replicate model derived from primary data. The application of MRS onto this cohort gives little to no validation of the model. Further work is needed using an alternative dataset or splitting the initial cohort into a design vs. test approach to better validate the score.

R2. The Reviewer is correct. The secondary dataset was not used to identify replication of the trends observed in the longitudinal primary data. This was mentioned in the limitations “The secondary population for MRS evaluation lacked longitudinal outcome data.”. However, to clarify this further, the fifth paragraph of Results was edited to note concordance of the score at baseline in the primary with the cross-sectional secondary data.

Given the smaller dataset, a cross-validation approach was used to create within-cohort validation. In future studies within larger cohorts, an 80/20 split would be feasible as outlined in the discussion. With the predictive ability of the score limited, multi-omic scores in larger populations may be a better approach as mentioned in the conclusions.

Please also note the updates to the methods and revised model described in the response to Reviewer #2.

C3. Para 13 – last line, “The MRS model leverages DNAm differences in the TSS of PNCK that are concordant across females and males.” This is incorrect looking at the beta vales presented in Figure S8, this site shows a beta difference of 0.05 (5%) in controls between males and females with a similar difference seen across disease states. This difference is considerably larger than the differences seen between disease states and is a strong argument for the removal of sex chromosomes from the analysis.

R3. These insightful comments from the Reviewer are appreciated. Based on reviewer comments, the model was revised and the model now includes three of the same CpG sites. However, the PNCK site is no longer included. Please see the response to Reviewer #2 for additional details.

Constructing sex-stratified MRS models could be an effective approach to address similar issues in future studies.

C4. Comment on the absence of age information in secondary cohort, was this information not needed for the final MRS calculation in this cohort?

R4. The request for the clarification is appreciated. Subject age was not retained in the elastic-net model for the primary/discovery dataset, as mentioned in the third paragraph of the Results. Only the three CpG sites in Table 1 were included in the final MRS model.

Minor Revisions:

C1. Para 1 – line 5, “stage of AD progression and the…” add "progression" to improve readability of this sentence.

R1. The sentence has been improved by this suggested revision.

C2. Para 2 – line 1, “on the development” add "the" to improve readability of this sentence.

R2. The sentence has been improved by this suggested revision.

C3. Para 7 – line 3, “seeking an epigenetic…” Suggest “To enable an epigenetic prediction model that is accurate despite shifts in cell abundance, blood cell distribution values were not included in the model.”

R3. The sentence has been improved by this suggested revision.

C4. Para 13 – line 2, “These two sites were their top two findings in the EWAS of MCI conversion, though neither site was statistically significant.” Sites were significant but did not reach multiple testing correction threshold.

R4. The revised manuscript includes a mention of multiple testing correction.

C5. S6 ROC curve to be moved to main paper and legend expanded.

R5. The revised Supplemental Figure S6 is now Figure 1 in the main document with additional information in the caption.

C6. Table 1 moved to supplementary.

R6. Table 1 is now Supplemental Table S2.

C7. Table 2, “#genes with nearby TSS” – define nearby.

R7. This information has been added to Table 1 (formerly Table 2).

C8. Figure 1, a suitable legend is needed, with statistical significance quoted for all analyses.

R8. A more detailed caption for Figure 2 (formerly Figure 1 – now updated using revised model) has been included in the revised manuscript.
Reviewer 1:

The comments and suggestions from Reviewer #1 have greatly helped to improve the manuscript and I hope the Reviewer finds the revised manuscript suitable for approval.

Major Revisions:

C1. Detection p value >0 led to exclusion of that probe. I would suggest that this is a typo or rounding error as a threshold of p>0 would yield 0 probes for downstream processing.

R1. The Reviewer noticing this in the manuscript is appreciated. A significant percentage of the p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.

C2. Secondary dataset does not have suitable data to replicate model derived from primary data. The application of MRS onto this cohort gives little to no validation of the model. Further work is needed using an alternative dataset or splitting the initial cohort into a design vs. test approach to better validate the score.

R2. The Reviewer is correct. The secondary dataset was not used to identify replication of the trends observed in the longitudinal primary data. This was mentioned in the limitations “The secondary population for MRS evaluation lacked longitudinal outcome data.”. However, to clarify this further, the fifth paragraph of Results was edited to note concordance of the score at baseline in the primary with the cross-sectional secondary data.

Given the smaller dataset, a cross-validation approach was used to create within-cohort validation. In future studies within larger cohorts, an 80/20 split would be feasible as outlined in the discussion. With the predictive ability of the score limited, multi-omic scores in larger populations may be a better approach as mentioned in the conclusions.

Please also note the updates to the methods and revised model described in the response to Reviewer #2.

C3. Para 13 – last line, “The MRS model leverages DNAm differences in the TSS of PNCK that are concordant across females and males.” This is incorrect looking at the beta vales presented in Figure S8, this site shows a beta difference of 0.05 (5%) in controls between males and females with a similar difference seen across disease states. This difference is considerably larger than the differences seen between disease states and is a strong argument for the removal of sex chromosomes from the analysis.

R3. These insightful comments from the Reviewer are appreciated. Based on reviewer comments, the model was revised and the model now includes three of the same CpG sites. However, the PNCK site is no longer included. Please see the response to Reviewer #2 for additional details.

Constructing sex-stratified MRS models could be an effective approach to address similar issues in future studies.

C4. Comment on the absence of age information in secondary cohort, was this information not needed for the final MRS calculation in this cohort?

R4. The request for the clarification is appreciated. Subject age was not retained in the elastic-net model for the primary/discovery dataset, as mentioned in the third paragraph of the Results. Only the three CpG sites in Table 1 were included in the final MRS model.

Minor Revisions:

C1. Para 1 – line 5, “stage of AD progression and the…” add "progression" to improve readability of this sentence.

R1. The sentence has been improved by this suggested revision.

C2. Para 2 – line 1, “on the development” add "the" to improve readability of this sentence.

R2. The sentence has been improved by this suggested revision.

C3. Para 7 – line 3, “seeking an epigenetic…” Suggest “To enable an epigenetic prediction model that is accurate despite shifts in cell abundance, blood cell distribution values were not included in the model.”

R3. The sentence has been improved by this suggested revision.

C4. Para 13 – line 2, “These two sites were their top two findings in the EWAS of MCI conversion, though neither site was statistically significant.” Sites were significant but did not reach multiple testing correction threshold.

R4. The revised manuscript includes a mention of multiple testing correction.

C5. S6 ROC curve to be moved to main paper and legend expanded.

R5. The revised Supplemental Figure S6 is now Figure 1 in the main document with additional information in the caption.

C6. Table 1 moved to supplementary.

R6. Table 1 is now Supplemental Table S2.

C7. Table 2, “#genes with nearby TSS” – define nearby.

R7. This information has been added to Table 1 (formerly Table 2).

C8. Figure 1, a suitable legend is needed, with statistical significance quoted for all analyses.

R8. A more detailed caption for Figure 2 (formerly Figure 1 – now updated using revised model) has been included in the revised manuscript.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 04 Apr 2024

Jarrett Morrow, Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, USA

04 Apr 2024

Author Response

Reviewer 1:

The comments and suggestions from Reviewer #1 have greatly helped to improve the manuscript and I hope the Reviewer finds the revised manuscript suitable for approval.

... Continue reading Reviewer 1:

The comments and suggestions from Reviewer #1 have greatly helped to improve the manuscript and I hope the Reviewer finds the revised manuscript suitable for approval.

Major Revisions:

C1. Detection p value >0 led to exclusion of that probe. I would suggest that this is a typo or rounding error as a threshold of p>0 would yield 0 probes for downstream processing.

R1. The Reviewer noticing this in the manuscript is appreciated. A significant percentage of the p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.

C2. Secondary dataset does not have suitable data to replicate model derived from primary data. The application of MRS onto this cohort gives little to no validation of the model. Further work is needed using an alternative dataset or splitting the initial cohort into a design vs. test approach to better validate the score.

R2. The Reviewer is correct. The secondary dataset was not used to identify replication of the trends observed in the longitudinal primary data. This was mentioned in the limitations “The secondary population for MRS evaluation lacked longitudinal outcome data.”. However, to clarify this further, the fifth paragraph of Results was edited to note concordance of the score at baseline in the primary with the cross-sectional secondary data.

Given the smaller dataset, a cross-validation approach was used to create within-cohort validation. In future studies within larger cohorts, an 80/20 split would be feasible as outlined in the discussion. With the predictive ability of the score limited, multi-omic scores in larger populations may be a better approach as mentioned in the conclusions.

Please also note the updates to the methods and revised model described in the response to Reviewer #2.

C3. Para 13 – last line, “The MRS model leverages DNAm differences in the TSS of PNCK that are concordant across females and males.” This is incorrect looking at the beta vales presented in Figure S8, this site shows a beta difference of 0.05 (5%) in controls between males and females with a similar difference seen across disease states. This difference is considerably larger than the differences seen between disease states and is a strong argument for the removal of sex chromosomes from the analysis.

R3. These insightful comments from the Reviewer are appreciated. Based on reviewer comments, the model was revised and the model now includes three of the same CpG sites. However, the PNCK site is no longer included. Please see the response to Reviewer #2 for additional details.

Constructing sex-stratified MRS models could be an effective approach to address similar issues in future studies.

C4. Comment on the absence of age information in secondary cohort, was this information not needed for the final MRS calculation in this cohort?

R4. The request for the clarification is appreciated. Subject age was not retained in the elastic-net model for the primary/discovery dataset, as mentioned in the third paragraph of the Results. Only the three CpG sites in Table 1 were included in the final MRS model.

Minor Revisions:

C1. Para 1 – line 5, “stage of AD progression and the…” add "progression" to improve readability of this sentence.

R1. The sentence has been improved by this suggested revision.

C2. Para 2 – line 1, “on the development” add "the" to improve readability of this sentence.

R2. The sentence has been improved by this suggested revision.

C3. Para 7 – line 3, “seeking an epigenetic…” Suggest “To enable an epigenetic prediction model that is accurate despite shifts in cell abundance, blood cell distribution values were not included in the model.”

R3. The sentence has been improved by this suggested revision.

C4. Para 13 – line 2, “These two sites were their top two findings in the EWAS of MCI conversion, though neither site was statistically significant.” Sites were significant but did not reach multiple testing correction threshold.

R4. The revised manuscript includes a mention of multiple testing correction.

C5. S6 ROC curve to be moved to main paper and legend expanded.

R5. The revised Supplemental Figure S6 is now Figure 1 in the main document with additional information in the caption.

C6. Table 1 moved to supplementary.

R6. Table 1 is now Supplemental Table S2.

C7. Table 2, “#genes with nearby TSS” – define nearby.

R7. This information has been added to Table 1 (formerly Table 2).

C8. Figure 1, a suitable legend is needed, with statistical significance quoted for all analyses.

R8. A more detailed caption for Figure 2 (formerly Figure 1 – now updated using revised model) has been included in the revised manuscript.
Reviewer 1:

The comments and suggestions from Reviewer #1 have greatly helped to improve the manuscript and I hope the Reviewer finds the revised manuscript suitable for approval.

Major Revisions:

C1. Detection p value >0 led to exclusion of that probe. I would suggest that this is a typo or rounding error as a threshold of p>0 would yield 0 probes for downstream processing.

R1. The Reviewer noticing this in the manuscript is appreciated. A significant percentage of the p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.

C2. Secondary dataset does not have suitable data to replicate model derived from primary data. The application of MRS onto this cohort gives little to no validation of the model. Further work is needed using an alternative dataset or splitting the initial cohort into a design vs. test approach to better validate the score.

R2. The Reviewer is correct. The secondary dataset was not used to identify replication of the trends observed in the longitudinal primary data. This was mentioned in the limitations “The secondary population for MRS evaluation lacked longitudinal outcome data.”. However, to clarify this further, the fifth paragraph of Results was edited to note concordance of the score at baseline in the primary with the cross-sectional secondary data.

Given the smaller dataset, a cross-validation approach was used to create within-cohort validation. In future studies within larger cohorts, an 80/20 split would be feasible as outlined in the discussion. With the predictive ability of the score limited, multi-omic scores in larger populations may be a better approach as mentioned in the conclusions.

Please also note the updates to the methods and revised model described in the response to Reviewer #2.

C3. Para 13 – last line, “The MRS model leverages DNAm differences in the TSS of PNCK that are concordant across females and males.” This is incorrect looking at the beta vales presented in Figure S8, this site shows a beta difference of 0.05 (5%) in controls between males and females with a similar difference seen across disease states. This difference is considerably larger than the differences seen between disease states and is a strong argument for the removal of sex chromosomes from the analysis.

R3. These insightful comments from the Reviewer are appreciated. Based on reviewer comments, the model was revised and the model now includes three of the same CpG sites. However, the PNCK site is no longer included. Please see the response to Reviewer #2 for additional details.

Constructing sex-stratified MRS models could be an effective approach to address similar issues in future studies.

C4. Comment on the absence of age information in secondary cohort, was this information not needed for the final MRS calculation in this cohort?

R4. The request for the clarification is appreciated. Subject age was not retained in the elastic-net model for the primary/discovery dataset, as mentioned in the third paragraph of the Results. Only the three CpG sites in Table 1 were included in the final MRS model.

Minor Revisions:

C1. Para 1 – line 5, “stage of AD progression and the…” add "progression" to improve readability of this sentence.

R1. The sentence has been improved by this suggested revision.

C2. Para 2 – line 1, “on the development” add "the" to improve readability of this sentence.

R2. The sentence has been improved by this suggested revision.

C3. Para 7 – line 3, “seeking an epigenetic…” Suggest “To enable an epigenetic prediction model that is accurate despite shifts in cell abundance, blood cell distribution values were not included in the model.”

R3. The sentence has been improved by this suggested revision.

C4. Para 13 – line 2, “These two sites were their top two findings in the EWAS of MCI conversion, though neither site was statistically significant.” Sites were significant but did not reach multiple testing correction threshold.

R4. The revised manuscript includes a mention of multiple testing correction.

C5. S6 ROC curve to be moved to main paper and legend expanded.

R5. The revised Supplemental Figure S6 is now Figure 1 in the main document with additional information in the caption.

C6. Table 1 moved to supplementary.

R6. Table 1 is now Supplemental Table S2.

C7. Table 2, “#genes with nearby TSS” – define nearby.

R7. This information has been added to Table 1 (formerly Table 2).

C8. Figure 1, a suitable legend is needed, with statistical significance quoted for all analyses.

R8. A more detailed caption for Figure 2 (formerly Figure 1 – now updated using revised model) has been included in the revised manuscript.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 01 Sep 2023

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 2 (revision) 15 Mar 24	read		read
Version 1 01 Sep 23	read	read

Adam Smith, University of Exeter Medical School, Exeter, UK
Rachel Cavill, Maastricht University, Maastricht, The Netherlands
Lily Wang, University of Miami Miller School of Medicine, Miami, USA

Wei Zhang, University of Miami Miller School of Medicine, Miami, USA

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

18 Views

01 Aug 2024 | for Version 2

Lily Wang, University of Miami Miller School of Medicine, Miami, FL, USA

Wei Zhang, University of Miami Miller School of Medicine, Miami, FL, USA

18 Views Cite this report Responses(0)

Approved With Reservations

Summary
Morrow et al. (2024) developed a methylation risk score to predict the conversion from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). While the study addresses a significant topic, several methodological issues should be addressed to support the validity and reproducibility of the findings.

Major Points

Normalization of DNA Methylation Data:
- It is crucial to exclude sex chromosomes during normalization because females are expected to have significantly higher methylation levels on the X chromosome due to X-chromosome inactivation.
Stringency in Filtering CpGs:
- The decision to remove CpGs with detection P > 0 seems overly stringent. A more reasonable threshold should be considered to ensure that potentially informative CpGs are not excluded unnecessarily.
Reproducibility and Code Availability:
- To promote transparency and reproducibility, the analysis code used in this study should be deposited in a public repository such as GitHub or Zenodo.
Appropriateness of the Testing Dataset:
- The testing dataset used in this study is cross-sectional, which limits its utility for predicting disease progression. A longitudinal dataset, such as the ADNI dataset, would be more suitable. The ADNI dataset, as described in Vasanthakumar et al. (2020)(refer 1), is accessible at https://adni.loni.usc.edu/ (subject to data use approval).
CpGs with mQTL Associations:
- Among the three CpGs listed in Table 1, two CpGs appear to have methylation Quantitative Trait Loci (mQTL) associated with them, according to Min et al. (2021) (PMID: 34493871)(refer 2). The relevant associations can be explored at http://mqtldb.godmc.org.uk/search.php?query=cg25342005 and http://mqtldb.godmc.org.uk/search.php?query=cg0989212
Overfitting Concerns:
- There is a concern about the potential overfitting of the prediction model. As a comparison, the author could randomly sample a set of three CpGs and build a prediction model using the AddNeuroMed dataset. By comparing the reported model with these randomly selected 3-CpG models, it would be possible to determine how many of these models exhibit a larger Area Under the Curve (AUC) than the one reported in the manuscript.
Replication of CpGs in Independent Studies:
- In the Introduction, the authors cite several references (ref 12-16). It would be valuable to know if any of the CpGs identified in this study replicate findings from these independent studies.

Minor Points

Workflow Figure:
- A workflow figure that outlines the various steps used in the analysis, both for training and testing samples, would greatly assist readers in understanding the methodology described in this manuscript.
Figure Legends:
- The legends for the figures in both the main manuscript and the Supplementary File should be comprehensive. They need to be understandable as standalone descriptions, please include details on the dataset used and the model applied, at a minimum.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

References

1. Vasanthakumar A, Davis JW, Idler K, Waring JF, et al.: Harnessing peripheral DNA methylation differences in the Alzheimer's Disease Neuroimaging Initiative (ADNI) to reveal novel biomarkers of disease.Clin Epigenetics. 2020; 12 (1): 84 PubMed Abstract | Publisher Full Text
2. Min JL, Hemani G, Hannon E, Dekkers KF, et al.: Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation.Nat Genet. 2021; 53 (9): 1311-1321 PubMed Abstract | Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

statistical modeling, epigenomics analysis

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

22 Views

02 May 2024 | for Version 2

Adam Smith, University of Exeter Medical School, Exeter, England, UK

22 Views Cite this report Responses(0)

Approved With Reservations

Morrow has addressed some of the concerns raised by myself and other reviewers, however I still find the study needs some revisions.

Further work is needed to address the point that detection P value for methylation assays such as this cannot go below 0, therefore the statement “Prior to the analyses in this study, any CpG site with a detection p-value > 0 for any subject was removed” is fundamentally incorrect as it would result in 0 datapoints for downstream analysis. I imagine this is a rounding error taken from the GEO website as the detection P values are usually very close to 0. I would recommend using the same detection P value cutoff used by Roubroeks et al.
Secondary dataset does not have suitable data to replicate model derived from primary data. The application of MRS onto this cohort gives little to no validation of the model. I can accept the author's comment that the primary dataset is not of sufficient size to csplit and perform a design vs. test approach to better validate the score. However, given the current data the statement “This score was evaluated in cross sectional data in AD subjects and controls in both the primary and a secondary datasets to help understand the relationship of the conversion risk score to overall disease severity.” is inaccurate. Author's response that the values at baseline show concordance gives little to no justification given the differences in the cohorts.
Given the absence of MCI-AD progression information in the secondary cohort, Figure S8 reveals no extra information and I recommend that this is removed.
It would be beneficial to see the effect of adding Age and Sex information to the predictive ability of the MRS for the initial cohort, including the effect on ROC.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Epigenetics, differential methylation

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

44 Views

06 Feb 2024 | for Version 1

Rachel Cavill, Department of Advanced Computing Sciences, Maastricht University, Maastricht, Limburg, The Netherlands

44 Views Cite this report Responses(1)

Not Approved

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Data science applied to biological data, in particular, omics data.

Respond to this report

Responses (1)

Author Response

04 Apr 2024

Jarrett Morrow, Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, USA

Reviewer 2:

Updates to the manuscript prompted by the comments and suggestions from Reviewer #2 have greatly helped to improve the manuscript, through both a revised approach and more clarification regarding the limitations of the study. I hope the Reviewer finds the revised manuscript suitable for approval.

C1. Figure 1 shows the box-plot of the methylation risk scores (MRS) for the different groups of subjects. If this score was a real predictor of cognitive impairment and its development into alzheimers disease (AD), the a priori hypothesis would be that the subjects with AD would have more extreme scores than the mild-converters, this is not shown to be the case, with the AD group having intermediate scores between the mild-converters and the controls.

R1. These comments by the Reviewer regarding an a priori hypothesis are much appreciated. However, with the elastic-net model trained using longitudinal outcomes of decline instead of cross-sectional disease severity, the specific hypothesis proposed by the reviewer was not supported by the findings of the study.

However, in the fourth paragraph of Results, a nominal increase in MRS values with increased severity at baseline was noted. While the observations suggest specificity of the risk score for susceptibility to conversion, a nominally higher score in AD provides evidence to support the spirit of the reviewer comment regarding AD and MCI scores.

C2. The MRS score is based off just 4 methylation sites. The initial full dataset contained >250,000 sites and therefore it would be very easy to find a small subset which is predictive by chance.

R2. This comment from the Reviewer is appreciated. An elastic-net model was the focus of the study, as this method seeks a model with fewer features to help avoid overfitting. In addition, to emphasize biological significance in the model, the variance of DNA methylation was considered. That is, the full set of methylation sites was filtered to approximately 63,000 sites based on variance, as low-variance sites may be less useful in practical applications.

A predictive model leveraging a small number of features is not typically considered of lesser value. For example, a recent Nature Communications (2022) paper, van Breugel and colleagues (PMCID: PMC9715628) created a three CpG site predictor of allergic disease in a cohort of 348 subjects. This is one example of predictive models based on a limited set of features that have been developed in various tissues and complex diseases.

C3. The cross validation AUC is significantly lower (0.635) than the training AUC (0.877), which is a sign of over-fitting the training set, given that the methods describe optimising the cross validation mis-classification error, it seems likely that the cross validation AUC is also over-fitted and an independent test set would display an even lower AUC.

R3. I completely agree with this comment by the Reviewer and thank the Reviewer for the insight. The last sentence in the first paragraph of the Discussion states: "The limited set of selected features, an AUC in the training data of 0.877, and a maximum AUC during cross-validation of 0.635 suggests both higher variance and bias. An MRS may find better utility as a component in a comprehensive conversion susceptibility prediction strategy."

However, while revisiting the model development and results, prompted by the reviewer comments, the approach was modified to include mean squared error as the measure instead of misclassification error. With this change, an alpha of 1 now produced the lower error. In addition, lambda was chosen to reduce overfitting, leading to a three-site model. As a result, the AUC for the training data was 0.843, the average AUC across the ten folds was 0.752 and the out-of-fold AUC was 0.653. Revisions were made to the manuscript to reflect these methods changes. I found the updated results demonstrating improved performance encouraging and I hope the Reviewer also views these revised findings favorably.

C4. The discussion that 2/4 CpG sites were previously identified is not evidence backing this up, as this result was obtained in a previous analysis of the same dataset. It would be unusual that two different analyses of the same dataset failed to find similar results (even when the analyses use different methods). A false positive CpG site in one analysis is likely to show up as a false positive in a different analysis on the same dataset.

R4. These insightful comments from the Reviewer are appreciated. The discussion regarding the selected sites in the original study was not intended to justify the predictive model. However, the discussion of the two sites from the study by Roubroeks and colleagues is relevant to the current study, as it seemed unethical to omit these details and claim the sites as novel findings, particularly given the current study uses publicly available data linked to the Roubroeks et al. study.

C5. The secondary dataset is not suitable to be used as an independent validation set, the study design is too different.

R5. Although the particular aspects of the secondary data raising a concern were not noted by Reviewer #2, this appears to be similar to the second set of comments from Reviewer #1. With the secondary dataset not used to identify replication of score trends observed in the longitudinal primary data, clarifying text was added in the fifth paragraph of the Results to note concordance of the score at baseline in the primary data with the cross-sectional secondary data.

C6. Additionally, I raise concerns about the description in the methods of "any CpG site with a detection p-value > 0 for any subject was excluded", this would surely exclude all detected CpGs, as a p-value of 0 for detection, surely indicates an undetected CpG site in that sample, and p-values can not go negative.

R6. The reviewer noticing this in the manuscript is appreciated. As outlined in my response to Reviewer 1 (response #1), a significant percentage of the overall p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

70 Views

19 Dec 2023 | for Version 1

Adam Smith, University of Exeter Medical School, Exeter, England, UK

70 Views Cite this report Responses(1)

Approved With Reservations

Detection p value >0 led to exclusion of that probe. I would suggest that this is a typo or rounding error as a threshold of p>0 would yield 0 probes for downstream processing.
Secondary dataset does not have suitable data to replicate model derived from primary data. The application of MRS onto this cohort gives little to no validation of the model. Further work is needed using an alternative dataset or splitting the initial cohort into a design vs. test approach to better validate the score.
Para 13 – last line, “The MRS model leverages DNAm differences in the TSS of PNCK that are concordant across females and males.” This is incorrect looking at the beta vales presented in Figure S8, this site shows a beta difference of 0.05 (5%) in controls between males and females with a similar difference seen across disease states. This difference is considerably larger than the differences seen between disease states and is a strong argument for the removal of sex chromosomes from the analysis.
Comment on the absence of age information in secondary cohort, was this information not needed for the final MRS calculation in this cohort?

Minor Revisions:

Para 1 – line 5, “stage of AD progression and the…” add "progression" to improve readability of this sentence .
Para 2 – line 1, “on the development” add "the" to improve readability of this sentence.
Para 7 – line 3, “seeking an epigenetic…” Suggest “To enable an epigenetic prediction model that is accurate despite shifts in cell abundance, blood cell distribution values were not included in the model.”
Para 13 – line 2, “These two sites were their top two findings in the EWAS of MCI conversion, though neither site was statistically significant.” Sites were significant but did not reach multiple testing correction threshold.
S6 ROC curve to be moved to main paper and legend expanded.
Table 1 moved to supplementary.
Table 2, “#genes with nearby TSS” – define nearby.
Figure 1, a suitable legend is needed, with statistical significance quoted for all analyses.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Epigenetics, differential methylation.

Respond to this report

Responses (1)

Author Response

04 Apr 2024

Jarrett Morrow, Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, 02115, USA

Reviewer 1:

The comments and suggestions from Reviewer #1 have greatly helped to improve the manuscript and I hope the Reviewer finds the revised manuscript suitable for approval.

Major Revisions:

C1. Detection p value >0 led to exclusion of that probe. I would suggest that this is a typo or rounding error as a threshold of p>0 would yield 0 probes for downstream processing.

R1. The Reviewer noticing this in the manuscript is appreciated. A significant percentage of the p-values in the detection p-value matrix had a value of zero. Therefore, any site having a p-value > 0 for any subject was considered of lower quality. The exclusion of these 82,090 sites is mentioned in the supplemental table (Table S1) added to the revised extended data that outlines the quality control process for the primary data.

C2. Secondary dataset does not have suitable data to replicate model derived from primary data. The application of MRS onto this cohort gives little to no validation of the model. Further work is needed using an alternative dataset or splitting the initial cohort into a design vs. test approach to better validate the score.

R2. The Reviewer is correct. The secondary dataset was not used to identify replication of the trends observed in the longitudinal primary data. This was mentioned in the limitations “The secondary population for MRS evaluation lacked longitudinal outcome data.”. However, to clarify this further, the fifth paragraph of Results was edited to note concordance of the score at baseline in the primary with the cross-sectional secondary data.

Given the smaller dataset, a cross-validation approach was used to create within-cohort validation. In future studies within larger cohorts, an 80/20 split would be feasible as outlined in the discussion. With the predictive ability of the score limited, multi-omic scores in larger populations may be a better approach as mentioned in the conclusions.

Please also note the updates to the methods and revised model described in the response to Reviewer #2.

C3. Para 13 – last line, “The MRS model leverages DNAm differences in the TSS of PNCK that are concordant across females and males.” This is incorrect looking at the beta vales presented in Figure S8, this site shows a beta difference of 0.05 (5%) in controls between males and females with a similar difference seen across disease states. This difference is considerably larger than the differences seen between disease states and is a strong argument for the removal of sex chromosomes from the analysis.

R3. These insightful comments from the Reviewer are appreciated. Based on reviewer comments, the model was revised and the model now includes three of the same CpG sites. However, the PNCK site is no longer included. Please see the response to Reviewer #2 for additional details.

Constructing sex-stratified MRS models could be an effective approach to address similar issues in future studies.

C4. Comment on the absence of age information in secondary cohort, was this information not needed for the final MRS calculation in this cohort?

R4. The request for the clarification is appreciated. Subject age was not retained in the elastic-net model for the primary/discovery dataset, as mentioned in the third paragraph of the Results. Only the three CpG sites in Table 1 were included in the final MRS model.

Minor Revisions:

C1. Para 1 – line 5, “stage of AD progression and the…” add "progression" to improve readability of this sentence.

R1. The sentence has been improved by this suggested revision.

C2. Para 2 – line 1, “on the development” add "the" to improve readability of this sentence.

R2. The sentence has been improved by this suggested revision.

C3. Para 7 – line 3, “seeking an epigenetic…” Suggest “To enable an epigenetic prediction model that is accurate despite shifts in cell abundance, blood cell distribution values were not included in the model.”

R3. The sentence has been improved by this suggested revision.

C4. Para 13 – line 2, “These two sites were their top two findings in the EWAS of MCI conversion, though neither site was statistically significant.” Sites were significant but did not reach multiple testing correction threshold.

R4. The revised manuscript includes a mention of multiple testing correction.

C5. S6 ROC curve to be moved to main paper and legend expanded.

R5. The revised Supplemental Figure S6 is now Figure 1 in the main document with additional information in the caption.

C6. Table 1 moved to supplementary.

R6. Table 1 is now Supplemental Table S2.

C7. Table 2, “#genes with nearby TSS” – define nearby.

R7. This information has been added to Table 1 (formerly Table 2).

C8. Figure 1, a suitable legend is needed, with statistical significance quoted for all analyses.

R8. A more detailed caption for Figure 2 (formerly Figure 1 – now updated using revised model) has been included in the revised manuscript.

View more View less

Competing Interests

No competing interests were disclosed.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Bellenguez C, Küçükali F, Jansen IE, et al.: New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat. Genet. 2022 Apr; 54(4): 412–436. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Wightman DP, Jansen IE, Savage JE, et al.: A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nat. Genet. 2021 Sep; 53(9): 1276–1282. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Hannon E, Lunnon K, Schalkwyk L, et al.: Interindividual methylomic variation across blood, cortex, and cerebellum: implications for epigenetic studies of neurological and neuropsychiatric phenotypes. Epigenetics. 2015 Oct 12; 10(11): 1024–1032. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Morris JC, Storandt M, Miller JP, et al.: Mild Cognitive Impairment Represents Early-Stage Alzheimer Disease. Arch. Neurol. 2001 Mar 1; 58(3): 397–405. PubMed Abstract | Publisher Full Text

[5] 5. Markesbery WR: Neuropathologic Alterations in Mild Cognitive Impairment: A Review. J. Alzheimers Dis. JAD. 2010 Jan; 19(1): 221–228. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Leonenko G, Baker E, Stevenson-Hoare J, et al.: Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores. Nat. Commun. 2021 Jul 23; 12(1): 4506. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Manzali SB, Yu E, Ravona-Springer R, et al.: Alzheimer’s Disease Polygenic Risk Score Is Not Associated With Cognitive Decline Among Older Adults With Type 2 Diabetes. Front. Aging Neurosci. 2022 [cited 2023 Jul 26]; 14. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Park YH, Hodges A, Simmons A, et al.: Association of blood-based transcriptional risk scores with biomarkers for Alzheimer disease. Neurol Genet. 2020 Dec 1; 6(6): e517. PubMed Abstract | Publisher Full Text | Free Full Text

[9] 9. Roubroeks JAY, Smith AR, Smith RG, et al.: An epigenome-wide association study of Alzheimer’s disease blood highlights robust DNA hypermethylation in the HOXB6 gene. Neurobiol. Aging. 2020 Nov 1; 95: 26–45. PubMed Abstract | Publisher Full Text | Free Full Text

[10] 10. Vasanthakumar A, Davis JW, Idler K, et al.: Harnessing peripheral DNA methylation differences in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) to reveal novel biomarkers of disease. Clin. Epigenetics. 2020 Jun 15; 12(1): 84. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Chouliaras L, Pishva E, Haapakoski R, et al.: Peripheral DNA methylation, cognitive decline and brain aging: pilot findings from the Whitehall II imaging study. Epigenomics. 2018 May; 10(5): 585–595. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Li QS, Vasanthakumar A, Davis JW, et al.: Association of peripheral blood DNA methylation level with Alzheimer’s disease progression. Clin. Epigenetics. 2021 Oct 15; 13(1): 191. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. Fransquet PD, Lacaze P, Saffery R, et al.: Blood DNA methylation signatures to detect dementia prior to overt clinical symptoms. Alzheimers Dement. Diagn. Assess. Dis. Monit. 2020; 12(1): e12056. Publisher Full Text

[14] 14. Pérez RF, Alba-Linares JJ, Tejedor JR, et al.: Blood DNA Methylation Patterns in Older Adults With Evolving Dementia. J. Gerontol. Ser. A. 2022 Sep 1; 77(9): 1743–1749. PubMed Abstract | Publisher Full Text | Free Full Text

[15] 15. Manzali S, Ravona-Springer R, Jacob-Hirsch J, et al.: Blood DNA methylation biomarkers for cognitive decline in older adults with type 2 diabetes. Alzheimers Dement. 2023; 19(S2): e065120. Publisher Full Text

[16] 16. Lunnon K, Smith RG, Cooper I, et al.: Blood methylomic signatures of presymptomatic dementia in elderly subjects with type 2 diabetes mellitus. Neurobiol. Aging. 2015 Mar 1; 36(3): 1600.e1–1600.e4. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Shadyab AH, McEvoy LK, Horvath S, et al.: Association of Epigenetic Age Acceleration With Incident Mild Cognitive Impairment and Dementia Among Older Women. J. Gerontol. Ser. A. 2022 Jun 1; 77(6): 1239–1244. PubMed Abstract | Publisher Full Text | Free Full Text

[18] 18. Fransquet PD, Lacaze P, Saffery R, et al.: Accelerated Epigenetic Aging in Peripheral Blood does not Predict Dementia Risk. Curr. Alzheimer Res. 2021; 18(5): 443–451. PubMed Abstract | Publisher Full Text | Free Full Text

[19] 19. Zhou W, Laird PW, Shen H: Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 2017 Feb 28; 45(4): e22. PubMed Abstract | Publisher Full Text

[20] 20. McRae AF, Marioni RE, Shah S, et al.: Identification of 55,000 Replicated DNA Methylation QTL. Sci. Rep. 2018 Dec [cited 2019 Mar 25]; 8(1): 17605. PubMed Abstract | Publisher Full Text | Free Full Text Reference Source

[21] 21. Nabais MF, Laws SM, Lin T, et al.: Meta-analysis of genome-wide DNA methylation identifies shared associations across neurodegenerative disorders. Genome Biol. 2021 Mar 26; 22(1): 90. PubMed Abstract | Publisher Full Text | Free Full Text

[22] 22. Pidsley R, Wong CCY, Volta M, et al.: A data-driven approach to preprocessing Illumina 450K methylation array data. BMCGenomics. 2013; 14(14). Publisher Full Text

[23] 23. Friedman JH, Hastie T, Tibshirani R: Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010; 33(1): 1–22. PubMed Abstract

[24] 24. Robin X, Turck N, Hainard A, et al.: pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011; 12: 77. PubMed Abstract | Publisher Full Text | Free Full Text

[25] 25. Hu H, Li H, Li J, et al.: Alzheimer’s Disease Neuroimaging Initiative. Genome-wide association study identified ATP6V1H locus influencing cerebrospinal fluid BACE activity. BMC Med. Genet. 2018 May 11; 19(1): 75. PubMed Abstract | Publisher Full Text | Free Full Text

[26] 26. Hampel H, Vassar R, De Strooper B, et al.: The β-Secretase BACE1 in Alzheimer’s Disease. Biol. Psychiatry. 2021 Apr 15; 89(8): 745–756. PubMed Abstract | Publisher Full Text | Free Full Text

[27] 27. Preman P, Alfonso-Triguero M, Alberdi E, et al.: Astrocytes in Alzheimer’s Disease: Pathological Significance and Molecular Pathways. Cell. 2021 Mar 4; 10(3): 540. PubMed Abstract | Publisher Full Text | Free Full Text

[28] 28. Roussotte FF, Gutman BA, Hibar DP, et al.: Carriers of a common variant in the dopamine transporter gene have greater dementia risk, cognitive decline, and faster ventricular expansion. Alzheimers Dement. J. Alzheimers Assoc. 2015 Oct; 11(10): 1153–1162. PubMed Abstract | Publisher Full Text | Free Full Text

[29] 29. Davis EJ, Solsberg CW, White CC, et al.: Sex-Specific Association of the X Chromosome With Cognitive Change and Tau Pathology in Aging and Alzheimer Disease. JAMA Neurol. 2021 Oct 1; 78(10): 1249–1254. PubMed Abstract | Publisher Full Text | Free Full Text

[30] 30. Logue MW, Smith AK, Wolf EJ, et al.: The correlation of methylation levels measured using Illumina 450K and EPIC BeadChips in blood samples. Epigenomics. 2017 Nov; 9(11): 1363–1371. PubMed Abstract | Publisher Full Text | Free Full Text

Methylation risk score in peripheral blood predictive of conversion from mild cognitive impairment to Alzheimer's Disease

Abstract

Keywords

Introduction

Methods

Primary data

Secondary data

Analysis

Results

Table 1. Demographics of European study subjects in primary and secondary datasets.

Table 2. CpG sites included in the conversion risk score model.

Figure 1. Box plots of the MRS for all subjects across disease states including MCI outcomes.

Discussion

Author’s contributions

Data availability

Underlying data

Extended data

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated