Keywords
Alzheimer’s disease, cognitive decline, risk score, DNA methylation, epigenetics
This article is included in the Bioinformatics gateway.
This article is included in the Genomics and Genetics gateway.
Alzheimer’s disease (AD) is a neurodegenerative and heterogeneous disorder with complex etiology. Mild cognitive impairment (MCI) may represent an intermediate stage of AD, and the ability to identify MCI patients at greater risk of conversion to AD could guide personalized treatments. This study sought to develop a methylation risk score predictive of conversion from MCI to AD using publicly available blood DNA methylation (DNAm) data.
Using blood DNA methylation data from an epigenome-wide association study of AD that included 111 subjects with MCI, a methylation risk score of MCI conversion was created using an elastic-net framework. The elastic-net model was trained with a high-variance subset of the DNAm data, age and sex as predictors.
The final model included three CpG sites: SLC6A3 (cg09892121) and TRIM62 (cg25342005), with a third (cg17292662) near the genes ATP6V1H and RGS20. A significant difference (p < 0.0001, t-test) was observed in the scores for MCI stable subjects compared with MCI converters. No statistically significant difference was observed between AD subjects and controls, suggesting specificity of the risk score for susceptibility to conversion.
The ability to identify MCI patients at greater risk of progression could inform early interventions and is a critical component in mitigation strategies for AD. This study provides insight into a potential role for epigenetics in the development of a multi-omic risk score of conversion.
Alzheimer’s disease, cognitive decline, risk score, DNA methylation, epigenetics
Revision were made to the manuscript to address the reviewer comments. As part of these revisions, the measure used in the elastic-net binomial model is now mean squared error instead of misclassification error. With this change, an alpha of 1 produced the lower error. In addition, lambda was chosen to reduce overfitting, leading to a three-site model, instead of four sites, with the PNCK site no longer included in the model. The AUC for the training data is now 0.843, with an average AUC across the ten folds of 0.752 and an out-of-fold AUC of 0.653. In the primary data, 82,090 sites were excluded from the analyses based on detection p-value filtering. To provide more clarity, this filtering is shown in a supplemental table (Table S1) added to the Extended data to outline the overall quality control process for the primary data. Clarifying text was added in the fifth paragraph of the Results to note concordance of the score at baseline in the primary data with the cross-sectional secondary data. The ROC curve previously in the Extended data (Figure S6) was moved to the main manuscript (Figure 1), with additional information added to the caption. The demographics summary (Table 1) was moved to the Extended data (Table S2). A more detailed caption for Figure 2 (formerly Figure 1) was included in the revised manuscript. This figure has also been updated based on the revised elastic-net model.
See the author's detailed response to the review by Adam Smith
See the author's detailed response to the review by Rachel Cavill
Alzheimer’s disease (AD) is a neurodegenerative and heterogeneous disorder with complex etiology and devastating impact on individuals and families. Although genome-wide association studies continue to provide insight into the genetic susceptibility to AD,1,2 epigenome-wide association studies (EWAS) and, in particular, studies of DNA methylation (DNAm) have the potential to capture signals related to environmental contributions.3 Mild cognitive impairment (MCI) may represent an intermediate stage of AD progression4 and the ability to identify MCI patients at greater risk of conversion to AD could guide personalized strategies for mitigation of decline, as the pathology of AD may be present years before onset of symptoms.5 Risk scores that provide predictive measures of AD susceptibility have been created using genetic6,7 and blood transcriptomic data.8 Epigenome-wide associations studies in peripheral blood have identified differentially methylated CpG sites associated with AD9–11 and AD progression.12–16 Associations between peripheral blood epigenetic age acceleration and cognitive function have also been examined.17,18
This study was focused on the development of a methylation risk score (MRS) predictive of conversion from MCI to AD, using publicly available DNA methylation data9 and machine learning methods. This score was evaluated in cross-sectional data in AD subjects and controls in both the primary and a secondary datasets to help understand the relationship of the conversion risk score to overall disease severity. This study provides insight into the value epigenetics could provide in a multi-omic risk score of conversion.
To create the risk score, blood DNA methylation (DNAm) data from an EWAS of AD in 300 subjects9 were obtained from the Gene Expression Omnibus (GEO: GSE144858). The cross-European AddNeuroMed study dataset includes 93 subjects with Alzheimer’s, 111 with MCI and 96 control subjects. Of the 111 MCI subjects, 68 were stable after one year, 39 converted to AD within one year and four converted at an unknown time. Roubroeks and colleagues9 extracted DNA from the blood samples and assayed DNA methylation levels using the Illumina Infinium Human Methylation 450K BeadChip array. After quality control analyses, they quantile-normalized the data using the dasen method from the R package wateRmelon to create a matrix of beta values (CpG sites in rows and subjects in columns). The data from the four subjects with unknown conversion date were excluded. Two MCI subjects less than 65 years of age, excluded from the study by Roubroeks et al.,9 were included in this study. Prior to the analyses in this study, any CpG site with a detection p-value >0 for any subject was excluded. Using the annotation from Zhou et al.,19 CpG sites with probe mapping issues or having a SNP with minor allele frequencies >1% within five bases were also removed. To reduce the influence of genetics on the prediction models, CpG sites with significant genetic associations (methylation quantitative trait loci: mQTL) in peripheral blood20 were also removed. The beta values for the remaining CG-annotated sites were retained for analysis. To identify possible sex mismatches, multidimensional scaling (MDS) plots were created using the cmdscale function in R and the X and Y chromosome data. MDS plots were also created using high variance DNAm data to observe batch effects. No sex mismatches or batch effects were observed.
An independent set of blood DNA methylation data from an epigenome-wide meta-analysis of neurodegenerative disorders21 was obtained from the Gene Expression Omnibus. The Australian Imaging, Biomarker & Lifestyle Flagship Study of Aging (AIBL) dataset of 726 subjects included 161 subjects with Alzheimer’s, 94 with MCI and 471 control subjects. Longitudinal information regarding conversion to AD is not available in this study. Nabais and colleagues21 assayed DNAm using the Illumina HumanMethylationEPIC BeadChip Array. Although data processed using functional normalization were publicly available, the methylated and unmethylated intensities were used to create a dataset normalized using dasen in the R package wateRmelon22 to create a matrix of beta values. Prior to normalization, CpG sites with a detection p-value >0 for any subject were excluded. CpG sites with probe mapping issues, nearby SNPs with minor allele frequencies >1%19 or a significant mQTL in peripheral blood20 were also removed. Following dasen normalization, no sex mismatches were observed in the MDS plot created using the X and Y chromosome data. The beta values for the remaining CG-annotated sites were retained for analysis.
A methylation risk score of MCI conversion was created using an elastic-net binomial (classification) model via the R package glmnet.23 This regularized regression method combines the L1 and L2 penalties of the lasso and ridge methods and provides the ability to retain correlated features. To adjust model and feature selection performance, the contribution of each penalty is selected using the hyperparameter alpha. Using DNAm beta values, age and sex as predictors and the MCI outcome (stable or conversion to AD) as the response, a model was trained using a 10-fold cross-validation approach. Given the limited number of MCI subjects, all data were used in the training process. The alpha hyperparameter was chosen to minimize the cross-validation mean squared error (MSE). To compare the performance with a score based on demographics only, a binomial risk score model was created using the glm function in R with age and sex as predictors.
Receiver operating characteristic (ROC) curves were created using the function roc in the R package pROC.24 Risk scores were calculated using the final conversion risk score model via the predict function in the glmnet package.
After quality control procedures, peripheral blood DNAm data were available for 251,491 CpG sites and 296 samples, including 93 AD subjects, 107 MCI subjects (68 stable, 39 converter) and 96 controls in the Roubroeks et al. (GSE144858) dataset (Tables S1 and S2, Extended data). Approximately 60% of the European subjects were female (52% female among MCI subjects).
An elastic net model was trained using the DNAm beta values (range: 0 to 1) for the MCI subjects. The beta values, age, and sex were included as possible predictors and the MCI outcome (stable or conversion to AD) was the response. Blood cell distribution values were not included in the model, to enable an epigenetic prediction model that may capture biology including shifts in cell abundance with conversion to AD. With a focus on more robust signatures, only CpG sites with variance in the top quartile (62,873 CpG sites) were included in the training set (Table S1, Extended data). After executing cross-validation for values of alpha from 0 to 1, an alpha value of 1.0 was observed to minimize cross-validation MSE (MSE = 0.44).
Selecting a lambda value of 0.164 to reduce variance, the final model from this process included three features, all CpG sites (Table 1). Two of the sites are annotated to the genes SLC6A3 (cg09892121) and TRIM62 (cg25342005), with a third (cg17292662) near the genes ATP6V1H and RGS20. The distribution of the beta values for these sites are centered between 0.6 and 0.8 (Figures S1 to S3, Extended data). A ROC curve was created using the training data with the final model (Figure 1) and the observed area under the ROC curve (AUC) was 0.843. The mean cross-validation AUC across the ten folds was 0.752, where AUC values were calculated using the test data for each fold. The out-of-fold prevalidation AUC, calculated using the aggregated test set predictions and outcomes from all ten folds, was observed to be 0.653.
CpG site | Chromosome | Gene annotation* | Relationship to gene | CpG island location |
---|---|---|---|---|
cg09892121 | Chr 5 | SLC6A3 | Body | OpenSea |
cg17292662 | Chr 8 | ATP6V1H #, RGS20 # | N/A | S_Shelf |
cg25342005 | Chr 1 | TRIM62 | TSS1500 | S_Shore |
* Illumina Infinium 450K BeadChip annotation from Zhou et al.19
The methylation risk score (predicted probability of conversion) was calculated using the final model and the DNAm beta values. A significant difference (p < 0.0001, t-test) in the score for MCI stable subjects compared with MCI converters may be observed in the box plots of the risk score (Figure 2). For MCI subjects, being above the mean MRS compared with below the mean equates to an odds ratio of 5.8 for conversion. Also included in Figure 2 are the scores for the AD subjects and controls. The seven controls and seven AD cases less than 65 years of age, excluded in the analyses by Roubroeks et al., were included in the MRS calculations in Figure 2. A nominal increase in MRS values may be observed with increased severity at baseline (Figure S4, Extended data), with no statistically significant difference between AD and controls (p = 0.88, t-test), perhaps suggesting specificity of the risk score for susceptibility to conversion. Box plots were created using the beta values for each of the three predictive sites across disease severity, stratified by sex (Figures S5 to S7, Extended data). The direction of effect for DNAm with respect to conversion in the MCI subjects is consistent across males and females.
After processing, peripheral blood DNAm data were available for 601,732 CpG sites and 296 samples, including 161 AD subjects, 94 MCI subjects and 471 controls in the Nabais et al. (GSE153712) dataset (Table S2, Extended data). Approximately 55% of the subjects were female. To examine concordance of the score at baseline in the primary with the cross-sectional secondary data, the MRS was calculated using the final conversion risk score model and the DNAm beta values in this secondary dataset. In the box plots across disease severity (Figure S8, Extended data), the MRS values increase with severity consistent with the discovery dataset. The MRS values are higher overall for the Nabais et al. data compared with Roubroeks et al. The secondary DNAm data were created using the Illumina HumanMethylationEPIC platform in contrast to the use of the Illumina 450K platform by Roubroeks et al.9
For the risk score created using only age and sex as predictors in the Roubroeks et al. dataset (Figure S9, Extended data), a significant difference was not observed between MCI stable and MCI converter (p = 0.2, t-test) and the AUC was 0.579 with the training data (Figure S10, Extended data).
In this study, a methylation risk score to quantify susceptibility to conversion from MCI to AD was examined. Highly variable CpG sites were selected for development of the score, seeking information to inform identification of robust biomarkers. In addition, the effects of genetics were suppressed by excluding sites previously associated with an mQTL or located near a common SNP. Using a modest set of 107 subjects with MCI, the model demonstrated predictive capabilities. The limited set of selected features, an AUC in the training data of 0.843, a mean AUC during cross-validation of 0.752, and an out-of-fold AUC of 0.653 suggest both higher variance and bias. An MRS may find better utility as a component in a comprehensive conversion susceptibility prediction strategy.
Two of the three predictive CpG sites (cg17292662, cg25342005) were identified in the previous study by Roubroeks and colleagues.9 These two sites were their top two findings in the EWAS of MCI conversion, though neither site was statistically significant after multiple testing correction. The CpG site cg17292662 is within 6,000 bases of the transcription start sites for two genes: ATP6V1H and RGS20. A prior GWAS has identified a variant in ATP6V1H (ATPase H+ transporting V1 subunit H) influencing human cerebrospinal fluid (CSF) β-site APP cleaving enzyme (BACE) activity.25 Previous studies have suggested that elevated beta-site amyloid precursor protein-cleaving enzyme 1 (BACE1) levels in CSF may be an indicator of MCI and early-stage AD (Zong Arch Gen Psychiatry), with BACE1 a possible therapeutic target.26 The gene RGS20 (regulator of G protein signaling 20) has biased expression in the brain and has been found to be downregulated in AD astrocytes.27 The CpG site cg25342005 is within 1500 bases of TSS of the gene TRIM62 (tripartite motif containing 62), a gene expressed in the brain. The CpG site cg09892121 is located within the gene SLC6A3 (solute carrier family 6 member 3) and was among the top 500 findings in the study by Roubroeks et al.9 The gene SLC6A3 encodes a dopamine transporter and a genetic variant in the gene was previously identified that may confer greater risk of dementia and cognitive decline.28
Limitations of this study include the small population of MCI subjects with longitudinal outcomes for development of an MRS. Identifying a signature predictive of cognitive decline in peripheral blood presents many challenges with respect to signal and noise and a larger study population with longitudinal data would enhance future MRS creation efforts. In a larger population, the ability to effectively train an MRS model with 80% of the data, while holding out 20% of the data for validation, would provide an effective internal validation. Future efforts may also explore other machine learning frameworks, including deep learning, random forests and support vector machines. The secondary population for MRS evaluation lacked longitudinal outcome data, limiting replication of the findings with respect to conversion. The disparity between MRS values for the baseline training and secondary datasets may be due to the difference in assay platforms, as low correlations have been previously observed between Illumina 450K and EPIC DNA methylation data in blood for many CpG sites.29
The ability to identify MCI patients at greater risk of progression could inform early interventions and is a critical component in mitigation strategies for AD. This study is the first to examine a blood-based methylation risk score of conversion from mild cognitive impairment to Alzheimer’s disease. Although the predictive ability of the score is limited, this study demonstrates the potential value epigenetics would add to a risk score based on multi-omic and phenotypic data collected from the same patients.
JDM: conceptualization, methodology, formal analysis, interpretation of data, manuscript preparation and approval of the final version
Gene Expression Omnibus: An epigenome-wide association study of Alzheimer’s disease blood highlights robust DNA hypermethylation in the HOXB6 gene, https://identifiers.org/geo:GSE144858. 9
Gene Expression Omnibus: Meta-analysis of genome-wide DNA methylation identifies shared associations across neurodegenerative disorders, https://identifiers.org/geo:GSE153712. 21
Zenodo: Methylation risk score in peripheral blood predictive of conversion from mild cognitive impairment to Alzheimer’s Disease, https://doi.org/10.5281/zenodo.10802595. 30
This project contains the following extended data:
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Vasanthakumar A, Davis JW, Idler K, Waring JF, et al.: Harnessing peripheral DNA methylation differences in the Alzheimer's Disease Neuroimaging Initiative (ADNI) to reveal novel biomarkers of disease.Clin Epigenetics. 2020; 12 (1): 84 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: statistical modeling, epigenomics analysis
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Epigenetics, differential methylation
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
No
Are sufficient details of methods and analysis provided to allow replication by others?
No
If applicable, is the statistical analysis and its interpretation appropriate?
No
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Data science applied to biological data, in particular, omics data.
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Epigenetics, differential methylation.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 2 (revision) 15 Mar 24 |
read | read | |
Version 1 01 Sep 23 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)