Investigation of gut microbiome association with inflammatory bowel disease and depression : a machine learning approach

Inflammatory bowel disease (IBD) is a group of chronic diseases Background: related to inflammatory processes in the digestive tract generally associated with an immune response to an altered gut microbiome in genetically predisposed subjects. For years, both researchers and clinicians have been reporting increased rates of anxiety and depression disorders in IBD, and these disorders have also been linked to an altered microbiome. However, the underlying pathophysiological mechanisms of comorbidity are poorly understood at the gut microbiome level. Metagenomic and metatranscriptomic data were retrieved from the Methods: Inflammatory Bowel Disease Multi-Omics Database. Samples from 70 individuals that had answered to a self-reported depression and anxiety questionnaire were selected and classified by their IBD diagnosis and their questionnaire results, creating six different groups. The cross-validation random forest algorithm was used in 90% of the individuals (training set) to retain the most important species involved in discriminating the samples without losing predictive power. The validation set that represented the remaining 10% of the samples equally distributed across the six groups was used to train a random forest using only the species selected in order to evaluate their predictive power. A total of 24 species were identified as the most informative in Results: discriminating the 6 groups. Several of these species were frequently described in dysbiosis cases, such as species from the genus and Bacteroides . Despite the different compositions among the Faecalibacterium prausnitzii groups, no common patterns were found between samples classified as depressed. However, distinct taxonomic profiles within patients of IBD depending on their depression status were detected. The machine learning approach is a promising approach for Conclusions: investigating the role of microbiome in IBD and depression. Abundance and functional changes in these species suggest that depression should be considered as a factor in future research on IBD.


Introduction
Increased depression rates have been frequently reported on patients with inflammatory bowel disease (IBD) (Graff et al., 2009), which is a big concern from a clinical standpoint, since increased levels of stress and anxiety are major drivers of IBD relapse and severity (Mawdsley & Rampton, 2006). Both IBD and depression are heavily influenced by the gut microbiome structure, which controls anti-inflammatory processes and permeability in the gut, and communicates with the brain by a complex and close relationship with the Autonomous Nervous System that is known as the brain-gut axis ( The availability of the large amount of data derived from the recent explosion in metagenomics and metatranscriptomics provides unique opportunities for investigation. However, it is sometimes difficult to identify informative species. Recently, machine learning algorithms have been successfully applied because they allow the identification of patterns in situations where large, multi-dimensional and heterogeneous datasets are available. Among the several machine learning approaches available, random forest is an algorithm used for classification and regression based on an ensemble that builds a population of decision tree classifiers, such that the result of a prediction from a given set of features is the most frequent result from the different trees of the "forest" (Breiman, 2001). This is an efficient and generalist algorithm that has already been applied in several metagenomic investigations in human diseases, such as IBS (Saulnier et al., 2011).
The aim of this work was to apply the random forest approach to identify the microbiome species that may be mostly involved in IBD and depression outcomes and that are responsible for the most relevant changes in the population structure between IBD, depression and patients comorbid for both conditions, and to provide insights on how the microbiome is involved in this comorbidity.

Subject selection
From this dataset, the 70 unique participants who answered an additional self-reported depression and anxiety questionnaire during registration (the answers to which are listed in the HMP2 metadata, column EC to EL) were selected. As the questionnaire model was not specified, only individuals with raw scores over 6 on this test was considered as showing "signs of depression". To calculate the raw scores, a severity scale was generated, with the following scores: 0, never; 1, rarely; 2, sometimes; 3, often; 4, always. The scores were then summed to give a final total. In the case of individuals undergoing multiple tests, the lower score was used. We selected a low threshold in order to be able to identify putative dysbiotic individuals that were not experiencing severe depression symptoms. All the others were classified as "no sign of depression". The combination between the test and the IBD diagnosis divided the dataset in six groups: Crohn's disease with no detectable sign of depression (CD; n=15), Crohn's disease with signs of depression (CDD; n=20), ulcerative colitis with no sign of depression (UC; n=4), ulcerative colitis with signs of depression (UCD, n=11), signs of depression but no inflammation (nonIBDD; n=7) and the control group: no inflammation/no depression (nonIBD; n=13). As the experimental design of the IBDMDB consisted of a longitudinal study, each subject contributed several times to this study, and all the samples used for this analysis were sequenced by shotgun sequencing as described in Schirmer et al. The resulting datasets for metagenomic and metatranscriptomic consist of 1084 and 566 samples, respectively. The final tables after

Amendments from Version 1
Main difference from previous version and the version 2 is that as per Reviewer 1's comments, we have added a more thorough description of the dataset and a workflow illustrating the k-Fold Cross Validation approach for Random Forest (new Figure 1). As per the comments of Reviewer 2, we have rerun the species identification considering all IBD samples as a group and then classified by a depressed or not depressed state. We have furthermore added a small description of the set of species and expanded the introduction with some of the citations suggested. However, we cannot add covariation analysis as the metadata is quite incomplete and heterogeneous and likely to give misleading results -hence this was not performed.

REVISED
pre-processing consist of 1486 columns, including Participant ID, data type, diagnostic, sex, mental score, and nested columns on the relative values of the different taxa.

Data analysis
For each of the six groups, abundance matrices of the metagenomic data, metatranscriptomic data, and the combination of metagenomics and metatranscriptomics were used for random forest classification. Each of the datasets was divided randomly into a training set (90% of the individuals) and a validation set (10% of the individuals). Random forest analysis were performed using the library Scikit-learn 0.19.1 (Pedregosa et al., 2011) on the training sets to identify the most important species involved in discriminating the samples without losing predicting power. A 1000-fold cross-validation for the combined dataset, and 500-fold for metagenomic and metatranscriptomic data (see Figure 1), considering one model for each iteration was performed and only the most important species in the construction of this model was retained. Only models with a precision classification >80% were considered, and among the considered models, only species that appeared more in more than one were selected. Afterwards, the validation sets were run with the selected species only to measure the possible loss of predictive capability and computed the area under the receiver operating characteristic (auROC) curve for the prediction of the validation set classes as a performance metric.

Statistical analysis
In order to assess the significance of the differences between the abundances of the selected species, we performed a one-way ANOVA (Scipy 1. The functional activity of the selected species was retrieved from the HUMAnN metatranscriptomic analyses described above. Only the pathways in which the selected species are involved and those that were different between the groups from the ANOVA test were selected and the correlation between these species was calculated using Spearman's correlation coefficient. A significance level of 0.05 was applied for all statistical tests.

Results and discussion
Species selection and model validation The random forest cross-validation selection of the most informative species showed a combined list of 24 species, as can be seen in Figure Figure 5). This small loss of information suggest a relevant role of the selected species in the interaction of both conditions, while the capability of the model to classify the validation data with with great accuracy shows that our model can generalize its results and it's not overfitting.
All species exhibited differences in at least one group in a one-way ANOVA (alpha=0.05, Supplementary Table 1), and no significant differences were found between DNA and RNA abundances for these species (Supplementary Table 2). This list of putative species pretends to be a trade-off between the allrelevant and minimal informative approaches. We chose this approach ir order to get as broad of a list as possible while avoiding artifacts related to the longitudinal nature of the dataset.
In order to assess the effect of the small sample size of group UC, the same procedure was made grouping all samples with IBD together. As expected, we see some difference in the species selected. However, the species that showed stronger differences in the previous classification were also the stronger ones, with most of the species overlapping. The interesting exception is Faecalibacterium prausnitzii that was absent.

The non-dysbiotic microbiome
The analyses showed an increase in the number of species from the genus Bacteroides in dysbiotic groups compared with the control (nonIBD) (Figure 3), as has been reported in other dysbiotic samples (Bloom et al., 2011), with the exception of Bacteroides dorei, which is more abundant in non-IBD than in any other group. Aside from Bacteroides dorei, nonIBD samples had a higher abundance of Alistipes shahii and Ruminococcus bromii, while a typical species associated with nonIBD, Faecalibacterium prausnitzii, was significantly decreased in nonIBDD and CD.

Crohn's disease abundance changes in depression
Both of the Crohn's disease-related groups (CD and CDD) showed higher abundances of Bacteroides ovatus and Bacteroides uniformis. However, CD samples exhibited higher abundances for several specific species, including Bacteroides xylanisolvens, Parasutterella excrementihominis and Bacteroides fragilis, compared with CDD, but decreased abundance of Faecalibacterium prausnitzii, which did not differ significantly in abundance between nonIBD and CDD groups.

Ulcerative colitis changes in depression
Ulcerative colitis samples had the most distinctive microbiome profile. Several species, including Burkholderiales bacterium 1_1_47, Bacteroides eggerthii and Bacteroides finegoldii were characteristic of this group, and absent in the others, except for B. finegoldii, which was also present in a lower abundance in nonIBD samples. Only UCD samples exhibited an increased abundance of Bacteroides fragilis, Bacteroides vulgatus and Haemophilus pittmaniae, this last species being almost exclusive to the UCD group.

Non-IBD changes in depression
The nonIBDD was the group with the highest number of changes in microbiome diversity when compared with its non-depressed counterpart (Table 1). However, most of those changes followed a similar pattern in other dysbiotic groups.
A notable change was observed in Faecalibacterium prausnitzii, which was present in almost the same abundances in nonIBD, UCD and CDD samples, and a high variability in UC while being significantly lower in CD and nonIBDD (Supplementary Table 3 and Supplementary Table 4). This is    particularly interesting, since this species is considered to have anti-inflammatory activity. It seems counterintuitive to find a depleted population of one of the species most associated in the literature with a healthy microbiome compared to an IBD one in a group that doesn't show any inflammatory process. However, Parabacteroides goldsteinii was increased in non- Other than Parabacteroides goldsteinii, nonIBDD samples did not contain other characteristic groups, and, more notably, none of the selected species was specific for depressed or nondepressed phenotypes.

Microbial functional activity
Regarding the functional activity of these species, seven pathways that were more abundant in dysbiotic groups than in nonIBD were identified (Supplementary Figure 1) and were correlated between each other and inversely correlated with   et al., 2010). However, even if nonIBDD have the highest activity for almost all of these pathways, CD and UC were also significantly increased, while functional activity in CDD was generally lower and non-significant in some pathways. Moreover, UCD did not differ from nonIBD in any of them.
This difference in functional activity again highlights the lack of a concrete pattern of gut microbiome abundance between depressed groups.

Supplementary Figure 1. Relative abundances of the pathways that showed significant differences between groups (alpha= 0.05).
Click here to access the data.

Supplementary Figure 2. Correlation between the different pathways contributed by the selected species.
Color gradient shows positive (red) or negative (blue) correlation.
Click here to access the data.

Supplementary Figure 3. Receiver operating characteristic curves for the validation model with combined metagenomic and metatranscriptomic data.
Click here to access the data.

Supplementary Figure 4. Receiver operating characteristic curves for the validation model with metagenomic data.
Click here to access the data.

Supplementary Figure 5. Receiver operating characteristic curves for the validation model with metatranscriptomic data.
Click here to access the data.

Supplementary Table 1. ANOVA results for each of the selected species in metagenomic and metatranscriptomic data sets.
Click here to access the data.

Supplementary Table 2. A t-test was used to assess the difference between DNA and RNA abundances per species and a nested column per group.
Click here to access the data. Click here to access the data.

Conclusions
The random forest approach was able to successfully identify informative changes in abundance at the species level, revealing specific patterns for the depressed and non-depressed groups without losing predictive power. We believe that this approach, and Machine Learning in general, can be really useful in a field of research were high dimensionality is always an issue.
This work provided, to our knowledge for the first time, an overview about the difference in the bacterial communities of patients with signs of depression and the combination with depression and inflammatory bowel disease. Our findings suggest a complex landscape of microbiome interactions, both at population structure and functional activity levels. However, the results showed that there are distinct taxonomic profiles within patients of IBD depending on their depression status, providing further input for future investigations.

Data availability
The datasets used for the analyses were retrieved from the Inflammatory Bowel Disease Multi-Omics Database (IBDMDB) (Schirmer et al., 2018) 1.

Open Peer Review Current Peer Review Status:
Version 2 1.

Department of Computer Science and Engineering (CSE) and Initiative for Biological Systems Engineering (IBSE), Robert Bosch Centre for Data Science and Artificial Intelligence (RBC-DSAI), Indian Institute of Technology (IIT) Madras, Chennai, Tamil Nadu, India
It is great that authors have attempted to address major concerns from both reviewers, with additional analysis, figure and clarifications to text.
A couple follow-up points regarding the additions in version 2 remain to be addressed (as explained below); furthermore, some issues that are minor yet important for increasing the readability/impact of the manuscript, which have been already mentioned in previous reviewer reports, remain to be addressed. I believe all of these changes can be incorporated into a new version without any additional analyses, and instead only with clarifications to text/figures. Readers would find a new version incorporating the suggested changes below much more valuable, and I look forward to reading this new version and associated point-by-point response to all reviewer reports posted so far.
Regarding the new additions in version 2 compared to version 1, I've the following comments: It is good to know that the species with stronger differences continue to be the informative species when IBD samples are grouped together, but it would be valuable for readers to know the exact list of informative species with the grouped IBD samples. Hence, please provide a suppl. table of all species identified using this combined IBD samples, so that the readers can learn more about which species with stronger differences got replicated, find other interesting exceptions, etc.
The authors say that "This list of putative species pretends to be a trade-off between the all-relevant and minimal informative approaches.... ". Did the authors try out the all-relevant approach (i.e., classic t-test or ANOVA for each species)? If so, please provide the selected species from this all-relevant approach as a suppl. table. If not, please mention that other alternate approaches such as minimally informative or all-relevant approach to select species are also possible (and cite the Boruta paper for more information on these alternate approaches), but not tried out.
Typos have crept into some of the newly added text. Please fix these: "chose this approach >>ir<< order to get as broad of a list as possible" "only species that appeared >>more in more than one<<", etc.
Regarding concerns already raised in version 1 reviewer reports (of both reviewer 1 and 2), I would like to give a few example issues that were not addressed: The suppl figs 1-5 captions seems mixed up, as already reported by reviewer 1 in his report, and it doesn't appear to be fixed in version 2 (similarly the very small font size in this figure making it very hard to read has also not been fixed). While this is a minor issue that can be easily fixed, leaving it unfixed can negatively impact the readability of the article.
The 1000-fold cross-validation still needs some explanation, again as raised by reviewer 1 in his response. Figure 1 helps a lot to understand the data splits, and with additional text on longitudinal sampling of the same individual, it is easier to understand that 70 individuals actually give rise tõ 1400 data points. What is not clear are: How are these ~1400 data points split into 1000 folds? Please clarify in text.
Please also clarify in text any issues/caveats associated with doing cross-validation on 1 3.
Please also clarify in text any issues/caveats associated with doing cross-validation on samples that are not independently distributed but are instead correlated due to several data points coming from the same individual.
There are several other, "simple-to-address" issues (i.e., issues that require no additional analyses, only clarifications to text/figures, to address), raised by reviewer 1 in his report. While a single paragraph summarizing all key changes to version 2 compared to version 1 is valuable, a point-by-point response to all reviewer reports submitted so far specifying which issues were addressed and which issues were beyond the scope of this work to address would make it easier for the reviewers to understand the reasoning of the authors in deciding which issues they decided to address when they prepared version 2. A quantitative data-driven analysis of the brain-gut-microbiome axis is an important topic to understand, and this paper makes a significant contribution in this area, and it would be valuable to readers once the above comments are addressed.

1.
reviews. Then I wrote this report to express my opinions in the context of the previous reviewer's opinion.

Summary:
The authors attempt to unravel the complex interaction between IBD and depression that is potentially mediated by the gut microbiome. More specifically, they identify a subset of (24) microbial species that could discriminate between various single-disease and co-morbid cases of IBD and depression, and discuss the potential roles of these species and associated functional pathways in co-morbidity in the context of prior literature. Their methodology involves applying a combination of a machine learning approach (building a random forest to predict disease labels from species abundances and feature selection of informative species in the final model) and performing statistical tests (ANOVA-based test to identify which of the 24 selected/informative species differ between pairwise groups of interest such as the IBD with vs. without depression groups).

Strengths:
The authors use a systematic data-driven approach to address an important question of which microbial species and which functional pathways are involved in the "microbiome-brain-gut axis" (or specifically which species/pathways discriminate between different groups of individuals such as IBD disease with vs. without depression). The idea of using a random forest to select informative species with predictive power and then doing all subsequent analysis with the selected species is an interesting strategy to deal with the heterogeneity in the data and small sample size per strata/group (though I've some concerns with this approach/idea as mentioned below, which may be addressed with some additional analysis). The nice summary of key results in Table 1 (of species whose abundances have changed between IBD with vs. without depression) and the related discussion of identified species in the context of prior literature on single/comorbid cases of IBD/depression would be of immediate interest to researchers in this field.

Concerns:
I agree with the previous reviewer's concerns/feedback on improving presentation (such as by adding key details on data dimensions and cross-validation folds to the text) and improving interpretation (such as by comparing the 24 selected species to whatever species would be detected from applying a classic t-test or ANOVA test to all species rather than just the 24 species selected in the random forest analysis).
I now provide my feedback/comments not already covered by the previous reviewer below.
Statistical concerns due to small sample size and potentially confounding covariates: Though the overall population is a decent sample size of 70 individuals, the per-strata or per-group sample sizes are low to moderate, with some groups like UC having only 4 patients!! The authors are aware of this issue and use a random forest feature selection to tackle the heterogeneity in sparse human population data. But I am not fully convinced that a machine learning model can recover from insufficient sample sizes as low as n=4. One way to address this issue could be to merge together UC with CD and then label them as a single IBD group and then study this group with/without signs of depression. If the conclusions are similar before/after this merging of UC with CD, then the authors may mention that this additional test yielded similar results and keep the current results in the paper as is.
Another issue with heterogeneous and sparse human data is that covariates such as age, gender, BMI, genotype, etc. are more likely to confound the association of gut microbiota with disease status. Are these covariates available from the original cohort (I would assume so since the authors say that even host genomes are available)? Importantly, if available, are these covariates matched between the different groups being compared here? If not matched, is the data adjusted for these covariates and how do the results change before/after this adjustment? Providing such information would be critical to readers to properly interpret the detected microbiota associations with IBD/depression.

IBD/depression.
The all-relevant vs. minimal-informative set of species: Please provide clarification on the text on whether the 24 species is a minimal, non-redundant set of species that has predictive power to classify the disease labels, or whether it includes all the relevant species that is associated with disease labels (or whether it is somewhere in-between in this spectrum). In other words, is any other species other than these 24 species associated with disease status? While a minimal set is sufficient to build a predictive model and simplifies further interpretation, the all-relevant set would be useful to understand the comprehensive role of all species and the overall mechanisms involved, as explained nicely in the Introduction of this paper on the Boruta feature selection package.
Based on the details provided on random forest based feature selection, the reported results may be closer to the all-relevant than a minimal-informative set, but the requirement of a species to be present in at least 2 models to be selected as an informative species is somewhat ad hoc (i.e., why not 3 or 4 models as a cutoff) and makes it unclear on whether all relevant species are selected. A more systematic way would be to assess the statistical significance of each species' association using a "wrapper method" around the random forest, such as the shuffling-based Boruta feature selection package (which is also used in the Saulnier et al. 2011 paper that the authors have cited). An alternative could be one of the methods in the paper "Statistical interpretation of machine learning-based feature importance scores for biomarker discovery" . 1.
bagging or feature bagging in the random forest? A more explicit explanation will make this comprehensible to more readers.
The supplementary figures and tables may have been mixed/corrupted. S. Figures 1 and 2 are described as corresponding to pathway analysis but are actually ROC curves. SF4 and 5 are supposed to be ROC curves but probably show pathway results. The resolution is too low to make out the text. The method of calculating the pathway abundances should also be described somewhere. Is it the total number of reads corresponding to genes in the pathway, does it depend on the species abundances or any other parameters?
Some of the ambiguity in the analysis may be removed by providing the code for any pre-processing, random forest analysis, feature selection, and pathway abundance analysis etc.

Interpretation and Conclusion
A machine learning algorithm can build an accurate prediction system, or generate hypotheses about the mechanisms at play or provide some other insight into the process. Here, I see two possible results of the ML analysis: The prediction accuracy can be a measure of the amount of information contained in the microbiome about the diseases. Alternatively, how predictive is the gut microbiome, and does this imply evidence for the causative effect of the microbiome on the disease? These would be comparatively harder claims to make, and would probably require a few more calculations.
The random forests are used to arrive at the most important features (bacterial species) affecting bowel disease and depression. I think this is the main claim/result of the analysis. In this case, how much more does information does ML give us compared to simply finding the species whose abundance is most different between the disease and non-disease states (in terms of fold-change or p-value). For the case of the multi-class problem, ANOVA can provide p-values for the non-random abundances in the different classes of patients. The article describes the results of such t-tests and ANOVA results. A sufficient and logical argument for the ML approach supported by any relevant calculations will strengthen the case for this analysis.
Overall, I feel the discussion of the possible role of some of the species and metabolic pathways etiology of the disease is the most interesting for biologists and clinicians. The article is important in this regard and further development of this discussion can only add to its strength.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Partly