Summary:
The purpose of this paper is to leverage publicly-available data to investigate the association between chromatin state and Huntington’s Disease (HD). The authors do this by identifying genes that are differentially expressed in individuals with HD relative to healthy individuals and identifying the locations of these genes in the genome and the biological processes associated with these genes. They find that many of these genes’ promoters are in the active chromatin state in healthy individuals and in CpG islands. They also find that many of these genes are related to biological processes related to HD and that some are in chromatin modification biological processes. Although this study suggests that there may be an association between chromatin state and HD, the nature of that association remains unclear.
Major Comments:
- I appreciate how the authors integrated existing literature with differential gene expression results to prioritize biological processes, diseases, genes, and molecular functions. In addition, defining the similarity between concepts based on the number of shared concepts is similar to approaches that have been used for community detection in social networks (Blondel et al., Journal of Statistical Mechanics: Theory and Experiment, 2008) and, more recently, for clustering cells based on protein expression (Levine et al., Cell, 2015) (I do not think that the authors need to cite these papers), so I am not surprised that it worked well. I hope that the author’s use of this approach will inspire others to use such methods for comparing biological concepts in literature and encourage future researchers to directly integrate literature with differential gene expression.
- I found many of the results difficult to interpret because the authors seem to have done all of the analyses on the set of all differentially expressed genes. My expectations for up-regulated genes are different from those for down-regulated genes. In the Minor Comments, I point out specific analyses for which I think that separating the genes based on the direction of the differential expression would be helpful. If the authors did use only down-regulated or only up-regulated genes, it would be great if they could make this clear in the Methods section and include what fold-change cutoff they used.
- I thought that some of the claims in the Discussion section were not well-supported by the results. I have pointed out what these are in the Minor Comments. Most concerns come from the lack of separation between down-regulated and up-regulated differentially expressed genes in the analyses in this paper.
- Although there is no chromatin state data from anywhere in the brain in HD individuals, there are H3K27ac and PolII datasets in the striatum of HD and control mice (Achour et al., Human Molecular Genetics, 2015). This paper would be more convincing if it included a comparison between the differentially-expressed genes in mouse HD individuals versus controls and the differential H3K27ac regions from this dataset.
- I found much of the Methods section difficult to follow. In the Minor Comments, I point out specific parts that I think should be re-ordered and specific details that I think should be added to make the Methods clearer. The authors should also include the exact version and settings that they used for every publicly available software package so that others can reproduce the results.
Minor Comments:
Introduction:
Page 3: Although the authors clearly describe literature suggesting that epigenetic mechanisms may be involved in HD, there is also some evidence against the role of epigenetics in HD. For example, a recent study profiled methylation in the cortex of HD individuals and controls using the Illumina HumanMethylation450K BeadChip array and found that there are no significantly differentially-methylated regions between HD individuals and controls (De Souza
et al.,
Human Molecular Genetics, 2016). The authors should cite this paper and explain why it does not demonstrate that epigenetics is not involved in HD (the assay used was not genome-wide, methylation is not the only component of transcriptional regulation, etc.).
Page 3: It is not clear to me why transcriptional dysregulation in HD would be associated with differentially expressed genes in regions that are not normally associated with active chromatin states. My understanding from the literature cited in the introduction is that many of the genes that are differentially expressed in HD individuals have
lower expression in HD individuals than they do in control individuals. I would therefore expect that these genes would fall in regions that are normally associated with active chromatin states but may not be associated with active chromatin states in individuals with HD. It would be great if the authors could clarify the motivation behind this hypothesis.
Page 3: At the end of the introduction, the computational method is introduced as an approach to intelligently select which experiments to do. However, I was not sure from the introduction what types of experiments this method is designed to guide. It would be great if the authors could add a more detailed explanation of this earlier in the manuscript.
Methods:
Page 3: It would be easier to understand the advantages of the vector space model if they were listed after the description of the vector space model instead of before it.
Page 3: It would be helpful if the authors could describe how the subset of PubMed records are selected for genes or cite a previous paper that uses the same method that they used.
Page 3: It would be helpful if the authors could define “symmetric uncertainty coefficient.”
Page 4: It would be helpful if the authors could list exactly what publicly available datasets were used for each differential expression test before describing the differential expression test.
Page 4: It would be helpful if the authors could state what microarrays were used to generate the gene expression data before describing how the differential expression analysis was done.
Page 4: It seems like the authors did not account for potential confounding factors that were available, such as sex, age, and brain tissue, in the differential expression analysis. I am concerned that these confounding factors may affect the results.
Page 4: The authors state which human and mouse assemblies they used, but they had not previously stated that their analysis included data from mouse. It would be helpful if the authors could state exactly what species they are using for each part of their analysis earlier in the manuscript.
Page 4: The authors state that they used a Kolmogorov-Smirnov test to compare p-value distributions. It was not clear to me where these p-values come from. Are they the p-values for differential expression of the genes corresponding to the promoters? It would be helpful if the authors could clarify this.
Page 4: It was not clear how many Kolmogorov-Smirnov tests were done. The authors said that they rejected the null hypothesis if the p-value was < 0.05. If they did more than one test, then they should do multiple hypothesis correction.
Page 4: It would be helpful if the authors could clarify the purpose of the concept ids.
Page 4: It would be helpful if the authors could provide a more detailed explanation of how the concept linking is done.
Page 5: It would be helpful if the authors could define the HTT concept and explain why they used it for prioritizing genes.
Page 5: It would be helpful if the authors could explain why they decided to prioritize genes based on the “epigene” concept. It seems like the authors are interested in genes that affect epigenetics, such as demethylases or histone modifiers. It is not clear to me how this selection relates to the hypothesis that was described in the introduction.
Page 5: It would be helpful if the authors could clarify exactly what differential expression tests were done with the human brain data and what the categories were for each test.
Page 5: It would be helpful if the authors could clarify whether the human brain data described here was the only human data used for differential expression analysis and, if it was not, what other data was used.
Page 5: It would be helpful if the authors could briefly describe how the CpG island data fits into the rest of the analysis.
Page 5: It would be helpful if the authors could explain why they selected the two cell types and four chromatin states that they used in the Methods section.
Page 5: I think that it might make sense to incorporate additional chromatin states, such as quiescent, weak repressed Polycomb, and enhancer, as strong repression is not always the cause of a promoter’s inactivity.
Page 5: It would be helpful if the authors could clarify why they used only the mouse data from animals treated with the vehicle. My intuition is that it would make more sense to use the animals that did not receive the HDACi 4b inhibitor since the human subjects did not receive any kind of treatment. It is possible that I misunderstood the purpose of the mouse analysis.
Results:
Page 6: It is not clear to me why a difference in distribution of expression levels between genes overlapping a chromatin state and genes not overlapping that chromatin state implies that chromatin state has an effect on HD. I think that the authors mean that, if the
difference in gene expression between individuals with and without HD is higher for genes overlapping a specific chromatin state than overlapping other chromatin states, then there is an
association between the chromatin state and HD.
Page 6: It would be helpful to split Figure 1 into two parts, one for genes that have higher expression in people with HD and another for genes that have lower expression for people with HD. My intuition is that most of the differences in p-value distribution are coming from the second category because, since the chromatin state data comes from people without HD, I would expect that genes in an active chromatin state would have higher expression in healthy individuals. Adding onto that, regions of closed chromatin cannot decrease because the genes are not expressed. Regions of open chromatin could either increase or decrease, potentially leading to more variability.
Page 6: It would be helpful to have a supplemental figure with all chromatin states because it is not clear from Figure 1 if the differences occur for TSS’s in all active chromatin states (including inactive genes that are acting as enhancers for other genes) or only from genes that are transcribed.
Page 6: It would be helpful if the authors could clarify if the overlaps in Figure 1 are done using the entire gene, only the TSS, or the gene’s promoter.
Page 7: For the biological process analyses, I think that using a tool for differential enrichment between the two groups of genes would provide more interpretable results than comparing the top hits from CPA because such a tool looks for terms that are significantly enriched in one gene set relative to another. An example of such as tool is CompGO (Waardenberg
et al.,
BMC Bioinformatics, 2015).
Page 8: It would be helpful if the authors clarified what they mean by “top novel protein.” Does novel mean that the gene had not been associated with HD in a previous paper?
Page 8: It was not clear why Figure 3 shows that CPA is able to prioritize true associations with huntington as measured by a gene expression experiment and why combining differential expression measurements and literature evidence enables the selection of even more specific HD signatures. It would be great if the authors could clarify this.
Page 8: It would be helpful if the authors could include the direction of the CPA score shifts for the different groups of differentially expressed genes.
Page 8: The authors say that “the top 100 and top 1000 differ significantly.” It would be helpful if they stated the way in which these gene sets differ.
Page 11: It would be helpful if the authors could clarify what x is in Figure 3.
Discussion:
Page 11: I am not sure if the paper provides a lack of evidence for genome-wide re-localization of gene activity to repressed chromatin states. The paper combined all of the up-regulated and down-regulated genes instead of separating them. If the paper had shown that the genes that are up-regulated in people with HD are not found in repressive chromatin states in healthy individuals, then I would be more convinced of this lack of re-localization. However, I would not be fully convinced because changes in chromatin state do not always cause changes in gene expression. For example, a previous study showed that most single nucleotide polymorphisms associated with histone modifications are not associated with transcription, suggesting that histone modification differences between individuals do not always correspond to gene expression differences (Grubert
et al.,
Cell, 2015). Thus, it is possible that there are chromatin state differences between HD individuals and controls in parts of the genome where there are no differentially-expressed genes.
Page 11: The authors suggest that HD is not associated with the disruption of chromatin states at a large scale. To investigate the association of HD with chromatin state using existing data, the authors would need determine if genes that are up-regulated in people with HD tend to fall in repressive chromatin states and if those that are down-regulated in people with HD tend to fall in active chromatin states. Because there do not seem to be separate evaluations of up-regulated and down-regulated genes, I do not think that the results in this paper can be used to evaluate the relationship between chromatin state disruptions and HD.
Page 12: I think that CPA’s high ranking of chromatin-related concepts for differentially expressed genes suggests an association between chromatin reorganization and HD. If differentially expressed genes near CpG islands include genes involved in chromatin structure, that suggests that there is
cis-regulatory change in the regulation of those genes, which could have a downstream effect on chromatin organization.
Page 12: Although the paper shows that there are more differentially expressed genes in the active chromatin state in healthy individuals, I am not sure that there is sufficient evidence to conclude that most important changes in HD are occurring in the active chromatin state. For example, if the majority of differentially expressed genes are down-regulated in individuals with HD, then the findings in this paper would match my expectations, even if the most important differentially-expressed genes are up-regulated and are not found in the active chromatin state in healthy individuals.
Supplemental Datasets
Dataset 1: Some of the line breaks seem to be missing.
Dataset 2: The column breaks seem to be missing.
Dataset 8: The column breaks seem to be missing for the top 100 differentially expressed genes.
Supplementary File 1: The first word in the first figure caption seems like it should be “Illustration.”
Supplementary File 1: It would be great if the authors could clarify what they mean by “x2.”
Supplementary File 1: It would be great if the authors could explain why they are using the HMEC and NHEK cell lines.
Is the work clearly and accurately presented and does it cite currently literature?
The authors seem to clearly describe what they do and cite most of the relevant literature. However, as I mentioned in the fifth major comment, I found parts of the Methods section difficult to follow.
Is the study design appropriate and is the work technically sound?
As I mentioned in the first major comment, I do not think that the authors can test their hypothesis with their study design because they combine the up-regulated and down-regulated genes.
Are sufficient details of methods and analysis provided to allow replication by others?
The authors provide publicly available workflows for almost everything they did. However, as I mentioned in my fifth major comment, the lack of clarity in parts of the Methods section might make reproducing some of the results difficult.
If applicable, is the statistical analysis and its interpretation appropriate?
Most of the statistical analysis seems appropriate, but most of the interpretation does not make sense because the up-regulated and down-regulated genes were combined.
Are all the source data underlying the results available to ensure full reproducibility?
Yes.
Are the conclusions drawn adequately supported by the results?
As I mentioned in my second major comment, I think that many of the conclusions are not supported by the results.
No competing interests were disclosed.
Comments on this article Comments (0)