Issues with data transformation in genome-wide association studies for phenotypic variability

Xia Shen; Lars Rönnegård

doi:10.12688/f1000research.2-200.v1

Home Browse Issues with data transformation in genome-wide association studies...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Correspondence

Issues with data transformation in genome-wide association studies for phenotypic variability

[version 1; peer review: 2 approved]

Xia Shen¹, Lars Rönnegård^2,3

PUBLISHED 02 Oct 2013

Author details Author details

¹ Division of Computational Genetics, Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, SE-750 07, Sweden
² Statistics, School of Technology and Business Studies, Dalarna University, Falun, SE-791 88, Sweden
³ Division of Quantitative Genetics, Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, SE-750 07, Sweden

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

The purpose of this correspondence is to discuss and clarify a few points about data transformation used in genome-wide association studies, especially for phenotypic variability. By commenting on the recent publication by Sun et al. in the American Journal of Human Genetics, we emphasize the importance of statistical power in detecting functional loci and the real meaning of the scale of the phenotype in practice.

Corresponding author: Xia Shen

Competing interests: The authors declare no competing interest.

Grant information: XS is funded by a Future Research Leaders grant from Swedish Foundation for Strategic Research (SSF) to Örjan Carlborg. LR is funded by the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2013 Shen X and Rönnegård L. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

How to cite: Shen X and Rönnegård L. Issues with data transformation in genome-wide association studies for phenotypic variability [version 1; peer review: 2 approved]. F1000Research 2013, 2:200 (https://doi.org/10.12688/f1000research.2-200.v1) First published: 02 Oct 2013, 2:200 (https://doi.org/10.12688/f1000research.2-200.v1) Latest published: 02 Oct 2013, 2:200 (https://doi.org/10.12688/f1000research.2-200.v1)

Correspondence

Recently, Sun et al.¹ raised an interesting suggestion concerning the use of variance-stabilization transformations in genome-wide association studies (GWAS) for phenotypic variability. Specifically, Sun et al. revisited Yang et al.’s² results on the variability-controlling locus FTO for human body mass index (BMI) and claimed that the underlying variability across genotypes might not be as large as Yang et al. had seen. Although it was an important point that Sun et al. discussed, especially when quantitatively studying phenotypic variability has become such a hot topic, it is our opinion that there are some issues with the transformation approach that Sun et al. proposed.

First of all, if we take Sun et al.’s transformation according to Yang et al.’s phenotypic mean and variance per FTO genotype class, i.e. a one-to-one map through an inverse hyperbolic sine function, the BMI scale will become rather different compared with the ordinary measurement that we normally use (Figure 1). On the transformed scale of BMI, the difference between two persons who have a BMI of 24 and 25 kg/m² is much larger than that between two BMIs of 20 and 21 kg/m², which is strange in reality since the original BMI scale is what we commonly use and also what we care about. Sun et al.’s main argument here is that nearly all the measurement units are manmade. However, considering one of the traits of most interest, e.g. height, why should we regard the difference between 160cm and 170cm different from 170cm and 180cm? Although the definitions of most units can be arbitrary, some measurement scales do have meaning in real life.

Figure 1. Comparison of the original scale of body mass index (BMI) and the transformed scale using Sun et al.’s¹ transformation.

The transformation was determined by the phenotypic distribution across FTO genotypes reported by Yang et al.².

Secondly, a key problem with Sun et al.’s transformation in practice is that such a transformation is marker-specific. Namely, when performing a GWAS, one needs to transform the phenotypic records differently for different markers, according to the phenotypic distribution across the genotypes per marker. This does not make much sense in practical analyses, because if there is a "best" scale of the phenotype, it should be used for all the markers across the genome, before testing the association between the phenotype and the markers. Using the tested marker to determine the transformation of the phenotype is strange. If a marker-specific transformation can be estimated, one should estimate a genome-specific transformation for GWAS, instead of doing different transformations marker-by-marker.

Thirdly, if the transformation of the phenotype is determined by one marker showing a significant effect on the phenotypic variability before testing the other markers, another significant effect on the phenotypic variability might be created due to such a transformation. In such a situation, it is problematic to decide which phenotypic scale we should choose.

Fourthly, several recent studies discussed that gene-gene or gene-environment interactions could cause significant variance heterogeneity across genotypes^3–6, which makes testing variance-controlling loci a powerful tool to reveal potential interaction effects. Reducing the difference in variance across genotypes using a marker-specific variance-stabilization transformation would dramatically reduce such power. Regarding the biological sense of genetically regulated variance heterogeneity, empirical evidence has shown that a single causal locus could show a much higher significant effect on variance compared with the mean⁶. In a particular population, such a locus may only be mappable through testing the variability rather than the magnitude of the phenotype.

The above issues cause us to question Sun et al.’s transformation in practice. The scale of the phenotype is certainly an important concern when interpreting an effect on phenotypic variability⁷. However, one needs to be careful for the points above before applying any transformation on the data. In particular, the statistical power in detecting functional loci and the real meaning of the scale used should be emphasized.

Author contributions

XS and LR initiated the study. XS performed the analysis. Both authors contributed to writing the report.

Competing interests

The authors declare no competing interest.

Grant information

XS is funded by a Future Research Leaders grant from Swedish Foundation for Strategic Research (SSF) to Örjan Carlborg. LR is funded by the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS).

Faculty Opinions recommended

References

1. Sun X, Elston R, Morris N, et al.: What Is the Significance of Difference in Phenotypic Variability across SNP Genotypes? Am J Hum Genet. 2013; 93(2): 390–397. PubMed Abstract | Publisher Full Text | Free Full Text
2. Yang J, Loos RJ, Powell JE, et al.: FTO genotype is associated with phenotypic variability of body mass index. Nature. 2012; 490(7419): 267–272. PubMed Abstract | Publisher Full Text | Free Full Text
3. Rönnegård L, Valdar W: Detecting major genetic loci controlling phenotypic variability in experimental crosses. Genetics. 2011; 188(2): 435–447. PubMed Abstract | Publisher Full Text | Free Full Text
4. Struchalin MV, Dehghan A, Witteman JC, et al.: Variance heterogeneity analysis for detection of potentially interacting genetic loci: method and its limitations. BMC Genet. 2010; 11: 92. PubMed Abstract | Publisher Full Text | Free Full Text
5. Paré G, Cook NR, Ridker PM, et al.: On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women's Genome Health Study. PLoS Genet. 2010; 6(6): e1000981. PubMed Abstract | Publisher Full Text | Free Full Text
6. Shen X, Pettersson M, Rönnegård L, et al.: Inheritance Beyond Plain Heritability: Variance-Controlling Genes in Arabidopsis thaliana. PLoS Genet. 2012; 8(8): e1002839. PubMed Abstract | Publisher Full Text | Free Full Text
7. Rönnegård L, Valdar W: Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability. BMC Genet. 2012; 13: 63. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 02 Oct 2013

Author details Author details

¹ Division of Computational Genetics, Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, SE-750 07, Sweden
² Statistics, School of Technology and Business Studies, Dalarna University, Falun, SE-791 88, Sweden
³ Division of Quantitative Genetics, Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, SE-750 07, Sweden

Competing interests

The authors declare no competing interest.

Grant information

XS is funded by a Future Research Leaders grant from Swedish Foundation for Strategic Research (SSF) to Örjan Carlborg. LR is funded by the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 02 Oct 2013, 2:200

https://doi.org/10.12688/f1000research.2-200.v1

Copyright

© 2013 Shen X and Rönnegård L. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Shen X and Rönnegård L. Issues with data transformation in genome-wide association studies for phenotypic variability [version 1; peer review: 2 approved]. F1000Research 2013, 2:200 (https://doi.org/10.12688/f1000research.2-200.v1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 02 Oct 2013

Views

41

Reviewer Report 01 Nov 2013

Yurii Aulchenko, Institute of Cytology and Genetics, Siberian Division of the Russian Academy of Sciences, Novosibirsk, Russian Federation

Yakov Tsepilov, Novosibirsk State University, Novosibirsk, Russian Federation

Sodbo Sharapov, Novosibirsk State University, Novosibirsk, Russian Federation

Approved

https://doi.org/10.5256/f1000research.2505.r1948

We agree with criticism raised by Shen and Ronnegard in their points 2 and 3 concerning the application of the transformation of Sun et al. in the context of whole-genome scans. Indeed, applying this transformation in SNP-specific manner is difficult ... Continue reading

We agree with criticism raised by Shen and Ronnegard in their points 2 and 3 concerning the application of the transformation of Sun et al. in the context of whole-genome scans. Indeed, applying this transformation in SNP-specific manner is difficult to adopt conceptually. Sun et al. rightly suggest that “the scales on which we measure interval-scale quantitative traits are man-made and have little intrinsic biological relevance”, but the underlying intrinsic scale, and the function reflecting this scale into the observed, is likely to be unique and does not change with SNP. In that, the transformation applied to a trait should not change through the markers studied. Practically, this is not very difficult to implement, and as a simplest option one could think of the estimation of Sun’s transformation parameters from upper, middle and lower tertiles of the total phenotypic distribution. A more general approach (without restricting the data into three groups, but modelling the variance as a function of the mean) should be straightforward to implement.

We also understand the reasoning behind the Shen and Ronnegard’s points 1 and 4, but here we are less certain that the problem raised could be easily addressed. Specifically, one could argue with point 1 (“why should we regard the difference between 160cm and 170cm different from 170cm and 180cm?”): it is not that hard to imagine a biologically relevant model in which same changes of an “intrinsic scale” lead to different changes on the observed scale as the mean advances (an example would be Michaelis–Menten kinetics). Also both points 1 and 4 (losing power after transformation) relate not only to Sun et al.’s transformation, but to almost any transformation in wide use (e.g. Log, Box-Cox, Gaussenization/inverse-normal). While it is true that analysis of transformed trait may lead to reduced power (and specifically in case of Sun’s transformation applied in marker-specific manner to the analysis of variance heterogeneity it should), we have a feeling that one still would like to check whether the variance heterogeneity found can be modeled as a function of the mean (in which case any SNP affecting the mean is likely to show “control” of the variance as well).

Finally, we fully agree with comment of William Hill and Ian White who criticize Sun et al.'s statement that “‘In the absence of genotypic mean differences, we can hardly infer that differences in variances are per se of biological interest”. We think that the differences in variance per se are biologically and genetically plausible and interesting.

Competing Interests: No competing interests were disclosed.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

33

Reviewer Report 07 Oct 2013

William G. Hill, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK

Ian White, University of Edinburgh

Approved

https://doi.org/10.5256/f1000research.2505.r1949

Shen and Rönnegård (SR) comment critically and succinctly on the paper by Sun et al. published in AJHG which advocates that, before any claim of differences in variance among genotypes in a GWAS or similar study, a check should first ... Continue reading

Shen and Rönnegård (SR) comment critically and succinctly on the paper by Sun et al. published in AJHG which advocates that, before any claim of differences in variance among genotypes in a GWAS or similar study, a check should first be made whether these can be removed by a monotonic transformation. Each of SR’s four criticisms seems well justified.

As 105 or more SNPs may be fitted in a GWAS study, what biological interpretation could be given to that number of different transformations or even on a limited subset of loci showing possible variance differences? If some loci give signals of mean but not variance difference, should these then be transformed to eliminate the scale effect on mean and perhaps reveal variance differences? Any concept of an original scale of measurement is lost, as SR point out. It is not obvious why the mere existence of a transformation designed to minimise differences in variance should prevent discussion of variance heterogeneity on the chosen scale. Equivalently, if we considered means of the three genotypes at the locus rather than just average effects, would our ability to transform the data at each locus such that heterozygotes were intermediate imply there was no dominance, or only that it was on a particular scale?

On a further point. Sun et al. (p395) comment: ‘In the absence of genotypic mean differences, we can hardly infer that differences in variances are per se of biological interest.’ That is to take too narrow a view: the mean and phenotypic variance (or CV) of a quantitative trait in any species take typical values, e.g. the CV for adult human height is ca. 4% and for BMI ca. 16% . There is direct evidence of genetic differences within species in environmental variance, with GWAS and other single gene studies, that cannot be removed by scale, so the level of the environmental variance is subject to evolutionary forces (e.g. Hill & Mulder 2010 Genet. Res. 92:381). To view variance as a biological phenomenon which is just some adjunct to the mean seems simplistic, as SR argue. Indeed one has to ask whether scale transformations have value unless there is a biological basis, such as a log transformation to account for multiplicative genetic effects; but that must then apply across all loci.

Competing Interests: No competing interests were disclosed.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 02 Oct 2013

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 02 Oct 13	read	read

William G. Hill, University of Edinburgh, Edinburgh, UK

Ian White, University of Edinburgh
Yurii Aulchenko, Siberian Division of the Russian Academy of Sciences, Novosibirsk, Russian Federation

Sodbo Sharapov, Novosibirsk State University, Novosibirsk, Russian Federation

Yakov Tsepilov, Novosibirsk State University, Novosibirsk, Russian Federation

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

41 Views

01 Nov 2013 | for Version 1

Yurii Aulchenko, Institute of Cytology and Genetics, Siberian Division of the Russian Academy of Sciences, Novosibirsk, Russian Federation

Yakov Tsepilov, Novosibirsk State University, Novosibirsk, Russian Federation

Sodbo Sharapov, Novosibirsk State University, Novosibirsk, Russian Federation

41 Views Cite this report Responses(0)

Approved

We agree with criticism raised by Shen and Ronnegard in their points 2 and 3 concerning the application of the transformation of Sun et al. in the context of whole-genome scans. Indeed, applying this transformation in SNP-specific manner is difficult to adopt conceptually. Sun et al. rightly suggest that “the scales on which we measure interval-scale quantitative traits are man-made and have little intrinsic biological relevance”, but the underlying intrinsic scale, and the function reflecting this scale into the observed, is likely to be unique and does not change with SNP. In that, the transformation applied to a trait should not change through the markers studied. Practically, this is not very difficult to implement, and as a simplest option one could think of the estimation of Sun’s transformation parameters from upper, middle and lower tertiles of the total phenotypic distribution. A more general approach (without restricting the data into three groups, but modelling the variance as a function of the mean) should be straightforward to implement.

We also understand the reasoning behind the Shen and Ronnegard’s points 1 and 4, but here we are less certain that the problem raised could be easily addressed. Specifically, one could argue with point 1 (“why should we regard the difference between 160cm and 170cm different from 170cm and 180cm?”): it is not that hard to imagine a biologically relevant model in which same changes of an “intrinsic scale” lead to different changes on the observed scale as the mean advances (an example would be Michaelis–Menten kinetics). Also both points 1 and 4 (losing power after transformation) relate not only to Sun et al.’s transformation, but to almost any transformation in wide use (e.g. Log, Box-Cox, Gaussenization/inverse-normal). While it is true that analysis of transformed trait may lead to reduced power (and specifically in case of Sun’s transformation applied in marker-specific manner to the analysis of variance heterogeneity it should), we have a feeling that one still would like to check whether the variance heterogeneity found can be modeled as a function of the mean (in which case any SNP affecting the mean is likely to show “control” of the variance as well).

Finally, we fully agree with comment of William Hill and Ian White who criticize Sun et al.'s statement that “‘In the absence of genotypic mean differences, we can hardly infer that differences in variances are per se of biological interest”. We think that the differences in variance per se are biologically and genetically plausible and interesting.

Competing Interests

No competing interests were disclosed.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

33 Views

07 Oct 2013 | for Version 1

William G. Hill, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK

Ian White, University of Edinburgh

33 Views Cite this report Responses(0)

Approved

Shen and Rönnegård (SR) comment critically and succinctly on the paper by Sun et al. published in AJHG which advocates that, before any claim of differences in variance among genotypes in a GWAS or similar study, a check should first be made whether these can be removed by a monotonic transformation. Each of SR’s four criticisms seems well justified.

As 105 or more SNPs may be fitted in a GWAS study, what biological interpretation could be given to that number of different transformations or even on a limited subset of loci showing possible variance differences? If some loci give signals of mean but not variance difference, should these then be transformed to eliminate the scale effect on mean and perhaps reveal variance differences? Any concept of an original scale of measurement is lost, as SR point out. It is not obvious why the mere existence of a transformation designed to minimise differences in variance should prevent discussion of variance heterogeneity on the chosen scale. Equivalently, if we considered means of the three genotypes at the locus rather than just average effects, would our ability to transform the data at each locus such that heterozygotes were intermediate imply there was no dominance, or only that it was on a particular scale?

On a further point. Sun et al. (p395) comment: ‘In the absence of genotypic mean differences, we can hardly infer that differences in variances are per se of biological interest.’ That is to take too narrow a view: the mean and phenotypic variance (or CV) of a quantitative trait in any species take typical values, e.g. the CV for adult human height is ca. 4% and for BMI ca. 16% . There is direct evidence of genetic differences within species in environmental variance, with GWAS and other single gene studies, that cannot be removed by scale, so the level of the environmental variance is subject to evolutionary forces (e.g. Hill & Mulder 2010 Genet. Res. 92:381). To view variance as a biological phenomenon which is just some adjunct to the mean seems simplistic, as SR argue. Indeed one has to ask whether scale transformations have value unless there is a biological basis, such as a log transformation to account for multiplicative genetic effects; but that must then apply across all loci.

Competing Interests

No competing interests were disclosed.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

[1] 1. Sun X, Elston R, Morris N, et al.: What Is the Significance of Difference in Phenotypic Variability across SNP Genotypes? Am J Hum Genet. 2013; 93(2): 390–397. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Yang J, Loos RJ, Powell JE, et al.: FTO genotype is associated with phenotypic variability of body mass index. Nature. 2012; 490(7419): 267–272. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Rönnegård L, Valdar W: Detecting major genetic loci controlling phenotypic variability in experimental crosses. Genetics. 2011; 188(2): 435–447. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Struchalin MV, Dehghan A, Witteman JC, et al.: Variance heterogeneity analysis for detection of potentially interacting genetic loci: method and its limitations. BMC Genet. 2010; 11: 92. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Paré G, Cook NR, Ridker PM, et al.: On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women's Genome Health Study. PLoS Genet. 2010; 6(6): e1000981. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Shen X, Pettersson M, Rönnegård L, et al.: Inheritance Beyond Plain Heritability: Variance-Controlling Genes in Arabidopsis thaliana. PLoS Genet. 2012; 8(8): e1002839. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Rönnegård L, Valdar W: Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability. BMC Genet. 2012; 13: 63. PubMed Abstract | Publisher Full Text | Free Full Text

Issues with data transformation in genome-wide association studies for phenotypic variability

Abstract

Correspondence

Figure 1. Comparison of the original scale of body mass index (BMI) and the transformed scale using Sun et al.’s1 transformation.

Author contributions

Competing interests

Grant information

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Figure 1. Comparison of the original scale of body mass index (BMI) and the transformed scale using Sun et al.’s¹ transformation.