Keywords
Height; Evolution;Polygenic Selection; Height
Height; Evolution;Polygenic Selection; Height
A recent GWAS (Wood et al., 2014) based on a very large sample (N=250K) identified common variants responsible for normal variation in human height within populations.
Over the last few years, researchers have started moving away from the study of genetic evolution using a single-gene, Mendelian approach towards models that examine many genes together (polygenic). The more genes are involved in a given phenotype, the more the signal of natural selection will be “diluted” across different genomic regions (because each gene accounts for a tiny effect) making it difficult to detect it using approaches focused on a single gene (Pritchard et al., 2010; Piffer, 2014). A first attempt at empirically identifying polygenic selection was made by Turchin et al., (2012) on two populations (Northern and Southern Europeans) and evidence for higher frequency of height increasing alleles (obtained from GWAS studies) among Northern Europeans was provided. A drawback of that study was the reliance on populations from a single continent and that crude pairwise comparisons (e.g. French vs. Italian) were used without correlating frequency differences to average population height. Moreover, the strength of selection was not determined.
Two different approaches to identify selection based on the correlation of allele frequencies across different populations have been recently developed by Piffer (2013) and Berg & Coop (2014).
Piffer’s method uses factor analysis of trait increasing alleles (found by GWA studies) as a tool for finding a factor that represent the strength of selection on a phenotype and the underlying genetic variation (Piffer, 2014a). An additional methodology consists of computing the correlation between genetic frequencies and the average phenotypes of different populations; then, the resulting correlation coefficients are correlated with the corresponding alleles’ genome-wide significance (p value). If the alleles contain selection signals, a positive correlation will be found, as alleles with high p value (more likely to be false positives) have a weaker correlation to average population phenotype (Piffer, 2014a).
Piffer’s method (Piffer, 2013; Piffer, 2014a) to identify signals of polygenic selection was used in this study and applied to the top five GWAS hits (ranked according to p value). Piffer (2014b) carried out a study on height SNPs but it was based on a smaller GWAS sample and an older version (phase 1) of the 1000 Genomes data, containing data for only 14 populations. This paper uses the phase 3 1000 Genomes data and the GWAS meta-analysis was carried out on a much larger sample size, which produces more hits with better significance. The aim of this paper is to test the hypothesis that stature has undergone natural or sexual selection in populations after humans dispersed in different continents giving rise to distinct genetic clusters.
Frequencies of alleles with a positive effect (height increasing) were obtained from 1000 Genomes (phase 3): http://browser.1000genomes.org/index.html comprising 26 populations belonging to five racial groups.
Average population height was obtained from the references listed at: http://en.wikipedia.org/wiki/Human_height, considering only statistics published after 2000 and young age groups (18–40). Only 11 populations met these criteria (see references in Table 1).
Population | Polygenic score (%) | Height (cm) | Reference (Height) |
---|---|---|---|
Afr.Car.Barbados | 48.94 | ||
US Blacks | 48.71 | 178.00 | McDowell et al., 2008 |
Esan Nigeria | 49.50 | ||
Gambian | 48.97 | ||
Luhya Kenya | 48.42 | ||
Mende Sierra Leo | 49.03 | ||
Yoruba | 48.52 | ||
Colombian | 46.05 | 170.60 | Meisel & Vega, 2004 |
Mexican LA | 46.95 | 170.6 | McDowell et al., 2008 |
Peruvian | 46.48 | ||
Puerto Rican | 46.79 | ||
Chinese Dai | 44.88 | ||
HanChineseBejing | 44.76 | 170.2 | Yang et al., 2005 |
HanChineseSouth | 45.70 | 170.2 | Yang et al., 2005 |
Japanese | 44.85 | 172.00 | Ministry of Ed., 2004 |
Vietnam | 44.76 | 165.70 | Hung & Park, 2008 |
UtahWhites | 47.62 | 178.9 | McDowell et al., 2008 |
Finns | 48.09 | 180.70 | National Institute for Health and Welfare, 2011 |
British | 46.80 | 177.80 | Moody (2013). Health Survey for England |
Spanish | 46.77 | ||
TuscanItaly | 47.11 | 177.00 | Cacciari et al., 2002* |
Bengali Banglade | 46.09 | ||
Gujarati Ind. Tx | 47.12 | ||
Indian Telegu UK | 47.62 | ||
Punjabi Pakistan | 47.21 | ||
SriLankanUK | 46.98 |
For each chromosome, the three alleles with the highest p values were selected, and these were all unlinked (>500Kb apart from each other). Only unlinked alleles were used to avoid the confounding influence of linkage on cross-population allele frequency. Selection was restricted only to the alleles with the highest significance because these are less likely to be false positives. The same number of SNPs (3) from each chromosome was used to get a representative sample of the entire genome, to avoid bias due to chromosome location. The conventional nominal p-value < 5×10-8 was used as significance threshold (Barsh et al., 2012).
A polygenic score was calculated as the mean frequency of height increasing alleles (defined as those with a positive Beta coefficient in the meta-analysis).
Analyses were carried out using R.
Polygenic scores and average country height are reported in Table 1. The Pearson correlation between polygenic score and average country height was r=0.83 (N=11, p=0.002). Table 2 reports average frequencies by sub-continental populations.
Continent | Polygenic score (%) |
---|---|
AFR | 47.69 |
AMR | 45.92 |
ASN | 45.52 |
EUR | 46.65 |
SAS | 46.549 |
Frequencies in descending order are: 1) Africans (AFR); 2) Europeans (EUR); 3) South Asians (SAS); 4) Latin Americans/Hispanics (AMR); 5) East Asians (ASN).
Spearman’s rank order correlation between each allele’s p value and its correlation with the polygenic score and with height were respectively -0.26 and -0.34 (N=66, p=0.037 and 0.0053). The “rcorr” and “cor” functions in R produced slightly different results due to differences in dealing with ties (equal values). “cor” produced slightly stronger coefficients (-0.28 and -0.37).
This provides evidence for the hypothesis that more significant GWAS hits (alleles) are enriched with natural selection signal. A similar phenomenon was observed in a previous analysis of genes affecting human height (Piffer, 2014b).
Factor analysis requires a satisfying cases to variable ratio, thus only a handful of SNPs could be used and these had necessarily to be those with the lowest p value, as they are more likely to be genuine hits (see previous section, MCV).
The top 5 alleles (i.e. those with the lowest p value) all correlated with the polygenic score and with average height in the expected direction (positively), as shown in Table 3 (see Dataset 2).The average correlations were 0.58 and 0.69, respectively, which is a significant improvement compared to the average of the correlations with polygenic score and height of all the 66 alleles (r=0.03 and 0.04, respectively; see Dataset 1, cells BP38–39).
(p value and r with polygenic (pol) score).
SNP | rs724016.G | rs1812175.G | rs42039.T | rs143384.G | rs8756.C |
GWAS p value | 3.2E-158 | 2.1E-86 | 3.8E-88 | 1.2E-121 | 4.5E-90 |
r with pol. score | 0.78 | 0.26 | 0.22 | 0.84 | 0.78 |
r with average pop. height | 0.62 | 0.75 | 0.75 | 0.49 | 0.88 |
A factor analysis using minimum residuals was carried out. A single factor was extracted that explained 42% of the variance. Factor loadings are displayed in Table 4. These are all positive (in the expected direction).
Standardized loadings (pattern matrix) based upon correlation matrix.
Gen.coordinate | SNP ID | Factor loading |
---|---|---|
142.588.260 (Chr.3) | rs724016.G | 0.62 |
145.794.294 (Chr.4) | rs1812175.G | 0.33 |
92.082.358 (Chr. 7) | rs42039.T | 0.62 |
33.489.170 (Chr.20) | rs143384.G | 0.48 |
64.646.019 (Chr.12) | rs8756.C | 1 |
Factor scores were extracted with the Thurstone method (Thurstone, 1947), and are reported in Table 5.
The Pearson correlation between average country height and the factor score was strongly positive (r=0.88, N=11, p=0.001). This factor was also significantly correlated to the polygenic score (r=0.78, N=26, p<0.001).
A polygenic score, created by averaging frequencies from 26 populations of 66 height increasing alleles by the largest and most recent human height GWAS, was positively correlated with the average height of 11 populations. The method of correlated vectors revealed that alleles with lower p values had a higher correlation with phenotypic height and polygenic score, suggesting that they tend to be enriched with signal of natural selection. A factor analysis of the top five GWAS hits produced a factor (whose loadings are all in the expected direction) which is significantly and strongly correlated both to population average height and to polygenic score. This showed an improvement over the correlation of the five single alleles with population height (Table 3, last row) which averaged 0.66, which in turn improved over the average correlation of the 66 alleles, which was near zero.
The rankings of polygenic scores match with the folk perception on the stature of various racial groups: Africans> Europeans> South/Central Asians> Hispanics> East Asians (Table 2).
South East Asians had the lowest scores, a result which matches with their anthropometric description.
Within Europe, northern Europeans (Finns and White Americans) had a higher genotypic stature than their southern counterparts (Italians and Spaniards), confirming the results from a previous study on GWAS loci which compared northern vs southern Europeans (Turchin et al., 2010).
A limitation was the unavailability of sound statistics on the average height of many populations. Moreover, although human height is largely heritable, it is also heavily influenced by nutrition and living conditions. The importance of environment is suggested by the dramatic secular trend which took place in the 20th century in developed countries (e.g. Arcaleni, 2006; Webb et al., 2008); an association with dietary intakes (i.e. milk consumption) and socioeconomic status has also been observed (Mamidi et al., 2011; Webb et al., 2008). Most of the missing data were for developing countries which likely have not reached their full growth potential or ethnic groups living in Western societies (Indian Telegu or Gujarati) for which anthropometric statistics are not easily available. If the allele frequency factor represents a genuine signal of natural selection, then the difference between it and current phenotypic height could be used as an indicator of the quality of diet and living conditions in general.
Factor analysis of allele frequencies is a promising method for detecting signals of recent selection on polygenic traits.
F1000Research: Dataset 1. Hits 1+2+3. 10.5256/f1000research.6002.d41833 (Piffer, 2014c).
F1000Research: Dataset 2. Method of correlated vectors (MCV). 10.5256/f1000research.6002.d41834 (Piffer, 2014d).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 3 (revision) 25 Jan 16 |
read | |
Version 2 (revision) 23 Dec 15 |
||
Version 1 16 Jan 15 |
read |
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)