Sequence variation at 8q24.21 and risk of back pain [version 1; peer review: 1 approved]

Back pain (BP) is a common condition of major social importance and poorly understood pathogenesis. Intervertebral lumbar disc degeneration in all its guises is one of the major biological risk factors for BP. Previously, we identified the locus at 8q24.21 associated with chronic BP, which has been found elsewhere associated with sciatica after surgery for lumbar disc herniation. In the current study we used co-localisation methods to identify the gene most likely to harbor the causal variant. We show that the same functional variant at the 8q24.21 locus is responsible for both lumbar disc degeneration and BP, and we also studied the effects of this locus on related phenotypes. Our results link the locus to intervertebral disc and bone mineral density, but not to anthropometric measurements, thus corroborating the epidemiological evidence. Moreover, the same functional variant at the locus is more likely to affect the expression of the nearby FAM49B gene, rather than the GSDMC gene, which was previously proposed as a causative one for BP.


Introduction
Back pain (BP) is a common debilitating condition with a lifetime prevalence of 40% and a major socioeconomic impact 1 . According to the Global Burden of Disease 2016 study, it leads the list of disabling conditions in many parts of the world 2 . The greatest biological risk for episodes of severe BP is thought to be lumbar disc degeneration (LDD) 3 . Rather than a discrete disease, LDD is now considered a continuous lifelong aging process that usually starts in the third decade of life but may also be seen in children 4 . It is highly heritable 5 , as well as being influenced by smoking 6 , age 7 and body mass index 8 . The overall burden of lumbar disc degeneration on magnetic resonance imaging is a strong predictor of BP episodes 9-11 . There is a clear genetic predisposition to both BP and LDD with estimates of heritability in the range of 30%-70% 12-15 . Also, these traits have been shown to share genetic risk factors 16 . However, little is known about the genetic architecture of BP and only three associated genetic loci have been identified 17,18 .
This work was focused on the locus at chromosome 8 (8q24.21), which was identified in our previous BP studies 17,18 . The same locus has been reported associated with having lumbar disc herniation (LDH) in Icelanders undergoing surgery for sciatica 19 . Bjornsdottir et al. additionally reported an association of this region to height, but concluded that the region does not influence LDH through height. They noted that the most highly associated SNP, rs6651255, was linked to expression of the nearby gasdermin (GSDMC) gene in GTEx data 20 . Further, the immune-related function of gasdermins implied that the inflammatory and immunological consequences of disk herniation may be linked to sciatica symptom severity. However, closely located SNPs associated with different traits do not always reflect pleiotropy 21 . Furthermore, in the case of eQTL studies, causal hypotheses generated through singlevariant look up are highly likely to be false positives 21 . Consequently, a variety of co-localization methods have been introduced 22-25 to distinguish the two common scenarios: pleiotropy (the same functional genetic variant is associated with both traits) and linkage (two different functional genetic variants, each associated with a different trait, that are in linkage disequilibrium).
The aim of this study was to (1) verify whether the same functional variant at 8q24.21 is responsible for BP and LDH and (2) to study the effects of this variation on related phenotypes. As a result, we offer an alternative explanation to the assertion by Bjornsdottir et al. 17 regarding the role of immunity and inflammation in generating symptoms of sciatica and BP due to variation in expression at the locus in between genes CCDC26 and GSDMC.

GWAS data
We used the BP GWAS performed in all genetically confirmed white British UK Biobank participants (N= 453,862) as target data for future analysis 17 . We used the summary level-data of LDH GWAS from the original paper of deCODE group 19 . They do not provide full GWAS scan, only the statistics for the significant SNPs from the 8q24.21 region. We used information about 40 SNPs from this locus for which information about effect size, effective and reference alleles and p-value was available (Supplementary Table 1,3 of the work 19 ).

SMR-HEIDI analysis
Summary-level mendelian randomization and heterogeneity in dependent instruments (SMR-HEIDI) 24 approach was used to test for potential pleiotropic effects of the 8q24.21 region on back pain (BP) and LDH and other complex traits including gene expression levels in certain tissues. SMR analysis provides evidence for pleiotropy, but is unable to define whether both traits are affected by the same underlying causal polymorphism. The latter is specified by a HEIDI test that distinguishes pleiotropy from linkage. Zhang F. and colleagues developed the software SMR tool implementing the SMR-HEIDI methods 24 .
SMR tool is suitable to analyze pleiotropic association using summary-level data from GWAS and eQTL studies. We conducted two groups of analysis: target trait vs eQTL and target trait vs other complex traits. Summary-level data from BP GWAS 17 was used as target (--gwas-summary option) in both cases. For eQTL analysis we used Westra Blood eQTL data (converted to suitable BSED format by developers) and GTEx v7, converted using --make-besd option (smr -eqtl-flist Gtex_tissue_1.txt --make-besd -out Gtex_tissue_1, where Gtex_tissue_1.txt file correspond to summary-level data of selected tissue). In order to apply SMR tool for complex traits we selected the region +-250kb around the target SNP from each complex trait GWAS. After that, we add new columns Probe_ID with trait_ name to corresponded SNP subset. Finally, we merge these SNP subsets into one file and converted it in BESD format using --make-besd option.
In both cases we set the minor allele frequency threshold as 0.03 and included in the HEIDI test SNPs inside +-250kb window centered around the target SNP; other parameters used default settings. The LD matrix for the HEIDI test was estimated using PLINK 1.9 software and individual data of white UK Biobank participants, the subset of individuals used to calculate BP GWAS (N=10,000).
Nominal P for SMR test was set at 6e-4 (0.05/89, where 89 is the total number of probes for the gene expression analysis) and 2e-5 (0.05/2,453, where 2,453 is the total number of complex traits used in analysis). For HEIDI analysis we used a conservative threshold of P = 0.05 (P < 0.05 corresponds to the rejection of pleiotropy hypothesis).

Results and discussion
We had previously performed the largest-to-date GWAS for chronic BP using the UK Biobank and CHARGE consortium cohorts, with total of 119384 cases and 334478 controls 17 . Cases were defined as those having BP of duration >3-6 months; controls included those without back pain or with back pain of <3-6 months. We found variants at 8q41.21 locus that showed a genome-wide significant association (OR=1.05, p = 4.4 × 10 -13 , after genomic control correction) 17 . The location chr8:130,717,716 is tagged by intergenic variant rs10956487 which is in perfect LD with rs6651255 (r 2 =1 according to 1000 Genomes EUR samples), the lead SNP in the Icelandic study of LDH by Bjornsdottir et al. 19 . The strongest association signal for BP is situated near GSDMC gene; yet another protein-coding gene, FAM49B, is located within a 500 kbp window. The summary of the previously reported results and our findings for locus 8q24.21 are shown in Figure 1.
We used the SMR-HEIDI approach 24 to test for potential pleiotropic effects of the 8q24.21 region (rs6651255) on BP and other complex traits including gene expression levels in relevant tissues. SMR analysis provides evidence for pleiotropy, but is unable to determine whether both traits are influenced by the same underlying causal polymorphism. The latter question was addressed by the HEIDI test which distinguishes pleiotropy from linkage. For SMR-HEIDI testing we used as the target trait the BP GWAS previously performed in all genetically confirmed white British UK Biobank participants 17 .
First of all, we checked relationships between BP and LDH. We found strong evidence that co-association of these traits to 8q24.21 is due to the pleiotropic action of the same functional variant (P SMR =2.8 × 10 -7 and P HEIDI = 0.87; extended data 29 ). We further applied SMR-HEIDI to test for pleiotropic effects of the region in the collection of 2,453 traits available from the UK biobank study and GeneAtlas. We found that the BP risk allele of rs6651255 was negatively associated with heel bone mineral density t-score and positively associated with selfreported prolapsed or slipped disc (both FDR SMR <0.01, underlying data 30 and extended data 29 ) and also strongly associated with standing height (underlying data 30 and extended data 29 ) and several measures of fat-free mass (all FDR SMR <0.001). However, HEIDI tests showed strong evidence against pleiotropic effects of the same functional variant in this region for BP and these anthropometric traits (all P HEIDI < 0.001), while associations with heel bone mineral density t-score and self-reported prolapsed or slipped disc were likely the result of pleiotropy, controlled by the same functional variant (all P HEIDI > 0.05). This suggests that genetic variants influencing structural spine features -the vertebral body and intervertebral disc -provide the link to BP symptoms, supporting previous work in twins highlighting shared genetic influence between spine degeneration and BP 16,31 .
The unanticipated results were obtained in the study of expression data. We have previously 17 demonstrated that variants in LD with the lead SNP at 8q24.1 were located in potential regulatory regions in chondrocytes and osteoblasts (Roadmap Epigenomics Consortium 32 ). In the current study, we tested expression of genes in the 8q24.21 region in all available tissues considered potentially relevant to development of pain (underlying data 33 ), including musculoskeletal, brain and nerve tissue (from GTEx, 20 ). Given the suggestion by Bjornsdottir et al. that the locus may act through regulation of immune function 19 , we also investigated expression in whole blood using the blood eQTL database 26 and five immune cell subsets from CEDAR 34 . Similar to the Bjornsdottir et al. study 19 , we observed that rs6651255 is co-associated with expression  of GSDMC in skeletal muscle (FDR SMR =0.01, P SMR =0.0001, underlying data 30 ), with the BP-associated C allele of rs6651255 being associated with increased expression of the gene. However, HEIDI results show this overlap to be unlikely due to pleiotropic action of the same functional variant (P HEIDI = 3×10 -7 , underlying data 30 and extended data 29 ). Our findings suggest that GSDMC is not the gene through which the functional variant acts to influence BP. The next strongest co-association was with increased expression of FAM49B in brain anterior cingulate cortex (P SMR =0.004, underlying data 30 and extended data 29 ) and decreased expression in cytotoxic T (CD8+) cells (P SMR =0.02, underlying data 30 and extended data 29 ), although these associations were only nominally significant (p<0.05), yet not statistically significant after multiple testing correction (FDR SMR =0.18 and 0.28, respectively). All corresponding P HEIDI were > 0.05, indicating these co-associations are likely due to pleiotropy. The product of FAM49B was found to be expressed in multiple tissues including bone, lymph node and spleen; according to Proteomics DB 35 , it has highest expression in lymphocytes, and was implicated as a key negative regulator of actin dynamics and T-cell activation 36 . The precise mechanism of which gene is involved in the development of BP is difficult to infer due to lack of eQTL data available for relevant tissues, such as intervertebral disc.

Conclusion
We hypothesize that the locus 8q24.21 exhibits its influence on BP phenotypes via disk and bone remodeling, and not through influence on anthropometrics or pain processing. Our findings suggest that FAM49B and not GSDMC is the most likely mechanism through which the functional variant leads to BP.