Discovery of functional non-coding conserved regions in the α-synuclein gene locus

Several single nucleotide polymorphisms (SNPs) and the Rep-1 microsatellite marker of the α-synuclein ( SNCA) gene have consistently been shown to be associated with Parkinson’s disease, but the functional relevance is unclear. Based on these findings we hypothesized that conserved cis-regulatory elements in the SNCA genomic region regulate expression of SNCA, and that SNPs in these regions could be functionally modulating the expression of SNCA, thus contributing to neuronal demise and predisposing to Parkinson’s disease. In a pair-wise comparison of a 206kb genomic region encompassing the SNCA gene, we revealed 34 evolutionary conserved DNA sequences between human and mouse. All elements were cloned into reporter vectors and assessed for expression modulation in dual luciferase reporter assays. We found that 12 out of 34 elements exhibited either an enhancement or reduction of the expression of the reporter gene. Three elements upstream of the SNCA gene displayed an approximately 1.5 fold (p<0.009) increase in expression. Of the intronic regions, three showed a 1.5 fold increase and two others indicated a 2 and 2.5 fold increase in expression (p<0.002). Three elements downstream of the SNCA gene showed 1.5 fold and 2.5 fold increase (p<0.0009). One element downstream of SNCA had a reduced expression of the reporter gene of 0.35 fold (p<0.0009) of normal activity. Our results demonstrate that the SNCA gene contains cis-regulatory regions that might regulate the transcription and expression of SNCA. Further studies in disease-relevant tissue types will be important to understand the functional impact of regulatory regions and specific Parkinson’s disease-associated SNPs and its function in the disease process.

An emerging hypothesis is gaining increasing interest and is based on the concept that subtle overexpression of α-synuclein (α-syn) over many decades can either predispose or even cause the neurodegenerative changes that characterize Parkinson's disease (PD). Neurons subjected to higher, non-physiological levels of α-syn might be more likely to be damaged by oligomerization or aggregation of this protein, eventually leading to the formation of α-synuclein-based neuropathological features of the disease 1 .
It is now well established that both point mutations and large genomic multiplications of the α-syn (SNCA) gene can cause an autosomal-dominant form of PD 2-10 . Furthermore, several association studies investigating genetic variants in the SNCA gene have found an increased risk for PD [11][12][13][14][15][16][17][18][19] . The finding that both qualitative and quantitative alterations in the SNCA gene are associated with the development of a parkinsonian phenotype indicates that amino acid substitutions as well as overexpression of wild-type α-syn are capable of triggering a clinicopathological process that is very similar to sporadic PD. Nevertheless, the precise mechanisms leading to α-syn-related pathology in sporadic PD in the absence of any α-syn mutations remain elusive.
The best characterized polymorphism in the SNCA gene is the Rep-1 mixed dinucleotide repeat which has been shown to act as a modulator of SNCA transcription 14-16 . The DNA binding protein and transcriptional regulator PARP-1 showed specific binding to SNCA-Rep1. These data were confirmed by a transgenic mouse model and demonstrated regulatory translational activity 20 .
Functionally, SNCA expression levels in postmortem brains suggest that the Rep-1 allele and SNPs in the 3′ region of the SNCA gene have a significant effect on SNCA mRNA levels in the substantia nigra and the temporal cortex 21 .
The promoter region of the SNCA gene has been recently examined in more detail in cancer cell lines and also in rat cortical neurons. Regulatory regions in intron 1 and the 5′ region of exon 1 have been shown to exhibit transcriptional activation 22-24 as well as the NACP-Rep-1 region upstream of the SNCA gene [14][15][16]20,25 . Several transcription factors have been identified such as PARP-1 16 , GATA 26 , ZIPRO1, and ZNF219 22 to have an effect on regulating the SNCA promoter region.
There is mounting evidence that SNCA expression levels could be crucial for maintenance and survival of neurons and its misregulation could play a key role in the development of PD. Thus, the importance of thoroughly investigating the SNCA gene to fully understand its cis-and trans-acting elements and factors and for the functional interpretation of the PD-disease associated risk alleles is becoming increasingly clear.
The goal of this study was to investigate transcriptional regulation of the SNCA region using a complementary approach, under the hypothesis that conserved non-coding regions of the SNCA gene are comprised of transcriptional enhancers or silencers and thus modulate gene expression. This would mean that single nucleotide polymorphisms (SNPs) in these regions could influence the transcriptional pattern of the SNCA gene 27 .

Comparative genomics
Using comparative genomics, we searched for highly conserved non-coding sequences between human and mouse and identified 34 evolutionary conserved non-coding genomic regions (ncECRs) within the SNCA gene that are conserved between human and mouse.
We utilized two complementary browsers (Vista browser (http://pipeline.lbl.gov/cgi-bin/gateway2) and ECR browser (http://ecrbrowser. dcode.org/) to generate a conservation profile by aligning the human SNCA gene with its mouse counterpart in a pair-wise fashion. We applied established selection parameters for our search with >100bp in length and >75% identity 28,29 . In addition to the 111.4kb SNCA gene region, we included a 44.5kb upstream and a 50kb downstream intergenic region to also capture surrounding regulatory elements.
We identified 34 ncECRs in the SNCA genomic region of 206kb on chromosome 4q21 (Chr.4: 90,961056-91,167082, UCSC Genome Browser on Human Mar. 2006 Assembly) by pair-wise comparison between human and mouse ( Figure 1). Ten of these DNA sequences were located downstream of the SNCA gene, 17 were intronic between exon 4 and 5, which is 92kb in length, and five were upstream of the SNCA gene ( Figure 1). None of the selected sequences overlapped with known expressed sequence tags (ESTs) or had an open reading frame of more than 20 amino acids in length, suggesting that these ncECRs are non-coding.

Cloning and luciferase assays
To test, if the ncECRs exhibit enhancer or silencer activity, we cloned all identified regions in specific reporter vectors and measured their luciferase activity after transfection into neuroblastoma cells. For our studies, we used the pGL3 luciferase reporter vectors (Promega, Cat. No. E1751, E1741, E1771, E1761) and the human neuroblastoma cell line SK-N-SH. NcECRs identified through the comparative analysis (Supplementary Table 1) were cloned upstream of a SV-40 promoter in the pGL3 promoter construct, transfected in SK-N-SH cells and assayed with the Dual-Luciferase ® Reporter Assay System (Promega, Cat. No. E1910).
Some of these regions were combined in one vector because of their close proximity to each other. Primers with specific restriction sites (KpnI, BglII or XhoI from New England Biolabs Inc.) were designed to amplify the conserved elements, and PCR products with specific restriction sites were directly cloned into the pGL3 promoter vector to ensure correct orientation of the genomic elements (Supplementary Table 1). All constructs were sequenced to ensure that no point mutations were introduced through the amplification and/or cloning process.
For transfection experiments, we used a 96-well format (Nunc, Cat. No. 167008). Cells were plated one day before transfection at a

Amendments from Version 1
We made corrections and edits according to the reviewers comments and addressed all questions and concerns in the body of the manuscript and Tables, and made changes in Figure 2 and Figure 3.  . In this assay, activities of firefly and Renilla luciferases were measured sequentially in one sample. All assays were performed in quadruplicate and each experiment was repeated three times. Altogether, 12 data points were ascertained for each conserved region/construct.

Statistical analysis
Differences among means were analyzed using two-samples student's t-test. For differences in transcriptional activation of the luc+ gene, ncECRs were tested in quadruplicates in three independent experiments. Differences were considered statistically significant at p<0.05.

Bioinformatic search for transcription factor binding sites (TFBS) with MatInspector (Genomatix)
To estimate the number of potential TFBSs and the number of interacting transcription factors (TFs) that could represent potential candidate proteins for our positive ncECRs, we used MatInspector in an in silico approach. We chose two elements for this bioinformatic analysis with MatInspector. The MatInspector software utilizes a large library of matrices for TFBSs to locate matching DNA sequences. The program assigns quality rating to matches and allows quality-based filtering and selection of matches. MatInspector can group similar or functionally related TFBSs into matrix families 30 .
In addition to the original human-mouse comparison, we added the sequences for dog and cow for comparisons. Only the TFBSs were considered that were present in all four species, in the same orientation, and similar distance to each other. We ran two analyses with 10 and 15 nucleotides distance, respectively. We accepted only models in which at least four TFs can bind in a concerted way. Each TFBS can potentially bind several TFs.
We also computationally tested all possible TFs for interactions with the SNCA promoter region, which were retrieved from the proprietary ElDorado database (Genomatix, Munich, Germany). In this database, promoters are defined and ranked by transcription start sites, corresponding known mRNA or EST sequences and by orthologous conservation.

Results
Functional non-coding conserved elements within the SNCA genomic locus Overall, 12 of 34 conserved non-coding elements exhibited either an increase or reduction of the expression of the luciferase reporter gene ( Figure 2 and Dataset 1). Three elements upstream of the SNCA gene (U3, U4-1, and U4-3) displayed a significant approximately 1.5 fold (p<0.009) increase in expression ( Figure 2A). Of the intronic regions, three showed a 1.5 fold increase (I2, I6, I8) and two others showed a 2 and 2.5 fold increase in expression (p<0.002), I5 and I12, respectively ( Figure 2B). Two elements downstream of the SNCA gene showed approximately 2 fold (D1 and D2) and 2.5 fold (D3) increase (p<0.0009) ( Figure 2C). One element D6 downstream of SNCA had a reduced expression of the reporter gene of 0.35 fold (p<0.0009) of normal activity ( Figure 2C, green) that was also confirmed after cloning the D6 element in a pGL3 control vector ( Figure 2C, insert). The pGL3 control vector contains the SV-40 promoter and a SV-40 enhancer element. The D6 element reduced the expression of the pGL3 control construct by ~50%, confirming that this element represents a repressor. Between 4 and 12 replicates were performed per ncECR.
These data provide experimental evidence that a significant proportion of the ncECRs show a regulatory function in the luciferase reporter assay.
In silico analysis reveals potential binding of midbrain transcription factors to regulatory conserved regions We performed MatInspector (Genomatix) analysis 30 on two elements (I12:chr4:90940532-90940786 and D6: chr4:90855871-90856339, Human Genome assembly NCBI36/hg18) with the highest fold change in the luciferase assay. In addition to the original humanmouse comparison to identify the ncECRs, we added the sequences from dog and cow. Only TFBSs that were present in all four species, in the same orientation, and similar distance to each other were considered. We ran two analyses with 10 and 15 nucleotides distance, respectively. We accepted only models in which at least four TFs can bind in a concerted way. Each TFBS can potentially bind several TFs. Interestingly, using this more restricted model, five factors showed an interaction with the SNCA promoter as well as with the ncECRs ( Figure 3A). These factors were the Paired-like homeodomain transcription factor 3 (PITX3), the Homolog of Drosophila orthodenticle 2 (OTX2), the Nuclear receptor subfamily 3, group c, member 1 (NR3C1) or glucocorticoid receptor (GCCR), the Androgen receptor (AR), and the general transcription initiation factor TATA box-binding protein (TBP).
It is intriguing to note that by searching for TFs that bind to both the promoter and the functional ncECR, several DNA-binding proteins were found that are linked to dopaminergic regulation and susceptibility for nigrostriatal impairment. Two of these TFs (PITX3 and OTX2) implicated in determination of a dopaminergic phenotype in the substantia nigra emerged from this preliminary search 31,32 . PITX3 has shown to be regulated in a negative feedback circuit through the microRNA mi-133b to fine-tune maintenance of dopaminergic neurons 33 . In an association study, a SNP in the  PITX3 promoter was reported to be associated with PD and might dysregulate expression of PITX3 34 suggesting that transcription factors play a critical role not only in the development and differentiation of dopaminergic neurons, but also for cell maintenance and survival of dopaminergic neurons.
GCCR and AR belong to a class of nuclear receptors called activated class I steroid receptors. GCCR is a cytosolic ligand-activated transcription factor that regulates the expression of glucocorticoidresponsive genes. GCCR shows strong anti-inflammatory and immunosuppressive effects. Interestingly, impaired GCCR expression in a mouse model shows a dramatic increase in the vulnerability of the nigrostriatal dopaminergic neurons to a toxic insult of MPTP 35 .
Taken together, this preliminary in silico screen resulted in very intriguing new candidates that might directly regulate SNCA expression and could play a role in the pathological processes that underlie PD. Data are ratios of luminometer readings for firefly luciferase and renilla luciferase. Ratios were normalized to Prom. Each non-coding element is labeled and data are presented under each element. Elements are organized according to Figure 2A-C.

Discussion
A major focus in PD research has been on post-translational modification of α-syn. The alterations seen in PD that were linked to disease pathogenesis were nitrated α-syn and α-syn phosphorylated at serine 129 identified in Lewy bodies and Lewy neurites 36,37 , however, the gene transcription as a control point and its regulation in particular cell types or upon cellular signals has only been touched fairly recently in PD-relevant genes.
Our results show that potential regulatory regions are not restricted to the promoter of the SNCA gene as discussed in the introduction, but are likely to be located also in other intronic and intergenic regions ( Figure 3B). Comparing our results to similar screens, where conserved regions range from 8-45 elements 38,41 , we found a similar number of functional elements in our screen that show a high evolutionary conservation.
Not only the promoter region of a gene drives the transcription/ expression of a gene. Also other cis-acting genomic regions within a certain gene, up to several hundred kb away, can serve as enhancers, silencers, or modifiers to ensure the accurate temporal and spatial expression of a gene by recruiting transcription activating or silencing factors that bind to them 38 . There is ample precedence for this approach to analyze genomic regions of genes implicated in human disease. Mutations in those conserved elements were found to cause human genetic syndromes, for example SALL1/Townes-Brocks syndrome 39 or SHH/preaxialpolydactyly 40 . Other groups have investigated the non-coding regulatory elements within disease genes such as RET (Ret proto-oncogene) and MECP2 (Methyl-CpG binding protein 2) and found multiple regulatory enhancer and silencer elements 38,41 .

Conclusion
This screen of evolutionary conserved genomic elements in the SNCA locus showed a number of functionally elements that in an in vitro assay modulated the expression of a reporter gene. Furthermore, we identified very intriguing new candidate transcription factors that could directly regulate SNCA expression and could, if binding is altered by genetic variants, play a role in the pathological processes that underlie PD. This is the first step to systematically analyze the SNCA locus to understand its transcriptional regulation in more detail. Further studies are needed in neuronal tissues (e.g. dopaminergic neurons derived from patient-specific induced pluripotent stem cells) to confirm these findings and expand the analysis to identify SNCA-regulating transcription factors. By defining the transcription factors that regulate expression and potentially overexpression of α-synuclein that can lead to neurodegeneration, we will be able to identify targets for novel therapeutic approaches for α-synucleinopathies including Parkinson's disease.

Competing interests
No competing interests were disclosed.  The article by Sterling has described the identification and functional analysis of evolutionally et al. conserved non-coding elements that might be involved in the transcriptional regulation of the gene , SNCA mutations in which were associated with Parkinson's disease. This is a very interesting, proof-of-concept article, with an attempt to provide pathogenic insight from the point of view of regulatory genomics for a complex human disease. I endorse the indexing of this manuscript.

Grant information
It is now well recognized that ~98% of human genome do not code for proteins. Comparative genomics studies revealed that the majority of evolutionally conserved regions consist of non-coding elements that that might be involved in regulating gene expression. Genome-wide association studies (GWAS) have showed that the majority (~93%) of SNPs contributing to human diseases or susceptibility lie outside protein-coding regions, and there are many non-coding SNPs have been demonstrated to be associated with common diseases and traits. 8.
Supp Table: there is a typo in the coordinates of D2. In the footnote include the human genome assembly of the coordinates. The identification of Transcription Factor Binding Sites (TFBS) is an important step required in order to evaluate the transcriptional regulation network of the SNCA gene. To this end, the computational prediction of TFBS is a classic approach that gives preliminary data but should be interpreted with caution. Integration of the classic approach with new models described in is highly recommended. The relation between TF motifs and Mathelier & Wasserman (2013) in vivo binding sites is far from simple. The analysis lacks of information about the context of the identified sequences. TF are highly context-specific, and the same TF typically binds to different genomic binding sites in different conditions. Obtaining information about the context could be helpful in better understanding the possible involvement of the predicted sites as TFBS. While this is beyond the scope of this study, this topic should be thoroughly discussed in the discussion section.
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.
No competing interests were disclosed.

Birgitt Schuele
We very much appreciate the careful review and excellent comments, suggestions and future directions of the reviewers. We hope to have addressed all of the comments to the reviewers' satisfaction.
There is some inconsistency regarding the number of the ncECRs identified in the initial screen between the different sections of the article (32, 34, 37). Please make the corrections where needed. Thanks so much for the comment. We made changes to reflect the correct number of 34 ncECRs. We combined counts for ncECRs that were located very closely in the luciferase assay to one ncECR therefore different numbers appeared in the text. That has been addressed.
Additional necessary control for the Luciferase experiments is a pGL-(SV40) promoter vector harboring an insert of a scrambled sequence that its size range mimics the average insert size of the tested ECRs. This is required to control for the 'spacer' effect of ECR lengths. We have included in our analysis three controls: 1 What method was used for the ? It is also not clear in the text whether all statistical analysis significant changes were calculated in comparison to the SV-40 promoter-only vector. That should be described in details in the method section. A description of the analysis of luciferase assays was lacking and has now been added as a paragraph at the end of Method section Cloning and luciferase assays and reads as follows: "Statistical analysis: Differences among means were analyzed using two-samples student's t-test. For differences in transcriptional activation of the luc+ gene, ncECRs were tested in quadruplicates in three independent experiments. Differences were considered statistically significant at p<0.05." To demonstrate the important implication of this study the authors are recommended to follow up on an event as an example. That is to say, to evaluate the effect of a genetic variation, a PD-associated SNP, on the regulatory function of the corresponding ECR using the luciferase system established in this work. Figure 3 demonstrates overlap between PD associated SNPs and ncECR, connecting these dots will be of high significance. This is an excellent suggestion and will definitely be conquered in future work with this system as this is the basis for the understanding of transcriptional regulation of the SNCA locus for potential translational applications. The presented study was intended to understand the basic changes in transcriptional regulation within the SNCA locus.
Supp Table: there is a typo in the coordinates of D2. In the footnote include the human genome assembly of the coordinates. We corrected the coordinates for D2 which was a duplicate of D1 with the correct genomic location chr4:90844830+90845413 and added in the header the corresponding Human Genome assembly NCBI36/hg18 (March 2006). Figure 2A X-axis: modify title to 'upstream….' Correction has been made. It reads now in Figure 2A " SNCA conserved elements". We Upstream also changed for consistency Figure 2B to "Intronic conserved elements" and capitalized SNCA Figure 2C " ownstream SNCA conserved elements". D Omit Figure 3A. Instead include a new panel to figure 3B that indicates the position of the putative binding sites of these TFs within SNCA locus. We have modified Figure 3 according to the MatInspector network view with respective changes in the legend. We also included which genomic sequences have been analyzed in the text. Since this is a preliminary in silico analysis, we feel that the overview is sufficient and has to be validated in functional studies. As pointed out below by the reviewer, these analyses have to be taken with care and a grain of salt.
The identification of Transcription Factor Binding Sites (TFBS) is an important step required in order to evaluate the transcriptional regulation network of the SNCA gene. To this end, the computational prediction of TFBS is a classic approach that gives preliminary data but should be interpreted with caution. Integration of the classic approach with new models described in