Molecular and in-silico analysis of single nucleotide polymorphism targeting human TP53 gene exon 5-8 in Sudanese esophageal cancer patients

Background: The protein product of the normal TP53 gene performs an essential function in cell cycle control and tumor suppression, and the mutation of a TP53 gene is an essential step in the development of many cancers. Despite the reported association of TP53 gene mutations with many human cancers, the comprehensive computational analysis of single nucleotide polymorphisms (SNPs), and their functional impacts, still remains rare. Methods: In this study DNA were extracted from formalin fixed paraffin embedded samples followed by the conventional polymerase chain reaction and DNA sequencing. Computational analysis was performed using different algorithms to screen for deleterious SNPs. Results: The results demonstrate that there are synonymous SNPs (sSNPs) and non-synonymous SNPs (nsSNPs) in the TP53 gene that may be deleterious to p53 structure and function. Additionally, TP53 gene mutations were found in 40% of samples. Six out of ten of TP53 gene mutations occurred in exon 5, two mutation in exon 6 and other two were present in exon 8. Only one SNP in position E298Q was predicted to have a neutral effect and other SNPs were predicted to be disease related according to Mutation Taster software. A total of 37.2% of squamous cell carcinoma (SCC) samples were found to be mutated, 87.5% of them exist in exon 5, 12.5% in exon 6 and 6.3% in exon 8, whereas adenocarcinoma (AC) achieved a higher rate of mutation (57.1%) with 100% exon 5 involvement. Conclusions: Mutation of TP53 exon 5 in esophageal cancer patients were the most frequent. Genomic results have identified a higher TP53 mutation rate in esophageal AC in contrast to SCC.


Abstract
The protein product of the normal gene performs an Background: TP53 essential function in cell cycle control and tumor suppression, and the mutation of a gene is an essential step in the development of many TP53 cancers. Despite the reported association of gene mutations with TP53 many human cancers, the comprehensive computational analysis of single nucleotide polymorphisms (SNPs), and their functional impacts, still remains rare.
In this study DNA were extracted from formalin fixed paraffin Methods: embedded samples followed by the conventional polymerase chain reaction and DNA sequencing. Computational analysis was performed using different algorithms to screen for deleterious SNPs.
The results demonstrate that there are synonymous SNPs Results: (sSNPs) and non-synonymous SNPs (nsSNPs) in the gene that may TP53 be deleterious to p53 structure and function. Additionally, gene TP53 mutations were found in 40% of samples. Six out of ten of gene TP53 mutations occurred in exon 5, two mutation in exon 6 and other two were present in exon 8. Only one SNP in position E298Q was predicted to have a neutral effect and other SNPs were predicted to be disease related according to Mutation Taster software. A total of 37.2% of squamous cell carcinoma (SCC) samples were found to be mutated, 87.5% of them exist in exon 5, 12.5% in exon 6 and 6.3% in exon 8, whereas adenocarcinoma (AC) achieved a higher rate of mutation (57.1%) with 100% exon 5 involvement.
Mutation of exon 5 in esophageal cancer patients were Conclusions: TP53 the most frequent. Genomic results have identified a higher mutation TP53 rate in esophageal AC in contrast to SCC.

Introduction
Esophageal cancer is considered one of the eight most common cancers throughout the world, and is also one of the most fatal cancers, taking into account its aggressiveness and reduced survival rate. Because of its poor prognosis with 5-year survival rates ranging between 10-13%, it ranks sixth among all cancers in mortality rate [1][2][3][4] .
Knockout of TP53 in mice leads to the development of different tumors, including lymphomas, sarcomas adenocarcinoma and benign tumors such as hemangioma, before they reach 6 month of age 5 .
TP53 gene encodes a tumor suppressor protein which plays an important role inside the cell especially in DNA transcription and repair, senescence, apoptosis, tumor suppression, treatment response and also the response to changes in metabolism 6,7 . Protein domains represent independently folding units of protein with a size between 40 to 200 amino acids. Human p53 protein contains three domains; transcriptional activation, DNA binding, and oligomerization domains. These domains are edged by a connecting region. A proline-rich region links the transcriptional activation and DNA binding domains, a second proline-rich region links the DNA binding and oligomerization domains and a basic region form the C-terminus of the protein 8 . The evolutionarily highly conservedcore domain (amino acids ~100 to ~300) is involved in sequence-specific binding to promoters of p53-regulated genes 9 .
Single nucleotide polymorphisms (SNP) are a significant type of genetic variation commonly detected in the human genome. SNPs occur in non-coding regions as well as in coding regions of the genome 10,11 . A total of 336,845,724 SNPs have been identified in humans so far, and have been deposited in NCBI dbSNP. The human TP53 gene has 3115 identified SNPs. SNPs arise in coding regions may cause an amino acid change in the corresponding protein and in such case it is called as non-synonymous SNP (nsSNP) or may not change the amino acid and here it is called a synonymous SNP (sSNP); these nsSNPs change the protein structure and hence its function, causing a specific disease 12,13 .
Recently a number of articles have demonstrated the association of SNPs in the TP53 gene with different cancer types, but in silico analysis has not yet been discussed on the functional, interactional and structural aspects of different types of SNPs in this gene. In the current study, we used different bioinformatics prediction tools and databases for analysis of these SNPs in TP53 gene. As a significant number of mutations have an impact on protein stability and interactions with the corresponding proteins, we also offered a structural model of the mutant protein. Here in this study the main objective is to detect mutations of TP53 gene focusing on exons 5 to 8 among esophageal cancer patients as these has been reported as the most mutated exons in this gene 14 .

Sampling
Sections of 30-40 µm thickness from 50 formalin fixed paraffin embedded (FFPE) tissue samples were obtained from esophageal cancer patients representing different hospitals and clinics in Khartoum State, Sudan, from July 2013 to June 2017. All patients have been previously diagnosed with squamous cell carcinoma (SCC) and adenocarcinoma (AC).

DNA extraction
Genomic DNA for PCR analysis was extracted from FFPE tissue blocks. Using commercial DNA extraction kits for fast isolation of genomic DNA from FFPE samples as per manufacturer's instructions. Extraction procedure is based on combination of an efficient lysis step with a subsequent binding of genomic DNA on a Spin Filter surface followed by washing of the bound DNA and finally eluting of the DNA (845-BP-0020250, black PREP FFPE DNA Kit, Analytik Jena Company).

Dataset collection
The SNP information SNP ID, mRNA accession number NM_000546, and Protein accession number NP_000537 of the human TP53 gene used in our computational analysis were retrieved from the National Center for Biotechnology Information (NCBI) database and catalogue of somatic mutation in cancer (COSMIC) database (TP53_ENST00000269305). The nucleotide and amino acid sequence of the p53 protein were obtained and investigated using nucleotide (NG_017013), Gene (Gene ID: 7157) database NCBI and UniProt database (P04637).

Data analysis using different bioinformatics tools
Codon code aligner. Sequences were assembled into contigs end clipped and edited using Codon Code Aligner software version 8.0.1 (Dedham, MA, USA). Sequence data are available at GenBank under accession numbers MH366303 to MH366483 17 . SIFT Program. SIFT (Sorting Intolerant from Tolerant) tool uses sequence homology to calculate the probability of affecting protein function in case of amino acid change. It uses the concept of evolutionarily conserved regions which is less tolerant to mutations, and therefore amino acid change or frame shift mutations in these regions are expected to affect protein function the most. SIFT tool works by introducing a query protein into SIFT program to be searched against protein database aligned with homologous protein sequences. Then the program calculates SIFT score based on amino acid changes in that position. A SIFT score ranges from 0 to 1. Score less than 0.05 is predicted to affect protein function and considered functionally deleterious, whereas any score more than or equal to 0.05 represents a neutral substitution 18-20 .

PolyPhen -2. PolyPhen-2 (Polymorphism Phenotyping version 2)
is a structural and functional predicting tool that predicts the effect of an amino acid change on protein characteristics based on SNPs functional annotations, protein structural properties with sequence annotation, and finally predict if the coding non-synonymous SNPs are considered damaging or not 21,22 .
PolyPhen-2 workflow requires protein sequence, mutational position, and substitution. The PolyPhen output is represented with a score that ranges from 0 to 1, with zero score indicating a neutral effect of amino acid substitution on protein function and a higher score representing a mutation that is more likely to be damaging 23 .

I-Mutant 3.0. I-Mutant
3.0 is a support vector machine (SVM) based tool, which was used to calculate the stability changes of specific SNP upon protein sequence. Information of wild and mutated residue, protein sequence, temperature, and pH was used as input parameters to this server, and finally, the outputs reports if a point mutation is stable or not. The program categorizes the prediction into: neutral mutation (DDG = 0.5 kcal/mol), large decrease of stability (0.5 kcal/mol). The output is a free Gibbs energy change value (ΔΔG) of protein before and after mutation 24-27 .

PhD-SNP.
PhD-SNP (Predictor of Human Deleterious Single Nucleotide Polymorphisms) software is a prediction tool that predicts disease association of nsSNP by dividing those SNPs into disease-related or neutral polymorphism based on a score ranged from (0-1); SNPs with a score above 0.5 are considered disease associated according to the program algorithm. PhD-SNP outputs depend on a number of sequences aligned, conservation index of SNP position, frequencies of wild and mutant residues 19,20,28 .

Project HOPE.
Structural and biochemical analysis for mutations was accomplished using Project HOPE is a web-server used to give a comprehensive report on the effect of the specific mutation on the 3D structure of the native protein and the variant model using different software and sources. The user can submit a protein sequence or an accession number of specific protein after specifying the wild-type residue and the new mutant form to create the report 29,30 .
Mutation Taster. Mutation Taster calculates the pathogenic consequences of variations in DNA sequence. It predicts the functional impact of amino acid alterations, intronic and synonymous substitutions, in addition to INDEL mutations and variants covering intron-exon connection region. Mutation Taster prediction system divides alterations as; Disease-causing: which is probably deleterious, Disease-causing automatic: the alteration here is known to be deleterious, Polymorphism: probably harmless alteration and polymorphism automatic: known to be harmless 31-33 .

FATHMM. FATHMM (Functional Analysis Through Hidden
Markov Models) is a web-server predicts the functional significances of both coding and non-coding variants. We selected the cancer option to display predictions that can distinguish between cancer-promoting/driver mutations and other germline polymorphisms. It uses a default prediction threshold of -0.75. Predictions with scores less than this indicate that the mutation is potentially cancer associated 34 .

Results of TP53 gene mutations in esophageal carcinomas
Esophageal squamous cell carcinoma cases represent 43 (86%) of all cases, whereas adenocarcinoma made up 7 cases (14%). Mutation analysis results demonstrate a higher TP53 mutation rate in esophageal adenocarcinoma compared to squamous cell carcinoma. This were illustrated in (Table 2) in the results section which describe Histopathological diagnosis, mutational status and exons affected in esophageal cancer patients.
The percentage of deleterious nsSNPs predicted by SIFT and PolyPhen was 66.7% (Figure 2), those SNPs were A161D, K164E, R175P and S215N according to SIFT and PolyPhen (Table 3-Table 4). I-Mutant suite also give the same percentage for deleterious nsSNPs which is 66.7% (Table 5) but in case of using I-Mutant suite the predicted SNPs to be deleterious were, A161D, R175P, S215N and M160V (Table 6). The PhD-SNP report defines 5/6 (83.3%) of nsSNPs as disease related polymorphism ( Table 7).
The results reveal that SNPs in positions E298Q were predicted to be a neutral polymorphism which represent 10% of all mutation detected. All other SNPs M160V, A161D, A161A, Y163Y, K164E, R175P, S215N, P222P, and K305K representing 90% of all SNPs were predicted to be disease related according to MutationTaster software (Table 8).
Frequency of SNPs among different samples was shown in details in (Table 9). FATHMM server; cancer association predictions result of the non-synonymous changes found in TP53 gene exon 5-8 were demonstrated in (Table 10).
Different mutations with their position, wild type and mutant form in addition to alignment and chromatogram were illustrated in (Figure 3-Figure 8).

Discussion
Mutations of TP53 gene can lead to loss of functional characteristics in tumor cells, as mutant TP53 may not play the assigned role in repairing cellular machinery leading to a loss of normal function and, subsequently, cells with mutant gene may express uncontrolled replication which leads to accumulation of protein 53 13 .
There were 8 (40%) males with TP53 gene mutations and 12 (60%) females. TP53 mutations were found in 22.9% of esophageal SCCs and 14.3% of esophageal ACs in a study done by            Exon 5 of the TP53 gene was observed to be the most mutated exon among the other exons investigated here in this study, with 17/20 (85%) of all mutations detected found in exon 5, 2/20 (10%) in exon 6 and 1/20 (5%) in exon 8, while exon 7 showed no mutations. Uchino et al. 37 results for mutation distribution across exons was 39.3% in exon 5, 32.1% in exon 6, 17.9% in exon 7 and (10.7%) forexon 8 mutations. Which has partly agreed with our result in exon 5 having the higher mutation rate in addition to reasonable differences in other exons rate of mutation, and this can be attributed to the small sample size used in this study.
A total of 10% of all detected mutations were classified as neutral polymorphism, while 90% were considered diseasecausing according to Mutation Taster, considered high rate of disease-causing mutations. This rate is lesser using other software due to different algorithms used by any one of them, their linked databases, and characteristics of the different software.

Conclusions
Mutation of exon 5 in p53 gene were the most frequent in esophageal cancer. Genomic results have identified a high TP53 mutation rate in esophageal Adenocarcinoma compared to squamous cell carcinoma.

Ethical considerations
This study was approved by the Institutional Ethics Committee, Sudan University of Science and Technology (reference number for the ethical committee is DSR-IEC-13-05). Patients consent cannot be obtained because most of the patients were dead and the rest cannot be traced due to lack of contact data. Therefore all samples and medical data used in this study have been irreversibly anonymized to ensure patients privacy.

Data availability
TP53 sequences for this study has been submitted to Banklt NCBI and had been assigned the accession numbers MH366303 to MH366483. The sequences are available as 4 PopSet entries for exons 5-8: Exon 5: 1472901613 Exon 6: 1472901713 Exon 7: 1472901809 Exon 8: 1472901897 TP53 sequence results were submitted in a zipped file. These sequencing results as received from BGI Company (China) include 50 esophageal cancer patients in this study using the four sets of primers for exon 5, 6, 7 and 8. Sequencing files that needs to be viewed using FinchTV and Notepad file.

Grant information
The author(s) declared that no grants were involved in supporting this work. 1.

Open Peer Review
Current Peer Review Status:

6.
7. Table 2 shows 18 mutations in exon 5, two in exon 6 and one in exon 8, but these data do not agree with the text, and the percentages are made from 10 mutations -these data must be reviewed to agree.
The description of the mutated SSC cases is not agreed in the manuscript -it is mentioned that there are 15 cases, but 16 cases are described (13 in exon 5, 2 in exon 6 and 1 in exon 8). In this situation, the authors must also present the most clear information.
Finally, in the discussion the authors compare their results of mutations in the TP53 gene only with studies from China -it would be very interesting if they would also compare them with similar studies carried out in other populations (European, American, Hispanic American, etc.).

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Partly

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Human Molecular genetic, Cancer genetic, human cytogenetic. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.