SNPector: SNP inspecting tool to detect disease existence and drug response in direct naked sequencing depends on mutation index

In recent years the number of genetically originated disease cases has raisin the alarm worldwide and has sparked interest in the development of precision medicine using molecular bio-markers such as single nucleotide polymorphism (SNP) that draw the attention of researchers due to the ability of diagnosing diseases and assessing medicines effectiveness in patients with aid of advanced diagnostic computing systems to help early detection of disease and prevention. The extraction of disease-associated SNP to discover disease is bottle nick problem because of the difficulty in selecting meaningful data and discover of SNP without need of reference to guide the software. The main objective of SNPector (https://github.com/peterhabib/SNPector) is to build python-based and user-friendly software to determine the existence of SNP in given sequence based on BLAST alignment and Clinvar database of NCBI to discover associated disease with detected SNP and calculate the linkage between detected SNPs and other different SNPs with providing of SNP Chip used for SNP genotyping . A pplication program interface (API) used to retrieve data and Several python packages are used for the visualization of variants and gene, disease, phenotype, and drug.


Introduction
Single nucleotide polymorphism (SNP), the most genetic variations among the individuals experimentally approved occurs in the human genome[1]. these randomized modification in the DNA bases may create alterations in amino acid residues of protein sequences so, modify their functions that consequently result in different disease condition in the individuals. many SNPs have been reported to be a disease-linked genetic markers in the human genome, which were used to uncover genes responsible for a specific disease Distinguishing proof and portrayal of rich number of markers will be important to relate the significant transformations in SNP and to discover their association in the advancement of malady conditions and thus, Clarification of the phenotypic-association mechanisms for these variations is vital for understanding the sub-atomic subtleties of disease beginning and creating novel therapeutic approaches [2][3]. So far, most researches has been concentrated on disease-associated SNPs (daSNPs) found in coding regions or exons, especially the non-synonymous SNPs which may change the biochemical capacity of coded proteins. Nevertheless, SNPs are also occurs in many other places of genes including: promoters, introns, 5′-and 3′ UTRs.
Therefore, modifications in gene expression, their consequences on disease susceptibility and medication reaction vary depending on the location of the SNPs. The SNPs promoter influence gene expression by altering transcription, transcription-factor binding, DNA methylation and histone modifications [4][5][6][7][8][9][10][11]. The exonal SNPs are vulnerable to cancer by preventing gene transcription and translation [12][13][14]. SNPs produce splice variants of transcripts in intronic regions, stimulate, and interrupt non-coding RNA bindings and functions [15][16][17]. SNPs in the 5′-3UTR affect translation, whereas SNPs in the 3′-UTR affect microRNA (miRNA) binding [16][17][18][19]. SNPs in regions far from the existing genes decrease or increase transcription of the genes through long-range cis effects [20]. In non-coding regions where ninety three percent as stated by like intron, long terminal repeats (LTRs) and intergenic regions, the majority of daSNPs found so far pose significant challenges to the researcher in the understanding of their disease participation.

Related work
Based on the increase in accessible variants data, softwares to modify produced data can be used for the production of new

Aim of work
Current software and web applications depends on given SNP information like SNP position, ID, allele, and gene in which gene located to conclude the effects and phenotype, nevertheless the only available information in some times may be the only the sequence in which SNPs are hidden without knowing any further SNP information or even the name of region from where sequence is taken. SNPector detect SNP with its position on gene, scaffold name, associated disease of phenotype, drug linked to disease, and adverse drug reaction annotation.

Running SNPector
SNPector run BLAST locally to find out where given sequence is located, or query, is located on human genome. By locating the query on the genome, SNPector starts collecting all SNPs located within query range from NCBI clinvar database, then compare the allele of each collected SNP with opposite on query and save the record if SNP alternative or mutated allele matches query allele.

Results preparation
At the end of SNP scanning, the output is about three files each contain the detected SNP according to its database in addition to "BLAST_RESULT.txt" where BLAST output saved. The first file created is "FromNCBI.tsv" containing the detected SNP only, then from SNP identification number in NCBI file SNPector begins to extract the same SNP from other databases file "FromAwesom.tsv" and "FromPharmGKB.tsv".

Result post-preparation
after generating the result files, SNPector initiate the preparation of data from which the circos and network will be build. Circos and network scripts accept specific format in order to work correctly

Visualization
SNPector use different python packages to:

SNPector in General
SNPector can extract variant data from given DNA sequence by investigate nucleotide by nucleotide in query and compare it with Clinvar NCBI dataset. Different databases When integrated into SNPector, it is possible to manifest the fluctuations in abundances of SNPs in query comparing with human genome variants originating from next generation sequencing projects.
These input data can be then used for generate visualizations of all output data elements. Furthermore, Figure 1 provides the workflow of the tool.

Implementation
To achieve user-friendly usage, SNPector can be run from terminal with simple command line indicate the parameters to how it works. In figure (1), python3 indicate the program interpeter as it build with python programming language, scan_dna.py refers to the python file in which program is written, -blaston command blast to work and blasting the given sequence against the genome to find where sequence is located while -blastoff tells the program to switch of BLAST and use previously generated results , -modesearch to order SNPector to only find out SNPs located in range of query while -modescan deeply investigate the existence of SNP in query, -circoson to draw circos figure illustrate where SNP with same scores are located, -networkon activate script the link between SNP, disease and drug to produce network html file, -download activate the API to download data for extracted SNPs from LDlink database, -vis for running visualize and plot data to produce figures such in figure (2), and GivenSequence.fasta is the file where user paste its sequence in fasta format. Any of the previous parameter can be deactivated when replaced with -off.

Comparability to SNP effect Tools
SNPector strength in its dependency on different database to annotate the discovered SNPs. Nevertheless, many tools provides SNPs annotation but they are still limited to given information. On the other hand, SNPector provides new tool that extract SNP from naked sequence while nothing other than the nucleotides order is known. And the following

Results Deep Investigation
SNPector provides user with more deeper and visual view figure (1) value of 1 indicates at least one expected haplotype combination is not observed, R squared (R2) that is a measure of correlation of alleles for two genetic variants, Minor allele frequency that refers to the frequency at which the second most common allele occurs in a given population which is widely used in population genetics studies because it provides information to differentiate between common and rare variants in the population. R2 values range from 0 to 1 with higher values indicating a higher degree of correlation. An R2 value of 0 indicates alleles are independent, whereas an R2 value of 1 indicates an allele of one variant perfectly predicts an allele of another variant. R2 is sensitive to allele frequency. Figure(1) shows example of output images that SNPector can generate.

Summary and Conclusion
One of our currently virulence growing catastrophic health problem is the accumulation of genetic disease in our genome, and that make it worse is not detecting it before it occurs ti reduce its severity or even cure it and this hard part of diagnosis and treatment can not be done perfectly without omics sciences and and computer sciences. SNPector combine between computer and omics sciences to provide the user with: in detail explanation of Disease associated SNPs in given query, network of disease and treatment, illustrations of linked disequilibrium detected SNPs on the same query, SNP list of minor allele frequency and R square linked with discovered SNPs in query, and list of SNP genotyping Chips for detected SNPs.  i  n  t  h  e  D  N  M  T  3  A  1  p  r  o  m  o  t  e  r  i  s  a  s  s  o  c  i  a  t  e  d  w  i  t  h  s  u  s  c  e  p  t  i  b  i  l  i  t  y  t  o   g  a  s  t  r  i  c  c  a  n  c  e  r  b  y  m  o  d  u  l  a  t  i  n  g  p  r  o  m  o  t  e  r  a  c  t  i  v  i  t  y  .  "  P  L  o  S  O  n  e  9  .  3  (  2  0  1  4  ) : e 9 2 9 1 1 .