<p>BIG DATA genetic code</p>

Praharshit Sharma

doi:10.7490/f1000research.1111304.1

Home Browse BIG DATA genetic code

Slides

NOT PEER REVIEWED

Download

metrics

VIEWS

2388

downloads

148

SEE MORE DETAILS

CITE

How to cite these slides:

Sharma P. BIG DATA genetic code [version 1; not peer reviewed]. F1000Research 2016, 5:171 (slides) (https://doi.org/10.7490/f1000research.1111304.1)

NOTE: it is important to ensure the information in square brackets after the title is included in this citation.

BIG DATA genetic code

Praharshit Sharma¹

Published 14 Feb 2016 (https://doi.org/10.7490/f1000research.1111304.1)

Author Affiliations

¹ Nencki Institute of Experimental Biology, Poland

Copyright © 2016 Sharma P. This is an open access work distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Metrics
2388 Views
148 Downloads

DOWNLOAD 695.46 KB

CITE

Published 14 Feb 2016

BIG DATA genetic code
[version 1; not peer reviewed]

Praharshit Sharma¹

Author Affiliations

¹ Nencki Institute of Experimental Biology, Poland

Abstract

Competing Interests

No competing interests were disclosed

Keywords

Bioinformatics, computational biology, data analysis, sequence analysis, blast, big data, data science, gene, DNA, RNA, protein, central dogma

Comments

15 Comments

All commenters must hold a formal affiliation as per our Policies. The information that you give us will be displayed next to your comment.

User comments must be in English, comprehensible and relevant to the article under discussion. We reserve the right to remove any comments that we consider to be inappropriate, offensive or otherwise in breach of the User Comment Terms and Conditions. Commenters must not use a comment for personal attacks. When criticisms of the article are based on unpublished data, the data should be made available.

I accept the User Comment Terms and Conditions

Affiliation

Please enter your institution.

Note: To add your institution or organisation, start typing the name and then select the correct name from the list. Where applicable, the name will appear in both the original language and in English. Do not paste in the name. If the name does not appear in the drop-down list, we will display the information you have entered.

Please select your country/region.

Competing Interests

Please disclose any competing interests that might be construed to influence your judgment of the content's validity or importance.

Praharshit Sharma

COMPUTATIONAL BIOLOGY, Indraprastha Institute of Information Technology Delhi, New Delhi, India

30 Nov 2023

Stirling's Approximation explains Limiting Value to Napier Constant attained by Praharshit Sharma's HyperProteoGenomic Equation

cf./ Please refer to DOI-indexed Self-Citation provided below:-

Cite as
Praharshit Sharma. (2023). Stirling's Approximation explains Limiting Value to Napier Constant attained by Praharshit... READ MORE

REPORT A CONCERN

Praharshit Sharma

Department of Computational Biology, Indraprastha Institute of Information Technology Delhi, New Delhi, India

30 Nov 2023

Ensurance of NEPIT-CODOME dimensionality of the Universal Genetic Code by means of adjusting Arbitrary P-norm of CDS or, (C)CDS: (Consensus)-Coding Sequences Mononucleotide Background Frequencies, in Context of its Elementary CA-Model, Computationally Equivalent to QCA: Quaternary Cellular Automata, by Ascertaining CCL... READ MORE

REPORT A CONCERN

BIOINFORMATICS Sharma

Jaipur National University, Rajasthan, India

26 Jan 2023

It is striking to note that the Sub-rules of ECA: Elementary Cellular Automata namely,

{ 000 , 001 , 010 , 011 , 100 , 101 , 110 , 111 } form the 8 three-Dimensional vertices of a Cube with Unit... READ MORE

REPORT A CONCERN

Praharshit Sharma

Nencki Institute of Experimental Biology, Poland

09 Jul 2021

Triplet-Codon Block Entropy = e invokes Human-Yeast Co-Evolutionary Relationship .
By the Logical flow of Links, https://bit.ly/homoyeast And https://bit.ly/GCgenHsRnaSc And https://youtu.be/B3dVuP0Kzg0 And https://bit.ly/gc1gc2gc3e we Obtain "Modulus" of First Set of 2 Complex Number solutions to "x" in 4th Link above... READ MORE

REPORT A CONCERN

Praharshit Sharma

Nencki Institute of Experimental Biology, Poland

02 Feb 2021

Is NAPIER's constant the "CC=Channel Capacity" of Big Data Genetic Code (BDGC)?
We define in the context of Differential-state Cellular Automaton mapping, CC=Channel Capacity as that Proportionality constant for which ratio of Radii/Neighborhood tends to ONE.
https://en.wikipedia.org/wiki/Channel_capacity
For instance, solving... READ MORE

REPORT A CONCERN

Praharshit Sharma

Nencki Institute of Experimental Biology, Poland

23 May 2020

SARS-nCoV-2 DNA-motifs/PROSITE patterns correlated to SHANNON-FANO Encoding OF First Decimal Approximation TO NEPIT Optimalised GENETIC CODING

Background:
A 1981 PMID: 7280057 paper authored by 3 Polish Airforce Scientists, rigorously proved using Differential Calculus that the Universal Genetic Code is Nepit/ NAT-encoded, that... READ MORE

REPORT A CONCERN

Praharshit Sharma

$comment.userAffiliation.renderToString()

07 May 2019

A perfect QEMS/Quantum Electro Mechanical System that could simulate Minimal Hardware required to NEPIT/NAT encode the Genetic Code, as per Arguments put forth in 1981 Jerzy Zbigniew Achimowicz et al's Seminal Work potentially Revolves around CosmoQuantum correlations in CMBR regime,... READ MORE

REPORT A CONCERN

Praharshit Sharma

$comment.userAffiliation.renderToString()

07 May 2019

Relevance Of Author's Big Data Genetic Coding conceptualization to 1981 Polish Airforce work establishing NEPIT/ NAT genetic coding using Differential Calculus, while Studying High Temperature Superconductivity in DNA, and Biological Effects of Microwave Irradiation on Resonance in Chromosomal Abberations, and Immunological Implications in Zaire Ebola virus, Inflenza A/ West Nile virus Proteome entropy, and those of MHC Class I/II , Tcell/ Bcell and "hiv" epitopes and Dengue.

Competing Interests

POLISH Airforce and PRAHARSHIT Sharma Deserve (and Reserve) NOBEL Prize for Optimizing Khorana-Nirenberg-Holley Genetic Code that won 1968 Nobel Physiology/Medicine--- http://bit.ly/nobelse http://bit.ly/sharmathesis https://www.researchgate.net/project/Quantum-DNA MY BIG DAY at India International Conference on Bioinformatics (Bioclues--Denmark--Asia--Pacific--Bioinformatics--Network--Singapore--ISCB--international--society--for--computational--biology--USA--SERB--DST--India--co-organized) 21-23 feb Held at Oldest lady DAV, Jalandhar. https://vimeo.com/319410981 Kanchinadham SPS*. Max. Proteome Entropy and CMBR [version 1; not peer reviewed]. { International Society for Computational Biology Community Journal , USA gateway } Published 18 Feb 2019 [ https://doi.org/10.7490/f1000research.1116442.1 ] https://f1000research.com/slides/8-196 Copyright © 2019 Kanchinadham Shanmuga Praharshit Sharma. Original work Must be properly cited. Diversity of ZEBOV [ Zaire EBOla Virus] proteome was measured by use of Shannon’s entropy [1]. This was done for overlapping nonamers (1–9, 2–10, etc.) of the aligned sequences of each protein. Window size of nine was chosen for immunological applications [2 –3]: it is the typical length of HLA class I T-cell epitopes and the core length of the HLA class II epitopes. T-cell epitopes are short peptides, typically with a core binding length of nine amino acids (nonamers; 9-mers), where one or multiple amino acid changes in the composition can produce combinatorial diversity even when neighboring sites are highly conserved; a single amino-acid change in a nonamer epitope can affect eight other overlapping nonamers [2]. For immunological applications, entropy measure for viral sequences is based on overlapping nonamer peptides (i.e. 1–9, 2–10, 3–11, etc) [2], and the method is described in [3]. These Entropy computations, assiduously carried out by Dr Asif et al, Dean (Perdana, Malaysia) have striking Correlations to "Relative Entropy computations, Kullback-Leiber distance “D” Information content" as explained by Thomas Nordahl Petersen from DTU, Denmark coupled with Prof Dr Hab Jerzy Zbigniew Achimowicz's 1981 Polish Airforce work, Praharshit Sharma's sole authored BDGC: Big Data Genetic Code presented at Nencki Institute of Experimental Biology (14 feb, 2016) and Finally CMBR anisotropic temparature considerations observed via COBE by Francis Mather (2006 Physics NOBEL from NASA, usa) and again Praharshit Sharma's Masters in Bioinformatics thesis on "“STATISTICAL CONSTRAINT ON ANISOU DATA FROM RCSB-PDB WITH SIGNIFICANT CORRELATIONS” whereby 1XY2.pdb (Minimal Decamer/ 10-mer Peptide with ANISOU entries) with NON-hetero-Atom 2-way ANOVA followed by Dimensionality reduction measures as well as Ultimately minimal Oliginucleotide Melting Temperatures of PCR assays as validated by Dr Jayaram group at SCFbio- IIT Delhi all fall into Place! :: REFERENCES :: [1] Heiny AT, Miotto O, Srinivasan KN, Khan AM, Zhang GL, Brusic V, et al. Evolutionarily conserved protein sequences of influenza a viruses, avian and human, as vaccine targets. PLoS One. 2007;2:e1190. [2] Khan AM, Miotto O, Nascimento EJM, Srinivasan KN, Heiny AT, Zhang GL, et al. Conservation and variability of dengue virus proteins: implications for vaccine design. PLoS Negl Trop Dis. 2008;2:e272. Koo QY, Khan AM, Jung K-O, Ramdas S, Miotto O, Tan TW, et al. Conservation and variability of West Nile virus proteins. PLoS One. 2009;4:e5352. [3] Rammensee HG. Chemistry of peptides associated with MHC class I and class II molecules. Curr Opin Immunol.1995;7:85–96. [DownLoad]= www.cbs.dtu.dk/courses/bioinformatics_it_and_health/2010/lectures/logo.ppt [4] https://www.ifi.unicamp.br/~assis/Apeiron-V2-p79-84(1995).pdf

READ LESS

REPORT A CONCERN

Praharshit Sharma

$comment.userAffiliation.renderToString()

07 Jan 2019

Implications of HPG-equation on CRISPR/ Cas9-precision in Genome-editing, computational Aspects of Wobble base-pairing based on Molecular Information Theory, and Fibonacci Numbers via Binomial Coefficients/ Golden ratio in CRISPR

Recent computational understanding of the CRISPR/ Cas9 code (December 2018), in so far... READ MORE

REPORT A CONCERN

Praharshit Sharma

$comment.userAffiliation.renderToString()

11 Jun 2018

Both the related presentations by the Speaker, namely, on BDGC: Big Data Genetic Code (Sharma.P, 2016) presented at Nencki Institute of Experimental Biology (Warsaw, Poland) and HPG: HyperProteoGenome (Sharma.P, 2016) presented at an ELIXIR conference (Odense, Denmark)
http://elixir-node.cbs.dtu.dk/wp-content/uploads/2017/02/DKBiC-2016-Collection-poster-programme-participants.pdf
rely on an infallibly reliable equation, inspired from the Principle of Computational Equivalence put forth in his Seminal Book "A New Kind of Science" (Wolfram.S, 2004)- specifically, the Mapping of Rules from Lesser Cardinality elements (DNA, 4 nucleotides) and Greater Cardinality elements (Proteins, 20 nucleotides)
http://mathworld.wolfram.com/ElementaryCellularAutomaton.html
giving rise to the Mathematical Representation of EVOLUTION at sequence level of the Universal Genome, constrained by the coding of Universal Proteome- translating in purely mathematical terms to (4^(4^x)) = 20^20), yielding x= e ~2.718...this elegantly establishes that "Channel Capacity" of cDNA genetic information transfer to Proteins is Napier's constant= e/ Nepit, as per Shannon's Information theory perspective. Also, when we consider the "MMDUMSAP: Minimal Most Divergent Ungapped Multiple Sequence Alignment of Peptides" let us first assume we arrange the aa/ amino-acids (for simplicity, in Alphabetical order) around a circle, and recurrently get for upto 20-times a 20-aa peptide, skipping one aa everytime (for the sake of convention), in clock-wise direction each time (Wenbing Hou et. al, Physica-A, 2015, Figure-1) and represent these 400 data points as 20-aa per row as a MSA (Multiple Sequence Alignment) which is intendedly Maximum Divergent as there is 0% Conserved domain or Motif in such case,
http://dx.doi.org/10.1016/j.physa.2015.10.067
Therefore, we have 20 X 20= 400 data points, each aa occurring with a frequency= 1/20= 0.05 hence as per "Shannon-Entropy" calculations, S= -400*0.05*(log_4(0.05))= 43.2193...which is same as HPG equation LHS exponent(above 4) justifying that a simplistic data format: Minimal MSA, "Includes" Extreme possible divergence of Proteins.

Competing Interests

This Extension of conceptualization of the Only Author's work on Big Data Genetic Code is again the sole author's itself, and there are no Conflict of Interests with whomosever concerned

READ LESS

REPORT A CONCERN

Praharshit Sharma

$comment.userAffiliation.renderToString()

15 May 2018

The "CODOME" codes for 100% Universal Proteome.

In a previous Comment, it was inferred that data compression achieved for a 'p'-Base 'n'-Tuple is= (p*n)/(n*(p^n)) = p^(1-n).

Therefore, invoking the RHS/ Right-Hand-Side of the HPG/ Hyper-Proteo-Genomic equation [ 4^(4^x) = 20^(20^1) ] we... READ MORE

REPORT A CONCERN

Praharshit Sharma

$comment.userAffiliation.renderToString()

04 Apr 2018

As per the LANDER/ WATERMAN Model to deduce “Probability of Base NOT Sequenced” based on the estimated sequence-coverage, does the RHS of “Reciprocal”-HyperProteoGenomic equation correlate with “Optimal Coverage” for TUMOR-Samples?

PRIMARY WEB REFERENCE-
“For optimum results Illumina recommends a minimum coverage of... READ MORE

REPORT A CONCERN

Praharshit Sharma

$comment.userAffiliation.renderToString()

30 Mar 2017

ECA: Elementary Cellular Automata 256 rules suffice to describe BDGC: Big Data Genetic Code, so do "Nepit"-scaled e-mer 4-nt Logic/ '2-bit' (Crumb) codon. And a Clue towards "3-Parallel Processors MODEL" of 8-Histone Octamers in Chromatin.

Observing that, Data Compression fraction achieved... READ MORE

REPORT A CONCERN

Praharshit Sharma

$comment.userAffiliation.renderToString()

13 Mar 2017

The HyperProteoGenomic equation confirms fraction of trait/disease-associated SNPs in GWAS-meta-Analysis

Background: My single-authored "Big Data Genetic Code", presented at Polish Academy of Sciences: Nencki Institute of Experimental Biology (http://dx.doi.org/10.7490/f1000research.1111304.1) and subsequently followed up as the "HyperProteoGenome" at ELIXIR 2nd Annual Danish Bioinformatics Conference, actually validated an important 1981 Polish Air-force result (https://www.ncbi.nlm.nih.gov/pubmed/7280057).

Intuition: That x= 99.9455% "Approximation to e= Napier's Constant actually motivates us to explore what actually spins-Off when we perform a "Taylor Series Expansion" of LHS of the BDGC "HyperProteoGeonomic-equation" [ 4^4^x = 20^20^1 ] at (x=e). Let us see.

Confirmatory Results: That 88% fraction of trait/disease-associated SNPs (https://dx.doi.org/10.1073/pnas.0903103106) based on NHGRI meta-GWAS-analysis were NON-exonic (45% intronic + 43% intergenic) could be neatly verified by the PerCentage ratio of "Approximations about x=e up to order 4" of the RHS (numerical value)“ to Taylor-series expansion” of LHS (function) of the HyperProteoGenomic equation to of the same. (1.048576e+26 / 1.18591e+26) * 100 = 88.41952593367118921334671265104434569233752982941369918459... % based upon following Web-references:-
http://www.wolframalpha.com/input/?i=Taylor+series+(4%5E4%5Ex)+at+x%3De
http://www.wolframalpha.com/input/?i=(1.048576e%2B26+%2F+1.18591e%2B26)+*+100%3D

Inference: This intricately Confirmatory result clearly suggests an asymmetric tripartite-segmentation of the HyperProteoGenomic equation in the context of SNPs associations (45% Intronic + 43% Intergenic + 12% Exonic) and further deepen our fundamental understanding of the notion of Junk-DNA, fortified by double-Edged Mathematical Rigor (Jerzy et. al, 1981: Differential Calculus approach, Continuous Mathematics) and 1-dimensional Cellular Automaton, Discrete Mathematics approach (Copyright- 2017 Sharma P).

Competing Interests

The above Mathematical Intuition is the Original Ideation of the Original Author of BDGC, and there exist NO competing interests, to his best of knoweledge and belief.

READ LESS

REPORT A CONCERN

Praharshit Sharma

$comment.userAffiliation.renderToString()

12 Apr 2016

One Question that might arise is: How "BIG" is this "BIG-DATA Genetic Code" ?

From slide 9 [07/17], substituting for the value of 'c' computed,
LHS = 4^(4^c) = 4^(4^2.7168...) = RHS = 20^20 ~ 1.048576 × 10^26
(which is strikingly of the... READ MORE

REPORT A CONCERN

Use of this website is subject to the F1000 Research Limited (F1000) General Terms and Conditions.

Submission of user comments to this website is subject to additional Terms and Conditions. By clicking "I accept the User Comment Terms and Conditions" before you submit your first comment, you agree to be bound by these conditions every time you submit a comment.

Terms relating to user comments

This Agreement shall begin on the date hereof.
Certain parts of this website offer the opportunity for users to post and exchange opinions, information, material and data ('Comments') in areas of the website. F1000 does not screen, edit, publish or review Comments prior to their appearance on the website and Comments do not reflect the views or opinions of F1000, its agents or affiliates. Comments reflect the view and opinion of the person who posts such view or opinion. To the extent permitted by applicable laws F1000 shall not be responsible or liable for the Comments or for any loss cost, liability, damages or expenses caused and or suffered as a result of any use of and/or posting of and/or appearance of the Comments on this website.
F1000 reserves the right to monitor all Comments and to remove any Comments which it considers in its absolute discretion to be inappropriate, offensive or otherwise in breach of these Terms and Conditions.
You warrant and represent that:
1. You are entitled to post the Comments on the website and have all necessary licenses and consents to do so;
2. the Comments do not infringe any intellectual property right, including without limitation copyright, patent or trademark, or other proprietary right of any third party;
3. the Comments do not contain any defamatory, libelous, offensive, indecent or otherwise unlawful material or material which is an invasion of privacy
4. the Comments do not contain any contaminating or destructive features or devices such as viruses, time bombs or coding designed to interrupt, destroy or limit the functionality of this website or any of this website's user's computer equipment or software; and
5. the Comments will not be used to solicit or promote business or custom or present commercial activities or unlawful activity.
You hereby grant to F1000 a non-exclusive royalty-free license to use, reproduce, edit and authorize others to use, reproduce and edit any of your Comments in any and all forms, formats or media.
You hereby agree to indemnify and keep indemnified F1000, its affiliates, contractors and agents from and against any and all losses (including without limitation direct, indirect and consequential loss), costs, claims, damages or expenses of whatever nature and howsoever caused arising directly or indirectly from any breach of these Terms and Conditions or arising from the Comments posted on this website or content contained in any email sent using the facilities provided by the website by you including without limitation as a result of any infringement of any intellectual property or other proprietary rights, libel, defamation, obscenity or the Comments being otherwise unlawful.

Competing Interests

Please disclose any competing interests that might be construed to influence your judgment of the content's validity or importance.