BIG DATA genetic code
BIG DATA genetic code
[version 1; not peer reviewed]No competing interests were disclosed
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
All commenters must hold a formal affiliation as per our Policies. The information that you give us will be displayed next to your comment.
User comments must be in English, comprehensible and relevant to the article under discussion. We reserve the right to remove any comments that we consider to be inappropriate, offensive or otherwise in breach of the User Comment Terms and Conditions. Commenters must not use a comment for personal attacks. When criticisms of the article are based on unpublished data, the data should be made available.
cf./ Please refer to DOI-indexed Self-Citation provided below:-
Cite as
Praharshit Sharma. (2023). Stirling's Approximation explains Limiting Value to Napier Constant attained by Praharshit Sharma's HyperProteoGenomic Equation:
https://doi.org/10.5281/zenodo.8372281
cf./ Please refer to DOI-indexed Self-Citation provided below:-
Cite as
Praharshit Sharma. (2023). Stirling's Approximation explains Limiting Value to Napier Constant attained by Praharshit... READ MORE
cf./ Please refer to DOI-indexed Self-Citation provided below:-
Cite as
Praharshit Sharma. (2023). Stirling's Approximation explains Limiting Value to Napier Constant attained by Praharshit Sharma's HyperProteoGenomic Equation:
https://doi.org/10.5281/zenodo.8372281
https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm
https://en.wikipedia.org/wiki/Generalized_mean
Considering for instance a Simple Hypothetical 8-mer Coding Sequence, namely
ACATGAAC , [ cf. http://users.fred.net/tds/lab/papers/primer/primer.pdf ] , we have the respective mono-nucleotide frequencies as, A=(1/2) , C=(1/4) , G=(1/8) and T=(1/8). Whereas in the Elementary CA-Model ( Elementary Cellular Automaton Model , cf.
https://mathworld.wolfram.com/ElementaryCellularAutomaton.html
https://en.wikipedia.org/wiki/Elementary_cellular_automaton )
proposed for Computing O-SCUO*, ,or, Overall-Synonymous Codon Usage Bias as a Representative Measure of CCL / Codome Core Length , HM (or) Harmonic-Mean was supposed as a suitable measure to near-Nepitize the CCL, herein we Advance a Enhancement to the same in so far as to incorporate Generalized-mean (OR) Power-mean in place of HM, that yields p = 0.267864 for the Above CDS Example, such that above p-Norm of Background Mono-nucleotide Frequencies, when Fitted into ECA-QCA Equation by accounting for KLD: Kullback-Leibler Divergence,.
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
leads to EXACTLY Nepit/ Nat = e based upon setting , p ~ 0.267864 ...
This may very well be Verified with the Scientific Computation below:-
https://www.wolframalpha.com/input?i=Solve+%282%5E%28e%29%29*%28%282%5E%281+-+3+p%29+%2B+2%5E%28-p%29+%2B+4%5E%28-p%29%29%5E%281%2Fp%29%29+%3D+256
It is surmised that the minute deviation of the so computed, just as in the afore-outlined case, might not only stand as a Testament for Chargaff's 2nd Parity Rule [ cf. https://rosalind.info/glossary/chargaffs-rules/ ] but also GC% Bias in (C)-CDS [ cf. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057000 ].
*Self-Citation of the Commenter ( PRE-Print Stage of Scientific Dissemination)---
Overall Synonymous Codon Usage Bias strongly Correlates with Codome Core Length anent Harmonic Mean based Computation of Kullback-Leibler Divergence in an ECA-QCA System (RJCTxpertReview). Zenodo. https://doi.org/10.5281/zenodo.5516971
https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm
https://en.wikipedia.org/wiki/Generalized_mean
Considering for instance a Simple Hypothetical 8-mer Coding Sequence, namely
ACATGAAC , [ cf. http://users.fred.net/tds/lab/papers/primer/primer.pdf ] , we have the respective mono-nucleotide frequencies as, A=(1/2) , C=(1/4) , G=(1/8) and T=(1/8). Whereas in the Elementary CA-Model ( Elementary Cellular Automaton Model , cf.
https://mathworld.wolfram.com/ElementaryCellularAutomaton.html
https://en.wikipedia.org/wiki/Elementary_cellular_automaton )
proposed for Computing O-SCUO*, ,or, Overall-Synonymous Codon Usage Bias as a Representative Measure of CCL / Codome Core Length , HM (or) Harmonic-Mean was supposed as a suitable measure to near-Nepitize the CCL, herein we Advance a Enhancement to the same in so far as to incorporate Generalized-mean (OR) Power-mean in place of HM, that yields p = 0.267864 for the Above CDS Example, such that above p-Norm of Background Mono-nucleotide Frequencies, when Fitted into ECA-QCA Equation by accounting for KLD: Kullback-Leibler Divergence,.
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
leads to EXACTLY Nepit/ Nat = e based upon setting , p ~ 0.267864 ...
This may very well be Verified with the Scientific Computation below:-
https://www.wolframalpha.com/input?i=Solve+%282%5E%28e%29%29*%28%282%5E%281+-+3+p%29+%2B+2%5E%28-p%29+%2B+4%5E%28-p%29%29%5E%281%2Fp%29%29+%3D+256
It is surmised that the minute deviation of the so computed, just as in the afore-outlined case, might not only stand as a Testament for Chargaff's 2nd Parity Rule [ cf. https://rosalind.info/glossary/chargaffs-rules/ ] but also GC% Bias in (C)-CDS [ cf. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057000 ].
*Self-Citation of the Commenter ( PRE-Print Stage of Scientific Dissemination)---
Overall Synonymous Codon Usage Bias strongly Correlates with Codome Core Length anent Harmonic Mean based Computation of Kullback-Leibler Divergence in an ECA-QCA System (RJCTxpertReview). Zenodo. https://doi.org/10.5281/zenodo.5516971
{ 000 , 001 , 010 , 011 , 100 , 101 , 110 , 111 } form the 8 three-Dimensional vertices of a Cube with Unit Volume (1^3), just like the 4 two-Bit Binary encoded pattern [ 00 , 01 , 10 , 11 ] do in fact occupy the 4 vertices of a 2-dimensional Square (1^2) with Unit Area, and 1-bit encoded { 0,1 } are either end-points of a 1-dimensional Line with Unit Length (1^1). It may be noted that these exponents are in fact equal to the 'Neighborhood' of Automata in question.
Thus in fact, it becomes interesting to interpret the Big_Data_Genetic_Codome's volume, whose Neighborhood has been proven to be Near-Nepit (or) an Approximation to Napier's constant - by means of the HyperProteoGenomic Equation as a Real solution to X below ( Click on Real solution, Approximate form).
https://www.wolframalpha.com/input?i=solve+4%5E%284%5EX%29+%3D+20%5E20
wherein, we now invoke "Alternative representation" RHS: Right-Hand side of the Equation,
https://www.wolframalpha.com/input?i=1%5Ee
replacing 'z' with V for Big_Data_Genetic_Codome's volume, namely 'V' = ( x*y*z ) where
x corresponds to 1st codon position's dimension.
y corresponds to 2nd codon position's dimension.
z corresponds to 3rd codon position's dimension.
And now subjecting this to Multi-variate Calculus,
https://www.wolframalpha.com/input?i=%28d%2Fdx%29%28d%2Fdy%29%28d%2Fdz%29%28e%5E%28x*y*z%29%29
and below solving for V, we now have ( Real solution, Approximate form ),
https://www.wolframalpha.com/input?i=Solve+e%5EV+%3D+log_4%2820*log_4%2820%29%29
and then Substituting into RHS of Expression above, V≈0.999455 ;
https://www.wolframalpha.com/input?i=%28e%5E%28V%29%29*%28%28V%5E2%29+%2B+%283*V%29+%2B+1%29+for+V+%3D+0.999455
Result = 13.5766 , that Perfectly correlates with the MEAN "Major Groove Width" of All the 16 Dinucleotides as per below 2009 Nucleic Acids Research paper-provided Table,
(12.15+12.37+13.51+12.87+13.58+15.49+14.42+13.51+13.93+14.55+15.49+12.37+12.32+13.93+13.58+12.15)/16 =
13.51375
Reference- https://diprodb.fli-leibniz.de/ShowTable.php
M Friedel, S Nikolajewa, J Suehnel, T Wilhelm.
DiProDB: a database for dinucleotide properties.
Nucleic Acids Research (Database issue) 37 (2009) D37-40.
The Miniscule difference/ Error ( 13.5766 - 13.51375) = 0.06285 may be justified and at the same subject to an improvisation of BEC-DFT ( Binary Erasure Channel using Discrete Fourier Transforms) that is, QEC-DFT ( Quaternary Erasure Channel using Discrete Fourier Transforms) , as per this ABSTRACT by Lakshmi Prasad Natarajan being presented at 2022 JTG IEEE-ITSoC Summer School.
https://iitmandi.ac.in/jtg2022/program.html
{ 000 , 001 , 010 , 011 , 100 , 101 , 110 , 111 } form the 8 three-Dimensional vertices of a Cube with Unit... READ MORE
{ 000 , 001 , 010 , 011 , 100 , 101 , 110 , 111 } form the 8 three-Dimensional vertices of a Cube with Unit Volume (1^3), just like the 4 two-Bit Binary encoded pattern [ 00 , 01 , 10 , 11 ] do in fact occupy the 4 vertices of a 2-dimensional Square (1^2) with Unit Area, and 1-bit encoded { 0,1 } are either end-points of a 1-dimensional Line with Unit Length (1^1). It may be noted that these exponents are in fact equal to the 'Neighborhood' of Automata in question.
Thus in fact, it becomes interesting to interpret the Big_Data_Genetic_Codome's volume, whose Neighborhood has been proven to be Near-Nepit (or) an Approximation to Napier's constant - by means of the HyperProteoGenomic Equation as a Real solution to X below ( Click on Real solution, Approximate form).
https://www.wolframalpha.com/input?i=solve+4%5E%284%5EX%29+%3D+20%5E20
wherein, we now invoke "Alternative representation" RHS: Right-Hand side of the Equation,
https://www.wolframalpha.com/input?i=1%5Ee
replacing 'z' with V for Big_Data_Genetic_Codome's volume, namely 'V' = ( x*y*z ) where
x corresponds to 1st codon position's dimension.
y corresponds to 2nd codon position's dimension.
z corresponds to 3rd codon position's dimension.
And now subjecting this to Multi-variate Calculus,
https://www.wolframalpha.com/input?i=%28d%2Fdx%29%28d%2Fdy%29%28d%2Fdz%29%28e%5E%28x*y*z%29%29
and below solving for V, we now have ( Real solution, Approximate form ),
https://www.wolframalpha.com/input?i=Solve+e%5EV+%3D+log_4%2820*log_4%2820%29%29
and then Substituting into RHS of Expression above, V≈0.999455 ;
https://www.wolframalpha.com/input?i=%28e%5E%28V%29%29*%28%28V%5E2%29+%2B+%283*V%29+%2B+1%29+for+V+%3D+0.999455
Result = 13.5766 , that Perfectly correlates with the MEAN "Major Groove Width" of All the 16 Dinucleotides as per below 2009 Nucleic Acids Research paper-provided Table,
(12.15+12.37+13.51+12.87+13.58+15.49+14.42+13.51+13.93+14.55+15.49+12.37+12.32+13.93+13.58+12.15)/16 =
13.51375
Reference- https://diprodb.fli-leibniz.de/ShowTable.php
M Friedel, S Nikolajewa, J Suehnel, T Wilhelm.
DiProDB: a database for dinucleotide properties.
Nucleic Acids Research (Database issue) 37 (2009) D37-40.
The Miniscule difference/ Error ( 13.5766 - 13.51375) = 0.06285 may be justified and at the same subject to an improvisation of BEC-DFT ( Binary Erasure Channel using Discrete Fourier Transforms) that is, QEC-DFT ( Quaternary Erasure Channel using Discrete Fourier Transforms) , as per this ABSTRACT by Lakshmi Prasad Natarajan being presented at 2022 JTG IEEE-ITSoC Summer School.
https://iitmandi.ac.in/jtg2022/program.html
By the Logical flow of Links, https://bit.ly/homoyeast And https://bit.ly/GCgenHsRnaSc And https://youtu.be/B3dVuP0Kzg0 And https://bit.ly/gc1gc2gc3e we Obtain "Modulus" of First Set of 2 Complex Number solutions to "x" in 4th Link above = sqrt((0.109019^2) + (0.389158^2)) = 40.41399402744055 % (Approx. Human 'Genomic' GC%) whereas "Modulus" of Second Set of 2 Complex Number solutions to "x" = sqrt((0.356044^2) + (0.172223^2))= 39.550991348511105 % (Approx. Yeast mRNA GC% (or) GC4d% [ https://bit.ly/gcebfungi ]). This is a great leap towards a Biologically significant under-pinning of Big Data Genetic Code.
By the Logical flow of Links, https://bit.ly/homoyeast And https://bit.ly/GCgenHsRnaSc And https://youtu.be/B3dVuP0Kzg0 And https://bit.ly/gc1gc2gc3e we Obtain "Modulus" of First Set of 2 Complex Number solutions to "x" in 4th Link above... READ MORE
By the Logical flow of Links, https://bit.ly/homoyeast And https://bit.ly/GCgenHsRnaSc And https://youtu.be/B3dVuP0Kzg0 And https://bit.ly/gc1gc2gc3e we Obtain "Modulus" of First Set of 2 Complex Number solutions to "x" in 4th Link above = sqrt((0.109019^2) + (0.389158^2)) = 40.41399402744055 % (Approx. Human 'Genomic' GC%) whereas "Modulus" of Second Set of 2 Complex Number solutions to "x" = sqrt((0.356044^2) + (0.172223^2))= 39.550991348511105 % (Approx. Yeast mRNA GC% (or) GC4d% [ https://bit.ly/gcebfungi ]). This is a great leap towards a Biologically significant under-pinning of Big Data Genetic Code.
We define in the context of Differential-state Cellular Automaton mapping, CC=Channel Capacity as that Proportionality constant for which ratio of Radii/Neighborhood tends to ONE.
https://en.wikipedia.org/wiki/Channel_capacity
For instance, solving the General equation for (X=x) in [ 2^(2^(3*x)) = 4^(4^x) ], taking LOG both sides, we have (2^(3*x))*(log_2(2)) = (4^x)*(log_2(4)), that is, (2^(3*x)) = (2^1)*(2^(2*x)).
Therefore, [ 3x = 2x + 1 ], implying that, x <= 1. Therefore, "3" is indeed the CC in this Case.
Extending this Logic to BDGC, we have [ 4^(4^(e*x)) = 20^(20^x) ]. Solving, x=0.997344 --> 1. Hence, in this Case, the Channel-Capacity (CC) of BDGC is justifiably e= Napier's constant .
We define in the context of Differential-state Cellular Automaton mapping, CC=Channel Capacity as that Proportionality constant for which ratio of Radii/Neighborhood tends to ONE.
https://en.wikipedia.org/wiki/Channel_capacity
For instance, solving... READ MORE
We define in the context of Differential-state Cellular Automaton mapping, CC=Channel Capacity as that Proportionality constant for which ratio of Radii/Neighborhood tends to ONE.
https://en.wikipedia.org/wiki/Channel_capacity
For instance, solving the General equation for (X=x) in [ 2^(2^(3*x)) = 4^(4^x) ], taking LOG both sides, we have (2^(3*x))*(log_2(2)) = (4^x)*(log_2(4)), that is, (2^(3*x)) = (2^1)*(2^(2*x)).
Therefore, [ 3x = 2x + 1 ], implying that, x <= 1. Therefore, "3" is indeed the CC in this Case.
Extending this Logic to BDGC, we have [ 4^(4^(e*x)) = 20^(20^x) ]. Solving, x=0.997344 --> 1. Hence, in this Case, the Channel-Capacity (CC) of BDGC is justifiably e= Napier's constant .
Background:
A 1981 PMID: 7280057 paper authored by 3 Polish Airforce Scientists, rigorously proved using Differential Calculus that the Universal Genetic Code is Nepit/ NAT-encoded, that is, in Base/Radix of Natural Logarithms, base "e"/ Napier's Constant= 2.7186... This result was subsequently proved ~30 years Later using Discrete Mathematics approach (Cellular Automaton) by the Sole Author of 'this' paper. Apart from the concretely successful HyperProteoGenomic equation included in the Sole Author of this MS' Originally Hypothesized Big-Data-Genetic-Coding {1} as well as Shannon entropy/ Information Content correlations in case of Immunologically relevant Nonamer core peptides in HIV/ HLA-class-2/ T-cell, B-cell Epitopes/ ZEBOV (Zaire EBOla Virus) proteomes {2}. This work attempts a satisfactorily approximate Shannon-Fano encoding of 2019-nCoV cDNA and Proteome, from the two-pronged perspective of Both 4-nt (cDNA state space Cardinality) and 20-aa (Amino-Acid state-space Cardinality). Our Ongoing Collaborative Efforts between Polish Airforce and Myself may be accessed here, cf. ( https://www.researchgate.net/project/Quantum-DNA )
Results:
Shannon-Fano encoding (ShFe) of NEPIT-genetic coding in the context of Universal Proteomic mutation space (Base= 20) reveals TAXA-UBIQUITOUS 8-fold Pentapetide Tandem Repeat {3} Peptide sequence architecture (leaving room for "Forbidden Pentapeptides" {4}, together totalling to 3,200,000), accommodatng 8000 Exhaustive set of Tripeptides {5} (atleast 16 of which are Biologically significant {6}, "RGD" motif in case of SARS-nCoV-2) and a 11-aa C-terminal tail that could potentially harbor LCRs/CBZs (Low Complexity Regions/ Compositionally Biased Zones) of remnant Amino-Acid residues. The ShFe/ Shannon-Fano encoding {7} is achieved thus:
In terms of 20-aa, [2.7= (27/10)= (54/20)= ((5/20)*8) + ((3/20)*1) + ((1/20)*11)= 2.7], the factors clearly conveying 8-Pentapeptide Epitopes for SARS-nCoV-2 vaccine design, ONE tripeptide (intended to be "RGD" motif in SARS-nCoV-2) followed by 11-aa C-terminal tail (LCRs/CBZs). For reference (based on IEDB/Immune Epitope database), see {8}.
In terms of 4-nt, [2.7= (1+ 1+ 0.6+ 0.1)= ((1/2)*2) + ((1/4)*4 + ((1/5)*3) + ((1/20)*2)= 2.7], thereupon implying "Differential Open Reading Frameshift" for CDS/ CoDing Sequence of the COVID-19 cDNA {9}
Conclusions and Further work:
The COVID-19 being a rapidly evolving Pandemic in Human History, this work should better Transform into a Community Effort. Already, we have Open Access availability of the Coronavirus Reference Genomes in NCBI, and the SARS-nCoV-2 PROSITE patterns {10}. Further work involves fine-tuning BLASTn word-size (7-11), BLASTp word-size (2-3), ML/Machine-Learning based k-Mer analysis of ShFe-ORF overlapping-windows by Elementary Cellular Automata rule-Progressions with randomized bagging of Initial Conditons, and Tripartite-coding (observing that "(2.7^3)=20 (Approx.)"
REFERENCES:
{1} Sharma P. BIG DATA genetic code [version 1]. F1000Research 2016, 5:171 (slides) (https://doi.org/10.7490/f1000research.1111304.1)
{2} Kanchinadham SPS. Max. Proteome Entropy and CMBR. (https://doi.org/10.7490/f1000research.1116442.1) https://vimeo.com/319410981
{3} https://en.wikipedia.org/wiki/Pentapeptide_repeat
{4} Tuller T, Chor B, Nelson N. Forbidden penta-peptides. Protein Sci. 2007;16(10):2251‐2259. https://dx.doi.org/10.1110/ps.073067607
{5} http://www.au-kbc.org/research_areas/bio/projects/protein/tri.html
{6} J. Med. Chem. 2011, 54, 5, 1111–1125 Publication Date:January 28, 2011 https://doi.org/10.1021/jm1012984
{7} Shannon-Fano coding https://planetcalc.com/8168/
{8} Lucchese, G. Epitopes for a 2019-nCoV vaccine. Cell Mol Immunol 17, 539–540 (2020). https://www.nature.com/articles/s41423-020-0377-z
{9} Information Theory Primer http://users.fred.net/tds/lab/papers/primer/primer.pdf
{10} SARS-CoV-2 Relevant PROSITE Motifs https://prosite.expasy.org/sars-cov-2.html
Background:
A 1981 PMID: 7280057 paper authored by 3 Polish Airforce Scientists, rigorously proved using Differential Calculus that the Universal Genetic Code is Nepit/ NAT-encoded, that... READ MORE
Background:
A 1981 PMID: 7280057 paper authored by 3 Polish Airforce Scientists, rigorously proved using Differential Calculus that the Universal Genetic Code is Nepit/ NAT-encoded, that is, in Base/Radix of Natural Logarithms, base "e"/ Napier's Constant= 2.7186... This result was subsequently proved ~30 years Later using Discrete Mathematics approach (Cellular Automaton) by the Sole Author of 'this' paper. Apart from the concretely successful HyperProteoGenomic equation included in the Sole Author of this MS' Originally Hypothesized Big-Data-Genetic-Coding {1} as well as Shannon entropy/ Information Content correlations in case of Immunologically relevant Nonamer core peptides in HIV/ HLA-class-2/ T-cell, B-cell Epitopes/ ZEBOV (Zaire EBOla Virus) proteomes {2}. This work attempts a satisfactorily approximate Shannon-Fano encoding of 2019-nCoV cDNA and Proteome, from the two-pronged perspective of Both 4-nt (cDNA state space Cardinality) and 20-aa (Amino-Acid state-space Cardinality). Our Ongoing Collaborative Efforts between Polish Airforce and Myself may be accessed here, cf. ( https://www.researchgate.net/project/Quantum-DNA )
Results:
Shannon-Fano encoding (ShFe) of NEPIT-genetic coding in the context of Universal Proteomic mutation space (Base= 20) reveals TAXA-UBIQUITOUS 8-fold Pentapetide Tandem Repeat {3} Peptide sequence architecture (leaving room for "Forbidden Pentapeptides" {4}, together totalling to 3,200,000), accommodatng 8000 Exhaustive set of Tripeptides {5} (atleast 16 of which are Biologically significant {6}, "RGD" motif in case of SARS-nCoV-2) and a 11-aa C-terminal tail that could potentially harbor LCRs/CBZs (Low Complexity Regions/ Compositionally Biased Zones) of remnant Amino-Acid residues. The ShFe/ Shannon-Fano encoding {7} is achieved thus:
In terms of 20-aa, [2.7= (27/10)= (54/20)= ((5/20)*8) + ((3/20)*1) + ((1/20)*11)= 2.7], the factors clearly conveying 8-Pentapeptide Epitopes for SARS-nCoV-2 vaccine design, ONE tripeptide (intended to be "RGD" motif in SARS-nCoV-2) followed by 11-aa C-terminal tail (LCRs/CBZs). For reference (based on IEDB/Immune Epitope database), see {8}.
In terms of 4-nt, [2.7= (1+ 1+ 0.6+ 0.1)= ((1/2)*2) + ((1/4)*4 + ((1/5)*3) + ((1/20)*2)= 2.7], thereupon implying "Differential Open Reading Frameshift" for CDS/ CoDing Sequence of the COVID-19 cDNA {9}
Conclusions and Further work:
The COVID-19 being a rapidly evolving Pandemic in Human History, this work should better Transform into a Community Effort. Already, we have Open Access availability of the Coronavirus Reference Genomes in NCBI, and the SARS-nCoV-2 PROSITE patterns {10}. Further work involves fine-tuning BLASTn word-size (7-11), BLASTp word-size (2-3), ML/Machine-Learning based k-Mer analysis of ShFe-ORF overlapping-windows by Elementary Cellular Automata rule-Progressions with randomized bagging of Initial Conditons, and Tripartite-coding (observing that "(2.7^3)=20 (Approx.)"
REFERENCES:
{1} Sharma P. BIG DATA genetic code [version 1]. F1000Research 2016, 5:171 (slides) (https://doi.org/10.7490/f1000research.1111304.1)
{2} Kanchinadham SPS. Max. Proteome Entropy and CMBR. (https://doi.org/10.7490/f1000research.1116442.1) https://vimeo.com/319410981
{3} https://en.wikipedia.org/wiki/Pentapeptide_repeat
{4} Tuller T, Chor B, Nelson N. Forbidden penta-peptides. Protein Sci. 2007;16(10):2251‐2259. https://dx.doi.org/10.1110/ps.073067607
{5} http://www.au-kbc.org/research_areas/bio/projects/protein/tri.html
{6} J. Med. Chem. 2011, 54, 5, 1111–1125 Publication Date:January 28, 2011 https://doi.org/10.1021/jm1012984
{7} Shannon-Fano coding https://planetcalc.com/8168/
{8} Lucchese, G. Epitopes for a 2019-nCoV vaccine. Cell Mol Immunol 17, 539–540 (2020). https://www.nature.com/articles/s41423-020-0377-z
{9} Information Theory Primer http://users.fred.net/tds/lab/papers/primer/primer.pdf
{10} SARS-CoV-2 Relevant PROSITE Motifs https://prosite.expasy.org/sars-cov-2.html
Quantum electromechanical systems are nano-to-micrometer (micron) scale mechanical resonators coupled to electronic devices of comparable dimensions, such that the mechanical resonator behaves in a manifestly quantum manner. Towards realising quantum electromechanical systems, one usually begins with the phononic quantum of thermal conductance for suspended dielectric wires. An important rule-of-thumb for observing quantum behaviour is, hv = kT ( "h" is Planck's constant = 6.62607004 × 10^(-34) m^2 *kg / s ) ( "k" is Boltzmann constant = 1.38064852 × 10^(-23) m^2 kg s^(-2) K^(-1) ) Substituting CMBR characteristic Temperature= 2.726K, Frequency "v" = 56.79 GHz (gigahertz)=> which Falls within the Microwave Regime; (The microwaves are the subset in the range between 0.3 and 300 gigahertz).
Quantum electromechanical systems are nano-to-micrometer (micron) scale mechanical resonators coupled to electronic devices of comparable dimensions, such that the mechanical resonator behaves in a manifestly quantum manner. Towards realising quantum electromechanical systems, one usually begins with the phononic quantum of thermal conductance for suspended dielectric wires. An important rule-of-thumb for observing quantum behaviour is, hv = kT ( "h" is Planck's constant = 6.62607004 × 10^(-34) m^2 *kg / s ) ( "k" is Boltzmann constant = 1.38064852 × 10^(-23) m^2 kg s^(-2) K^(-1) ) Substituting CMBR characteristic Temperature= 2.726K, Frequency "v" = 56.79 GHz (gigahertz)=> which Falls within the Microwave Regime; (The microwaves are the subset in the range between 0.3 and 300 gigahertz). READ LESS
Recent computational understanding of the CRISPR/ Cas9 code (December 2018), in so far as to "precisely" predict the Outcome of CRISPR-mediated editing in the Human
genome by a Group from Francis-Crick institute (UK) has led me to Apply above Hyper-Proteo-Genomic (HPG) equation-- also having taken into account the Effects pronounced by "Length of sgRNA / single-guided RNA" used, that ranges from 17-20nt of { A/ C/ G/ T }. The references involved being,
https://phys.org/news/2018-12-scientists-crispr-code-precise-human.html
https://www.sciencedirect.com/science/article/pii/S1097276518310013?via%3Dihub
https://www.nature.com/articles/srep28566
Continuing my research statement prior to (and including) above URLs, and
my HPG-equation (that 1-1/ Bijectively) maps Universal INTRA-genomic variations to Global-space of INTER-proteomic single-point mutations ( cf. my Slides at PAN/ Polish Academy of Sciences+ Abstract+ Page-89 poster
presentation procedings at 2nd Annual ELIXIR-Danish Bioinformatics conference, August- 2016) http://bit.ly/bdgcode [ 1332 Views, 97 Downloads]
the text-Representation of my Equation may be re-Expressed wrt "RHS" as,
4^(4^e) = 20^20 = (4*5)^20 = (4^20) * (5*20) = (4^(17+3)) * (4+1)^20
Therefore, RHS of my Originally ideated HPG- equation takes the form of,
(4^17)* (4^3) * (x+1)^20, where "17"= minimum CRISPR-sgRNA length (Thereby (4^17) being ENTIRE set of minimal 17-mers of sgRNA...[1]
3= codon-Length (ATG, start codon etc.,) as per Universal Genetic Code &,
(x+1)^20 is subject to a Simple "Binomial-series" expansion (where x=4) [2]
( Moreover, ( 4+1)^20 = (1+4)^20 , which has Nice agreement with
"The precision of DNA editing is mainly determined by the fourth nucleotide upstream of the PAM site" , strictly as per "Molecular-Cell" paper Above.
While [1] above can potentially address phenomena such as "Codon usage/ bias" and CAI (i.e,) "codon adaptation index" including Wobbling of 3rd base based on "MIT"/ molecular-information-theory ( a brief review is herein below)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3220916/
Given, "It turns out that it can be expressed as an average of the information of individual sequences by adding together weights of the form Ri(b, l) = 2 + log_2 (f(b,l)) for DNA sequence" therefore, invoking Abstract of above "MIT" article with Addendum = 70 % efficiency,or (0.7), << Avg. Info. >> ~ 2.7, that agrees with my Genetic-coding work, https://bioinformer.github.io/
With due Regard to "Biological Information Theory" works Courtesy of Dr Tom,
http://users.fred.net/tds/lab/
And [2] as a Mathematical-beauty, can very well (and Directly) be abstracted to the correspondence between Binomial expansion and Fibonacci numbers within the context of "Golden-number/ ratio" encountered in CRISPR/ Cas9.
https://biomedres.us/pdfs/BJSTR.MS.ID.000324.pdf
https://proofwiki.org/wiki/Fibonacci_Number_as_Sum_of_Binomial_Coefficients
wherein, putting (x=4), we can Expand as per Wolfram Computation Engine,
https://www.wolframalpha.com/input/?i=(x%2B1)%5E20%3D
Lastly, to "probe" the "FBC"/ Fibonacci-Binomial-Coefficients above, when we Put ( k = K-2), we have,
FBC= { n - (K-2) -1 } "C" { (K-2) } , where "C" denotes combination...
http://mathworld.wolfram.com/BinomialCoefficient.html
Thus, the "minimum" Integer-offset of Index above= "2" (not 3, for Traditional Codon), which is= [e] = [2.71...] = 2, where "[]" is Greatest Integer Function,
wrt Seminal-work+ References may be accessible at, http://bit.ly/nobelse
FBC= { n - K + 2 - 1 } "C" { ( K-2) } = { n - K + 1 } "C" { (K-2) }
Interestingly, for given "K", ( n-K+1) = Number of "K-mers" in n-nt sequence.
https://arxiv.org/pdf/1308.2012.pdf
( Also see "Induction-Hypothesis" for Fibonacci-term, F (n+2) ),
https://proofwiki.org/wiki/Fibonacci_Number_as_Sum_of_Binomial_Coefficients#Induction_Hypothesis
Recent computational understanding of the CRISPR/ Cas9 code (December 2018), in so far... READ MORE
Recent computational understanding of the CRISPR/ Cas9 code (December 2018), in so far as to "precisely" predict the Outcome of CRISPR-mediated editing in the Human
genome by a Group from Francis-Crick institute (UK) has led me to Apply above Hyper-Proteo-Genomic (HPG) equation-- also having taken into account the Effects pronounced by "Length of sgRNA / single-guided RNA" used, that ranges from 17-20nt of { A/ C/ G/ T }. The references involved being,
https://phys.org/news/2018-12-scientists-crispr-code-precise-human.html
https://www.sciencedirect.com/science/article/pii/S1097276518310013?via%3Dihub
https://www.nature.com/articles/srep28566
Continuing my research statement prior to (and including) above URLs, and
my HPG-equation (that 1-1/ Bijectively) maps Universal INTRA-genomic variations to Global-space of INTER-proteomic single-point mutations ( cf. my Slides at PAN/ Polish Academy of Sciences+ Abstract+ Page-89 poster
presentation procedings at 2nd Annual ELIXIR-Danish Bioinformatics conference, August- 2016) http://bit.ly/bdgcode [ 1332 Views, 97 Downloads]
the text-Representation of my Equation may be re-Expressed wrt "RHS" as,
4^(4^e) = 20^20 = (4*5)^20 = (4^20) * (5*20) = (4^(17+3)) * (4+1)^20
Therefore, RHS of my Originally ideated HPG- equation takes the form of,
(4^17)* (4^3) * (x+1)^20, where "17"= minimum CRISPR-sgRNA length (Thereby (4^17) being ENTIRE set of minimal 17-mers of sgRNA...[1]
3= codon-Length (ATG, start codon etc.,) as per Universal Genetic Code &,
(x+1)^20 is subject to a Simple "Binomial-series" expansion (where x=4) [2]
( Moreover, ( 4+1)^20 = (1+4)^20 , which has Nice agreement with
"The precision of DNA editing is mainly determined by the fourth nucleotide upstream of the PAM site" , strictly as per "Molecular-Cell" paper Above.
While [1] above can potentially address phenomena such as "Codon usage/ bias" and CAI (i.e,) "codon adaptation index" including Wobbling of 3rd base based on "MIT"/ molecular-information-theory ( a brief review is herein below)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3220916/
Given, "It turns out that it can be expressed as an average of the information of individual sequences by adding together weights of the form Ri(b, l) = 2 + log_2 (f(b,l)) for DNA sequence" therefore, invoking Abstract of above "MIT" article with Addendum = 70 % efficiency,or (0.7), << Avg. Info. >> ~ 2.7, that agrees with my Genetic-coding work, https://bioinformer.github.io/
With due Regard to "Biological Information Theory" works Courtesy of Dr Tom,
http://users.fred.net/tds/lab/
And [2] as a Mathematical-beauty, can very well (and Directly) be abstracted to the correspondence between Binomial expansion and Fibonacci numbers within the context of "Golden-number/ ratio" encountered in CRISPR/ Cas9.
https://biomedres.us/pdfs/BJSTR.MS.ID.000324.pdf
https://proofwiki.org/wiki/Fibonacci_Number_as_Sum_of_Binomial_Coefficients
wherein, putting (x=4), we can Expand as per Wolfram Computation Engine,
https://www.wolframalpha.com/input/?i=(x%2B1)%5E20%3D
Lastly, to "probe" the "FBC"/ Fibonacci-Binomial-Coefficients above, when we Put ( k = K-2), we have,
FBC= { n - (K-2) -1 } "C" { (K-2) } , where "C" denotes combination...
http://mathworld.wolfram.com/BinomialCoefficient.html
Thus, the "minimum" Integer-offset of Index above= "2" (not 3, for Traditional Codon), which is= [e] = [2.71...] = 2, where "[]" is Greatest Integer Function,
wrt Seminal-work+ References may be accessible at, http://bit.ly/nobelse
FBC= { n - K + 2 - 1 } "C" { ( K-2) } = { n - K + 1 } "C" { (K-2) }
Interestingly, for given "K", ( n-K+1) = Number of "K-mers" in n-nt sequence.
https://arxiv.org/pdf/1308.2012.pdf
( Also see "Induction-Hypothesis" for Fibonacci-term, F (n+2) ),
https://proofwiki.org/wiki/Fibonacci_Number_as_Sum_of_Binomial_Coefficients#Induction_Hypothesis
http://elixir-node.cbs.dtu.dk/wp-content/uploads/2017/02/DKBiC-2016-Collection-poster-programme-participants.pdf
rely on an infallibly reliable equation, inspired from the Principle of Computational Equivalence put forth in his Seminal Book "A New Kind of Science" (Wolfram.S, 2004)- specifically, the Mapping of Rules from Lesser Cardinality elements (DNA, 4 nucleotides) and Greater Cardinality elements (Proteins, 20 nucleotides)
http://mathworld.wolfram.com/ElementaryCellularAutomaton.html
giving rise to the Mathematical Representation of EVOLUTION at sequence level of the Universal Genome, constrained by the coding of Universal Proteome- translating in purely mathematical terms to (4^(4^x)) = 20^20), yielding x= e ~2.718...this elegantly establishes that "Channel Capacity" of cDNA genetic information transfer to Proteins is Napier's constant= e/ Nepit, as per Shannon's Information theory perspective. Also, when we consider the "MMDUMSAP: Minimal Most Divergent Ungapped Multiple Sequence Alignment of Peptides" let us first assume we arrange the aa/ amino-acids (for simplicity, in Alphabetical order) around a circle, and recurrently get for upto 20-times a 20-aa peptide, skipping one aa everytime (for the sake of convention), in clock-wise direction each time (Wenbing Hou et. al, Physica-A, 2015, Figure-1) and represent these 400 data points as 20-aa per row as a MSA (Multiple Sequence Alignment) which is intendedly Maximum Divergent as there is 0% Conserved domain or Motif in such case,
http://dx.doi.org/10.1016/j.physa.2015.10.067
Therefore, we have 20 X 20= 400 data points, each aa occurring with a frequency= 1/20= 0.05 hence as per "Shannon-Entropy" calculations, S= -400*0.05*(log_4(0.05))= 43.2193...which is same as HPG equation LHS exponent(above 4) justifying that a simplistic data format: Minimal MSA, "Includes" Extreme possible divergence of Proteins.
http://elixir-node.cbs.dtu.dk/wp-content/uploads/2017/02/DKBiC-2016-Collection-poster-programme-participants.pdf
rely on an infallibly... READ MORE
http://elixir-node.cbs.dtu.dk/wp-content/uploads/2017/02/DKBiC-2016-Collection-poster-programme-participants.pdf
rely on an infallibly reliable equation, inspired from the Principle of Computational Equivalence put forth in his Seminal Book "A New Kind of Science" (Wolfram.S, 2004)- specifically, the Mapping of Rules from Lesser Cardinality elements (DNA, 4 nucleotides) and Greater Cardinality elements (Proteins, 20 nucleotides)
http://mathworld.wolfram.com/ElementaryCellularAutomaton.html
giving rise to the Mathematical Representation of EVOLUTION at sequence level of the Universal Genome, constrained by the coding of Universal Proteome- translating in purely mathematical terms to (4^(4^x)) = 20^20), yielding x= e ~2.718...this elegantly establishes that "Channel Capacity" of cDNA genetic information transfer to Proteins is Napier's constant= e/ Nepit, as per Shannon's Information theory perspective. Also, when we consider the "MMDUMSAP: Minimal Most Divergent Ungapped Multiple Sequence Alignment of Peptides" let us first assume we arrange the aa/ amino-acids (for simplicity, in Alphabetical order) around a circle, and recurrently get for upto 20-times a 20-aa peptide, skipping one aa everytime (for the sake of convention), in clock-wise direction each time (Wenbing Hou et. al, Physica-A, 2015, Figure-1) and represent these 400 data points as 20-aa per row as a MSA (Multiple Sequence Alignment) which is intendedly Maximum Divergent as there is 0% Conserved domain or Motif in such case,
http://dx.doi.org/10.1016/j.physa.2015.10.067
Therefore, we have 20 X 20= 400 data points, each aa occurring with a frequency= 1/20= 0.05 hence as per "Shannon-Entropy" calculations, S= -400*0.05*(log_4(0.05))= 43.2193...which is same as HPG equation LHS exponent(above 4) justifying that a simplistic data format: Minimal MSA, "Includes" Extreme possible divergence of Proteins.
In a previous Comment, it was inferred that data compression achieved for a 'p'-Base 'n'-Tuple is= (p*n)/(n*(p^n)) = p^(1-n).
Therefore, invoking the RHS/ Right-Hand-Side of the HPG/ Hyper-Proteo-Genomic equation [ 4^(4^x) = 20^(20^1) ] we have p= 20, n= 1.
This implies that our Novel concept of the "CODOME" codes for 100% Universal Proteome, since for proteome-RHS, p^(1-n)= 20^(1-1)= 20^0= 1= 100% i.e, 0% data is "compressed".
wrt/ With Reference to "Protein Moment Of Inertia" paper Fig. 1. The location of 20 amino acids on the circumference, herein:-
https://www.sciencedirect.com/science/article/pii/S0378437115009267
We aim to achieve a "Complete-Graph" with 20 Nodes spread across the circumference a of a circle, for ease of representation, 18 deg. apart.
So each node (denoted by Amino-Acid 1-letter IUPAC-IUB symbol), is connected to every other 19 nodes, and itself, thus opening up the possibility to deduce ANY given Primary structure or Protein Sequence as simply a PATH traversal among this "Complete Graph of the CODOME".
In a previous Comment, it was inferred that data compression achieved for a 'p'-Base 'n'-Tuple is= (p*n)/(n*(p^n)) = p^(1-n).
Therefore, invoking the RHS/ Right-Hand-Side of the HPG/ Hyper-Proteo-Genomic equation [ 4^(4^x) = 20^(20^1) ] we... READ MORE
In a previous Comment, it was inferred that data compression achieved for a 'p'-Base 'n'-Tuple is= (p*n)/(n*(p^n)) = p^(1-n).
Therefore, invoking the RHS/ Right-Hand-Side of the HPG/ Hyper-Proteo-Genomic equation [ 4^(4^x) = 20^(20^1) ] we have p= 20, n= 1.
This implies that our Novel concept of the "CODOME" codes for 100% Universal Proteome, since for proteome-RHS, p^(1-n)= 20^(1-1)= 20^0= 1= 100% i.e, 0% data is "compressed".
wrt/ With Reference to "Protein Moment Of Inertia" paper Fig. 1. The location of 20 amino acids on the circumference, herein:-
https://www.sciencedirect.com/science/article/pii/S0378437115009267
We aim to achieve a "Complete-Graph" with 20 Nodes spread across the circumference a of a circle, for ease of representation, 18 deg. apart.
So each node (denoted by Amino-Acid 1-letter IUPAC-IUB symbol), is connected to every other 19 nodes, and itself, thus opening up the possibility to deduce ANY given Primary structure or Protein Sequence as simply a PATH traversal among this "Complete Graph of the CODOME".
PRIMARY WEB REFERENCE-
“For optimum results Illumina recommends a minimum coverage of 30x for normal tissue and 60x coverage for tumor samples.”
https://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/all-in-one-day-medicine-cancer-patients-infographic.pdf
Let us take the reciprocal of the HyperProteoGenomic equation, 4^(4^x) = 20^20 . We have,
4^(-(4^x)) = 20^(-20)
Clearly, we Note that the “RRHS/ Reciprocal-RHS” of Above Equation= 20^(-20) is a very infinitesimal quantity ( 0 < RRHS < 1 ), allowing us to assume the same as the LANDER/ WATERMAN Model-based Probability of Bases NOT Sequenced= e^(-C), where “C” denotes Sequence Coverage.
Before we proceed Further, to the logical Next Step, we can nearly justify the Original HPG/ HyperProteoGenomic- equation in Question, that x= e (close to 99.9455 %) might have a corollary with “Exponential Tumor Growth” (in CANCER-Lesions).
Therefore, in order to Compute “C” (hypothesized Coverage for RRHS above),
e^(-C) = 20^(-20). Which yields,
C ~ 59.91464547107981986870447152285081551353203245978056460308...
(~ 60x Approx.);
which is 99.86 % close to 60x Coverage, which is “Optimal” (PE/ Paired End) for Tumor samples, cf. Authentic INTEL Web Reference already mentioned Above,
https://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/all-in-one-day-medicine-cancer-patients-infographic.pdf
PRIMARY WEB REFERENCE-
“For optimum results Illumina recommends a minimum coverage of... READ MORE
PRIMARY WEB REFERENCE-
“For optimum results Illumina recommends a minimum coverage of 30x for normal tissue and 60x coverage for tumor samples.”
https://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/all-in-one-day-medicine-cancer-patients-infographic.pdf
Let us take the reciprocal of the HyperProteoGenomic equation, 4^(4^x) = 20^20 . We have,
4^(-(4^x)) = 20^(-20)
Clearly, we Note that the “RRHS/ Reciprocal-RHS” of Above Equation= 20^(-20) is a very infinitesimal quantity ( 0 < RRHS < 1 ), allowing us to assume the same as the LANDER/ WATERMAN Model-based Probability of Bases NOT Sequenced= e^(-C), where “C” denotes Sequence Coverage.
Before we proceed Further, to the logical Next Step, we can nearly justify the Original HPG/ HyperProteoGenomic- equation in Question, that x= e (close to 99.9455 %) might have a corollary with “Exponential Tumor Growth” (in CANCER-Lesions).
Therefore, in order to Compute “C” (hypothesized Coverage for RRHS above),
e^(-C) = 20^(-20). Which yields,
C ~ 59.91464547107981986870447152285081551353203245978056460308...
(~ 60x Approx.);
which is 99.86 % close to 60x Coverage, which is “Optimal” (PE/ Paired End) for Tumor samples, cf. Authentic INTEL Web Reference already mentioned Above,
https://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/all-in-one-day-medicine-cancer-patients-infographic.pdf
Observing that, Data Compression fraction achieved (w.r.t "Slide [03/ 17]) in case of base-2, 3-mer/ 3-tuple Logic=
(2*3) / (3*(2^3)) = 2^(1 - 3) = 2^(-2) = 0.25 = 25 %
http://www.wolframalpha.com/input/?i=(2*3)%2F8%3D
Strikingly, we "catch" an Agreement with "Nepit"-scaled e-mer 4-nt Logic, because,
e * 4^(1-e) = 0.25106 = (approx.) = 25.11 %
http://www.wolframalpha.com/input/?i=e+*+4%5E(1-e)
Now, let us take the simple-straight-forward case of 2-bit ("Crumb") 4-nt (A|C|G|T) logic,
wherein we ignore the "Central" nucleotide (See i3-encoding publication URL below).
We see, 4^(1-2) = 4^(-1) = (1/4) = 0.25 = 25 %
Hence, in ALL "3" cases, BDGC data compression achieved = (100 - 25) % ~ 75 %
The "Central nt-Independence" (Except Singular-case of Serine) is justified by the Outer-Totalistic Logic under play in "i3-encoding Paper" of the same Author,
http://ijcb.in/ijcb/v2/index.php/ijcb/article/view/7/8
(For an Example Model based upon Buy-Sell dynamics in Finance, please consult=
http://www.wolframscience.com/nksonline/page-432 )
Even otherwise, "silent" mutations of Serine (exception) are, "Evolutionarily-Neutral" :-
https://en.wikipedia.org/wiki/Silent_mutation#Transfer_RNA
To summarize, the 8-Histone Octamers (De-)/ Methylation 'status' (0/1) may be elegantly characterized within the context of Chromatin architecture, considering the "3" cells in ECA above to be "3" Parallel Processors (READER + WRITER + ERASER) which 'collectively' decisively determine the CpG-island-Mapped "Methylation-status" of 8-outputs (octamers)
elegantly on the Basis of 88 of 256 fundamentally In-equivalent Boolean/ Algebraic rules:-
http://atlas.wolfram.com/01/01/views/173/TableView.html
http://atlas.wolfram.com/01/01/views/172/TableView.html
http://mathworld.wolfram.com/ElementaryCellularAutomaton.html
REFERENCES ::
Functional coupling between writers, erasers and readers of histone and DNA methylation.
Torres IO, Fujimori DG.
https://www.ncbi.nlm.nih.gov/pubmed/26496625
Figurative= http://www.nature.com/nrd/journal/v13/n9/fig_tab/nrd4360_F1.html
Nepit, https://en.wikipedia.org/wiki/Nat_(unit)
Crumb, http://mathworld.wolfram.com/Crumb.html
Note:- It may be noted that 3-cell ECA logic is proposed to be employed in "Chromatin architecture/ Histone-Code", wheras 2-cell "Crumb Codon" in context of "Genetic-Code".
Observing that, Data Compression fraction achieved... READ MORE
Observing that, Data Compression fraction achieved (w.r.t "Slide [03/ 17]) in case of base-2, 3-mer/ 3-tuple Logic=
(2*3) / (3*(2^3)) = 2^(1 - 3) = 2^(-2) = 0.25 = 25 %
http://www.wolframalpha.com/input/?i=(2*3)%2F8%3D
Strikingly, we "catch" an Agreement with "Nepit"-scaled e-mer 4-nt Logic, because,
e * 4^(1-e) = 0.25106 = (approx.) = 25.11 %
http://www.wolframalpha.com/input/?i=e+*+4%5E(1-e)
Now, let us take the simple-straight-forward case of 2-bit ("Crumb") 4-nt (A|C|G|T) logic,
wherein we ignore the "Central" nucleotide (See i3-encoding publication URL below).
We see, 4^(1-2) = 4^(-1) = (1/4) = 0.25 = 25 %
Hence, in ALL "3" cases, BDGC data compression achieved = (100 - 25) % ~ 75 %
The "Central nt-Independence" (Except Singular-case of Serine) is justified by the Outer-Totalistic Logic under play in "i3-encoding Paper" of the same Author,
http://ijcb.in/ijcb/v2/index.php/ijcb/article/view/7/8
(For an Example Model based upon Buy-Sell dynamics in Finance, please consult=
http://www.wolframscience.com/nksonline/page-432 )
Even otherwise, "silent" mutations of Serine (exception) are, "Evolutionarily-Neutral" :-
https://en.wikipedia.org/wiki/Silent_mutation#Transfer_RNA
To summarize, the 8-Histone Octamers (De-)/ Methylation 'status' (0/1) may be elegantly characterized within the context of Chromatin architecture, considering the "3" cells in ECA above to be "3" Parallel Processors (READER + WRITER + ERASER) which 'collectively' decisively determine the CpG-island-Mapped "Methylation-status" of 8-outputs (octamers)
elegantly on the Basis of 88 of 256 fundamentally In-equivalent Boolean/ Algebraic rules:-
http://atlas.wolfram.com/01/01/views/173/TableView.html
http://atlas.wolfram.com/01/01/views/172/TableView.html
http://mathworld.wolfram.com/ElementaryCellularAutomaton.html
REFERENCES ::
Functional coupling between writers, erasers and readers of histone and DNA methylation.
Torres IO, Fujimori DG.
https://www.ncbi.nlm.nih.gov/pubmed/26496625
Figurative= http://www.nature.com/nrd/journal/v13/n9/fig_tab/nrd4360_F1.html
Nepit, https://en.wikipedia.org/wiki/Nat_(unit)
Crumb, http://mathworld.wolfram.com/Crumb.html
Note:- It may be noted that 3-cell ECA logic is proposed to be employed in "Chromatin architecture/ Histone-Code", wheras 2-cell "Crumb Codon" in context of "Genetic-Code".
Background: My single-authored "Big Data Genetic Code", presented at Polish Academy of Sciences: Nencki Institute of Experimental Biology (http://dx.doi.org/10.7490/f1000research.1111304.1) and subsequently followed up as the "HyperProteoGenome" at ELIXIR 2nd Annual Danish Bioinformatics Conference, actually validated an important 1981 Polish Air-force result (https://www.ncbi.nlm.nih.gov/pubmed/7280057).
Intuition: That x= 99.9455% "Approximation to e= Napier's Constant actually motivates us to explore what actually spins-Off when we perform a "Taylor Series Expansion" of LHS of the BDGC "HyperProteoGeonomic-equation" [ 4^4^x = 20^20^1 ] at (x=e). Let us see.
Confirmatory Results: That 88% fraction of trait/disease-associated SNPs (https://dx.doi.org/10.1073/pnas.0903103106) based on NHGRI meta-GWAS-analysis were NON-exonic (45% intronic + 43% intergenic) could be neatly verified by the PerCentage ratio of "Approximations about x=e up to order 4" of the RHS (numerical value)“ to Taylor-series expansion” of LHS (function) of the HyperProteoGenomic equation to of the same. (1.048576e+26 / 1.18591e+26) * 100 = 88.41952593367118921334671265104434569233752982941369918459... % based upon following Web-references:-
http://www.wolframalpha.com/input/?i=Taylor+series+(4%5E4%5Ex)+at+x%3De
http://www.wolframalpha.com/input/?i=(1.048576e%2B26+%2F+1.18591e%2B26)+*+100%3D
Inference: This intricately Confirmatory result clearly suggests an asymmetric tripartite-segmentation of the HyperProteoGenomic equation in the context of SNPs associations (45% Intronic + 43% Intergenic + 12% Exonic) and further deepen our fundamental understanding of the notion of Junk-DNA, fortified by double-Edged Mathematical Rigor (Jerzy et. al, 1981: Differential Calculus approach, Continuous Mathematics) and 1-dimensional Cellular Automaton, Discrete Mathematics approach (Copyright- 2017 Sharma P).
Background: My single-authored "Big Data Genetic Code", presented at Polish Academy of Sciences: Nencki Institute of Experimental Biology (http://dx.doi.org/10.7490/f1000research.1111304.1) and subsequently followed up as the "HyperProteoGenome" at ELIXIR 2nd Annual Danish... READ MORE
Background: My single-authored "Big Data Genetic Code", presented at Polish Academy of Sciences: Nencki Institute of Experimental Biology (http://dx.doi.org/10.7490/f1000research.1111304.1) and subsequently followed up as the "HyperProteoGenome" at ELIXIR 2nd Annual Danish Bioinformatics Conference, actually validated an important 1981 Polish Air-force result (https://www.ncbi.nlm.nih.gov/pubmed/7280057).
Intuition: That x= 99.9455% "Approximation to e= Napier's Constant actually motivates us to explore what actually spins-Off when we perform a "Taylor Series Expansion" of LHS of the BDGC "HyperProteoGeonomic-equation" [ 4^4^x = 20^20^1 ] at (x=e). Let us see.
Confirmatory Results: That 88% fraction of trait/disease-associated SNPs (https://dx.doi.org/10.1073/pnas.0903103106) based on NHGRI meta-GWAS-analysis were NON-exonic (45% intronic + 43% intergenic) could be neatly verified by the PerCentage ratio of "Approximations about x=e up to order 4" of the RHS (numerical value)“ to Taylor-series expansion” of LHS (function) of the HyperProteoGenomic equation to of the same. (1.048576e+26 / 1.18591e+26) * 100 = 88.41952593367118921334671265104434569233752982941369918459... % based upon following Web-references:-
http://www.wolframalpha.com/input/?i=Taylor+series+(4%5E4%5Ex)+at+x%3De
http://www.wolframalpha.com/input/?i=(1.048576e%2B26+%2F+1.18591e%2B26)+*+100%3D
Inference: This intricately Confirmatory result clearly suggests an asymmetric tripartite-segmentation of the HyperProteoGenomic equation in the context of SNPs associations (45% Intronic + 43% Intergenic + 12% Exonic) and further deepen our fundamental understanding of the notion of Junk-DNA, fortified by double-Edged Mathematical Rigor (Jerzy et. al, 1981: Differential Calculus approach, Continuous Mathematics) and 1-dimensional Cellular Automaton, Discrete Mathematics approach (Copyright- 2017 Sharma P).
From slide 9 [07/17], substituting for the value of 'c' computed,
LHS = 4^(4^c) = 4^(4^2.7168...) = RHS = 20^20 ~ 1.048576 × 10^26
(which is strikingly of the Order of Age of Universe in years!).
Thus evidently, "BIG-DATA" comes into the picture of this novel formulation of Genetic-Code,
since essentially, we need to assign 1-1 rule correspondences between (1.048576 x 10^26) neighbor-"dependant" DNA/ Genomic substitutions on the LHS, and (1.048576 x 10^26) neighbor-"independant" Protein/ Proteomic substutions on the RHS.
Inarguably, this involves "super-computation" and is data-critical, hence requires "BIG-DATA" approaches to address the issue and develop an Original and fresh relook into Genetic-code.
From slide 9 [07/17], substituting for the value of 'c' computed,
LHS = 4^(4^c) = 4^(4^2.7168...) = RHS = 20^20 ~ 1.048576 × 10^26
(which is strikingly of the... READ MORE
From slide 9 [07/17], substituting for the value of 'c' computed,
LHS = 4^(4^c) = 4^(4^2.7168...) = RHS = 20^20 ~ 1.048576 × 10^26
(which is strikingly of the Order of Age of Universe in years!).
Thus evidently, "BIG-DATA" comes into the picture of this novel formulation of Genetic-Code,
since essentially, we need to assign 1-1 rule correspondences between (1.048576 x 10^26) neighbor-"dependant" DNA/ Genomic substitutions on the LHS, and (1.048576 x 10^26) neighbor-"independant" Protein/ Proteomic substutions on the RHS.
Inarguably, this involves "super-computation" and is data-critical, hence requires "BIG-DATA" approaches to address the issue and develop an Original and fresh relook into Genetic-code.
Use of this website is subject to the F1000 Research Limited (F1000) General Terms and Conditions.
Submission of user comments to this website is subject to additional Terms and Conditions. By clicking "I accept the User Comment Terms and Conditions" before you submit your first comment, you agree to be bound by these conditions every time you submit a comment.
Terms relating to user comments