Keywords
mutation rate, self-affine development, human evolution, phase transition, genome coverage, Landau-Zener transition
This article is included in the Bioinformatics gateway.
mutation rate, self-affine development, human evolution, phase transition, genome coverage, Landau-Zener transition
The issues about an unevenness of a genome arose in particular in a distribution of coverage of sequencing reads mapped to a genome1. There, the deviations of a reads’ coverage from a 'classical' Lange-Waterman model, which was constructed following a Poisson distribution for short genome fragments, reflects some features of self-affinity for most frequent genome fragments in addition to the previously observed over-dispersion of a ''Poisson'' peak. There, the effect was observed stably in several analyzed genomes and is to be treated as a robust enough phenomenon to be discovered further and in depth.
The features of self-affinity in DNA sequences were detected at the very early days of genomics, in a classical work of Peng et al.2; a definition of the fractal dimension for the one-dimensional series was proposed there, and DNA sequences were a model of ''fractal''-like series.
The self-affinity features in a phenomenon imply the influence of perturbations with infinite dispersion, or the presence of a so-called 'fat-tail' in their probability distribution, and these features are difficult to detect and describe.
Here, an approach is proposed to detect and apply a measure to these 'perturbations' focusing on the phenomenon mentioned, which was observed in coverage distributions. The relevance of a proposed research project is demonstrated on a model of evolution of humans restored from some of the present-day human genomes; confusions which are accumulated in solving of this scientific problem were in fact a motif to drive out the research.
Self-affinity features are a property of 'transitional' period, and a description of these features is borrowed from approaches to a so-called 'Ising model', the model where a phase transition in a mutual orientation of spins in a crystal is explained. The heating of magnetic crystal leads to abrupt disappearance of magnetic momentum, and an approximation of this phenomenon is simulated in the Ising model. In a very simplified form, the interactions in crystal are presented there as an increase of energy if two spins in a linear chain are oriented in the same direction, and a critical temperature of phase transition (Tc) is derived from a strength of these interactions ('coupling constant').
The cooling led in turn to a sudden appearance of magnetic momentum, and a decrease of a temperature close to a critical temperature leads to accumulation of 'islands', long enough fragments where spins are oriented in the same order. Ordered fragments in a one-dimensional spin chain can be compared to identical fragments of genome sequences, and a distribution of these fragments allows to describe there the features of self-affinity.
In the frames of statistical physics, an expression for the probability of island of length k was obtained by Dziamagra3, as an applied case of the so-called 'Landau-Zener' transition'':
This is a point where a distribution with infinite dispersion can be introduced, assuming that a power coefficient 2 in this Gaussian-like distribution is substituted to some floating power coefficient D < 2, a dimension of 'intrinsic' self-affinity of the underlaid process. The model constructed above depends on the two flexible parameters; intrinsic dimension D and a parameter τ, a rate of cooling, or a rate of approaching a transitional phase. This model allows us to explain over-dispersion of the genome coverage distributions mentioned above (Figure 1) and to fit the parameters to a measure of unevenness of human genomes, trying to reconstruct their evolution most precisely . The distribution of 'islands' for human genomes can be obtained as distribution of lengths of completely identical fragments in the genomes; lists of polymorphic positions (SNP) from ''1000 genomes'' project4 were used as a representation of genomes. A similar distribution of fragment sizes is observed for this data (Figure 2A; Table 1). The clusters for the three races are clearly seen for both genetic distance and for tails of 'island' sizes in genome-genome comparison (Figure 2B). To interpret this, higher unevenness relative to same genetic distance means higher 'equilibration rate', higher mutation rates, and a lesser slope of a fitting line.
(A) Simulation of over-dispersion effect in a genome following modified Ising model of islands in a chain of spins. Here, D = 1.9, τ = 0.0005, number of spins N = 200. The dashed line emulates an ordinary Poisson distribution, τ is the same. (B) Dependency between estimated fitting coefficient, the average of distribution, and underlying closeness to 'phase transition'.
(A) Distribution of fragments’ sizes in human genomes in a pairwise comparison. Measures of closeness between genomes: genetic distance (B) and unevenness of fragment sizes (C). Here races are, from left to right, from bottom to top: Asia, Mexico, Europe, India, Africa.
’unevenness’ m | genetic distance p | ||
---|---|---|---|
Europe | Europe | 0.0138 | 0.25 |
Europe | Asia | 0.0144 | 0.27 |
Asia | Asia | 0.0137 | 0.22 |
Europe | Africa | 0.0152 | 0.29 |
Asia | Africa | 0.0153 | 0.29 |
Africa | Africa | 0.0131 | 0.24 |
For the two independent populations, the distance does not depend on heterogeneity in populations; a simple model of exponential development can provide a dependency of average genetic distance within population.
The simplified model of evolution of human ethnic groups is shown in Figure 3, and for further consideration it was reduced to just clarify a separation between three human races. In this case, the model can be further reduced to a system of equations (2). The events which are assumed here as events of separation between races are (a) separation between modern Asians and modern Europeans, which happened nearly just after an expansion to America, about ten thousand years ago; and (b) expansion of modern humans to Eurasia, about fifty thousand years ago.
pa, pe, ... - heterogeneity of genomes within a communicating group; dae, ... - distance between genomes of separated groups; be, ba, ... relative heterogeneity of a group in the events of separation between groups. ka, ke, ... - rates of 'exponential' development.
The mutation rate 10-9 per nucleotide per year can be transformed as 3 per genome per year = 3000 per genome per thousand years, 3000 mutations per 700000 SNP, so that r in the equations above should be about 0.004.
Having a requirement that p0 ≥ 0, the r should be less than 0.0029. Rates of development are assumed to be unknown, what is only known is a dependency between a rate of development and a linear coefficient m. Values of ba, be, bA, pae, kae which are attributes of a passed history are also assumed to be unknown.
For a marginal but the most confident assumption, p0 = 0, pae = 0.21, kA = 1.65 and kae = 1.81. Unevenness in a comparison between groups is a weighted average of evolutionary paths from a time of separation, so that if kA ~ mA, ke ~ me, then kae ~ (mAeT + mA(T + t) + me t)/2(T + t); kae ≈ 0.0141. Following a log-linear approximation, ka and ke, for modern Asians and Europeans, should be about 1.74 and 1.76.
The genetic diversity of Asians in a time of separation is than estimated as 0.15, much lower than a pool of genotypes just before the separation (pae = 0.21). For Europeans, the pool of genotypes was wider, about 0.20.
What is known is that Eurasia is a continent with good communications, and that it was populated mostly by ancestors of present-day Asian race before the time of last separation, or 'crash', in better words. It is also known that ancestors of modern Europeans began to expand to Europe mostly after that 'crash'; the founders of the latest expansion originated from tribes which developed slowly for a long time somewhere in an area of central Asia mountains. The wide expansion to Eurasia for the ancient pre-Asian race was characterized, instead, by a substantive increase of genome unevenness. Some of it now is lost, some are kept in native Americans, and, for modern Asians, the instabilities in diverged genomes were neutralized by a long enough period of stable slow development after the crash.
Selection of individuals was almost the same as in Schiffels and Durbin5, the difference with that model is that non-Gaussian features in genomes are considered here explicitly. This has a substantial influence on a reconstructed history of the three human races. Dealing with self-affine phenomena is difficult and risky, but it by no way can be ignored in any of valuable challenges to a present-day science.
Zenodo: Measure of unevenness in human genomes: supplement software utilities and intermediate data files, http://doi.org/10.5281/zenodo.44954446.
This project contains the utilities to convert, pre-process and compare genotypes.
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
The International Genome Sample Resource: 1000 Genomes phase 3 release, https://www.internationalgenome.org/data-portal/data-collection/phase-3
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: statistical genetics
Peer review at F1000Research is author-driven. Currently no reviewers are being invited.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |
---|---|
1 | |
Version 1 25 Feb 21 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)