ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Measure of unevenness in human genomes, described as a self-affine phase transition in a 'spin-chain’ model

[version 1; peer review: 1 not approved]
PUBLISHED 25 Feb 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Bioinformatics gateway.

Abstract

Background: Non-Gaussian distribution of polymorphic positions across a genome can substantially influence the results of any approach to molecular evolution based on a 'classical' probability model. The infinite dispersion of non-Gaussian perturbations is a challenge in an attempt to accept it in a probability-based model of evolution.
Methods: Here a model is proposed where non-Gaussian distribution is introduced to an exact solution of the 'Ising model'; it describes a behavior of one-dimensional chain of spins in an approaching to a phase transition. The distribution of fragments which are identical between two genomes is similar to distribution of islands of spins with the same orientation, in the model where non-integer dimension is introduced.
Results: Application of this model allows us to compare the relative contributions of non-Gaussian perturbations for pairs of human genomes from different ethnic groups. An evolution of the three human races in a most compact presentation is considered, rates of development on the separated stages of the evolution are assumed to be proportional to a value of relative unevenness between the appropriate groups of genomes. In the resolved model, the meaningful details of the separation between Asian and European races are clarified, in a period around ten thousand years ago; a particular viewpoint to the separation of the African race is also presented.
Conclusion: The proposed approximation of non-Gaussian perturbations in human genomes allows to support the statements which are otherwise missed in the scientific investigations of the early history of modern humans.

Keywords

mutation rate, self-affine development, human evolution, phase transition, genome coverage, Landau-Zener transition

Introduction

The issues about an unevenness of a genome arose in particular in a distribution of coverage of sequencing reads mapped to a genome1. There, the deviations of a reads’ coverage from a 'classical' Lange-Waterman model, which was constructed following a Poisson distribution for short genome fragments, reflects some features of self-affinity for most frequent genome fragments in addition to the previously observed over-dispersion of a ''Poisson'' peak. There, the effect was observed stably in several analyzed genomes and is to be treated as a robust enough phenomenon to be discovered further and in depth.

The features of self-affinity in DNA sequences were detected at the very early days of genomics, in a classical work of Peng et al.2; a definition of the fractal dimension for the one-dimensional series was proposed there, and DNA sequences were a model of ''fractal''-like series.

The self-affinity features in a phenomenon imply the influence of perturbations with infinite dispersion, or the presence of a so-called 'fat-tail' in their probability distribution, and these features are difficult to detect and describe.

Here, an approach is proposed to detect and apply a measure to these 'perturbations' focusing on the phenomenon mentioned, which was observed in coverage distributions. The relevance of a proposed research project is demonstrated on a model of evolution of humans restored from some of the present-day human genomes; confusions which are accumulated in solving of this scientific problem were in fact a motif to drive out the research.

Methods

Self-affinity features are a property of 'transitional' period, and a description of these features is borrowed from approaches to a so-called 'Ising model', the model where a phase transition in a mutual orientation of spins in a crystal is explained. The heating of magnetic crystal leads to abrupt disappearance of magnetic momentum, and an approximation of this phenomenon is simulated in the Ising model. In a very simplified form, the interactions in crystal are presented there as an increase of energy if two spins in a linear chain are oriented in the same direction, and a critical temperature of phase transition (Tc) is derived from a strength of these interactions ('coupling constant').

The cooling led in turn to a sudden appearance of magnetic momentum, and a decrease of a temperature close to a critical temperature leads to accumulation of 'islands', long enough fragments where spins are oriented in the same order. Ordered fragments in a one-dimensional spin chain can be compared to identical fragments of genome sequences, and a distribution of these fragments allows to describe there the features of self-affinity.

In the frames of statistical physics, an expression for the probability of island of length k was obtained by Dziamagra3, as an applied case of the so-called 'Landau-Zener' transition'':

p(k,τ)~eτk2

This is a point where a distribution with infinite dispersion can be introduced, assuming that a power coefficient 2 in this Gaussian-like distribution is substituted to some floating power coefficient D < 2, a dimension of 'intrinsic' self-affinity of the underlaid process. The model constructed above depends on the two flexible parameters; intrinsic dimension D and a parameter τ, a rate of cooling, or a rate of approaching a transitional phase. This model allows us to explain over-dispersion of the genome coverage distributions mentioned above (Figure 1) and to fit the parameters to a measure of unevenness of human genomes, trying to reconstruct their evolution most precisely . The distribution of 'islands' for human genomes can be obtained as distribution of lengths of completely identical fragments in the genomes; lists of polymorphic positions (SNP) from ''1000 genomes'' project4 were used as a representation of genomes. A similar distribution of fragment sizes is observed for this data (Figure 2A; Table 1). The clusters for the three races are clearly seen for both genetic distance and for tails of 'island' sizes in genome-genome comparison (Figure 2B). To interpret this, higher unevenness relative to same genetic distance means higher 'equilibration rate', higher mutation rates, and a lesser slope of a fitting line.

b5070e06-bbc3-4e52-963f-dd1f9b93aa18_figure1.gif

Figure 1.

(A) Simulation of over-dispersion effect in a genome following modified Ising model of islands in a chain of spins. Here, D = 1.9, τ = 0.0005, number of spins N = 200. The dashed line emulates an ordinary Poisson distribution, τ is the same. (B) Dependency between estimated fitting coefficient, the average of distribution, and underlying closeness to 'phase transition'.

b5070e06-bbc3-4e52-963f-dd1f9b93aa18_figure2.gif

Figure 2.

(A) Distribution of fragments’ sizes in human genomes in a pairwise comparison. Measures of closeness between genomes: genetic distance (B) and unevenness of fragment sizes (C). Here races are, from left to right, from bottom to top: Asia, Mexico, Europe, India, Africa.

Table 1. Approximated averaged similarities between human races and within a race, according to the detailed chart in Figure 2.

’unevenness’ mgenetic
distance p
EuropeEurope0.01380.25
EuropeAsia0.01440.27
AsiaAsia0.01370.22
EuropeAfrica0.01520.29
AsiaAfrica0.01530.29
AfricaAfrica0.01310.24

Results

For the two independent populations, the distance does not depend on heterogeneity in populations; a simple model of exponential development can provide a dependency of average genetic distance within population.

pxp0x+trx1N0nNg(nαn)k~em1+mdx=p0+trxNg+tryNg

The simplified model of evolution of human ethnic groups is shown in Figure 3, and for further consideration it was reduced to just clarify a separation between three human races. In this case, the model can be further reduced to a system of equations (2). The events which are assumed here as events of separation between races are (a) separation between modern Asians and modern Europeans, which happened nearly just after an expansion to America, about ten thousand years ago; and (b) expansion of modern humans to Eurasia, about fifty thousand years ago.

b5070e06-bbc3-4e52-963f-dd1f9b93aa18_figure3.gif

Figure 3. Simplified presentation of human evolution.

pa, pe, ... - heterogeneity of genomes within a communicating group; dae, ... - distance between genomes of separated groups; be, ba, ... relative heterogeneity of a group in the events of separation between groups. ka, ke, ... - rates of 'exponential' development.

pa=bapae+tkarpe=bepae+tkerdae=pae+2trpA=bAp0+(t+T)kArpae=baep0+TkaerdAe=p0+2(T+t)r

The mutation rate 10-9 per nucleotide per year can be transformed as 3 per genome per year = 3000 per genome per thousand years, 3000 mutations per 700000 SNP, so that r in the equations above should be about 0.004.

Having a requirement that p0 ≥ 0, the r should be less than 0.0029. Rates of development are assumed to be unknown, what is only known is a dependency between a rate of development and a linear coefficient m. Values of ba, be, bA, pae, kae which are attributes of a passed history are also assumed to be unknown.

For a marginal but the most confident assumption, p0 = 0, pae = 0.21, kA = 1.65 and kae = 1.81. Unevenness in a comparison between groups is a weighted average of evolutionary paths from a time of separation, so that if kA ~ mA, ke ~ me, then kae ~ (mAeT + mA(T + t) + me t)/2(T + t); kae ≈ 0.0141. Following a log-linear approximation, ka and ke, for modern Asians and Europeans, should be about 1.74 and 1.76.

The genetic diversity of Asians in a time of separation is than estimated as 0.15, much lower than a pool of genotypes just before the separation (pae = 0.21). For Europeans, the pool of genotypes was wider, about 0.20.

What is known is that Eurasia is a continent with good communications, and that it was populated mostly by ancestors of present-day Asian race before the time of last separation, or 'crash', in better words. It is also known that ancestors of modern Europeans began to expand to Europe mostly after that 'crash'; the founders of the latest expansion originated from tribes which developed slowly for a long time somewhere in an area of central Asia mountains. The wide expansion to Eurasia for the ancient pre-Asian race was characterized, instead, by a substantive increase of genome unevenness. Some of it now is lost, some are kept in native Americans, and, for modern Asians, the instabilities in diverged genomes were neutralized by a long enough period of stable slow development after the crash.

Conclusions

Selection of individuals was almost the same as in Schiffels and Durbin5, the difference with that model is that non-Gaussian features in genomes are considered here explicitly. This has a substantial influence on a reconstructed history of the three human races. Dealing with self-affine phenomena is difficult and risky, but it by no way can be ignored in any of valuable challenges to a present-day science.

Data availability

Underlying data

Zenodo: Measure of unevenness in human genomes: supplement software utilities and intermediate data files, http://doi.org/10.5281/zenodo.44954446.

This project contains the utilities to convert, pre-process and compare genotypes.

Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

The International Genome Sample Resource: 1000 Genomes phase 3 release, https://www.internationalgenome.org/data-portal/data-collection/phase-3

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Feb 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Feranchuk S. Measure of unevenness in human genomes, described as a self-affine phase transition in a 'spin-chain’ model [version 1; peer review: 1 not approved]. F1000Research 2021, 10:149 (https://doi.org/10.12688/f1000research.51192.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Peer review discontinued

Peer review at F1000Research is author-driven. Currently no reviewers are being invited. What does this mean?
Version 1
VERSION 1
PUBLISHED 25 Feb 2021
Views
11
Cite
Reviewer Report 04 Apr 2022
Ying Ji, Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN, USA 
Not Approved
VIEWS 11
The author presented a model to approximate non-Gaussian perturbations in human genomes, and provided reconstructed history of Europe, Asia, and Africa based on the model results.

Human evolution history is very interesting and important problem to work, ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ji Y. Reviewer Report For: Measure of unevenness in human genomes, described as a self-affine phase transition in a 'spin-chain’ model [version 1; peer review: 1 not approved]. F1000Research 2021, 10:149 (https://doi.org/10.5256/f1000research.54328.r127304)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 07 Apr 2022
    Sergey Feranchuk, Department of Physics, Smolensk State University, Smolensk, Russian Federation
    07 Apr 2022
    Author Response
    a foreword.

    I appreciate the efforts of a reviewer who is a specialist in a statistical genetics.
    In my reply, first, I would like to point out the meaning ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 07 Apr 2022
    Sergey Feranchuk, Department of Physics, Smolensk State University, Smolensk, Russian Federation
    07 Apr 2022
    Author Response
    a foreword.

    I appreciate the efforts of a reviewer who is a specialist in a statistical genetics.
    In my reply, first, I would like to point out the meaning ... Continue reading

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Feb 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.