Keywords
Population genetics, probability distributions, extreme value distributions
This article is included in the Genomics and Genetics gateway.
Population genetics, probability distributions, extreme value distributions
In Equation 1, I replaced m < z with m ≤ z so that the new equation is F(z)=Prob(m≤z)=exp(−(z−βs)−α)
See the author's detailed response to the review by Qi Zheng
See the author's detailed response to the review by Pavol Bokes
Suppose a single cell expands exponentially to a population of size N, with a mutation rate of u per cell division. The number of mutant cells, m, in the final population depends on the number of mutations that occur and when those mutations occur. For example, a single mutation in the final round of cell division is limited to one cell. By contrast, a single mutation transmitted to one of the daughters in the first cellular division may occur in approximately one-half of the final population.
The distribution of the number mutants, m, is known as the Luria–Delbrück distribution1. That distribution is widely used to estimate the mutation rate. The distribution also arises when studying the amount of mutational mosaicism within multicellular individuals2–4.
Currently, for experiments with a small number of mutational events, one typically calculates the distribution with a probability generating function5,6. However, that approach becomes numerically inaccurate for larger numbers of mutational events, in which case the distribution is calculated by computer simulation.
This article shows that the Fréchet distribution provides a good approximation for the number of neutral mutants. In particular, the probability that the number of mutants, m, is less than z is approximately
in which exp(z) = ez is the exponential function. The probability of being in the upper tail, m > z, is 1 − F(z). The three parameters set the shape, α, the scale, s, and the minimum value, β, such that z, m > β.
This form of the Fréchet distribution has three parameters. I found that the following parameterization matches closely the Luria–Delbrück process for neutral mutations
in which e is the base of the natural logarithm. This parameterization depends on the single parameter, Nu, the final population size times the mutation rate.
Figure 1 shows the good fit. Two aspects of mismatch occur. First, the number of mutants is discrete, whereas the Fréchet is continuous. As Nu declines to one, significant amounts of probability mass concentrate at particular mutant number values, causing discrepancy between the distributions. Nonetheless, the Fréchet remains a good approximation.
Each population begins with one cell and grows to N cells. Mutation occurs at rate u. Blue curves show the distribution from a computer simulation using the simu.cultures command of the R package rSalvador7. Orange curves show the Fréchet distribution in Equation 1. In rSalvador, I used sample sizes of 106 or 107, values of Nu varying as shown above the plots, and values of N ranging from 106 to 1010. The Julia software code to produce this figure is available from Zenodo8. The input data for calculating the empirical Luria-Delbrück CDF is also available from Zenodo9.
Second, the lower tail of the Luria–Delbrück process spreads to lower values than the Fréchet. One can see this mismatch most clearly in the figure for Nu ≥ 100.
This mismatch may occur because the Luria–Delbrück process transitions from a highly stochastic process in earlier cellular generations to a nearly deterministic accumulation of mutations in later cellular generations, when the larger population size reduces the coefficient of variation in the number of new mutations. The Fréchet applies most closely to the earlier generations for the following reasons.
In an expanding population, the earliest mutation strongly influences the final number of mutants. An early mutant carries forward to all descendant cells in an expanding mutant clone. If we start with the final cells and then look back through the cellular generations toward the original progenitor, the mutation with the most extreme time from the end toward the beginning tends to dominate the final mutant number.
The extreme value of a temporal extent often has a Gumbel distribution. In this case, once the mutation arises, it increases multiplicatively by cell division to affect the final mutation count. Substituting the extreme Gumbel time for its multiplicative consequence provides a common way to observe a Fréchet probability pattern.
Prior mathematical work also supports the Fréchet approximation. Kessler and Levine10 showed that the Luria–Delbrück distribution converges to a Landau distribution for large Nu, in which the Landau distribution is a special case of the Lévy α-stable distribution. However, the Landau distribution does not have a closed-form expression for its probability or cumulative distribution functions.
Separately, Simon11 showed the close match between the Lévy α-stable distribution and the Fréchet distribution. That match of a Lévy distribution to the Fréchet distribution had not previously been associated with the Luria–Delbrück distribution. The Fréchet parameterization in this article provides a simple expression that can be used to develop further theory and applications of the Luria–Delbrück process.
The input data for calculating the empirical Luria-Delbrück CDF:
Zenodo: Empirical CDF for Luria-Delbrück distribution from rSalvador package. https://doi.org/10.5281/zenodo.70756559.
The Julia software code used to produce Figure 1:
Source code available from: https://github.com/evolbio/FrechetLD
Archived source code at time of publication: https://doi.org/10.5281/zenodo.72550508
License: MIT
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Mathematical biology, stochastic modelling, gene expression, differential equations
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 03 Mar 23 |
||
Version 1 04 Nov 22 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)