Keywords
scaling patterns, ecology, demography, linguistics, probability theory
This article is included in the Mathematical, Physical, and Computational Sciences collection.
scaling patterns, ecology, demography, linguistics, probability theory
A few simple patterns recur in nature. Adding up random processes often leads to the bell-shaped normal distribution. Death and other failures typically follow the extreme value distributions.
Those simple patterns recur under widely varying conditions. Something fundamental must set the relations between pattern and underlying process. To understand the common patterns of nature, we must know what fundamentally constrains the forms that we see.
Without that general understanding, we will often reach for unnecessarily detailed and complex models of process to explain what is in fact some structural property that influences the invariant form of observed pattern.
We already understand that the central limit theorem explains the widely observed normal distribution1. Similar limit theorems explain why failure often follows the extreme value pattern2,3.
The puzzles set by other commonly observed patterns remain unsolved. Each of those puzzles poses a challenge. The solutions will likely broaden our general understanding of what causes pattern. Such insight will help greatly in the big data analyses that play an increasingly important role in modern science.
Zipf’s law is one of the great unsolved puzzles of invariant pattern. The frequency of word usage4, the sizes of cities5,6, and the sizes of corporations7 have the same shape. On a log-log plot of rank versus abundance, the slope is minus one. For cities, the largest city would have a rank of one, the second largest city a rank of two, and so on. Abundance is population size.
The abundance of species is another great unsolved puzzle of invariant pattern. In an ecological community, the probability that a species has a population size of n individuals is proportional to pn/n, the log series pattern8. Communities differ only in their average population size, described by the parameter, p. Actual data vary, but most often fit closely to the log series9.
In this article, I show that Zipf’s law and the log series arise as the opposing endpoints of a more general theory. That theory provides insight into the particular puzzles of Zipf’s law and species abundances. The analysis also suggests deeper insights that will help to unify understanding of commonly observed patterns.
The argument begins with the invariances that define alternative probability patterns10,11. To analyze the invariances of a probability distribution, note that we can write almost any probability distribution, qz, as
in which T(z) ≡ Tz is a function of the variable z. The probability pattern, qz, is invariant to a constant shift, Tz ↦ a + Tz, because we can write the transformed probability pattern in Equation 1 as with k = kae–λa. We express k in this way because k adjusts to satisfy the constraint that the total probability be one. In other words, conserved total probability implies that the probability pattern is shift invariant with respect to Tz.Now consider the consequences if the average of some value over the distribution qz is conserved. That constraint causes the probability pattern to be invariant to a multiplicative stretching (or shrinking), Tz ↦ bTz, because
with λ = λbb. We specify λ in this way because λ adjusts to satisfy the constraint of conserved average value. Thus, invariant average value implies that the probability pattern is stretch invariant with respect to Tz.Conserved total probability and conserved average value cause the probability pattern to be invariant to an affine transformation of the Tz scale, Tz ↦ a + bTz, in which “affine” means both shift and stretch.
The affine invariance of probability patterns with respect to Tz induces significant structure on the form of Tz and the associated form of probability patterns. Understanding that structure provides insight into probability patterns and the processes that generate them10,12,13.
In particular, Frank & Smith12 showed that the invariance of probability patterns to affine transformation, Tz ↦ a + bTz, implies that Tz satisfies the differential equation
in which w(z) is a function of the variable z. The solution of this differential equation expresses the scaling of probability patterns in the generic form in which, because of the affine invariance of Tz, I have added and multiplied by constants to obtain a convenient form, with Tz → w as β → 0. With this expression for Tz, we may write probability patterns generically asTurning now to the log series and Zipf’s law, the relation n = er between observed pattern, n, and process, r, plays a central role. Here, r represents the total of all proportional processes acting on abundance. A proportional process simply means that the number of individuals or entities affected by the process increases in proportion to the number currently present, n.
The sum of all of the proportional processes acting on abundance over some period of time is
Here, m(t) is a proportional process acting at time t to change abundance. The value of r = log n is the total of the m values over the total time, τ. For simplicity, I assume n0 = 1.
Proportional processes are often discussed in terms of population growth5,14. However, many different processes act individually on the members of a population. If the number of individuals affected increases in proportion to population size, then the process is a proportional process.
Growth and other proportional processes often lead to an approximate power law, qn ≈ kn–ρ. However, the exponent of a growth process does not necessarily match the values observed in the log series and Zipf’s law. We need both the power law aspect of proportional process and something further to get the specific forms of those widely observed abundance distributions. That something further arises from conserved quantities and their associated invariances.
The log series and Zipf’s law follow as special cases of the generic probability pattern in Equation 3. To analyze abundance, focus on the process scale by letting the variable of interest be z ≡ r, with the key scaling simply the process variable itself, w(r) = r. Then Equation 3 becomes
in which qrdr is the probability of a process value, r, in the interval r + dr. From the relation between abundance and process, n = er, we can change from the process scale to the abundance scale by the substitutions r ↦ log n and dr ↦ n–1dn, yielding the identical probability pattern expressed on the abundance scaleThe value of k always adjusts to satisfy the constraint of invariant total probability, and the value of λ always adjusts to satisfy the constraint of invariant average value.
For β = 1, we obtain the log series distribution
replacing n – 1 by n in the exponential term which, because of affine invariance, describe the same probability pattern. The log series is often written with e–λ = p, and thus qn = kpn/n. One typically observes discrete values n = 1, 2, . . . . The Supplemental material for this article15 shows the relation between discrete and continuous distributions16 and the domain of the variables. The continuous analysis here is sufficient to understand pattern.
For β → 0, we have (nβ – 1)/β → log n, which yields
for n ≥ 1. If we constrain average abundance, 〈n〉, with respect to this distribution, thenFor any average abundance that is finite and not small, λ → 1, which is Zipf’s law.
Equation 5 provides a general expression for abundance distributions. The log series and Zipf’s law set the endpoints of β = 1 and β → 0. We can understand the differences between abundance distributions in terms of the parameter β by writing the distribution in the generic form of Equation 1, with the defining affine invariant scale
This scale expresses the invariances that define the pattern. At the Zipf’s law endpoint, β → 0, the scale becomes 2 log n = 2r, when satisfying the constraint that the average abundance, 〈n〉, is sufficiently large.
In this case, with affine invariant scale Tn = 2r, neither addition nor multiplication of process value, r ↦ a + br, alters the pattern. We could have started with this affine invariance, and derived the probability pattern from the invariance properties10,11.
For the log series endpoint, β = 1, the affine invariant scale is
The dominant aspect of the scale changes with n. For small abundances, the logarithmic scale r = log n dominates, and for large abundances, the linear scale n = er dominates. Many common probability patterns change their scaling with magnitude13,17.
For log series patterns, the dominance of scale at small magnitude by r corresponds to affine invariance with respect to r. At larger abundances, the dominance by the effectively linear scale, n, corresponds to invariance to a shift in process r ↦ a + r, but not to a multiplication of process, r ↦ br, because ebr = nb is a power transformation of abundance. Linear scales are not invariant to power transformations. Once again, we could have derived the pattern from the invariances.
In Equation 8, intermediate values of β combine aspects of Zipf’s law and the log series. The closer β is to one of the endpoints, the more the invariance characteristics of that endpoint dominate pattern.
This analysis shows how two great and seemingly unconnected puzzles solve very simply in terms of a single continuum between alternative invariances. This approach reveals the simple invariant structure of many common probability patterns.
All data underlying the results are available as part of the article and no additional source data are required.
Zenodo: Supplemental Material for “The common patterns of abundance: the log series and Zipf’s law”. https://doi.org/10.5281/zenodo.259789515.
The Donald Bren Foundation supports my research.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
I completed this work while on sabbatical in the Theoretical Biology group of the Institute for Integrative Biology at Eidgenössische Technische Hochschule (ETH) Zürich.
A previous version of this article is available on arXiv: https://arxiv.org/abs/1812.09662
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
No source data required
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Quantitative biology, decision-making, epidemiology, science for policy support and analysis
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Complexity
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Urban economics, growth models, analysis of statistical distributions as models of change.
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
No source data required
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Volkov I, Banavar JR, Hubbell SP, Maritan A: Patterns of relative species abundance in rainforests and coral reefs.Nature. 2007; 450 (7166): 45-9 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Population Dynamics, Dynamical Systems, Ecology and Evolution, Statistical Mechanics, Complex Systems.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
Version 1 25 Mar 19 |
read | read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)