ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

The common patterns of abundance: the log series and Zipf's law

[version 1; peer review: 4 approved]
PUBLISHED 25 Mar 2019
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Mathematical, Physical, and Computational Sciences collection.

Abstract

In a language corpus, the probability that a word occurs n times is often proportional to 1/n2. Assigning rank, s, to words according to their abundance, log s vs log n typically has a slope of minus one. That simple Zipf's law pattern also arises in the population sizes of cities, the sizes of corporations, and other patterns of abundance. By contrast, for the abundances of different biological species, the probability of a population of size n is typically proportional to 1/n, declining exponentially for larger n, the log series pattern.
This article shows that the differing patterns of Zipf's law and the log series arise as the opposing endpoints of a more general theory. The general theory follows from the generic form of all probability patterns as a consequence of conserved average values and the associated invariances of scale.
To understand the common patterns of abundance, the generic form of probability distributions plus the conserved average abundance is sufficient. The general theory includes cases that are between the Zipf and log series endpoints, providing a broad framework for analyzing widely observed abundance patterns.

Keywords

scaling patterns, ecology, demography, linguistics, probability theory

Introduction

A few simple patterns recur in nature. Adding up random processes often leads to the bell-shaped normal distribution. Death and other failures typically follow the extreme value distributions.

Those simple patterns recur under widely varying conditions. Something fundamental must set the relations between pattern and underlying process. To understand the common patterns of nature, we must know what fundamentally constrains the forms that we see.

Without that general understanding, we will often reach for unnecessarily detailed and complex models of process to explain what is in fact some structural property that influences the invariant form of observed pattern.

We already understand that the central limit theorem explains the widely observed normal distribution1. Similar limit theorems explain why failure often follows the extreme value pattern2,3.

The puzzles set by other commonly observed patterns remain unsolved. Each of those puzzles poses a challenge. The solutions will likely broaden our general understanding of what causes pattern. Such insight will help greatly in the big data analyses that play an increasingly important role in modern science.

Zipf’s law is one of the great unsolved puzzles of invariant pattern. The frequency of word usage4, the sizes of cities5,6, and the sizes of corporations7 have the same shape. On a log-log plot of rank versus abundance, the slope is minus one. For cities, the largest city would have a rank of one, the second largest city a rank of two, and so on. Abundance is population size.

The abundance of species is another great unsolved puzzle of invariant pattern. In an ecological community, the probability that a species has a population size of n individuals is proportional to pn/n, the log series pattern8. Communities differ only in their average population size, described by the parameter, p. Actual data vary, but most often fit closely to the log series9.

In this article, I show that Zipf’s law and the log series arise as the opposing endpoints of a more general theory. That theory provides insight into the particular puzzles of Zipf’s law and species abundances. The analysis also suggests deeper insights that will help to unify understanding of commonly observed patterns.

Theory

The argument begins with the invariances that define alternative probability patterns10,11. To analyze the invariances of a probability distribution, note that we can write almost any probability distribution, qz, as

qz=keλTz,(1)
in which T(z) ≡ Tz is a function of the variable z. The probability pattern, qz, is invariant to a constant shift, Tza + Tz, because we can write the transformed probability pattern in Equation 1 as
qz=kaeλ(a+Tz)=keλTz,
with k = kaeλa. We express k in this way because k adjusts to satisfy the constraint that the total probability be one. In other words, conserved total probability implies that the probability pattern is shift invariant with respect to Tz.

Now consider the consequences if the average of some value over the distribution qz is conserved. That constraint causes the probability pattern to be invariant to a multiplicative stretching (or shrinking), TzbTz, because

qz=keλbbTz=keλTz,
with λ = λbb. We specify λ in this way because λ adjusts to satisfy the constraint of conserved average value. Thus, invariant average value implies that the probability pattern is stretch invariant with respect to Tz.

Conserved total probability and conserved average value cause the probability pattern to be invariant to an affine transformation of the Tz scale, Tza + bTz, in which “affine” means both shift and stretch.

The affine invariance of probability patterns with respect to Tz induces significant structure on the form of Tz and the associated form of probability patterns. Understanding that structure provides insight into probability patterns and the processes that generate them10,12,13.

In particular, Frank & Smith12 showed that the invariance of probability patterns to affine transformation, Tza + bTz, implies that Tz satisfies the differential equation

dTzdw=α+βTz,
in which w(z) is a function of the variable z. The solution of this differential equation expresses the scaling of probability patterns in the generic form
Tz=1β(eβw1),(2)
in which, because of the affine invariance of Tz, I have added and multiplied by constants to obtain a convenient form, with Tzw as β → 0. With this expression for Tz, we may write probability patterns generically as
qz=keλ(eβw1)/β.(3)

Turning now to the log series and Zipf’s law, the relation n = er between observed pattern, n, and process, r, plays a central role. Here, r represents the total of all proportional processes acting on abundance. A proportional process simply means that the number of individuals or entities affected by the process increases in proportion to the number currently present, n.

The sum of all of the proportional processes acting on abundance over some period of time is

r=0τm(t)dt.

Here, m(t) is a proportional process acting at time t to change abundance. The value of r = log n is the total of the m values over the total time, τ. For simplicity, I assume n0 = 1.

Proportional processes are often discussed in terms of population growth5,14. However, many different processes act individually on the members of a population. If the number of individuals affected increases in proportion to population size, then the process is a proportional process.

Growth and other proportional processes often lead to an approximate power law, qnknρ. However, the exponent of a growth process does not necessarily match the values observed in the log series and Zipf’s law. We need both the power law aspect of proportional process and something further to get the specific forms of those widely observed abundance distributions. That something further arises from conserved quantities and their associated invariances.

The log series and Zipf’s law follow as special cases of the generic probability pattern in Equation 3. To analyze abundance, focus on the process scale by letting the variable of interest be zr, with the key scaling simply the process variable itself, w(r) = r. Then Equation 3 becomes

qrdr=keλ(eβr1)/βdr,(4)
in which qrdr is the probability of a process value, r, in the interval r + dr. From the relation between abundance and process, n = er, we can change from the process scale to the abundance scale by the substitutions r ↦ log n and drn–1dn, yielding the identical probability pattern expressed on the abundance scale
qndn=kn1eλ(nβ1)/βdn.(5)

The value of k always adjusts to satisfy the constraint of invariant total probability, and the value of λ always adjusts to satisfy the constraint of invariant average value.

For β = 1, we obtain the log series distribution

qn=kn1eλn,(6)

replacing n – 1 by n in the exponential term which, because of affine invariance, describe the same probability pattern. The log series is often written with eλ = p, and thus qn = kpn/n. One typically observes discrete values n = 1, 2, . . . . The Supplemental material for this article15 shows the relation between discrete and continuous distributions16 and the domain of the variables. The continuous analysis here is sufficient to understand pattern.

For β → 0, we have (nβ – 1)/β → log n, which yields

qn=λn(1+λ)(7)
for n ≥ 1. If we constrain average abundance, 〈n〉, with respect to this distribution, then
λ=111/n.

For any average abundance that is finite and not small, λ → 1, which is Zipf’s law.

Equation 5 provides a general expression for abundance distributions. The log series and Zipf’s law set the endpoints of β = 1 and β → 0. We can understand the differences between abundance distributions in terms of the parameter β by writing the distribution in the generic form of Equation 1, with the defining affine invariant scale

Tn=lognλ+nβ1β.(8)

This scale expresses the invariances that define the pattern. At the Zipf’s law endpoint, β → 0, the scale becomes 2 log n = 2r, when satisfying the constraint that the average abundance, 〈n〉, is sufficiently large.

In this case, with affine invariant scale Tn = 2r, neither addition nor multiplication of process value, ra + br, alters the pattern. We could have started with this affine invariance, and derived the probability pattern from the invariance properties10,11.

For the log series endpoint, β = 1, the affine invariant scale is

Tn=1λlogn+n.

The dominant aspect of the scale changes with n. For small abundances, the logarithmic scale r = log n dominates, and for large abundances, the linear scale n = er dominates. Many common probability patterns change their scaling with magnitude13,17.

For log series patterns, the dominance of scale at small magnitude by r corresponds to affine invariance with respect to r. At larger abundances, the dominance by the effectively linear scale, n, corresponds to invariance to a shift in process ra + r, but not to a multiplication of process, rbr, because ebr = nb is a power transformation of abundance. Linear scales are not invariant to power transformations. Once again, we could have derived the pattern from the invariances.

In Equation 8, intermediate values of β combine aspects of Zipf’s law and the log series. The closer β is to one of the endpoints, the more the invariance characteristics of that endpoint dominate pattern.

Conclusion

This analysis shows how two great and seemingly unconnected puzzles solve very simply in terms of a single continuum between alternative invariances. This approach reveals the simple invariant structure of many common probability patterns.

Data availability

Underlying data

All data underlying the results are available as part of the article and no additional source data are required.

Extended data

Zenodo: Supplemental Material for “The common patterns of abundance: the log series and Zipf’s law”. https://doi.org/10.5281/zenodo.259789515.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Mar 2019
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Frank SA. The common patterns of abundance: the log series and Zipf's law [version 1; peer review: 4 approved]. F1000Research 2019, 8:334 (https://doi.org/10.12688/f1000research.18681.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 25 Mar 2019
Views
13
Cite
Reviewer Report 29 Apr 2019
Neil McRoberts, Department of Plant Pathology, University of California, Davis, Davis, CA, USA 
Approved
VIEWS 13
This paper continues a series of investigations by Prof Frank into general explanations for the presence of common patterns in the world. Previous work has examined underlying reasons for common probability distributions across a range of types of observation and ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
McRoberts N. Reviewer Report For: The common patterns of abundance: the log series and Zipf's law [version 1; peer review: 4 approved]. F1000Research 2019, 8:334 (https://doi.org/10.5256/f1000research.20456.r46234)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
12
Cite
Reviewer Report 17 Apr 2019
Scott E. Page, Department of Political Science, Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI, USA 
Approved
VIEWS 12
I found this article to be fascinating and elucidating but also a bit frustrating to read. The central claim of the article is that one can construct a family of distributions such that Zipf's law and species abundance are the ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Page SE. Reviewer Report For: The common patterns of abundance: the log series and Zipf's law [version 1; peer review: 4 approved]. F1000Research 2019, 8:334 (https://doi.org/10.5256/f1000research.20456.r46235)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response (F1000Research Advisory Board Member) 23 Apr 2019
    Steven Frank, Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, 92697-2525, USA
    23 Apr 2019
    Author Response F1000Research Advisory Board Member
    I appreciate the thoughtful comments from Scott Page. The review and my reply are part of the final published version. So, I will use this space to develop my comments, ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response (F1000Research Advisory Board Member) 23 Apr 2019
    Steven Frank, Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, 92697-2525, USA
    23 Apr 2019
    Author Response F1000Research Advisory Board Member
    I appreciate the thoughtful comments from Scott Page. The review and my reply are part of the final published version. So, I will use this space to develop my comments, ... Continue reading
Views
13
Cite
Reviewer Report 17 Apr 2019
Jose Lobo, School of Sustainability, Arizona State University, Tempe, AZ, USA 
Approved
VIEWS 13
  1. The manuscript addresses the relationship between two probability distributions that, although originated in specific research domains, have gone on to be widely used as representations of growth processes.
     
  2. The mathematical derivations are clear.
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Lobo J. Reviewer Report For: The common patterns of abundance: the log series and Zipf's law [version 1; peer review: 4 approved]. F1000Research 2019, 8:334 (https://doi.org/10.5256/f1000research.20456.r46236)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response (F1000Research Advisory Board Member) 23 Apr 2019
    Steven Frank, Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, 92697-2525, USA
    23 Apr 2019
    Author Response F1000Research Advisory Board Member
    I appreciate the thoughtful comments from Jose Lobo. The review and my reply are part of the final published version. So I will use this space to develop my comments, ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response (F1000Research Advisory Board Member) 23 Apr 2019
    Steven Frank, Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, 92697-2525, USA
    23 Apr 2019
    Author Response F1000Research Advisory Board Member
    I appreciate the thoughtful comments from Jose Lobo. The review and my reply are part of the final published version. So I will use this space to develop my comments, ... Continue reading
Views
16
Cite
Reviewer Report 15 Apr 2019
Luís M. A. Bettencourt, Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA;  Santa Fe Institute, Santa Fe, NM, USA 
Approved
VIEWS 16
This manuscript approaches the origins of two particularly important distributions describing abundances in biological and social populations from the perspective of mathematical invariances of their mathematical forms. 
 
The author shows in this way that Fisher’s log series ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Bettencourt LMA. Reviewer Report For: The common patterns of abundance: the log series and Zipf's law [version 1; peer review: 4 approved]. F1000Research 2019, 8:334 (https://doi.org/10.5256/f1000research.20456.r46232)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response (F1000Research Advisory Board Member) 23 Apr 2019
    Steven Frank, Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, 92697-2525, USA
    23 Apr 2019
    Author Response F1000Research Advisory Board Member
    I appreciate the thoughtful comments from Luis Bettencourt. The review and my reply are part of the final published version. So I will use this space to develop my comments, ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response (F1000Research Advisory Board Member) 23 Apr 2019
    Steven Frank, Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, 92697-2525, USA
    23 Apr 2019
    Author Response F1000Research Advisory Board Member
    I appreciate the thoughtful comments from Luis Bettencourt. The review and my reply are part of the final published version. So I will use this space to develop my comments, ... Continue reading

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Mar 2019
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.