Invariance in ecological pattern

Background: The abundance of different species in a community often follows the log series distribution. Other ecological patterns also have simple forms. Why does the complexity and variability of ecological systems reduce to such simplicity? Common answers include maximum entropy, neutrality, and convergent outcome from different underlying biological processes. Methods: This article proposes a more general answer based on the concept of invariance, the property by which a pattern remains the same after transformation. Invariance has a long tradition in physics. For example, general relativity emphasizes the need for the equations describing the laws of physics to have the same form in all frames of reference. Results: By bringing this unifying invariance approach into ecology, we show that the log series pattern dominates when the consequences of processes acting on abundance are invariant to the addition or multiplication of abundance by a constant. The lognormal pattern dominates when the processes acting on net species growth rate obey rotational invariance (symmetry) with respect to the summing up of the individual component processes. Conclusions: Recognizing how these invariances connect pattern to process leads to a synthesis of previous approaches. First, invariance provides a simpler and more fundamental maximum entropy derivation of the log series distribution. Second, invariance provides a simple derivation of the key result from neutral theory: the log series at the metacommunity scale and a clearer form of the skewed lognormal at the local community scale. The invariance expressions are easy to understand because they uniquely describe the basic underlying components that shape pattern.

"It was Einstein who radically changed the way people thought

Introduction
Ecologists have been interested in species abundance distributions (SADs) since the classic papers by Fisher 2 and Preston 3 . Two major patterns have been identified depending on the size of the community. In a large community, abundances often follow the log series distribution 4 . Specifically, the probability that a species has a population size of n individuals follows p n /n. Communities differ only in their average population size, described by the parameter, p. At smaller spatial scales, the species abundance pattern often follows a skewed lognormal (a random variable is lognormally distributed when its logarithm is normally distributed) 5,6 .
It is intriguing that the species abundance distribution follows these simple patterns irrespective of the particular group (birds, insects, mammals) and region considered. Other ecological patterns also follow simple probability distributions 7-9 . Those patterns have attracted a lot of attention. Why does the variability and complexity of biology reduce to such a small range of simple distributions? How can we understand the relations between complex processes and simple patterns?
Approaches such as Harte's 9 maximum entropy formalism and Hubbell's 5 neutral theory have attempted to explain the generality of the log series and skewed lognormal patterns in species abundance distributions. Maximum entropy describes probability distributions that are maximally random subject to satisfying certain constraints 10-12 . This approach has a long tradition in physics, both in statistical mechanics and information theory. An early maximum entropy approach in ecology derived the biomass pattern of populations 13-15 .
Neutral theory derives probability distributions by assuming that all individuals are equivalent 16 . Variation arises by random processes acting on the mechanistically identical individuals. Put another way, the mechanistic processes are "neutral" apart from random processes. Both maximum entropy and neutral theory have been shown to provide a good fit to the empirical patterns of species abundance distributions. In this article, we subsume these two different ways of understanding the log series and skewed lognormal patterns with a more general perspective based on the concept of invariance 17 .
Invariance can be defined as the property by which a system remains unchanged under some transformation. For example, a circle is the same (invariant) before and after rotation (Figure 1a). In ecology, pattern often depends on the ways in which form remains invariant to changes in measurement. Some patterns retain the same form after uniformly stretching or shrinking the scale of measurement (Figure 2b). Measures of length provide a common example of stretch invariance. One can measure lengths equivalently in millimeters or centimeters without loss of information. As we will see, that kind of invariance often determines the form of observed pattern.
To give another example, consider the common and widely familiar pattern of the normal distribution. By the central limit theorem, when independent random variables are added, their properly normalized sum tends toward a normal distribution, even when the component variables themselves are not normally distributed. The central limit theorem and the normal distribution are often considered as unique aspects of pattern that stand apart from other commonly observed patterns.
The invariance perspective that we promote shows how the normal distribution is in fact a specific example of a wider framework in which to understand the commonly observed patterns of nature. In particular, the normal distribution arises from the rotational invariance of the circle 18 . For two variables, x and y, with a given squared length, x 2 + y 2 = r 2 , all combinations of the variables with the same radius, r, lie along the circumference of a circle (Figure 1a). When each combination is equally likely, the rotationally invariant radius is sufficient to describe the probability pattern.
It is this rotational invariance that gives the particular mathematical form of the normal distribution, in which the average squared radius sets the variance of the distribution. By this perspective, the mathematical forms of all commonly observed distributional patterns express their unique invariances 18 .
The perspective of invariance was the basis for most of the great conceptual advances of physics in the twentieth century 1 . For example, Gell-Mann's pioneering theoretical work on the fundamental particles of nature derived from invariance (symmetry) properties that unified understanding of known particles and predicted new particles such as quarks, which were subsequently observed. By contrast, general aspects of invariance have not been used consistently as the fundamental basis for understanding patterns in ecology. One exception concerns scale invariance, which is often discussed in ecology [19][20][21] . But scale invariance is typically limited to special kinds of patterns rather than forming a unified approach to diverse patterns.
The point of this paper is that invariance is the most general way in which to understand commonly observed patterns. Species abundance distributions provide an excellent illustration. For species abundances, we show that maximum entropy and neutral models can succeed in certain cases because they derive from invariance principles. However, maximum entropy and neutrality are often difficult to interpret because they hide their underlying basis in invariance.
Our unifying invariance analysis clarifies why seemingly different conceptual approaches have the same consequences for pattern. Similarly, seemingly different biological processes may often lead to the same observed pattern, because those different processes share a common basis in invariance. That deeper understanding suggests a more insightful way to think about alternative mechanistic models. It also suggests the kinds Rotating regular polygons changes pattern. However, as more rotated polygons are added, the form converges asymptotically to a rotationally invariant circle, in which adding another rotated polygon does not change the pattern. Many common patterns of nature are asymptotically invariant. In this case, aggregation causes loss of all information except invariant radial distance. (c) The normal distribution is asymptotically invariant. The left curve describes an arbitrary probability pattern. The second curve expresses the sum of two randomly chosen values from the first curve. The height is the relative probability of the summed values. The third, fourth, and fifth curves express the sum of 4, 8, and 16 randomly chosen values from the first curve. Each curve width is shrunk to match the first curve. In this case, aggregation smooths the curve, causing loss of all information except the average squared distance from the center (the variance), which is equivalent to the average squared radial distance of rotationally invariant circles.  . Decreasing values of a shift the curve to the right, which is equivalent to shifting the x axis by resetting the zero point. For probability patterns, the total probability must be normalized to one, which means that all curves must have the same area under the curve for values of x between 0 and ∞. To normalize the curves, the right panel plots k a e -(x + a) with k a = e a . Thus, all curves become e -x invariantly with respect to different shift values, a. (b) The left panel shows e -bx for b = 2 0 , 2 -1 , . . . , 2 -4 . Decreasing values of b stretch the x axis by a factor of 2 for each halving of b. To normalize the average value of each probability curve to be the same, the right panel shows e -λ b bx for λ b = 1� = 1�b. Thus, all curves become e -x invariantly with respect to different stretch values, b. of empirical tests that may differentiate between alternative causal processes.
This manuscript is organized as follows. First, we highlight key theoretical results for species abundance distributions. Second, we review how invariance defines probability patterns in a general way 18,22,23 . The log series distribution 24 and the gammalognormal distribution for species abundances follow directly from the universal invariance expression of probability patterns. Third, we show that maximum entropy and neutrality can easily be understood as special examples of invariance principles. Finally, we discuss the broad role of invariance in the analysis of ecological pattern.

Key results
This article develops two key theoretical results. We highlight those results before starting on the general overview of invariance and pattern.
First, we present a simple maximum entropy derivation of the log series pattern. We show that constraining the average abundance per species is sufficient when analyzing randomness and entropy on the proper demographic scale 25 .
The simplicity of our maximum entropy derivation contrasts with Harte's more complicated maximum entropy model 9,26 . Harte had to assume an additional unnecessary constraint on energy usage. He required that unnecessary constraint because he evaluated randomness on the scale of measured abundances rather than on the scale of demographic process. This will be made explicit below.
We use this result to demonstrate that maximum entropy is the outcome of deeper underlying principles of invariance and pattern. By working at the deeper level of invariance, one obtains a simpler and more powerful understanding of pattern.
The second result shows that Hubbell's 5 neutral model is the simple expression of three basic invariances. Hubbell's full range of log series and skewed lognormal (zero sum multinomial) results follows immediately from those three underlying invariances.
The three invariances correspond to a maximum entropy model that constrains the average abundance of species and the average and variance of the demographic processes influencing abundance. The three invariances lead to a simple gamma-lognormal distribution that matches the neutral theory pattern for species abundances 25 . The gamma-lognormal is a product of the standard gamma and lognormal distributions.

Invariance
This section reviews how invariance considerations lead to the log series distribution 24 . We delay discussion of the gammalognormal until the later section on Hubbell's neutral model.

Canonical form of probability distributions
We can rewrite almost any probability distribution as , T z z q ke λ − = (1) in which T(z) ≡ T z is a function of the variable, z, and k and λ are constants. For example, Student's t-distribution, usually written as can be written in the form of Equation 1 with λ = (ν + 1)/2 and T z = log(1 + z 2 /ν).
The probability pattern, q z , is invariant to a constant shift, T z ↦ a + T z , because we can write the transformed probability pattern in Equation 1 as (Figure 2a). We express k in this way because k adjusts to satisfy the constraint that the total probability be one. In other words, conserved total probability implies that the probability pattern is shift invariant with respect to T z 18 .
Now consider the consequences if the average of some value over the distribution q z is conserved. For example, the average of z is the mean, µ, and the average of (zµ) 2 is the variance. A constraint causes the probability pattern to be invariant to a multiplicative stretching (or shrinking), Figure 2b). We specify λ in this way because λ adjusts to satisfy the constraint of conserved average value. Thus, invariant average value implies that the probability pattern is stretch invariant with respect to T z .
Conserved total probability and conserved average value cause the probability pattern to be invariant to an affine transformation of the T z scale, T z ↦ a + bT z , in which "affine" means both shift and stretch.
The affine invariance of probability patterns with respect to T z induces significant structure on the form of T z and the associated form of probability patterns. Understanding that structure provides insight into probability patterns and the processes that generate them 18,22,23 .
In particular, Frank and Smith 22 showed that the invariance of probability patterns to affine transformation, T z ↦ a + bT z , implies that T z satisfies the differential equation in which w(z) is a function of the variable z. The solution of this differential equation expresses the scaling of probability patterns in the generic form in which, because of the affine invariance of T z , we have added and multiplied by constants to obtain a convenient form, with T z → w as β → 0.
By writing T z in this way, w expresses a purely shift-invariant aspect of the fundamental affine-invariant scale, because the shift transformation w ↦ a + w multiplies T z by a constant, and probability pattern is invariant to constant multiplication of T z . Thus, Equation 2 dissects the anatomy of a probability pattern (Equation 1) into its component invariances.
With this expression for T z , we may write probability patterns generically as This form has the advantage that w(z) expresses the shiftinvariant structure of a probability pattern. Most of the commonly observed probability patterns have a simple form for w 23,27 . That simplicity of the shift-invariant scale suggests that focus on w provides insight into common patterns.

Proportional processes and species abundances
To understand the log series, we must consider the relation n = e r between the observed pattern of abundances, n, and the processes, r. Here, r represents the total of all proportional processes acting on abundance 24 .
A proportional process simply means that the number of individuals or entities affected by the process increases in proportion to the number currently present, n. Demographic processes, such as birth and death, act proportionally.
The sum of all of the proportional processes on abundance over some period of time is Here, m(t) is a proportional process acting at time t to change abundance. Birth and death typically occur as proportional processes. The value of r = log n is the total of the m values over the total time, τ. For simplicity, we assume n 0 = 1.
The log series follows as a special case of the generic probability pattern in Equation 3. To analyze abundance, focus on the process scale by letting the variable of interest be z ≡ r, with the key shift-invariant scale as simply the process variable itself, w(r) = r. Then Equation 3 becomes in which q r dr is the probability of a process value, r, in the interval r + dr. We can generalize the relation between abundance and process, n = e r , by writing n β = e βr , which uses an additional parameter β to allow comparison with the canonical form of probability distributions in the previous subsection. When we focus on standard models of species abundances, we use β = 1.
We can change from the process scale, r, to the abundance scale, n, by noting that β log n = βr, and so, for any β, we have r = log n. Thus, we can use the substitutions r ↦ log n and dr ↦ n -1 dn in Equation 4, yielding the identical probability pattern expressed on the abundance scale The value of k always adjusts to satisfy the constraint of invariant total probability, and the value of λ always adjusts to satisfy the constraint of invariant average value.
For proportional processes and species abundances, β = 1, as noted above. For that value of β, we obtain the log series distribution 24 replacing n -1 by n in the exponential term which, because of affine invariance, describe the same probability pattern. The log series is often written with e -λ = p, and thus q n = kp n /n. One typically observes discrete values n = 1, 2, …. See https:// doi.org/10.5281/zenodo.2597895 for the general relation between discrete and continuous distributions. The continuous analysis here is sufficient to understand pattern.
We can also write the log series on the process scale, r, from Equation 4, as 24 This form shows that the log series is the simplest expression of generic probability patterns in Equation 3. The log series arises from β = 1, associated with n = e r , and from the base shift-invariant scale as w ≡ r for proportional processes, r.

Invariances of the log series
This subsection summarizes a few technical points about invariance. These technical points provide background for our simpler and more general derivation in the following section of maximum entropy models for species abundances. Those previous models focused only on abundances, n, without considering the underlying process scale, r.
We begin with invariance on the process scale, r. On that scale, the log series in Equation 7 is the pure expression of additive shift invariance to r and lack of multiplicative stretch invariance to r. For example, note in Equation 7 that an additive change, r ↦ r + a, is compensated by a change in λ to maintain the overall invariance, whereas a multiplicative change, r ↦ br, cannot be compensated by a change in one of the constants. For example, if r is net reproductive rate, then an improvement in the environment that adds a constant to everyone's reproductive rate does not alter the log series pattern. By contrast, multiplying reproductive rates by a constant does alter pattern.
To understand the parameter, β, from Equation 2, consider that in which β is the relative curvature of the measurement scale for abundance, n, with respect to the scale for process, r. The relative curvature is β = T″/T′, with the primes denoting differentiation with respect to r 27 .
For the log series, the curvature of β = 1 describes the amount of bending of the abundance scale, n = e r , with respect to multiplying the process scale, r, by a constant-the departure from stretch invariance.
The simple invariances with respect to process, r, become distorted and more difficult to interpret when we focus only on the observed scale for abundance, n, associated with the log series in Equation 6. In that form of the distribution, the canonical scale is In this expression, purely in terms of abundances, the logarithmic term dominates when n is small, and the linear term dominates when n is large. Thus, the scale changes from stretch but not shift invariant at small magnitudes to both shift and stretch invariant at large magnitudes 24 . Without the simple insight provided by the process scale, r, we are left with a complicated and nonintuitive pattern that is separated from its simple cause. That difficulty has led to unnecessary complications in maximum entropy theories of pattern.
Pueyo et al. developed a simple alternative approach for deriving the log series distribution that combines invariance and maximum entropy 25 . In their derivation, the average value of n is a maximum entropy constraint, and the equivalent of our r variable is considered as an invariant Bayesian prior in the sense of Jaynes 12 . Previous publications describe the differences between our invariance approach and the invariant prior maximum entropy approach of Pueyo et al. 18,22,28,29 .

Maximum entropy
Maximum entropy describes probability distributions that are maximally random subject to satisfying certain constraints 10-12 .
In Equation 1, with the generic description for distributions as maximum entropy interprets this form as the expression of maximum randomness with respect to the scale z, subject to the constraint that the average of Tz is fixed 23 .
This section begins with a maximum entropy derivation for the log series based on our separation between the scales of process, r, and observed abundance, n.
We then discuss Harte's 9,26 alternative maximum entropy derivation of the log series. Harte's derivation emphasizes mechanistic aspects of energy constraints rather than our emphasis on the different scales of process and abundance.

Constraint of average abundance on process scale
The log series in Equation 7 is d d . in which 〈·〉 r denotes average value with respect to the process scale, r.
In this case, process values, r, are maximally random, subject to the ecological constraint that limits abundance, n. Thus, maximizing entropy with respect to the process scale, r, subject to a constraint on the observed pattern scale, n, leads immediately to the log series.
Relating the process scale, r, to the scale of ecological constraint, n, often makes sense. Typically, environmental perturbations associate with changes in demographic variables, such as birth and death rates. Such demographic factors typically act proportionally on populations, consistent with our interpretation of r as the aggregate of proportionally acting processes. The perturbations, acting on demographic variables, associate the process scale with the scale of randomness.
In contrast with the process scale of perturbation and randomness for the demographic variables, the scale of constraint naturally arises with respect to a limit on the number of individuals, n. Thus, randomness happens on the r scale and constraint happens on the n scale.
It is, of course, possible to formulate alternative models in which randomness and constraint happen on scales that differ from our interpretation. Different formulations are not intrinsically correct or incorrect. Instead, they express different assumptions about the relations between process, randomness, and invariance. The next section considers an alternative formulation.

Harte's joint constraints of abundance and energy
Harte developed comprehensive maximum entropy models of ecological pattern. He tested those theories with the available data. His work synthesizes many aspects of ecological pattern 9 .
For species abundances, Harte 9,26 analyzed maximum randomness with respect to the scale of abundance values, n. Maximum entropy derivations commonly evaluate randomness on the same scale as the observations. In this case, with observations for the probabilities of abundances, p n , entropy on the same scale is the sum or integral of -p n log p n .
However, there is no a priori reason to suppose that the scale of observation is the same as the scale of randomness. The fact that observation, randomness, and process may occur on different scales often makes maximum entropy models difficult to develop and difficult to interpret. For example, we may observe the probabilities of abundances, p n , but randomness may be maximized on the scale of process, as the sum or integral of -p r log p r .
In the final part of this section, we argue that invariance provides a truer path to the natural scale of analysis and to the mechanistic processes that generate pattern than does maximum entropy. Before comparing invariance and maximum entropy, it is useful to sketch the details of Harte's maximum entropy model for species abundances.
The simplest maximum entropy model analyzes entropy with respect to abundance, n, subject to a constraint on the average abundance, ⟨n⟩. That analysis yields an exponential distribution d d .
λ − = n n q n ke n The exponential pattern differs significantly from the observed log series pattern. Thus, maximizing entropy with respect to the scale of abundance, n, and constraining the average abundance is not sufficient.
From our invariance perspective, it is natural to think of the scale of randomness in terms of dr, the scale of proportional processes, rather than in terms of dn, the scale of abundance.
Maximizing randomness with respect to dr leads directly to the log series, as shown in the previous section.
Harte did not consider the distinction between the exponential and log series patterns with respect to the scale of randomness. Instead, to go from the default exponential pattern of maximum entropy to the log series, his maximum entropy analysis required additional assumptions. He proceeded in the following way.
Suppose that the total quantity of some variable, , is constrained to be constant over all individuals of all species. The average value per individual is ⟨ ⟩. It does not matter what the variable is. All that matters is that the constraint exists. Harte assumed that is energy, but that assumption is unnecessary with regard to the species abundance distribution.

The value
is distributed over individuals independently of their species identity. Thus, the variable δ |n = n is the total value in a species with n individuals, with average value ⟨δ |n⟩ = n⟨ ⟩.
The joint distribution of n and δ is The species abundance distribution is obtained by Noting that ∫ e -λ′n = 1/λ′n, and absorbing the constant λ′ into k, we obtain the log series for the species abunance distribution

Maximum entropy and invariance
Harte's maximum entropy derivation of the log series assumes joint constraints of abundance, n, and some auxiliary variable, , which he labeled as energy. He evaluated entropy on the scales of n and .
By contrast, our invariance derivation arises from a constraint on abundance plus evaluation of invariance or entropy on the scale r = log n. On that scale, the log series arises in a simple and clear way. There is no need for constraint of a second auxiliary variable.
Without an invariance argument, nothing compels us to analyze with respect to the r scale. Harte, without focus on invariance, followed the most natural approach of using n as the scale for maximization of randomness and for constraint. That approach required an auxiliary constraint on a second scale to arrive at the log series.
Harte's approach was a major step in unifying the analysis of empirical pattern. But, in retrospect, his approach was unnecessarily complicated.
One might say that Harte's approach provided a richer theory because it led to predictions about both abundance and energy. However, the data on abundance patterns match very closely to the log series, whereas the data for different proxies of energy vary considerably 9 .
Our invariance approach strips away the unnecessary auxiliary variable. The invariance theory therefore provides a much simpler way to derive and to understand abundance patterns.
Maximum entropy can be thought of purely as a basic invariance method of analysis. Maximum entropy distributions have the form in Equation 1 as in which T z is the affine-invariant scale that defines the probability pattern. Thus, the method of maximum entropy is simply a method for deriving the affine-invariant expression, T z . In practice, maximum entropy has three limitations.
First, maximum entropy is silent with respect to the proper choice for the scale on which entropy is maximized and the constraints that set the affine-invariant expression, T z . By contrast, focus on invariance led us to the shift invariance of the process scale, r. That scale provided a much simpler analysis, in which r is the incremental scale with respect to invariance and the measurement scale with respect to entropy.
In other words, maximum entropy is a blind application of the most basic invariance principles, without any guidance about the proper scales for invariance, randomness, and constraint. By contrast, an explicit invariance approach takes advantage of the insight provided by the analysis of invariance.
Second, by focusing on invariance, we naturally obtain the full invariance (symmetry) group expression in Equation 3 as the generic form of probability patterns That generic expression leads us to a generalization of the log series in Equation 5 as 24 which is a two parameter distribution for abundances with respect to λ and β. The log series is a special case with β = 1.
Third, invariance leads to a deeper understanding of the relation between observed pattern and alternative mechanistic models of process. The following section provides an example.

Neutrality
Here, we analyze Hubbell's 5 neutral model of species abundances in the light of our invariance perspective. With that example in mind, we then discuss more generally how neutral models relate to invariance and maximum entropy.
Hubbell's neutral model The strong recent interest in Hubbell's neutral model follows from the match of the theory to the contrasting patterns of species abundance distributions (SADs) that have been observed at different spatial scales. In the theory, many local islandlike communities are connected by migration into a broader metacommunity. Sufficiently large metacommunities follow the log series pattern of species abundances. Each local community follows a distribution that Hubbell called the zero-sum multinomial 30 , which is similar to a skewed lognormal. As noted by Rosindell et al. 6 , it is this flexibility of the classic neutral model to reconcile the log series and lognormal distributions that allows it to fit empirical data well 31 .
Invariance and the gamma-lognormal distribution Broad consensus suggests that species abundances closely follow the log series pattern at large spatial scales. Extensive data support that conclusion 4 .
Observed pattern at small spatial scales differs from the log series. Consensus favors a skewed lognormal pattern. The data typically show an excess of rare species, causing a skew relative to the symmetry of the lognormal when plotted on a logarithmic scale.
At small spatial scales, most recent analyses focus on data from a single long-term study of tree species in Panama 5,30 . Thus, some ambiguity remains about the form and consistency of the actual pattern at small scales. The blue curve of Figure 3 shows Chisholm & Pacala's 30 fit of the neutral theory to the Panama Frequency tree data for species abundances at small spatial scales. The gold curve shows the close match to the neutral theory pattern by a simple probability distribution derived from the analysis of invariance.
To obtain the matching distribution derived by invariance, we begin with the canonical form for probability distributions in Equation 3. That canonical form expresses pattern in terms of the shift-invariant scale, w. Next, we need to find the specific form of the scale w that relates this canonical form for probability distributions to the neutral theory. Because the neutral theory derives abundance, n, as an outcome of demographic processes, r, the fundamental shift-invariant scale for neutral theory is expressed in terms of the demographic process variable as with parameters λ, a, and α. We can write this distribution equivalently on the n scale for abundance as In the second distribution, µ = (aã)/2α. Thus, both distributions have the same three parametric degrees of freedom.
The right-hand exponential term of Equation 12 is a lognormal distribution with parameters µ and σ 2 = 1/ 2α. The remaining terms are a gamma distribution with parameters ã and λ. We call this product of the gamma and lognormal forms the gamma-lognormal distribution. Figure 3 showed that the gamma-lognormal distribution matches the neutral theory fit for the Panama tree data. Figure 4 shows that the shape of the gamma-lognormal matches the shape of the neutral theory predictions for various mechanistic parameters of the neutral theory.
In summary, the neutral theory distribution appears to be nearly identical to a gamma-lognormal distribution when compared over realistic parameter values. Both distributions have the same three parametric degrees of freedom. Pueyo et al. 25 derived the gamma-lognormal by using an invariance argument to obtain the n = e r relation as a Bayesian prior for maximum entropy and then using additional constraints in a maximum entropy analysis. They also noted the good fit to Hubbell's neutral theory. As mentioned above, our invariance analysis and our interpretation of invariance differ from Pueyo et al.'s Jayesian invariant prior approach for maximum entropy.

Maximum entropy and the gamma-lognormal
The constraints on pattern can be seen most clearly by rewriting Equation 11 as   By the standard theory of maximum entropy, q r maximizes entropy on the incremental scale dr subject to a constraint on the average value of the defining affine-invariant scale, ⟨T⟩ r . That constraint is the linear combination of three constraints: the average abundance on the process scale, ⟨n = e r ⟩ r , the average demographic process value, ⟨r⟩, and the variance in the demographic process values, 2 .
r By maximum entropy, all of the information in Hubbell's mechanistic process theory of neutrality and the matching gamma-lognormal pattern reduces to maximum randomness subject to these three constraints.
However, it is very unlikely that we would have derived the correct form by maximum entropy without knowing the answer in advance. This limitation emphasizes that maximum entropy provides deep insight into process and pattern, but often we need an external theory to guide our choice among various possible maximum entropy formulations.
Put another way, maximum entropy and process oriented theories, such as Hubbell's model, often work together synergistically to provide deeper insight than either approach alone.

Invariance, information and scale
Before turning to invariance and the gamma-lognormal pattern of neutral theory, it is useful to consider some basic properties of invariance and information 34,35 . In particular, this subsection develops our claim that the affine-invariant scale provides the deepest insights into the relations between pattern and process.
We start by noting that, in the general expression for probability distributions is constant at all magnitudes of the measurement, z. Every measured increment on the T z scale provides the same amount of information about pattern. Constancy of information at all magnitudes is the ideal for a measurement scale. Thus, affine invariance provides the ideal scale on which to evaluate the pattern in measurements 23 . Figure 5 illustrates some key properties of the affine-invariant scale.
Information is sometimes thought of as a primary concept. However, it is important to understand that, in this context, information and affine invariance are the same thing. Neither is intrinsically primary.
We prefer to emphasize invariance, because it is an explicit description of the properties that pattern and process must obey 17,27,36 . Further analysis of invariant properties leads to deeper insight. For example, only through invariance can we obtain the group theory expression for the canonical form of probability patterns (Equation 3). By contrast, "information" is just a vague word that associates with underlying invariances. Further analysis of information requires unwinding the definitions to return to the basic invariances.

Invariance interpretation of the gamma-lognormal
We turn now to the neutral theory model for abundances at local spatial scales. We showed that all of the information about pattern and process in the neutral theory is captured by the gamma-lognormal pattern in Equation 13 as  for sufficiently large e r = n. The smaller the value of λ relative to ã and α, the greater e r must be for this pattern to dominate. When λ is relatively large compared with ã and α, this pattern dominates at all magnitudes and leads to the log series.
With respect to constraint, for large values of abundance, n, the constraint on average abundances dominates the way in which altered process influences pattern. With respect to invariance, a process that additively shifts or multiplicatively stretches the e r = n values does not alter the pattern in the upper tail. Similarly, pattern is invariant to a process that additively shifts process values, r, but processes that multiplicatively change r alter pattern. Thus, we can evaluate the role of particular processes by considering how they change n or r.
The pattern at small and intermediate values of r depends on the relative sizes of the parameters. If the ãr term dominates, then the constraint, ⟨r⟩, on the average process value is most important. With respect to invariance when ãr dominates, a process that additively shifts or multiplicatively stretches the r values does not alter the pattern in the lower tail. That lower tail is a rising exponential shape, e ãr , as in Figure 4c.
When the 2 r α  term is negligible at all magnitudes, the combination of the dominance by ãr in the lower tail, and the dominance by λe r in the upper tail, yields the gamma distribution pattern on the abundance scale, n.
Finally, for magnitudes of r at which the 2 2 ( ) r r α α µ = −  term dominates, the constraint, σ 2 = ⟨rµ 2 ⟩, on the variance in process values is most important. In this case, pattern follows a normal distribution, e -α (r-µ) 2 , on the r scale, which is a lognormal distribution on the abundance scale, n.
When combining numerous process values to obtain an overall net r value, approximate rotational invariance is sufficient for the pattern to be very close to a perfect normal curve (see Introduction). When measuring net squared deviations from the mean, which is the squared radial distance, the pattern is invariant to shift and stretch of the squared radial measures, (rµ) 2 .
In practice, the lognormal pattern of abundance dominates when a constraint on r dominates and net values of r obey rotational invariance (symmetry) with respect to the summing up of the individual processes acting on abundance.
Any theory of process that leads to those three basic invariances will follow the gamma-lognormal pattern. The great unsolved puzzle is how specific mechanistic processes combine such that the structure of pattern is fully expressed by these particular invariances of pattern or, equivalently, by constraints on the average values of certain quantities in the context of maximum entropy. Our work opens the way for a more direct attack on this great puzzle by clarifying the anatomy of a pattern, thereby clarifying the puzzle that must be solved.

The anatomy of pattern
[J]ust as the physiologist divides the animal world, according to anatomy, into families and classes, so the ornamentist is able to classify all pattern-work according to its structure [invariance]. Like the scientist, he is able even to show the affinity between groups to all appearance dissimilar; and, indeed, to point out how few are the varieties of skeleton upon which all this variety of effect is framed (ref. 37,  pp. 3-4) Invariances comprise the structural components in the anatomy of pattern. Commonly observed patterns almost always dissect completely into a few simple invariances. Our primary goal has been to introduce into ecological study the anatomy of pattern and the methods of dissection.
Identifying and naming the parts does not tell one how those parts came to be. In fact, common patterns are widespread exactly because so many different underlying mechanistic processes give rise to the same simple invariances.
Roughly speaking, one can think of a common pattern as an attractor. Each different underlying mechanistic process that develops into the generic form traces a distinctive path from some starting point to the generic endpoint of the attractor. All of the different mechanistic processes and starting points that end up at the same attractor form the basin of attraction for that pattern.
Our work characterized the anatomy of pattern-the anatomy of the attractors. The next step requires understanding how various combinations of mechanistic processes lead to one attractor or another. Equivalently, one can think of a mechanistic process as something that transforms inputs into outputs 38 . Three questions follow. How do particular cascades of input-output transformations ultimately combine to produce overall transformations that associate with simple invariances? What separates some cascades from others with regard to association with different invariances? In other words, how can we assign different mechanistic cascades to one basin of attraction or another?
If we could answer those questions, then we could predict whether different mechanistic processes lead to the same pattern or to different patterns.
The fact that different processes can attract to the same pattern has been widely discussed in ecology 30,39-47 . However, that past work typically did not explain common patterns in terms of invariance. Without invariance, one does not have a basis for describing the anatomy of common patterns or the reasons why certain processes attract to a particular pattern and others do not.
Invariance may provide a way to compare different models of process that lead to the same pattern. Among the many complex component processes that may occur in a model, which truly matter? In other words, which component processes shape the defining invariances and which are irrelevant? For the focal component processes of each model that matter, which empirical tests would tell us which of the alternative mechanistic models is the more likely match to natural processes?

Conclusions
The apparent simplicity of invariance can mislead about its ultimate power. For example, probability patterns express a shift and stretch invariant scaling. That affine-invariant scaling provides a constant measure of information at all magnitudes.
Shift and stretch invariance seem almost trivially simple. Yet, by analyzing how repeated transformations of shift and stretch retain invariance, we obtain the most general form that expresses various affine-invariant scales (Equation 2). That affine symmetry group defines the simple, general structure of probability patterns and their uniform measurement scales.
Knowing the general invariant form of probability patterns reveals the relations between different approaches. Invariance provides powerful methods to analyze pattern and process.
To sum up, our invariance approach is not just another one among various alternatives. Rather, it is the only way to relate process to pattern, because the essence of pattern is invariance. Only by understanding what pattern actually is and how it generally arises can one begin to formulate testable hypotheses about mechanism.
Put another way, pattern is always the interaction between, on the one hand, the generic aspects of invariance and scale that arise in all cases and, on the other hand, the particular aspects of biology that operate in each case. Without a clear view of that duality between the generic and the particular, it is easy to mistakenly attribute generic aspects of observed pattern to particular causes. To properly understand the role of specific mechanistic aspects in shaping pattern, one must evaluate pattern simultaneously from the perspectives of the generic and the particular.

Data availability
All data underlying the results are available as part of the article and no additional source data are required.

Software availability
The Mathematica code for the analysis and creation of Figure 3 and Figure 4 is available at Zenodo: https://doi.org/10.5281/ zenodo.3243364 33 .
License: Creative Commons Attribution 4.0 International license.
Author contributions SAF initiated the project, did the mathematical analyses, created the figures, and wrote the first draft. JB developed connections to broader issues in ecology, expanded the text to clarify exposition and significance, extended the framing of concepts and the explanation and discussion of key points, and edited the entire manuscript.
This paper by Frank and Bascompte explores ideas about the origins of statistical pattern in biology developed in previous work by Frank and Frank & Smith, and applies them to the question of the distributions of species population sizes commonly observed in community ecology. As the authors note, the distribution of population sizes in a given ecological community will tend to follow quite closely one of two probability distributions -the logarithmic or the skewed log-normal -irrespective of the physical size of the organisms in the community of interest, and the number of species involved. Why this should be is an important question, and one that points toward some general rule or principle of ecology that would be fascinating and potentially valuable to understand. In providing their own explanation, the authors follow a formula proven by Frank in earlier publications: The problem is outlined, previous efforts to explain it are described, noting or special features which limit their generality, before the authors' own ad hoc explanation is described and its relative merits compared with the earlier attempts.
It is worth taking a moment to put this work in a brief historical context of the more general topic of research that has occupied Frank for more than a decade, because that broader (longer?) perspective says something useful about the process of research into theoretical biology and the development of theory. Frank's earliest publications on the topic of probability patterns in nature were built around the idea that constraints on information contained in myriad stochastic processes, together with a principle of maximum entropy, leads to a few probability patterns being common. This explanation raises the question of how the constraints occur. Frank's earlier work led to some principles to answer that question by assuming that the constraints arise in solving the maximum entropy expression for probability patterns with the method of Langrangian multipliers. Of course, since what's needed is an explanation for empirical observation, none of this is supposed to be happening to purely abstract mathematical objects, but to actual physical processes, when we observe them (and indeed some of the most important constraints on information in our data arise from how we observe), so an explanation in terms of constraints in Lagrangians had better have a clear and fairly direct physical interpretation; fortunately it does.
A natural interpretation of the Lagrangian constraints arises in thinking about the scaling at which information is gathered from the system by the process of observation and how this interacts with natural scales at which information is preserved in the organization of the observed system. Earlier publications, rooted in the maximum entropy concept, used the idea of convolution to describe how information could rooted in the maximum entropy concept, used the idea of convolution to describe how information could be preserved (or lost) as large numbers of random, small scale, processes interact and are observed. This line of thought appears to have led Frank to the realization that the organizing framework for the constraints is, at its core, a set of expressions describing patterns of invariance. Three types of invariance -shift and stretch (which combine to give affine invariance) and rotation, are sufficient to account for the fact that a few common probability patterns in biological and physical systems spanning many orders of magnitude, describe the majority of empirical datasets. Frank & Smith (2011) gives a comprehensive early account of using the concept of symmetries (invariances) to derive many probability patterns. A recent strand of papers (of which the current one is an example) employ the concept of invariance to show how several specific types of biological pattern arise.
In these papers, the concept of invariance is given primacy, and -explicitly in the current examplemaximum entropy is viewed as a derived property that is not needed to explain the commonality of probability patterns. To reiterate an earlier point, quite aside from the technical merits, theoretical depth, and potential applications of the work itself, Frank's publications on this topic are an interesting publication trail for those studying the development of theory in biology. They present an example of how one scientist's thinking on a subject changes and develops over time. I'm laboring the description of the context for the work, because, as I will outline in what follows, I think the most important critique lies not in the technical details as they apply to species abundance distributions, but to the epistemic basis for the whole endeavor.

Invariance and maximum entropy
Frank & Bascompte (F&B) argue that the log series for species abundance arises naturally when one considers two assumptions; first that there is affine invariance at the scale of proportional processes that act on species abundance (so, for example, birth and death), and second that there is a constraint on average abundance. Algebraic analysis, assuming a standard exponential form for the probability distribution, then shows that these two assumptions are sufficient to induce the log-series form; the probability distribution describing abundance. The authors contrast this analysis with the one owing to Harte, which is based on a maximum entropy interpretation. F&B note that working from first principles, assuming a constraint on average abundance, and starting from the assumption that entropy is maximized in the abundance distribution, one ends up with the exponential distribution as the maximum entropy form for species abundance. This is at odds with empirical observation.
If we assume that the observed distribution of species abundances a maximum entropy distribution, is then this analysis tells us that simple constraint on the average abundance is insufficient to induce the observed probability pattern, which leads to three possible alternatives. First, some further constraint is required on the abundance distribution so that the derived form matches observation (this is essentially the approach taken by Harte). Second, the abundance distribution is itself dependent on one or more constraints in some other process (for which entropy is maximized) and the joint effect of these two sets of constraints results in the observed log series distribution (this is the approach taken by F&B). Thirdly, we abandon the premise that species abundance is a maximum entropy distribution and look for explanations in some other room in the library of all possible theories. The third option is a drastic one; especially when there are good arguments in related fields of research that support the idea that Nature does indeed confront us with maximum entropy distributions when we make observations. For example, in discussing the correspondence between entropy maximization and description length minimization, Grunwald (2007, p644) writes: "we imagine a two-player game between Nature and Statistician. Nature first picks a distribution and then generates an outcome X according to it; Statistician picks a code and uses it to describe X. Nature is allowed to select any distribution she likes, as long as it satisfies E[ϕ(X)]=μ, and Statistician is allowed to 1 2 allowed to select any distribution she likes, as long as it satisfies E[ϕ(X)]=μ, and Statistician is allowed to use any code whatsoever. Nature's goal is to maximize Statistician's expected codelength, and Statistician's goal is to minimize it. … the best (maximum codelength_ that she can achieve if she has to move first is equal to the best (minimum codelength) that Statistician can achieve if he has to move first. Surprisingly, under weak conditions both Nature's maximin and Statistician's minimax strategy turn out to " be the Maxent distribution… There is, I believe, an important connection between MDL and Frank's program of explanation for biological patterns and it is captured in the quotation from Grunwald's (2007) book given above. In a loose sense, one might cast Frank's investigation of pattern as an inquiry into what Nature is doing in the game described by Grunwald. The conclusion is that she is playing a strategy of showing us maximum entropy distributions. As Grunwald (2007) points out, this is the optimal conclusion for us to reach (playing the role of Statistician) if we want our adopted descriptions to be optimal in the sense of minimizing our expected maximum error. The Kraft-McMillan inequality establishes the correspondence between codelength functions and probability distributions, so Grunwald's game between Nature and Statistician can be rephrased directly by substituting "probability distribution" for "codelength". But, there is an additional, epistemic, connection between MDL and what Frank and his co-authors on this and other papers are doing. In establishing MDL Jorma Rissanen was attempting to establish a principle for model selection and inference that was free from the need for prior assumptions about the process (or model) generating the observed data. Here is the opening paragraph of Rissanen's (1978) paper on model selection:

This study is an attempt to derive a criterion for estimation of both the integer-valued structure parameters and the real-valued parameters of dynamic systems starting from a single natural and fundamental
.

principle: the least number of digits it takes to write down an observed sample of a time series
Frank's scheme, for describing why particular models ( probability distributions) describe is an i.e. alternative, but also hypothesis-free, attempt to describe what Nature is doing. It's important to clarify what is meant in saying Frank's approach is hypothesis-free. It is simply this. The method does not select a particular probability distribution (equivalently, a model or codelength function) for the data , but a priori instead establishes a few mechanistically motivated constraints on information, given the context of the data and the measuring process, and uses those to infer the form of the probability distribution one expects the data to follow. This idea of making well-motivated choices about the identity of the best description of observed data is also enshrined in MDL in the "luckiness principle" (see Grunwald (2007) Ch14) further emphasizing the connection between the two lines of investigation.
Why does this matter in relation to the paper by F&B? As I mentioned earlier the answer is more one of process and principle than technical detail. The importance of the current paper is that it adds to argument, advanced by Frank, that biological observations of all kinds can be systematized; general principles operate that allow us to form expectations about the distributional properties of our data in a non manner. I applaud this effort and think that it's a contribution to modern biology that will come ad hoc to be seen as a major advance in the philosophical grounding of the subject, which is why I find the current paper somewhat frustrating. My main concern is this. In seeking to establish an invariance principle as taking precedence in some epistemiological sense, over the principle of maximum entropy, I think the authors make a mistake, and one that threatens the clarity of the preceding work by Frank and others on probability patterns. In essence the problem is that invariance and maximum entropy are not alternatives, the former is one of two approches commonly used to solve/understand maximum entropy problems, the other being the method of Lagrangian multipliers. I would characterize the conceptual shift in this paper (from Frank's previous work) not as a shift from maximum entropy to invariance, but as a shift in focus on solutions to the maxent problems from approaches grounded in Lagrangians to approaches 2 2 3 2 in focus on solutions to the maxent problems from approaches grounded in Lagrangians to approaches derived from the concepts of invariances, particularly symmetry groups. Both approaches are firmly within the overall framework of maximum entropy. So, my main request to the authors would be that they consider re-casting the paper along the lines just outlined and less as a demonstration that a principle of invariance supersedes the maximum entropy principle in describing biological patterns, in particular species abundance distributions. To anchor this argument more firmly to the paper (and draw in the MDL connection) here are a couple of points where I think the authors need to offer the reader a little more support for their proposal. F&B argue that their approach, based on invariance, offers a clearer rationale for deriving/explaining appropriate distributional forms than those based in either "maximum entropy" or mechanistic neutral theories. For example, in the section "Maximum entropy and the gamma lognormal" F&B note: By maximum entropy, all of the information in Hubbell's mechanistic process theory of neutrality and the . matching gamma-lognormal pattern reduces to maximum randomness subject to these three constraints However it is very unlikely that we would have derived the correct form by maximum entropy without knowing the answer in advance. This limitation emphasizes that maximum entropy provides deep insight into process and pattern, but often we need an external theory to guide our choice among various possible maximum entropy formulations. I would argue that subsequent derivation based on invariances is no less opaque, and someone attempting the derivation would, similarly, need to know where they were going in order to get there. It is of interest to point out that here, as in MDL, access to external theory will be of value in achieving results. This idea of externally motivated choice of approach is also apparent in the second example I would ask the authors to consider. In the discussion of the log series pattern, F&B contrast their approach, with that offered by Harte. As we already noted, F&B point out that starting from the canonical form for probability distributions, a constraint on the expected value for the observations leads to the exponential distribution as the emergent form; a result that demonstrably fails to deal with observed species abundance data. Harte solved this problem by introducing a second variable that is similarly constrained at the same scale as the expectation of abundance. F&B criticize Harte's approach, in essence, on the grounds that it is an solution, ad hoc arguing that their approach, in which constraints (invariances) are placed on underlying demographic processes, has a clearer rationale. While F&B's argument is persuasive, I think it needs to be strengthened and somewhat expanded. Here's why: As a I pointed out above, failure of the simple constraint on average abundance to lead to the log series requires us to come up with an alternative hypothesis for why the log series is observed. In MDL terms, we need a better description of the data. From a logical perspective, there doesn't appear to be any difference between adding an assumption of a variable at the same scale as abundance being under constraint, and an assumption of proportional demographic processes at a lower scale being constrained. In fact, from a model parsimony (MDL) perspective, a model that relies on adding a whole additional scale of processes may be viewed as propagating unnecessary complication. Furthermore, it doesn't seem like too much of a stretch to suggest that constraints on proportional processes at a lower scale, might not give rise to quantity at the same scale as the expectation of abundance that is similarly constrained, when measured at that scale? Is the issue simply one of how phenomenological one likes one's models to be? And if so, what guidance can be given to those who want to pursue the type of analysis proposed by F&B? As a first stab at an answer to that question would something along the following lines offer budding invariance analysts a template to work from? dentify the bjects that are the subject of your interest (SAD's in the current case). se ccam's azor I O U O R and ethodological ndividualism, when deciding where to look for constraints. (IOUORMI "I owe you, or M I and ethodological ndividualism, when deciding where to look for constraints. (IOUORMI "I owe you, or M I me"). The combination of Occam's Razor and Methodological Individualism should guide investigators to look for the simplest model built from processes operating at one level below the objects of interest in the organizational hierarchy of the systems of interest. © 2020 Alonso D. This is an open access peer review report distributed under the terms of the Creative Commons , which permits unrestricted use, distribution, and reproduction in any medium, provided the original Attribution License work is properly cited.

David Alonso
Theoretical and Computational Ecology, Center for Advanced Studies of Blanes (CEAB-CSIC), Blanes, Spain This contribution emphasizes how the concept of invariance connects patterns to processes leading to an original synthesis of previous approaches. In particular, this piece of work illustrates the application of the concept of invariance to deduce the most commonly observed species abundance distributions (SAD) in nature. The authors succeed in giving a general overview of the relation between invariance, process, and pattern.
As an illustration, they present two key theoretical results. First, they derive the log series SAD from a simple maximum entropy argument. When species abundances fluctuate randomly on a log scale, this is, process values 'r' are random (where r is defined as log n), and these fluctuations are only subject to an ecological constraint limiting total abundance, then the log series distribution, initially introduced by R. A. Fisher, naturally arises. Second, they show that Hubbell's neutral model is the simple expression of three basic invariances, which correspond to a maximum entropy model constraining average species abundances, and the average and variance of the demographic processes influencing abundance.
The first theoretical result is far from original as the same argument is clearly presented by Pueyo et al. (2007) in "The maximum entropy formalism and the idiosyncratic theory of biodiversity" where clearly, after a scale invariance argument is done, the log series again naturally arises from a maximum entropy derivation by only constraining on total abundance. However, the second result and the whole emphasis on the generality and the power of the invariance approach in ecology is a true novel contribution to the field.
The authors state, in their conclusions, that "the invariance approach is the only way to relate process to pattern". The authors emphasize that, in order to uncover the plausible underlying mechanisms underlying an observed pattern, one needs first to pay special attention to the general invariant form of probability patterns. In the future, I would like to see the authors' invariance approach to apply, more generally to other patterns in ecology.

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
No source data required Are the conclusions drawn adequately supported by the results? Yes No competing interests were disclosed. Competing Interests: Reviewer Expertise: community ecology, population biology, infectious diseases, biodiversity research, climate change, environmental forcing, stochastic birth-death processes, non-linear interactions, self-organization, complex systems I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com