The invariances of power law size distributions

Size varies. Small things are typically more frequent than large things. The logarithm of frequency often declines linearly with the logarithm of size. That power law relation forms one of the common patterns of nature. Why does the complexity of nature reduce to such a simple pattern? Why do things as different as tree size and enzyme rate follow similarly simple patterns? Here I analyze such patterns by their invariant properties. For example, a common pattern should not change when adding a constant value to all observations. That shift is essentially the renumbering of the points on a ruler without changing the metric information provided by the ruler. A ruler is shift invariant only when its scale is properly calibrated to the pattern being measured. Stretch invariance corresponds to the conservation of the total amount of something, such as the total biomass and consequently the average size. Rotational invariance corresponds to pattern that does not depend on the order in which underlying processes occur, for example, a scale that additively combines the component processes leading to observed values. I use tree size as an example to illustrate how the key invariances shape pattern. A simple interpretation of common pattern follows. That simple interpretation connects the normal distribution to a wide variety of other common patterns through the transformations of scale set by the fundamental invariances.


Introduction
The size of trees follows a simple pattern. Small trees are more frequent than large trees. The logarithm of frequency declines linearly with the logarithm of size 1 . Log-log linearity defines a power law pattern. Power laws are among the most common patterns in nature 2 .
Power laws arise by aggregation over a multiplicative process, such as growth. Many processes in nature apply a recursive repetition of a simple multiplicative transformation, with some randomness 2 .
Aggregation over a random multiplicative process often erases all information except the average logarithm of the multiplications 3,4 . That average determines the slope of the power law line. In the case of tree size, we must also account for the fact that trees cannot grow to the sky. The upper bound on growth causes the frequencies of the largest trees to drop below the power law line.
That simple view of aggregation and the regularity of power laws contrasts with an alternative view. By the alternative view, the great regularity of a power law pattern suggests that there must be a very specific and particular underlying generative process. If the pattern of tree size is so regular, then some specific process of trees must have created that regularity.
To support the simple view of aggregation and regularity, I show that a normal distribution contains the same information as a power law size distribution. The distributions differ only in the scaling used to measure the distance of random variations in size from the most common size 5 .
The familiar regularity of pattern exhibited by the normal distribution (or equivalently the lognormal distribution, see Appendix) arises solely from the aggregation of underlying stochastic processes. Aggregation and stochasticity alone are sufficient to explain the regularity 6 . There is no need to invoke a detailed generative process specific to trees. Given the observed power law of sizes, maybe all we can reasonably say is that growth is a stochastic multiplicative process and that trees do not grow to the sky.
The trees provide an example of deeper principles about pattern and process in biology. What exactly are those principles? How can we use those principles to gain insight into biological problems?
To start on those questions, the next section presents an example of tree size data. Those data follow a power law with an upper bound on size. I show that those data also match almost exactly to a normal distribution when scaled with respect to a natural metric of growth.
The normal distribution and the power law pattern express the same underlying relation between pattern and process. That underlying relation arises from a few simple invariance principles. I introduce those invariance principles and how those principles shape the common patterns of nature 5 . Figure 1A shows the distribution of tree size in a tropical forest 1 . Most of the trees lie along the green power law. The largest trees, Figure 1. (A) Tree size, z = d 2 , in which the squared diameter, d 2 , is proportional to the cross sectional area of the stem, and d ranges over approximately 11-2800mm. The green line shows great regularity of pattern as a power law over the range that covers almost all probability. The largest trees, beyond the green power law line, comprise only a small fraction of all trees, because of the logarithmic scaling of frequency. (B) The blue line is log q z = log k − λT z , with T z = log(1 + az) + γ z, and parameters λ = 1.06, a = 0.004, and γ = 7 × 10 −7 , with log k shifting curve height and total probability. (C) The fitted blue line in panel B is a classic normal distribution with variance 1/2λ when plotted as q z ∝ e −λT z versus ± z T , with respect to z as a positive parameter. In this plot, the metric is shifted so that the most common type associates with a value T z = 0. Data approximated from Figure 4 in Farrior et al. 1 .

Amendments from Version 2
The version modifies a couple of sentences in the main text noting an equivalence of the normal and lognormal distributions and pointing to the extended discussion of the lognormal distribution in the Appendix.

REVISED
beyond the line, comprise only a small fraction of all trees, because of the logarithmic scaling of frequency.
The blue curve in Figure 1B closely fits the observed pattern. That curve expresses the natural metric for variation in tree size, z, as This metric relates size to a logarithmic term for multiplicative growth plus a linear term for an upper bound on size. There is no additional information in the fitted curve beyond this natural metric.
The normal distribution in Figure 1C expresses exactly the same information about the distribution of tree sizes as the fitted curve in Figure 1B. The normal distribution follows from the expression of size variation in terms of the natural metric, T z . I derive these conclusions in the following sections.

Natural metrics
The pattern of tree size can be understood by considering Tz as a natural metric for size. A natural metric expresses a shift and stretch invariant scale for an observed probability pattern 5 . Shift, by adding a constant to a natural metric, does not change observed pattern. Stretch, by multiplying the metric by a constant, does not change pattern.
Ideally, a natural metric also expresses the relation between underlying process and observed pattern. However, we can be right about the proper natural description of observed pattern but wrong about its underlying cause. It is important to distinguish description from causal interpretation.
The next section describes the natural metric for tree size with respect to the fundamental invariances of shift and stretch. I discuss the panels of Figure 1 as simple expressions of the natural metric.
The following sections consider how to interpret natural metrics, the description of observed pattern, and the analysis of underlying process. The presentation here extends the underlying abstract theory to the interpretation and intuitive understanding of empirical pattern. Technical details can be found in the cited articles.

The metric of tree size: affine invariance
The data 1 in Figure 1 arose from measurements of trunk diameter, d. I sought a natural metric based on d that describes the data in a shift and stretch invariant manner 5 .
How does one find a shift and stretch invariant natural metric that matches an observed pattern? In practice, one uses the extensive underlying theory and prior experience in what often works 3,4,7,8 . I achieved an excellent fit to the observed tree size data in Figure 1B based on the metric, Tz, in equation 1. I summarize the steps by which I arrived at that metric.
The data form a probability distribution. Probability patterns have a generic form. Measurements, z, relate to the associated probability, q z . The natural metric, Tz, transforms measurements such that the probability pattern has the exponential form in which λ adjusts the stretch of Tz, and k adjusts the total probability to be one.
Probability patterns in the exponential form are shift and stretch invariant with respect to the metric, Tz. In particular, the affine transformation of shift and stretch, Tz ↦ α + βTz, is exactly compensated by adjustments of k and λ, leaving the probability pattern invariant.
Intuitively, we can think of affine invariance as defining a ruler that is linear in the metric, Tz. In a linear ruler, it does not matter where we put the zero point. The information in measurement depends only on the distance from where we set zero to where the observation falls along the ruler. That independence of the starting point is shift invariance.
Similarly, if we uniformly stretch or shrink the ruler, we still get the same information about the relative values of different measurements. All we have to do is multiply all measurements by a single number to recover exactly the same distances along the original ruler. The metric Tz provides information that is stretch invariant.
To fit the data of Figure 1A, we have to find the matching affine invariant metric, Tz, for probability expressed in the exponential form of equation 2.

The metric of tree size: scale
Most natural metrics are simple combinations of linear, logarithmic, and exponential scaling 4,8 . For example, in the metric Tz = log z+γ z, the logarithmic term dominates when z is small, and the linear term dominates when z is large. The metric scales in a log-linear way. Change in scale with magnitude often occurs in natural metrics.
Roughly speaking, the linear, logarithmic, and exponential scales correspond to addition, multiplication, and exponentiation. Those arithmetic operations are the three primary ways by which quantities combine. One can think of numbers combining additively, multiplicatively or exponentially at different magnitudes, depending on the way in which process changes with magnitude.
Small trees tend to grow multiplicatively, and large trees tend to scale linearly as they approach an upper size limit. Farrior et al. 1 used logarithmic scaling at small magnitudes and linear scaling at large magnitudes. However, they did not express a metric that smoothly changed the proportion of the two scalings with magnitude. Instead, they switched from log to linear scaling at some transition point.
The observed data fit roughly to a pure log-linear metric, T z = log z + γ z, with z = d as tree diameter. I obtained a better fit by modifying this metric in two ways to obtain the expression in equation 1.
First, I used the square of the diameter, z = d 2 , which is proportional to the cross sectional area of the trunk at the point of measurement. Various intuitive reasons favor area rather than diameter as a measure of size and growth. However, I ultimately chose area because it fit the data.
Second, I replaced log z by log(1+az). On a pure log scale, log z explodes to negative infinity as z approaches zero. In application to positive data, such as size, it almost always makes sense to use log(1 + az). This expression becomes smaller in magnitude as z declines. The parameter a scales the rate of change with respect to the point of origin.
Size distributions often follow the metric, Tz = log(1+az)+γ z. Of course, not all distributions follow that pattern. But one can use it as a default. When observations depart from this default, the particular differences can be instructive.

Interpretation of natural metrics
The natural metric of a probability pattern transforms observed values on the scale z into probability values on the scale Tz. Through the natural metric, the particular pattern on the observed scale, z, becomes a universal probability pattern in the natural metric, Tz.
One can understand the intuitive basis of natural metrics by considering the properties of the universal probability scale. Probability patterns are often discussed with words such as information or entropy 9 . Those words have various technical and sometimes conflicting definitions. But all approaches share essential intuitive concepts.
Surprise expresses the intuition 10 . Rare events are more surprising than common events. Suppose a particular size, z, occurs in one percent of the population, and another size, z′, occurs in two percent of the population. We will be more surprised to see z than z′. How much more surprised?
Surprise is relative. We should be equally surprised by comparing probabilities of 0.01 versus 0.02 and 0.0001 versus 0.0002. Each contrast compares one event against another that is twice as common.
What is a natural metric of probability that captures these intuitive notions of surprise? For probability, q z , the surprise is defined as We compare events z and z′ by taking the difference log log log . The relation between the universal metric of probability, z S , and the natural metric for a particular observed scale, Tz, follows from the exponential form for probability 5 Here T′z transforms increments along the observable scale, dz, into increments along the universal scale of probability pattern, d z S . All of the information that relates observation to probability pattern is summarized by the natural metric, Tz.

Generative process: generic vs particular
What underlying generative process leads to an observed pattern? We must separate two aspects. Generic aspects arise from general properties of aggregation, measurement and scale that apply to all problems. Particular aspects arise from the special attributes of each problem.
Confusing generic and particular aspects leads to the greatest misunderstandings of pattern and process 3,4 . For example, the observed pattern in Figure 1 perfectly expresses generic properties. Aggregation leads to the normal distribution by the central limit theorem ( Figure 1C). The natural metric of size, T z , relates the normal distribution to power law and exponential scaling in Figure 1A,B, when probability is plotted with respect the logarithm of the observed values, z.
In the tree size data, simple generic properties account for all of the observed pattern. I do not mean that there is nothing particular about trees or that we cannot study how ecological processes influence tree size. I mean that we must not confuse the generic for the particular in our strategy of inference 3,6,11 .
This article focuses on generic aspects of pattern. The following sections discuss those generic aspects in more detail.

The normal distribution and generic pattern
One often observes great regularity in probability patterns. Tree size follows a power law with an upper bound. Other measurements, such as height, weight, and enzymatic rate, also express regularity, but with different patterns.
A single underlying quantity captures the generic regularity in seemingly different patterns. That underlying quantity is the average distance of observations from the most common type 6 .
The key is to get the correct measure of distance, which is the natural metric.
The normal distribution is a pure expression of the generic regularity in probability patterns. In the normal distribution, the variance is the average distance of fluctuations from the mean.
In the normal distribution, the natural metric is the squared deviation from the mean, Tz = z 2 . Here, z is the observed deviation from the mean, and Tz is the natural metric for distance. The normal distribution follows from the standard expression of probability patterns in equation 2, repeated here with v = k, as The average of the squared deviations, Tz = z 2 , is the average distance of fluctuations from the most common type, which is the definition of the variance, σ 2 . We can express the parameters in terms of the variance from which we derive the commonly written form for the normal distribution as The normal distribution is universally known but rarely understood.
Interpreting the powerful generic aspect of probability patterns often reduces to correctly reading this equation.
The standard expression for the normal distribution in equation 6 seems obscure. By understanding that equation 4 expresses the same information in a much more general and broadly applicable way, we learn to read the simple generic aspect of common pattern.
The key arises from the relation between the natural metric, Tz, and the measurement scale, z, used to express the pattern.

Metrics of probability and measurement
This section discusses key aspects of the natural metric transformations, Tz, of the underlying measurements, z. The understanding of probability pattern arises from these key aspects of the natural metric.
Suppose that two observers measure the same pattern. One uses a ruler that follows the scale, z. Another has a logarithmic ruler that returns logarithmic values, log z, for the same underlying values. The two observers do not know that they are using different scales.
When the two observers plot their data, each will see a different probability pattern. The plot of q z versus z differs from the plot of q z versus log z.
Similarly, two observers may see different patterns of human size if they measure different things. Suppose one observer measures femur length, the other measures cross sectional area of the chest. The probability patterns of femur and chest size differ. But the different patterns reflect the same information about the underlying size variation in the population.
What is the best way to find the relation between different observed values and the common underlying information about variation? Often, the natural metric for each observed scale provides the universally comparable scale for probability pattern.
That universally comparable scale can be used to express variation as a normal distribution.
When an observed probability pattern matches the normal distribution, then the variance summarizes all of the information in the pattern 6 . We can write the variance, σ 2 , which is the average of the squared distance for fluctuations from the mean, as in which the angle brackets denote the average value of z 2 , and the subscript z means that the average is taken with respect to the underlying scale, z.
The great generality of the normal distribution arises from a broader concept of the average distance of fluctuations from a central location The left shows the standard definition of the variance as the average squared distance from a central location. The right generalizes that notion of average squared distance by using the average of the natural metric, Tz, in which the average is taken with respect to the square root of the natural metric, . z T Here, Tz is shifted so that the most common type associates with Tz = 0, and the metric expresses fluctuations from the most common type 5 .
On the left, we average z 2 with respect to z. On the right, we average Tz with respect to . z T The general form on the right-hand side includes the left-hand side as the special case of Tz = z 2 .
The key conclusion is that common probability patterns expressed in their natural metric q z = ve −λTz are normal distributions when plotting q z versus ± .

Natural metrics and generic forms
The tree size data match almost perfectly to the generic normal distribution ( Figure 1C). I discuss that match in terms of universal properties of the normal distribution, given in the prior sections.
Tree size variation follows a simple log-linear natural metric, Tz. That metric and its associated probability pattern q z = ke −λTz closely fit the data. Figure 1B shows the fit when plotting log q z versus log z. Figure 1C shows that the same observed variation closely fits a normal distribution when plotting q z versus ± . z T The generalized variance is the average squared fluctuation of tree size from the most common type, when squared fluctuations are expressed by the natural metric, and fluctuations are measured by the square root of the natural metric. By the generalized notion of the variance in equation 7, all of the information in the observed distribution of tree size is contained in the average distance of fluctuations, measured in the natural metric.
The transformation of data into a normal distribution is sometimes considered a trivial step in the statistical analysis of significance levels. Here, in contrast, the natural metric and the associated expression in normal form provide an essential step in the general understanding of pattern and process.
Later sections discuss why the normal distribution arises as the simple expression of pattern in relation to natural metrics. Before turning to those concepts, I present another example.

Dimensional inversion and metric pairs
Natural metrics sometimes come in pairs 4,7 . For example, rates and frequencies follow dual metrics. Rates have dimensional units S/t, in which S is a generic size or number unit, and t is a time unit. A growth rate for trees may be given in terms of the change in size per year. A chemical reaction rate may be given as the number of molecules produced per unit time.
The inverse of a rate has units t/S. That inverse expresses the time to grow larger or smaller by a particular size unit, or the time to produce a particular number of molecules.
This section illustrates the common dual metrics for rates and times. The dual metrics yield different probability patterns that contain exactly the same underlying information. Each metric takes on the same common normal distribution form when stochastic fluctuations are measured by the metric relative to its square root.
To illustrate the dual metrics, I use the measured rates of chemical reactions for individual enzyme molecules given by Iversen et al. 12 . The measurements produce a probability pattern for the distribution of reaction rates. The measurements are not sufficiently precise to determine exactly which natural metric fits the data.
I made an approximate fit to the data by using the natural metric in equation 1, which I previously used to fit tree size. My only purpose here is to illustrate typical aspects of rate and frequency patterns, rather than to over-analyze the limited data available in this particular study. Figure 2A shows the fitted distribution of reaction rates. The rates are in molecules per second, r, with units S/t. The colors in the curve express the change in the scaling relations of the natural metric as magnitude increases. The natural metric from equation 1, repeated here with r = z, is When r is small, linear scaling of T r dominates, as shown by the blue coloring. As r increases, logarithmic scaling dominates, as shown by the gold coloring. Figure 2C, covering a greater range of r values, shows that further increase in r leads to linear dominance of scale, as shown by the green color. The upper linearity expresses the bound on size or number. Trees do not grow to the sky. Reaction rates do not become infinitely fast. Figure 3 shows the tree size data colored by the linear-log-linear transitions.
The probability pattern for rates, S/t, has a natural dual pattern expressed by inverted units for time, t/S. We can invert units by the Laplace transform 4,7 . The inversion leads to an altered probability pattern based on the natural metric with α = 1 − λ and d = γ λ. The parameters match the paired metric, T r . The common value of λ shared by the paired distributions arises from the full expression for probability patterns in equation 2. The probability pattern for time, arising from T τ , is a gamma distribution shifted by d.
The time per molecules pattern in Figure 2B matches the dual enzyme rate pattern of molecules per time in Figure 2A. The dual distributions express the identical information.
Dimensional inversion associates the various linear-log-linear scales between the two forms 4,7 . The linear, blue component at small magnitude in the upper panel matches the long blue tail at large magnitude in the lower panel. Put another way, slow rates, r, correspond to long waiting times, τ.
In the top, the gold logarithmic component for high rates matches the lower gold component for short waiting times. For very high rates, r, we have to look at Figure 2C. The upper green linear tail corresponds to the rapid decline in the probability of observing extremely high rates, associated with the natural upper bound on rates. The green upper bound on rates matches the green lower limit on times in Figure 2B. If extremely rapid rates of reaction, r, are very rare, then no reactions will produce molecules in very short time periods, τ. That limitation produces the green shift at small times in Figure 2B. The dual natural metrics of rate, T r , and time, T τ , correspond to similar expressions of the normal distribution 5 in Figure 2D. In general, different probability patterns expressed in different metrics, T, become normal distributions when fluctuations from the most common value are measured by ± T .

Aggregation and asymptotic invariance
Why do tree sizes and enzyme rates match a simple natural metric? Why do a few simple natural metrics match most of the commonly observed patterns? Part of the answer arises from the way in which aggregation leads to simple invariant pattern.
The top rows of Figure 4 illustrate aggregation and invariance. Each row begins on the left with two regular polygons, randomly rotated about their center. Columns to the right add more randomly rotated components. As the random rotations aggregate, the shape converges asymptotically to an invariant circular form.
Random rotation causes loss of information about the angle of orientation. In the aggregate, the asymptotic form is rotationally invariant. In other words, the circular shape remains invariant no matter how it is rotated. A circle expresses pure rotational invariance.
The bottom two rows illustrate aggregation and the invariant pattern of the normal distribution. Each row begins on the left with a probability distribution. For each distribution, the horizontal axis represents observable values, and the vertical axis represents the relative probability of each observed value. I chose the shapes of the distributions to be highly irregular and to differ from each other.
The second column is the probability distribution for the sum of two randomly chosen values from the distribution in the left column. The third, fourth, and fifth columns are, respectively, the sum of four, eight, and 16 randomly chosen values. The greater the aggregation of randomly chosen values, the more perfectly the pattern matches a normal distribution. Adding randomly chosen values often causes an aggregate sum to converge asymptotically to the invariant normal form.

Natural metrics and a universal scale
The invariant normal form expresses a universal scale. That universal scale clarifies the concept of natural metrics. To understand the universal scale, we begin with the fact that the same pattern can be described in different ways.
Consider enzyme catalysis. Fluctuations can be measured as the rate of molecules produced per unit time. Alternatively, fluctuations can be measured as the interval of time per molecule produced. Figure 2A, B show the dual expression of the same underlying information.
The dual measurement scales each have their own natural metric. A natural metric transforms a particular measurement scale into a universal scale that expresses the common underlying information.  Figure 1B. This distribution has the same natural metric as in Figure 2C, but with different parameters. The curve is colored to show the change in the scaling of the natural metric with increasing magnitude as linear (blue), logarithmic (gold), and linear (green).
A metric is natural in the sense that it connects a particular scale of observation to a common universal scale.
The normal distribution purely expresses the universal scale. Suppose we begin with different scales of measurement, such as the rate of molecules produced per unit time and the interval of time per molecule produced. Each scale has its own distinctive pattern of random fluctuations, as in Figure 2A, B. When we transform each scale to its natural, universal metric, Tz, the pattern of random fluctuations follows the normal distribution ( Figure 2D).
A normal distribution expresses information only about the average distance of fluctuations from the most commonly observed value. If we measure distance for different underlying measurements in their natural metrics, then that distance is the universal form of variance in equation 7 as The generalized variance expresses the average deviation of the natural metric relative to the square root of the natural metric.
Why is the relation between a natural metric and its square root the universal measure of scale and also the expression of the normal distribution? The answer concerns how rotation and aggregation lose information and leave an invariant pattern (Figure 4).
The next section discusses rotational invariance and its relation to the universal scaling of the normal distribution. The following sections return to tree size and other commonly observed size distributions. The concepts of rotational invariance and the normal distribution clarify why the natural metric for tree size, given in equation 1 as Tz = log(1 + az) + γ z, is a common natural metric for size patterns.

Rotational invariance
To understand the universal scale of the normal distribution, we begin with circles and rotational invariance ( Figure 5). Simple geometric concepts provide the key to natural metrics, universal scales, and the structure of commonly observed patterns.
A circle expresses a rotationally invariant radial distance from a central location. In Euclidean geometry, squared distance is the sum of squared values along each dimension. Invariant radial distance in two dimensions, x 1 , and x 2 , may be written as R 2 = x 1 2 + x 2 2 . The points (x 1 , x 2 ) at constant radial distance lie along the circle. The radial distance is rotationally invariant to the angle of orientation. The circular pattern is also invariant to interchange of the order of x 1 and x 2 .
We can think of the rotationally invariant circle as a way to decompose a given value into components. If we start with any observed value and equate that value with a radial distance, R 2 , then the observed value is equally consistent with all points (x 1 , x 2 ) that satisfy the circular constraint, We can break up a given value into n components, R 2 = ∑x i 2 , which is the invariant radial distance of a sphere in n dimensions. Changing the order of the components does not change the radial distance. Rotational invariance implies order invariance of the component dimensions. Figure 4 illustrates how aggregation leads to invariant distance. The top two rows aggregate randomly rotated shapes. Initially, the rows differ, because they begin with different shapes in different orientations. However, after adding many shapes, the aggregate patterns converge to the same circular form, because the order no longer matters in a large sample. The pattern of distance from the center becomes the same in every direction.
The lower two rows of Figure 4 show a similar aggregate tendency to an invariant measure of distance. On the left, the initial patterns differ. As more samples are added, all information is lost except the average distance of fluctuations from the center.
The rotational invariance of circles relates to the invariance of average distance in the normal distribution 5 . In both cases, the squared distance is the standard Pythagorean definition of Euclidean geometric distance as the sum of squares. To see the connection between the rotational invariance of circles and the average distance of fluctuations in the normal distribution, we begin with an observed value and consider how it might have arisen by the aggregation of underlying components.

Aggregation and natural metrics
Suppose we transform an observed value, z, into a natural metric value, Tz. What different aggregations would lead to the same value of Tz? If we think of Tz = R z 2 as a radial distance, we can evaluate the combinations of underlying values that lead invariantly to the same radial distance 5 .
Previously, we partitioned squared radial distance as We can equate the explicitly squared radial distance to the implicitly squared natural metric, R z 2 = Tz. Similarly, we can equate the explicitly squared component dimensions to the implicitly squared dimensions, x 2 = y, or equivalently, x = y . Then R z 2 = Tz can be written as  For the natural metric, Tz, the square root scale, T , is the natural scale of distance, aggregation, and rotational invariance.

The normal distribution
The prior section emphasized that the natural metric Tz = R z 2 has the square root z T = Rz as its natural scale of distance. This section relates the normal distribution to this association between natural metrics and radial distance. See Frank 5 for additional details.  If we shift Tz so that it is expressed as a deviation from its minimum value, then for many natural metrics, Tz, the probability pattern in equation 8 is a normal distribution with respect to the incremental scale d z T = dRz. The distribution is centered at the minimum of Tz and has average distance of fluctuations from the central location as the generalized variance, σ 2 .
Different natural metrics can often be expressed in this normal form. Thus, the rotationally invariant normal form expresses a universal scale ( Figure 2D).
Rotational invariance often implies invariance with respect to the order of observations in an aggregate. Order invariance connects the asymptotic rotational invariance of circles and natural metrics to the asymptotic form of the normal distribution in Figure 4. Thus, the normal distribution, expressed in natural metrics, provides a universal scale for understanding probability pattern.

Inductive: observed metric to universal scale
How does one find natural metrics? For tree size and chemical reaction rates, I began with the observed probability pattern. From those data, I found a natural metric that fit the observed pattern. In those cases, I chose the natural metric based on the fact that patterns of size and reaction rate tend to follow a particular, commonly observed natural metric. This inductive approach matches a natural metric to a particular problem. The natural metric can then be used to transform the observed pattern into the universal scale of the normal distribution. What do we learn by this inductive fit of a metric and subsequent transformation to the normal form?
We have a good sense of the normal distribution as the outcome of simple aggregation and its connection to rotational invariance ( Figure 4). Thus, once we find the proper scaling through the natural metric, we can think of an observed probability pattern an an expression of the normal form on a different scale.
For example, we can think of tree size as following a normal distribution when we express size, z, in the natural metric Tz = log(1+az)+γz. The normal form follows by expressing Tz relative to the most common size as the squared distance of a random fluctuation in relation to the distance, . z T By recognizing the universal normal form, we can see that different measurements of the same underlying pattern express the same information. In Figure 2, the different probability patterns for rate and time have a common normal expression. Of course, many patterns that arise from unrelated processes also have the normal form.
The key is that the structure of commonly observed pattern arises from the generic processes of aggregation and rotational invariance, when evaluated with the proper natural metric, rather than from the special attributes of particular processes. That conclusion is simply the well known principle of statistical mechanics.
The principle of statistical mechanics is both well known and frequently ignored in the study of pattern. The reason is that the different scales on which observed patterns arise tend to obscure the underlying commonality. The point here is that one can understand natural metrics and universal scales in a rational way, and thus connect abstract principles to real problems in ways that have often been missed.

Deductive: universal scale to predicted metric
The inductively fit metric expresses the essence of an observed pattern. But the fit does not tell us about the generative process that led to that particular metric.
Ideally, one would deduce the appropriate natural metric for a problem by considering the generative process and the necessary invariances that must be satisfied. For example, tree size must depend on growth processes, and the consequent probability pattern likely satisfies shift, stretch, and rotational invariance. However, three difficulties arise.
First, the relations between process, measurement and pattern can be obscure. For tree size, what is the proper scale on which to measure the consequences of growth, competition, and other processes?
We could use trunk diameter, d, or cross-sectional area, proportional to d 2 , or a fractal exponent of diameter, d s , or another size measure correlated with diameter.
The natural metric is often the scale that aggregates additively, leading to patterns that tend to be shift, stretch, and rotationally invariant. However, what we measure may be a complex transformation of that underlying scale. Inductive fit gets around the problem by describing the pattern and its associated invariant scale, rather than trying to deduce the processes that caused the observed pattern.
Second, multiple processes may shape pattern. Different processes may dominate at different scales. For example, exponential growth may dominate among smaller trees, whereas a bound on maximum size may dominate among larger trees. In general, different processes may dominate at different magnitudes. Predicting the metric that fits observations requires proper combination of the different underlying processes.
Third, natural metrics express the patterns that arise by loss of information, subject to a few minimal constraints of invariance. Because aggregation dissipates information, many seemingly distinct processes will generate the same observable pattern. Common patterns are common exactly because they match so many distinctive underlying processes 3 . The natural metrics of common patterns reflect only the similarities of the simple invariances. Most of the special attributes of different generative processes tend to disappear in the aggregate.

Deductive: tree size example
Tree size depends on growth, on limits to maximum size, and on a variety of other factors. Here, I give a simple introduction to natural metrics that arise from growth. I do not include bounds on size or other processes. I do not include difficulties of measurement. In spite of those limitations, this simplified analysis of growth and natural metrics provides insight into commonly observed probability patterns.

I begin with the form
which is a normal distribution when we measure increments on the square root scale, d z T . The normal distribution arises when we consider Tz values to be an aggregate sum of component values.
For tree size, the problem concerns how the aggregation of random growth increments leads to the observed size. We can split total growth into t increments. Each incremental unit multiplies current size by e g i , in which g i is the growth rate in the ith increment. The average growth per increment is 1 1 .   (9) in which w = gt is the sum of the t incremental growth rates.
The variable w provides a natural base scale for growth, because it expresses the aggregate sum of growth components. The sum is invariant to the order of the components. Thus, the total of the incremental growth rates can be thought of as a rotationally invariant radial distance.
Natural metrics arise from shift and stretch (affine) invariance to transformations of their base values 4,7,8 . Thus, a natural metric, T(w) ≡ T w , for the base scale, w, arises from affine invariance to a generator transformation, G(w), such that for some constants α and b. If we consider to be a shift of the growth rates, so that the shape of probability patterns for size does not depend on adding a constant value to growth rates, then a natural metric for size with respect to growth is in which β is a positive parameter. This metric remains affine invariant to a shift of the base scale, w ↦ δ + w, because

T [G(w)] = e β(δ+w) = bT(w)
for b = e βδ . The metric T w is perhaps the most generic and important form of all natural metrics. Its application to growth is a special case of its underlying generality. I discussed this metric extensively in earlier articles 4,8 . Here, I confine myself to the problem of growth in relation to size.
The natural metric T w associates with the probability pattern when measured with respect to the incremental scale, dT w . If we wish to express the probability pattern with respect to measurements of growth rate, on the incremental scale dw, note that dT w = βT w dw = βe βw dw, yielding the probability pattern when measured with respect to the incremental base scale, dw, as in which, as always, k adjusts so that the total probability is one.
Suppose we wish to transform from growth, w, to size, z, in which w(z) expresses growth as a function of size. If w increases with z, then we can write dw = w′dz, in which w′ is the derivative of w with respect to z. The generic probability pattern becomes q z = ke −λTz = ke log w′+βw−ϕe βw (10) with respect to the incremental measurement scale, dz.
In the tree size example, w is the aggregate growth rate. Let z 0 + z be size, with z 0 as initial size, and z as the increase in size by growth, thus implying that w as a function of z is w = log(1 + az). (12) In this particular derivation, a = 1/z 0 . However, one should not interpret parameters literally. Different generative processes will lead to the same form, with alternative assumptions about process and parameters. Ultimately, the invariant properties of the metric capture the essence of common pattern. This particular derivation is meant only to show one way in which a metric arises.
We can use equation 12 to write the probability pattern of equation 10 explicitly in terms of the increase in size by growth, z, as with respect to the incremental scale, dz, yielding for β < 1, and dropping constants of proportionality. For certain parameter combinations, this probability pattern will be similar to the pattern for the size metric Tz = log(1+ az) + γz.
I presented this derivation to encourage future study. The proper way to relate general growth processes to invariant probability patterns remains an open problem.

Conclusion
Probability patterns often follow a few simple scaling relations. Those scaling relations define natural metrics. A natural metric transforms measurements to a universal scale. On the universal scale, the average distance of random fluctuations from the most commonly observed value defines a generalized variance. When observed values arise by aggregation of random processes, that aggregation erases all information except the average fluctuation, the generalized variance.
Many different probability patterns become a normal distribution when expressed on the universal scale of natural metrics. The only information in each distribution is the generalized variance. Transforming the natural metric distance back to the underlying observed values yields the standard description for probability pattern on the scale of the observed measurements.
The great regularity of observed patterns, such as power laws, often arises from the same aspects of aggregation and invariance that lead to the normal distribution. A power law pattern and a normal distribution may simply be different transformations of the same underlying pattern.
The transformations arise from measurement and from the invariances that define scaling relations and natural metrics 4,5,7,8 . These key aspects of scale provide the framework in which to study the relations between pattern and process.
Author contributions SAF did all the research and wrote the article.

Competing interests
No competing interests were disclosed.

Grant information
National Science Foundation grant DEB-1251035 supports my research.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Appendix: lognormal and power law distributions
The lognormal distribution often fits well to observed patterns. For example, many size distributions with power law attributes match reasonably well to the lognormal form. Conceptually, the lognormal distribution is easily understood by simple analogy with the normal distribution and the central limit theorem. For these reasons, the lognormal distribution is frequently used in empirical study.
In this Appendix, I briefly introduce the lognormal distribution and its relation to power law distributions. I then explain the conceptual limitations of the lognormal distribution, and why I did not mention the lognormal distribution in main text. Finally, I show how pure power laws arise from metrics given in the text.

Approximate lognormal size distributions
In the last full section of this article, I described a simple model of growth and consequent size. I split total growth into t increments.
Each incremental unit multiplies current size by e g i , in which g i is the growth rate in the ith increment. The average growth per increment is 1 1 .  If growth in each increment, g i , is a random variable, then w is the sum of random growth variables. The sum of the growth variables will sometimes converge to an approximately normal distribution by the central limit theorem. The approximation to normality may be close or far off depending on particular aspects of the growth process.
The lognormal distribution is defined as the exponential of a normal distribution. Thus, if w is normally distributed, then size, y = e w , has a lognormal distribution of the form q y = ke −λ(log y−μ) 2 −log y on the incremental scale dy, in which λ = 1/2σ 2 with σ 2 as the variance of the normal distribution in y.
Lognormal distributions sometimes match reasonably well to power law patterns. As the variance in the underlying normal distribution becomes large, λ becomes small, and the lognormal distribution becomes approximately q y ≈ ke −log y = ky −1 .
This limiting form approximates a power law, because the log-log plot of log q y versus log y is approximately a straight line with a slope of minus one.
Conceptual limitations of the lognormal I did not mention the lognormal distribution in the text because of its conceptual limitations. The lognormal distribution is identical to the normal distribution. If a variable, w, is normally distributed, then the variable y = e w has a lognormal distribution. Conceptually, there is no difference between the normal and lognormal distributions. If a pattern is normal it is also lognormal, and vice versa.
Nothing is gained or lost by using one form or the other.
In this article, I showed how one can change a wide variety of different distributions into the normal distribution. My changes truly altered the patterns of the different distributions, showing their broad conceptual unity. For example, if one has a gamma distribution q z = ke −λ(log z+γ z) = ke −λT z , then plotting q z versus z T ± leads to a normal distribution. Here, I have related the gamma pattern to the normal form, relating two distinct probability patterns to each other. A similar approach works for many different metrics, T z .
The value in relating distributions in this way arises from two aspects. First, the common form of distributions in terms of metrics T z follows from the two simple invariances of shift and stretch. Second, the relations between different distributions and the common normal form arises from the third invariance of rotation. Those three invariances together provide a unified framework for understanding commonly observed pattern. The lognormal does not add to that understanding, because it simply expresses the normal pattern in a slightly different way, therefore sitting outside of the conceptual framing that is the topic of this work.

Pure power laws
The metrics in equation 1 and equation 13 include pure power law forms as special cases. As γ → 0, the metrics become T z → log(1 + az), and the distribution becomes which is the classic Lomax or Pareto Type II distribution. That distribution is a power law for large values of z. As a becomes large, the distributions become a pure power law form in which k adjusts to maintain a total probability of one.
I now see how my previous critique was a bit of a misunderstanding on my part. The normal and log normal are really the same pattern (as stated in the appendix). Once you see that they are the same, one of the main ideas becomes much clearer.
I have . I think people will benefit by being pointed to the appendix early as the rest of the one suggestion paper is more easily understood (at least by me) by first reading the description of normal, log normal, and power law because we get the functional forms. I would suggest one modification.

Change the sentence:
"The normal distribution calls to mind the great regularity in pattern that arises solely from the aggregation of an underlying stochastic process."

to the following
The familiar regular pattern exhibited by the normal distribution (or equivalently the log normal distribution -) arises solely from the aggregation of an underlying stochastic process or processes. see appendix Not a deal breaker for me if they don't add this but it would be a loss if they did not point to the appendix with a teaser. I also think that by saying "normal and log normal" are the same pattern, the reader will suddenly get it!!!! I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. This is a provocative paper that should have high impact. It makes both a scholarly and educational contribution. I expect it to be widely cited and taught. When indexed, it will appear on the syllabus of a graduate class that I teach.
That said, I have one quibble with the paper. First some background. major The standard way to teach distributions goes as follows: Normal Distributions arise from adding or averaging variation (as nicely explained here) Log Normal Distributions arise from multiplying shocks Power Law Distribution have multiple causes: self organized criticality, preferential attachment, random walk return times, etc.
In this paper, Frank argues that we can connect some power law and log normal distributions to normal distributions by using different unit of analysis.
Let's take the standard story of why tree sizes have a log normal distribution. Trees grow by random rates each year. If rates of growth are proportional, then a tree of size S that has growth rates r(t) will be size (2) In 10 years. If I take the logarithm of that size, it will be additive in the shocks, and thus normally distributed.
Frank makes an alternative argument, that there is a natural metric for size, T = d and that this when transformed produces a normal distribution. If d is normally distributed, then d will be log normally distributed. He then makes a similar argument for enzyme reactions.
(The tree model is more complicated as he includes a linear term and a log term but this captures the main idea).
My quibble with the paper has to do with the difference between a or ( log normal distribution ) and . exponential a power law distribution Power law: y ~ x Exponential: y ~e If you plot a log normal distribution on a log log plot, you get the sort of curvature that Frank shows in many of his graphs. What he is calling power laws would be characterized by many as log normal or exponential.
I realize that his more general point is that the tree size is either log normal or a power law. However, not the paper would be much stronger and much clearer if he would make the following changes.
Clarify the difference between power law and log normal (including mention of the curve on the log log plot) Explain that many of the "long tails" in biology such as tree size have "long tails" that can be explained using his method that separates the generic causes of the distribution from the particular. He might even separate out the generic and particular in the tree growth so we see why he cannot just use the log normal distribution. Ideally, he would also show how his approach can produce a true power law.
I would be remiss to not add that the paper is a model of clarity of exposition and argumentation.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
No competing interests were disclosed. I agree with Page's main point that I should relate my approach to the widely used lognormal distribution and the common power law expressions. To address those issues, I have added an Appendix to the revised version. I show how the lognormal distribution arises as an approximate description of growth and size whenever one can reasonably consider the distribution of growth rates as approximately normally distributed. Because total growth can be thought of as the sum of sequential growth increments, total growth may often be approximately normally distributed as a consequence of the various central limit theorem type arguments. However, sums of random variables are often not normally distributed, so one must be cautious about the generality of lognormal distributions for size.
In my original article, I did not mention the lognormal distribution. I find it useful to distinguish practical applications from approaches aimed at broad conceptual understanding of the relations -a x practical applications from approaches aimed at broad conceptual understanding of the relations between different distributions. In my opinion, the lognormal distribution provides a useful practical tool, but has some limitations with regard to the conceptual aims of my article. I discuss these points in a newly added section of the Appendix.
The new Appendix also includes a section that shows how my approach leads to a true power law distribution.
Finally, Page discussed how one might separate the generic factors that shape probability distributions from the specific factors that influence particular observable patterns. I agree that separating the generic from the particular is a key aspect of predicting and explaining patterns. In my article, the section on "Deductive: tree size example" attempts to show how one might evaluate particular generative processes of tree growth and size within my broader framework of the invariances that shape the generic form of probability distributions. I say at the end of that section: "I presented this derivation to encourage future study. The proper way to relate general growth processes to invariant probability patterns remains an open problem." At present, I do not have anything sufficiently compelling to add, although I certainly agree that this is a key issue. A related article of mine, "Invariant death", emphasizes the duality of the generic and particular aspects of pattern, and adds some analysis on this topic (see This paper continues Prof Frank's investigation of the connections between observation and probability patterns in the natural sciences, in this case dealing specifically with power law size distributions. The paper is written in a tutorial style which is probably appropriate given the that the material covered is not standard reading in the life sciences. For those who are completely unfamiliar with the general subject area, I strongly advise reading reference [3] from the list in the current paper before or in conjunction with it.
Students in the life sciences often have a difficult relationship with concepts of probability and statistics. It should not be that way, of course, since biological systems are inherently stochastic, but it seems that as biologists we often approach the existence of variance in our observations as a problem to be got round, rather than as the very stuff of biology which it is our job to explain. Worse yet, many of us encounter the idea of "transformation" to make data more normal as an entirely ad hoc, opaque, process that seems to rely on rules of thumb learned by rote, with no explanation as to why a particular transformation would be appropriate in some situations and not others. Frank's paper offers an altogether more satisfying perspective on the subject of transformation. In addition to a wealth of other insights, this paper lays out a well-grounded theoretical basis for understanding which transformations to seek if one wishes to preserve the information content of original, non-normal, observations but express it in terms of a corresponding normal distribution; the paper focuses in particular on the case of tree size data that conform to a power-law on their original scale of measurement. Of course, the paper is not intended as a tutorial on data transformation (the fact that students could learn to think of transformation in a new, richer, sense 1.
data transformation (the fact that students could learn to think of transformation in a new, richer, sense from reading this work is a by-product) but more an introduction to a different perspective on biological observation and its relationship with the probability distributions to which the observations conform.
Three key ideas carry the paper along: "A single underlying quantity captures the generic regularity in seemingly different patterns. That underlying quantity is the average distance of observations from the most common type." A natural metric will exist, as some transformation of the original scale of measurement, such that when considered in terms of the square root of the natural metric, average distance of observations from the most common type will follow a normal distribution.
With the transformation to the natural metric properly chosen, the information content of the original data distribution and the normal distribution for the data, when expressed in its natural metric, are the same (the invariance property).
The general form ( ) = exp(-( , )) in which adjusts so that ( ) is a proper density function and , p y K L y x K p y L(y x ) is any desired distance measure is, in the words of Jorma Rissanen, a simple device used already by . In the present case it forms a density function for the distance between individual observations Gauss and the most common type. In other cases (such as those which Rissanen had in mind) it may measure the distance between model predictions and observed data. The simple device forms a link between the work presented by Frank and the extensive literature on coding, model selection and statistical inference. Exploring those links lies well beyond the scope of Frank's paper, but their shared basis in information theory and the notion of how much, and what, information can be obtained from Nature and then modeled is an area of research that biologists have largely ignored.
The paper includes a number of other points during the exposition of its central ideas. These, together with the complexity of the ideas themselves, and the fact that the notation used to lay out the numerous (but necessary) equations is subject to somewhat arbitrary-seeming substitutions mean that the paper needs close reading, in spite of the clarity of Frank's writing. Although signposts are provided along the way (often in the form of rhetorical questions) to let us know where we're going next, the paper would benefit from a more comprehensive section by section guide in the introduction, so that the whole journey can be seen in a single view.
Returning to my initial point concerning the way that the process of making data "more normal" is often learned by rote, the sections on deduction of appropriate natural metrics to express data should be particularly useful from a pedagogic perspective; they show that (at least in theory) an approach based on argument from principles is possible. Those who are unfamiliar with the ideas will still probably be left with the unwelcome impression that it will take considerable experience to become proficient at recognizing approaches that are likely to work, but one of the wider lessons of Frank's publications in this general area is that rather few approaches are likely to account for the majority of observations most of us will encounter. McRoberts mentions the duality between the understanding of probability patterns and the complementary problems of inference. I originally came to this subject through that connection, particularly through my study of Jaynes pioneering work (see references to my earlier work in my article). However, I was not aware of several of the explicit connections mentioned by McRoberts, which I appreciate learning about from his review.
McRoberts suggests that I provide "a more comprehensive section by section guide in the introduction." I often provide such a guide in my longer articles. In this case, I had that kind of guide in my early drafts. However, the technical underlying nature of the work made the overview into what seemed like more of an obstacle than an invitation to the article. So I dropped it, allowing me to move the article very quickly into the example of tree size that I use throughout to help connect the underlying abstractions to real-world problems. Perhaps it would be possible to write a helpful introductory guide, but I have not yet found the right expression.
The main difficulty with the current structure is that some readers may mistakenly focus on the tree size problem as the central message of the article. It is not. The main message is that we can understand almost all common probability patterns by a few simple underlying invariances. That understanding provides great insight into many aspects of commonly observed patterns, including patterns such as tree size.
One advantage of the F1000Research format is that I can submit a revised version at any time. For now, I will keep the current structure, while I continue to think about how to improve the presentation. I welcome comments from readers.
I have no competing interests. Competing Interests: