Sunday, March 23, 2008

Fat tails and outliers: A closer look

No, it's not about the Fat Tonys of the world, Taleb's proverbial cabdrivers who know at least as much about events as so-called experts, not because they're so smart, but because the so-called experts know far less than they think. But Fat Tony might appreciate the world of "fat-tailed" probability distributions, since they provide the mathematical way of capturing, in part, the phenomenon of the black swan: why large deviations from the mean ("outliers") are less common than small ones, but still much more common than expected on the basis of the normal or Gaussian bell-curve distribution.

The Gaussian distribution is used so much because of an important mathematical result, the Central Limit Theorem (CLT). It states that, if we consider a large number of instances of a random process, the collective "distribution of distributions" is Gaussian, if certain conditions hold. These conditions are that:
  • The individual instances making up the distribution must be independent of one another.
  • The moments, or weighted averages, of the original probability distribution must be finite.
What happens in the "large numbers" limit, if these conditions hold, is that, of all the moments of the original distribution, only three matter after the dust settles - the total population size, the mean, and the variance (the zeroth, first, and second moments - see below). All the other moments either vanish or are controlled by the first three. These three are exactly the ones needed to define a Gaussian bell curve.

A simple example. Let's consider a population of particular instances of some property or attribute, quantified by a random variable x, allowed to range from -∞ to +∞. Its probability density is f(x); within an infinitesimal range dx, the total number of instances between x and x+dx is f(x) dx. The cumulative number of all instances of x < X is the integral of f(x) from -∞ to X. Define the nth moment (or weighted area under the curve) as M(n) = ∫ xn f(x) dx. The non-negative integer n = 0, 1, 2, ... ∞.

The Gaussian with zero mean and variance of one is f(x) = exp(-x2/2)/√(2π). (The normalization is chosen such that M(0) = 1.) It is strongly peaked at x = 0 (the mean) and falls off rapidly for deviations from the mean.

The "fat tail" case occurs when, whatever f(x) is doing for small x, it decreases for large x as |x|-a, a > 0, apart from overall multiplicative constants. f(x) falls off for large x, but far more slowly than the Gaussian does. Then M(n) ~ ∫ |x|n-a dx. Replace the upper (lower) limit of the integral with +X (-X), X → +∞. Then M(n) ~ Xn-a+1. There are three possibilities:
  • n - a + 1 < 0. The moment M(n) is defined (convergent or finite).
  • n - a + 1 = 0. The moment M(n) is infinite, diverging logarithmically.
  • n - a + 1 > 0. The moment M(n) is infinite, diverging as a positive power.
For a "fat-tailed" distribution behaving this way, while some moments (for lower n) might be defined, the remaining moments n > a - 1 are undefined. Therefore the CLT does not hold, and it is not correct to use Gaussian-based statistical methods for such populations.*

Long before Fat Tony.... Such distributions are called, in the mathematical literature, Lévy flights, after the French mathematician Paul Lévy, who first worked with them in the decade prior to the Second World War. Mandelbrot, the geometer of fractals, was a student of Lévy. Both Lévy and Mandelbrot went into hiding after the French defeat in 1940, avoiding the Nazi and Vichy dragnet of French Jews.

After the war, they were also intellectual refugees from a certain style of mathematics that swept over the French academic world and had a strong influence elsewhere. Collectively named the Bourbaki school, it drove applied and "heuristic" mathematics to the margins of the field and favored a lean, abstract approach of theorem-proof, with no pictures, diagrams, or applications. (It was the same period that the artistic avant-garde moved strongly in the same direction: away from sense perception, toward "pure" abstraction.) The situation relaxed in the 1970s and 1980s, followed by a strong revival of interest in applied mathematics both among mathematicians and scientists and engineers who use mathematics. While rigor and precision are essential to mathematics, it can't survive or even make sense without contact with applied problems and the world of the senses, and the Bourbaki revolution petered out.

Using Lévy's results, Russian mathematicians Gnedenko and Kolmogorov proved a generalization of the Central Limit Theorem that allows for systematic statistical methods to be applied even in such Extremistan cases. But the resulting "distribution of distributions" is not Gaussian. If we want to study the statistics of events in a chaotic system, like the climate or financial markets, say, we must use these generalized methods pioneered by Lévy, not the 19th-century methods of binomials, Poisson, and Gauss. Like 20th-century artistic palettes and musical styles, it's a 20th-century statistics cookbook of expanded possibilities and greater generality. In the next posting, we'll meet a recent climate case where appropriate statistical methods were applied, with striking results, to a situation where wrong methods were long used.
* Usually, a > 1 in practice. If 0 < a < 1, then even the zeroth moment M(0), the total number in the population, is infinite. (The mean and variance are undefined as well.) Mathematicians can still cope with cases where some or all the moments diverge, by using something called the generating function of the probability distribution.

Labels: , , , , ,


Post a Comment

Links to this post:

Create a Link

<< Home