Measuring Biodiversity
Imagine a bag of objects which you can
take out one by one and place on the table: what you take out will be a
sample of the objects in the bag. Let’s say they differ in colour,
shape and size. So your first object might be a large round blue ball
and the next might be a small green cube, etc. The sample will contain
a set of objects that can differ in three different ways: each of these
is an axis of diversity. If the sample contained just two objects, then
there can be at most three different ways in which its component parts
differ (one for each pair-wise comparison, for each category of
difference).
How
much information is needed to describe the two-object sample? The
answer is three bits (one for each difference). The information content
of the sample (in these terms only) is identical, it is three bits.
This information content escalates rapidly as we take a larger sample.
Suppose there are at least three colours, at least three shapes and at
least three sizes. Then with a sample of 3 objects there can be as many
as nine differences among the objects, and as few as none (if all the
objects extracted from the bag are the same). The probability of the
number of differences in the sample depends on the number of categories
defined for each dimension of diversity: this is termed the number of
levels. Notice now that we are describing a situation that is
mathematically equivalent to the one used to explain thermodynamic
entropy in terms of the number of ways of arranging the state of the
system - termed the multiplicity - which is expressed as via a
probability distribution.
More
straight forwardly, we can think about what the different dimensions of
biodiversity are and what levels should apply to them. The simplest
approach is to describe what is physically present at some definite
scale of organisation. For example, among n organisms there are (n^2
-n)/ 2 differences at the organism level (comparing each possible
unique pairing). If we cluster the organisms into their taxonomic
classes and find three distinct classes, then we find 3 differences
among them (A-B; A-C; B-C). On the other hand, if we look at the more
fundamental genetic scale, with say 12 organisms, we may for example
find all the organisms share 50% of genes in common (so no difference
there), and, to keep it simple, let the remaining half of genes all be
unique to their organisms. With each organism having a genome of 2000
genes (it’s just an illustration), that means there are 12000 unique
genes in the assembly of organisms, so there are 71994000 differences
within the total genetic pool. Now, how much information is there in k
differences? The answer is well explained by a short story, found in
Zernike (1972) and a little updated
here.