## How Information content is calculated from uncertainty

Twenty Questions
The basic idea of information is easy to get.
A lot of people explain information through the parlour game "20 questions". The idea is to guess what object a person is thinking of by asking a series of up to 20 questions with strictly only yes / no answers. The point of this is that each answer (being strictly binary) gives exactly 1 bit of information. When the questioner guesses correctly after n≤20 questions, they will have received n bits of information.
Of course that is rather artificial and limited.

-------------

Here we explain, with an illustration, how information can be quantified in way that is a little more flexible.
What follows is an explanation based on a short story, found in Zernike (1972) [see our library page] and a little updated.

Here goes...
P1 after the first text; P2 after the second and P3 after the third. Using logs in base 2 (we'll explain in a moment):

for the second two messages

I2 = K (log2 P1 - log2 P2) = K (-5.98 - -3.17)

I3 = K (log2 P2 - log2 P3) = K (-3.17 - 0) (note P3 is 1, so its log is zero).

We used base 2 for the logs, because this naturally gives units in binary bits of information.
Recall that a binary bit [1or 0] is the smallest possible difference and therefore smallest possible unit of data and by implication, of information. There are just two possibilities in a single bit, with probability of each being 0.5 and you don’t know which it will be until you ‘receive’ the bit of information, so the bit changes your uncertainty by K (log2 0.5 - log2 1) = K( -1 - 0)  = K. Now it is sensible, since this is the smallest unit of information, to define it equal to 1 bit. Thus K has the (convenient) value of -1 exactly, if we use log in base 2.

We can now work out the information content of the second two text messages:

I2 = 2.8 bits and I3 = 3.2 bits (roughly) and the total information of both messages was about 6 bits. Indeed if your friend had told you Wednesday at 11am in the second message, then that would have been worth about 6 bits and -( log2 1/63 - log2 1) is indeed about 6.

At this point it is important to notice that we are dealing with the semantic information content of the messages, not their syntactic information content (which ignores their meaning and just considers the probability of a sequence of letters).

One last thing, the first text message said that your friend would come sometime next week and we have not yet accounted for that information. This is because it is a bit difficult to specify: on the strict interpretation of the evidence and nothing but the evidence provided, you had no idea at all when, or even whether she would arrive, so the uncertainty was technically infinite. This is where the numerical calculations rather break down - they only work with finite possibilities. It would be easy to work out if you knew she would be coming sometime this year.