Information Theory in Molecular Biology


Perhaps the first and most advanced application of information theory in biology concerns the molecular basis of genetics and, by implication, of reproduction and evolution by natural selection. This is not surprising, since it is in the genetic code that we find the most obvious connection between biology and information. The very word ‘code’ implies an intuitive understanding that a DNA sequence constitutes stored information and its transcription or replication is an information transfer, or communication.

As well as that, It is now appreciated that many of the cell’s processes constitute communication within and among cells, via cell signalling. Molecular messengers carry information about the state of the environment, both internal and external and they communicate this with receptor molecules that in turn trigger cascades of molecular-based responses in the cell. This cell signalling enables coordination among bacteria (quorum sensing) and within a multi-cellular organism, indeed without it multicellularity would be impossible. It also enables a cell to adapt to its environment and to manage its own homeostasis and the coordination of complex processes such as reproduction, where several different structures have to act in the right way at the right moment.

Just as the computer on your desk has a power supply and a lot of information processing hardware, the cell has two basic processing systems - one for supplying energy (the power supply) and the other for processing information, whether that be the routine management of homeostasis, responding to changes in the external environment and communications from other cells, or the translation of information from DNA into functional forms as proteins or whole-scale reproduction.


Shannon’s definition of ‘information’ as a decrease in the uncertainty of a receiver has enabled quantitative analysis of biomolecular systems, using concepts such as ‘mutual information'* and ‘channel capacity’*. These aspects of information theory have allowed the development of a straightforward and practical method of measuring information in genetic control systems. This enables us to answer questions such as: How do genetic systems gain information by evolutionary processes?
Tom's paper here explains and uses the method to observe information gain in the binding sites for an artificial protein in a computer simulation of evolution (there is a list of Tom's paper's here). The simulation begins with zero information and, as in naturally occurring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome. The transition is rapid, demonstrating that information gain can occur by punctuated equilibrium.

* Definitions available in the Glossary for Bio-molecular Information Theory
   by Tom Schneider and Karen Lewis


Molecules build the patterns necessary for life

One of the core principles of our particular understanding of life is that it is constructed from a nested hierarchy of informational structures (patterns), each level creating the next above by self-assembly (see here). The lowest level of relevance to biology is that of molecules, but understanding how a collection of different chemical 'species' can eventually lead to a living organims requires an appreciation of how physical forces, combined with quantum rules, create the molecules and how these assemble into supra-molecular structures and these in turn form functional complexes. The mechanisms responsible for self assembly are considered on this page.

We strongly recommend this book and its wonderful illustrations as an aid to appreciating the molecular machinery of the living cell. In fact, the IFB project hopes to engage its author David Goodsell in future developments.


You can now read our first 'tutorial' paper here: How much information does DNA instantiate?




This Theme aims to:

The Theme is led by Tom Schneider