Overcoming the sequence paradox and the improbable origin of life

Dr Keith Farnsworth.

This is a short article introducing the sequence paradox and its importance in theories about the origin of life. It makes only superficial use of the relevant biochemistry and I recommend that those interested in more depth should read the very comprehensive 2014 review by Kepa Ruiz-Mirazo, Carlos Briones, and Andres de la Escosura: “Prebiotic Systems Chemistry: New Perspectives for the Origins of Life”. The principal concern here is with the information problem - if life is information processing, then how did sufficient information come to be gathered together to kick start the process? It is the ultimate ‘chicken and egg’ story.

It might be useful to read our 'tutorial' paper (PDF) in conjunction with this article: How much information does DNA instantiate?

What we are up against

The insulin molecule is a relatively simple protein of fifty amino acid units, that includes  twenty different types of amino acid. If these twenty types were randomly arranged in a string of fifty amino acids, the chance of this random arrangement being insulin is one in 20^50 (about 10^80), because the chance for the right amino acid being at any one location is 1/20; for two to be right entails this to be true and also another being in the right place, so is 1/20 x 1/20 = (1/20)^2, and so on. Even for this relatively small protein, the chance of it occurring at random is so cosmologically small that more than all the atoms in the universe would have to take part in the ‘trial and error’ for there to be a reasonable chance of even just one molecule appearing. This improbability is termed the “sequence paradox” (De Duve, 2002) and indeed is a major challenge to the development of life from pre-biotic chemistry. In practice, it is far worse. One single molecule is nothing like enough - real chemistry is not about single molecules, but rather, it emerges from populations of 10^6 or more. Anything less is lost in the diffusive ‘thermal noise’ of all the non-reacting molecules. So why do such molecules as insulin exist?

The superficial answer is that life has made them because it finds them useful. That is not much of an answer because life itself is made out of them, so they had to be around to make life before it was in a position to find them useful. Writing in “The Emergence of Life”, Peier Luisi (2006) puts it the other way round asking “why these proteins and not others”. He examines the extraordinarily discriminating filter, that selected just one in 10^100 (or more) of all the possible proteins, to be those that would take part in life. Whilst some thermodynamic and chemical contingency may take several orders of magnitude off this by pre-filtering (according to which proteins are practically possible), that comes no where near explaining the tiny fraction that actually appear in living systems today. Since filtering is reducing the range of possibility and that is the same as accruing information (in the Shannon sense), we can think of the paradox as one of finding a large amount of information (embodied in life’s ‘choice’ of proteins) and wonder where all that information came from.

A small clue, just as an aside for now, is to be found in the computers of the 1970s. I remember one which occupied an equipment rack two metres tall, and along its front was a row of toggle (on/off) switches. These were set to make a binary sequence that was a very small amount of information, but it was enough to provide an electronic instruction that combined with a permanently stored small piece of information could start the most basic elements of the operating system in software, which in turn unwrapped more complex and information rich algorithms and slowly the whole software of the operating system emerged. This was called boot-strapping (a reference to the apparent paradox of lifting oneself up by the bootstraps). It is a self-assembly of information structures and still referred to today as ‘booting up’. Could the beginning of life have followed that pattern?

Insulin is a highly developed molecule and part of a complicated and very sophisticated system found in modern mammals. Early life did not need such sophistication to work. We need to think, not about such molecules as insulin, but rather about proteins that do a very basic job just well enough to be viable, perhaps just roughly and not particularly efficiently: they would be the  equivalent of the first working engine parts. To solve the sequence paradox, our first job is to define the minimum requirements for a ‘working’ life-form: what must it do to live. The purpose of insulin is to take part in a system well beyond this minimum level of sophistication. Using an analogy, it is a component of the car’s engine management system (the elctro-mechanical complex that includes all those sensors that keep you returning to the garage), absent from early and basic engines. Given a minimum design of a working life-form, we can identify a system of molecular components that fulfil the requirements, at least to a minimum level of efficiency. These are the proteins that must be selected before life can make use of them. If (and that is a big ‘if’) it turns out that such proteins and the organisation (in space and time) needed to coordinate them into ‘living’ require an information carrying polymer (a nucleic acid) to produce them, then we must include this too. Finally, it seems significant that all life as we know it is contained in a porous bag - a membrane tegument which compartmentalises it.  That makes three likely prerequisites for life: the minimal set of proteins, the nucleic acid ‘memory’ and the enveloping tegument. These three have each given rise to their own school of thought in theories of the origin of life, each with an illustrious following: the ‘protein first’, the ‘RNA first’ and the ‘compartmentalists’ or ‘lipid first’. More generally, the protein first hypothesis proposes that metabolism emerged before explicit information storing in genetics. In all existing life, metabolism and genetics are mutually dependent, so it is hard to reconcile metabolism first and genetic first. Since they both seem to need each other, we are confronted with the ultimate ‘chicken and egg’ problem. It is one for which bootstrapping is surely the only solution. In contrast, the formation of teguments suitable for first life is quite easy to demonstrate using strictly abiotic processes and a good argument can be made for this being a prerequisite for both metabolism and genetics.

Minimal life

Mycoplasma genitalium, with a genome of 525 genes, is currently considered to be the simplest living cell, but as an obligatory parasite (causing nasty urinary and genital tract infections) is by definition not a representative of independent life. To be alive, most agree, there must be self-maintenance (which in turn requires metabolism), self-reproduction and evolvability (which may be little restriction in practice) (see synthesis page). Self-reproduction is needed for a practical reason as well as because it is a feature of all known life. The hypothetical first living thing, we shall see, must have been relatively complex and therefore unlikely, hence rare, so that it would soon be lost by natural degradation (this being more likely the more complex the structure). Self replication (chemists call auto-catalysis) produces a chain reaction in which the making of the first living thing quickly and greatly exceeds its decay rate. Chain reactions produce exponential growth; without that, complex living things would have disappeared long ago.

Since the life we are thinking of is embodied in molecular interactions, the molecules have to be kept sufficiently close to interact, not once, but reliably and repeatedly if necessary, so they need to be contained in a bag - hence the need for an encircling membrane. Accordingly, first life needs a way to make membranes and ideally, it would come from the contents of the fluid filled bag. In 1991, Schmidli et al. created the chemistry that could make a phospholipid bilayer out of pre-biotic molecules having the necessary enzymatic activity. These are still quite complex molecules and it is not clear that they would be available in sufficient quantity in any pre-life environment, but it was a start. There is of course more: the contents of the bag must be maintained at a working concentration, so given that the whole bag and its contents must self-replicate (e.g. by binary fission, just as modern bacteria do), the contents must be self-replicating too. Otherwise the primitive daughter ‘cells’ would either be half the size of their mother - quickly becoming too small to contain the necessary molecules - , or have half the concentration (as molecules are shared) until ‘death by dilution’ results. Several schemes for self-replication of particular components of hypothesised proto-cell contents have been suggested, but without the addition of an information polymer (such as RNA) and a ribosome to translate its sequence into protein, none of these come close to replication of a complete system (see review by Luisi 2006), though what Luisi calls “limping life” has been synthesised by inserting these components in a laboratory environment. The essential problem here is that the outcomes of pre-biotic, so thermodynamically controlled, chemical reactions are solely the result of reaction rates and energy gradients. Those abiotic processes just don’t seem up to the job.

RNA first ?

All the proteins of known existing life forms are made by RNA in the ribosome and their sequence of amino acids is determined by RNA. This inspires many to believe that life began with RNA finding a way to self-replicate (for this the RNA molecule must act as an enzyme as well: it would be a ribozyme). This is the RNA world hypothesis and strong support for it comes with the discovery that the crucial part of the ribosome that actually forms the peptide bonds (putting a protein together) is made from pure RNA (Nissen et al 2000) (a modern ribosome also includes bits of protein in its structure). In all existing life, RNA is needed to make the ribosomes and the proteic enzymes that construct DNA and is therefore essential in self-reproduction (Gilbert 1986). According to the RNA world hypothesis, all known life descended from an ancestral population of organisms that used RNA as their sole biopolymer, capable of both information storage and the enzymatic activities needed for replication. This was obviously prior to the emergence of DNA and protein, which were to become products of novel RNA functionality. Paul and Joyce (2002) experimentally created a self-replicating system based on a 61-nucleotide RNA molecule that catalyses its own replication and subsequently showed it could evolve into a system that grows exponentially by self-replication (Lincoln and Joyce 2009).  However, this immediately raised the question of where the first RNA came from and to answer this, the more radical RNA first hypothesis was offered. In it, RNA as a pure oligonucleotide capable of self-replication, arose spontaneously as one of a random population of polymers created by non-enzymatic reactions from a mixture of mononucleotides.

There are, unfortunately, some major problems with the RNA-first hypothesis. So much so that some experts in the search for the origins of life, dismiss RNA-first as resting on a fantasy (or a miracle, according to Luisi, 2006). The critical point is that the spontaneous emergence of RNA molecules from “chemical soup” with functional sequences sufficient to sustain self-replication is extremely unlikely, to say the least (Shapiro 2000). This problem is two-fold: in part we have the sequence paradox and on top, we have the chemical realities of RNA construction and maintenance under purely thermodynamic control (including that free polymers of RNA tend to break down (e.g. via hydrolysis) so they are unstable (Levy and Miller 1998). Recent advances, especially using clay minerals as catalysts (see below) help this situation, but for most commentators, not enough yet. On the other hand, in a faithful lab-simulation of interstellar conditions, Meirerhenrich (2004) reported that not only alpha-amino acids, but diamino acids formed spontaneously and these are the mononuclear building blocks of the peptide nucleic acid (PNA) backbone. This is simpler than the sugar-phosphate backbone for RNA or DNA. PNA is well known to be capable of carrying information in the same way as RNA (Egholm et al. 1993) and indeed has many practical applications in e.g. genetic engineering today. No surprise then, that these more accessible information polymers are speculated to have preceded RNA in the early development of life. It turns out that PNA is not the only viable alternative for this role, indeed a whole family of ‘pre-RNA’ information polymers have now been found, the mononucleotides of which, in some cases, were likely available in prebiotic soup (Eschenmoser, 2007) and have been found in meteorites (more information available in Ruiz-Mirazo et al., 2014).

Let us for the moment put aside the chemical problems, which remain a matter of speculation and consider the information constraints on the probability of a self-replicating macro-molecule appearing spontaneously. The information perspective has the obvious advantage of freeing us from the as yet unfathomed details of chemistry that must have embodied the first life. Showing that this perspective is not so abstract as to be irrelevant, Adami (2014) provides a clear explanation for the spontaneous emergence of self-replication among random information-polymers (be they RNA, PNA, TNA or peptide-based, or any other).

Adami (2014) made use of the fact that, at least in the case of amino acids forming proteins, the monomers of biologically relevant copolymers (i.e. made from more than one kind of monomer) do not occur with equal frequency (Dorn, Nealson and Adami 2011). Under pre-biotic conditions, peptides (strings of amino acids) form by adding single amino acid (monomers) to one or both ends according to the rules of thermodynamics and reaction kinetics, given a not-necessarily equal concentration of amino acid types in the pre-biotic soup. If each monomer type had an equal probability of attaching, then the resulting polymer would have monomer frequencies representative of the soup’s relative concentrations. In reality, some monomers are more likely to stick onto some others, giving a further bias. For example, we could start with equal amounts of monomers A and B, and find the reaction rates for attaching A on A (kAA), B on A (kAB), A on B (kBA) and B on B (kBB), then define the ratios rA = kAA / kAB and rB = kBB / kBA. If, for example, both rA and rB are much bigger than 1, then we get sequences of AAAAA…  which might occasionally get a B, which would then be followed by BBBBBB… (this example taken from Luisi 2006, p.60). With rA and rB closer to 1, but slightly different, and initial concentrations of A and B not quite equal, you can get subtle biases in what otherwise look like random sequences ABBABAABBBABBA etc. In his 2014 paper, Adami hypothetically generalised this to apply to any ‘informational’ copolymer (RNA, DNA, PNA, etc.). He calculated the entropy of the average position along the polymer string to see how likely it was to find a self-replicating molecule from random combinations of monomers, given certain assumptions about their probability of connecting onto the ends of growing strings. With a speculative assumption that the probability of any monomer appearing at any position in the molecule was close to that in the (biased) distribution of an abiotically formed one, he demonstrated a dramatic lessening of the sequence paradox. Effectively, Adami was taking account of a more realistic entropy in abiotic monomer sequences, substituting this for the standard uniformly random assumption. Adami’s 2014 prediction remains rather speculative and awaits testing with real chemistry.

The practical problem is that, as Briones et al. (2009) said “no RNA polymerase [RNA making catalytic] activity has been found among the short RNA sequences that can be generated by nonenzymatic random polymerization”. Without that, there is no prospect of self-replicating RNA polymers. They go on “ Indeed, a minimum size of about 165 nucleotides has been experimentally established for such a template-dependent RNA polymerase molecule (Johnston et al. 2001; Joyce 2004), a length three to four times that of the longest RNA oligomers obtained by random polymerization of activated ribonucleotides on clay mineral surfaces (Huang and Ferris 2003, 2006).” We might say ‘the information spirit is willing, but the chemical body is weak’.
Let us remind ourselves of just how unlikely a 165-nucleotide oligomer is to spontaneously arise in numbers that count as viable for chemistry: In the simplest (all things equal) case, we can expect one functional molecule in 4164, which is more than 1099 and of course we need to multiply this by a further 1010 or so to achieve viable chemistry. Following Adami’s more refined (2014) calculation, if the assumptions hold at least approximately, then up to a few tens of orders of magnitude can be taken off, but we are still left with a probability that amounts to saying - forget it.

Mineral surfaces
Graham Cairns-Smith realised that the inevitable imperfections of a developing crystal counted as information (compare to the ‘aperiodic crystal’ suggested by Shrödinger in his prophesy of nucleic acids) and could be the first source of  inheritable information to proto-life, which based on this, developed to the point of a “genetic takeover” (Cairns-Smith, 1982). Mineral surfaces, especially clays have proved to be effective catalysts of several important pre-biotic reactions, most notably in supporting polymerisation (indeed oligomerization) to form nucleic acid and protein macro-molecules. Enthusiasm for a ‘clay world’ hypothesis in which life began on (or in amongst) clay particles has waxed and waned, but it seems likely that clays provided some support. Of particular interest here is the discovery by James Ferris and co-workers (e.g. 2006) that montmorillonite clay helps to form RNA-like chains of up to twenty nucleotides (Huang and Ferris 2006), in the absence of a template, but in a way that seems biologically relevant. This is especially because differential binding affinities between the clay and monomers leads to biased sequences - a selection process that may lend a little support to the calculations of Christoph Adami. Not only that, but these montmorillonite clays are now known to catalyse the formation of vesicles (see below) from simple fatty acids, which would no doubt be abundant in a pre-biotic soup. Thus contact between the soup and clays might have provided two of the supposed fundamental ingredients for life to begin and it is also worth commenting that such clays are thought to have also been abundant in this early environment (Ferris, 2006).

It’s the shape that matters
Next, we should consider another easing of the sequence paradox, which is that the catalytic activity of an oligomer is not a direct result of the sequence of monomers, but rather the way that sequence creates its secondary structure: the physical shape of the molecule brought about by folding, which happens spontaneously under thermodynamic control. Briones et al. (2009) refer to this as a molecular phenotype and point out that many possible sequences may produce the same phenotype, so reducing the improbability of finding the ‘right one’ from random generation. They go on to calculate the secondary structure of large pools of random single-stranded RNA sequences with lengths 15, 25, 35, and 40 nucleotides - the shorter of them being the sort of thing we may expect to appear from the clay catalyst. As the length increases, unfolded strands become less common, whilst stem-loop and hairpin shapes become more common. Now things get really interesting because, quoting Briones et al. (2009) “Certain hairpin-like structures are endowed with ribozyme activity able to catalyze RNA cleavage/ligase reactions in a reversible way, as originally described in plant virus satellite RNAs (Buzayan et al. 1986; Hampel and Tritz 1989)”. In other words, these little hairpins are able to stitch bits of RNA together. There is a lot of detail to cover in moving from this observation to actually having functional units composed of little bits of folded single strand RNA (and these details are discussed by Briones et al. (2009), but the basis for RNA bootstrapping has been established, at least in principle. As pieces are stitched together, the resulting shapes become more complex and the repertoire of enzymatic activities increases. The goal of researchers in this field is to find a sequence of abiotic thermodynamically driven events that lead to a ribozyme with sufficient likelihood to become  plausible. The system described by Briones et al. (2009) gives us a technically feasible way, but there are many stages of rather low probability standing in the way of a journey from less than 20 nucleotide sequences formed in conjunction with montmorillonite clays to the first true population of ribozymes.

An information perspective
Despite (or perhaps because of) all these chemical obstacles, some physicists have tried to get underneath the origin of life problem, to seek an account that supervenes the chemical limitations. In “The algorithmic origins of life”, Sara Walker and Paul Davies (2013) wrote:

An implicit assumption of these traditional [biochemical] approaches has been that, while information may be manifested in particular chemical structures (digital or analog), it has no autonomy. As such, information – though widely acknowledged as a key hallmark of life – thus far, has played only a passive role in studies of life’s emergence.

Walker and Davies emphasise that information for life is distributed, so a singular focus on informational polymers misses some of the essential features of what separates the living from non-living. They also point out that information can only be functional given a specific context: that it is a relational property, rather than an inherent characteristic of a molecule or more complex system. Crucially for them: “biological information is distinctive because it possesses a type of causal efficacy - it is the information that determines the current state and hence the dynamics (and therefore also the future state(s)) of the biological system. The point of biological information is that it is functional, meaning that it has causal power, especially in regulation and development. This point is underpinned by the quasi-philosophical theory of informational causation, developed in Auletta et al. (2008) and elaborated in Walker et al. (2012); Walker (2014) and Farnsworth et al. (2016). It is very difficult to discern, so only just dawning on us now, that perhaps the unique attribute of life is the phenomenon of downward causation (Campbell, 1974; Ellis, 2011). Walker and colleagues suggest that the origin of life coincides (or is even defined by) a transition from purely upward causation (the assumption behind all reductive science in which all phenomena are rooted in and strictly determined by sub-atomic interactions) to systems in which downward causation also plays a part. They explain how this information perspective may unify the genetic and metabolic theories of origin: “where genetics may be thought of in terms of digital information processing and metabolism roughly as a form of analog information processing” (Walker 2014). From the information perspective, the genes first versus metabolism first camps of origin of life theorists can be interpreted as digital verses analogue information processing. They are not the first to point out that early ribozomes are an inevitable compromise between the digital task of efficiently storing and replicating information and the analogue chemical task of being an enzyme. But they go on to identify the physical limit to information capacity that this compromise entails. These physical limits are circumvented in modern cells by specialisation:  DNA specialises in the digital role, biochemical pathways perform analogue processing and RNA acts (in part) as a sort of digital-analogue converter.

This digital / analogue contrast is only a (relatively) superficial feature of the most important insight given us by Walker and her co-workers. They realised that the special role of digital information in a living cell was to carry an almost infinitely variable set of ‘program’ instructions that was independent of its substrate. The information polymers separate the physical from the informational, freeing abstract data from its physical constraints. This is because sequences of bases, encoding information, can be created with approximately equal ease irrespective of which base follows which. This critical freeing of information from its physical bounds allows for what Walker (2014) calls non-trivial replication. The difference between trivial and non-trivial replication is effectively the difference between the program followed by your washing machine (inflexible and bound by its hardware, however complicated) and the flexible capabilities of your desktop computer (it can do almost anything with information). More precisely, it is the difference between a computer that uses read-only memory (ROM) and one with random access memory, which can not only be accessed in any order, but deleted, rewritten and even written in response to current states, changing the program as it goes. It is this non-triviality that enabled life to arise from networks of chemistry. For Sara Walker and her colleagues, life began when this transition was accomplished.

Significantly, since we are considering the sequence paradox here, non-triviality is a feature of replicators that is almost independent of their possibility number (e.g. the sequence length of an information polymer).  I say almost, because practically a minimum complexity (entropy or sequence length) must be reached before non-trivial replication can be instantiated. If this limit is reasonably small, then the range of possible systems that may be constructed by the information instantiated in a plausibly long oligonucleotide may be sufficient to create life. For this reason, it is now important to quantify the minimum complexity capable of supporting non-trivial replication by transcending the physical constraints on information embodiment. Nobody has done that yet, though.

Up to this point, we thought of complexity as a mere consequence of sequence length: the bigger the oligonucleotide, the larger its possibility number and the more functional complexity it can produce. In that explanation, there is a locking together of the physical (sequence length) and the range of possible systems that can be created. That is an old idea, originating with von Neumann’s theory of self-reproducing automata (1966) and interpreted by Szathmáry and Maynard Smith (1997) into a classification of replicators according to whether they had effectively unlimited heredity or not. Any replicating system of sufficient possibility number (e.g. a sequence of more than about 50 elements) is effectively unlimited in possibility and that is what we thought we were looking for and why we ran into the sequence paradox. We were thinking of trivial replication and for that to approach the complexity needed for minimum life, it indeed must exceed the limit of plausibility. The intriguing thought now introduced by Walker and co-workers, is that the necessary complexity (and consequent variety of components) might be encoded by a polymer small enough to be plausible, if only it could be big enough to achieve non-triviality. That would be the ‘Goldilocks zone’ of information embodiment.

The question of how this might be achieved is advanced by realising that (by definition) a non-trivial replicator must be programmable with the instructions needed to construct itself. We can imagine this taken in stages: first a very simple program (embodied in a small biochemical system) that writes a slightly more complicated one in another biochemical structure. This next then writes a yet more complex one and so on until the apparatus of a minimal life form is created. Such a process would be bootstrapping, but how is it possible, given that it seems to create new information out of nothing at each stage? Well, we know by now that one way of getting new information (apparently out of nothing) is to allow component parts to find a particular configuration relative to one another, out of all possible configurations, that is functional. In this case functional has a very particular meaning: it is that a process becomes possible at a higher ontological level than that of the component parts. In this we see another feature that is probably quintessential to life: the emergence of higher ontological levels from the interaction of components. The program directing non-trivial replication is to be found embodied not only within molecules, but also in the interactions among them. The program, as an abstract (virtual) information structure, transcends the physical reality of the chemicals that, at the lower ontological level, give rise to it. The program is therefore an example of a transcendent complex (the conceptual information structure which I introduced to understand downward causation (Farnsworth et al. 2016)). An autocatalytic set of biochemical reactions, if it generated an abstract program of instructions for replication at a higher ontological level, would be a non-trivial replicator and could be made form a set of plausibly small components. The question is: can this abstract idea be translated into possible chemistry?  To-date, one candidate unambiguously presents itself.

Lipid first
Returning to chemistry, we think it is possible that proto-life gradually grew from meeting its most basic structural requirements. Membranes are a feature of all known life and broadly considered essential for its existence.  Compounds that form membranes have been found in all the same places as other pre-biotic organics as well as in sea water. They were therefore very likely available for forming pre-biotic vesicles, lending support to the compartmentalist theories of life’s origin. Vesicles are membranes folded in three dimensions to form an enveloping sphere (the surface of a sphere) and, crucially, they form spontaneously following a thermodynamic gradient to increased entropy (which is a paradox for some who mistakenly think that more entropy always means less organisation). Molecules rich in oxygen and nitrogen atoms are usually most thermodynamically comfortable (low energy) when surrounded by water molecules (hydrophilic), whilst those rich in carbon tend to be repelled by water (hydrophobic) and form clumps to avoid it (e.g. oil in water). When the two are connected end-to-end in a molecule, the result is an amphiphilic molecule: one end trying to surround itself with water, the other trying to get away from it. This results in groups of such molecules self-organising into simple structures in water - sheets (like soap bubbles), rods, tubes and spheres (which are the vesicles). The membrane of a vesicle consists of two parallel sheets of amphiphilic molecules with their hydrophobic tails as a ‘sandwich filling’ on the inside and their hydrophilic heads on the outside: most notably a lipid bi-layer. This is the basic design of all biological membranes. Not only that, but under abiotic thermodynamic regulation, these also incorporate ionic (hydrophilic) molecules on their surfaces and other, even very large and complicated molecules with hydrophobic regions into their hydrophobic filling layer. The vesicles can also trap all kinds of molecules in their enveloped interior and the whole thing can end up looking uncannily like a prokaryotic cell. Going even further, Doron Lancet of the Weizmann Institute in Israel, and his then students, Barak Shenhav and Daniel Segre developed a model of primitive protocells (they refer to this as meso-biotic since it is intermediate between pre-biotic and true life). This they called the Graded Autocatalysis Replication Domain (GARD) (2001).  They agree that transmission of information from one generation to the next is essential for life, but look for inheritance in the absence of biopolymers, arguing that improbably large molecules such as functioning RNA are not necessary.  Instead, they suggest, the more likely rout was via monomers that form a system by association, in turn storing information distributed among them.

The GARD system was probably inspired by Eigen and Schuster’s (1971; 1979) self-replicating dynamics system, conceived during the 1970s with information oligomers in mind. This system represents populations of self-replicating molecules (auto-catalytic sets) following the mathematical rules that are well established for the dynamics of organism populations, but with the addition of a term representing mutation (without selection). The theory is presented and elaborated by Kuppers (1983), using the term ‘quasi-species’ for a stable state in which assemblies of populations of self-replicating molecules persists (all other combinations being transitory). Using the model (borrowed from ecology) of interacting populations represented by an interaction matrix (in this case with autocatalysis rates on the diagonal and mutation rates off diagonal), quasi-species could be found from the Eigen system of the matrix. Jain and Krishna (1998) showed that the quasi-species was the mixture of oligomers having the maximum reproduction rate. Note the term reproduction, rather than self-replication, since mutation implies that copies are not exact: it is not a single oligomer that is reproducing, but rather a particular autocatalytic set of oligomers that ‘accidentally’ transform into one another at fixed rates, these being a characteristic of the quasi-species. But if we start with an oligomer of more than a few monomers long, the range of possible mutants is very large and may be effectively unbounded. If the mutation rate is high, then the autocatalytic set would disperse into this range and loose itself by something like dilution. Too many mutants would be unable to contribute to autocatalysis (which is presumably rare) and the whole set would fade away. A slow mutation rate preserves the autocatalytic oligomer, so its vital information is not lost, but allows the assembly to explore the sequence space and gradually to optimise its fitness. There is a fine balance.

More or less stable mixtures composed of populations of different molecules are interpreted as individuals: molecular assemblies which share the carrying of Shannon information via their form - in aggregate. If such an assembly has the function of maintaining its composition through the process of ’budding’ (division of proto-cells), then the functional assembly is said to be a ‘composome’. More formally, the composome is a compositional state (describing the molecules and their concentrations), capable of homeostatic growth and thereby information preserving division (during self-replication of their host proto-cell). What this describes is formation of a transient complex at the level of interaction among molecules of the assembly. The particular transient complex instantiates a program for self-replication: it is abstract, exists at a higher ontological level than its component molecules and is preserved during its functioning as a self-replicator, in short it is a non-trivial replicator (and without a string of RNA to be seen). As far as I know, the GARD team have not conversed with Walker and colleagues yet. If they do, perhaps the interaction will creating something amazing.


Adami, C. Information theory in molecular biology. Phys. Life Rev. 2004, 1, 3–22.

Auletta, G.; Ellis, G.; Jaeger, L. (2008) Top-down causation by information control: From a philosophical problem to a scientific research programme. J. R. Soc. Interface, 5, 1159–1172.

Acharya Chackrabarti & Chakrabarti.  (2004).  In: Seckbach et al. (eds.). Life in the Universe. Kluwer Academic. Netherlands. pp. 195-199.

Buzayan, J.M., Gerlach, W.L., and Bruening, G. (1986). Nonenzymatic cleavage and ligation of RNAs complementary to a plant virus satellite RNA. Nature 323: 349–353.

Campbell, D. (1974). Downward causation in hierarchically organised biological systems. In Studies in the philosophy of biology: Reduction and related problems; Ayala, F.J., Dobzhansky, T. (eds). Macmillan: London, UK. pp. 179–186.

De Duve, C. (2002). Life Evolving: Molecules, Mind and Meaning. Oxford University Press.

Dorn, E.D., Nealson, K.H. and Adami, C. (2011). Monomer abundance distribution patterns as a universal biosignature: examples from terrestrial and digital life. J Mol Evol., 72: 283–95.

Egholm et al. (1993). PNA hybridizes to complementary oligonucleotides obeying Watson-Crick hydrogen-bonding rules. Nature 365:566-568.

Eigen, M. (1971). Self-organization of matter and the evolution of biological macromolecules. Naturwissenschaften, 58, 465-523.

Eigen, M. and Schuster, P. (1979). The Hypercycle. Springer Verlag, Berlin.

Ellis, G.F.R. (2011). Top-down causation and emergence: Some comments on mechanisms. Interface Focus, 2, 126–140.

Eschenmoser, A. (2007). The search for the chemistry of life’s origin. Tetrahedron, 63:12821 – 12843.

Farnsworth, K.D., Ellis, G.R.F., Jaeger, L. (2016). Living through downward causation: from molecules to ecosystems, in  S.I. Walker, G.F.R. Ellis and P.C.W. Davies (eds) From Matter to Life: Information and Causality. In press Cambridge University Press.

Ferris J.P. (2006). Montmorillonite-catalysed formation of RNA oligomers: The possible role of catalysis in the origins of life. Philos. Trans. R. Soc. Lond. B Biol. Sci. 361:1777–1786

Gilbert, W. Origin of life: The RNA world. (1986). Nature, 319:618.

Hampel, A. and Tritz, R. 1989. RNA catalytic properties of the minimum (--)sTRSV sequence. Biochemistry 28: 4929–4933.

Huang, W. and Ferris, J.P. (2006). One-step, regioselective synthesis of up to 50-mers of RNA oligomers by montmorillonite catalysis. J. Am. Chem. Soc. 128: 8914–8919.

Jain, S. and Krishna, S. (1998). Autocatalytic sets and the growth of complexity in an evolutionary model. Phys. Rev. Lett., 81, 5684-5687.

Kuppers, B.O. (1983). Molecular theory of evolution. Springer-Verlag, Berlin-Heidelberg.

Lincoln, T.A. and Joyce, G.F. (2009). Self-sustained replication of an RNA enzyme. Science 323: 1229–1232.

Luisi, P. L. (2006). The Emergence of Life. Cambridge University Press.

Levy. M. and Miller, S.L. (1998). The stability of the RNA bases: Implications for the origin of life. Proc. Natl. Acad. Sci. USA, 95(14):7933–7938.

Nissen, P. et al. (2000). The structural basis of ribosome activity in peptide bond synthesis. Science. 289, 920-930.

Ruiz-Mirazo, K., C. Briones, and A. de la Escosura. (2014). Prebiotic Systems Chemistry: New Perspectives for the Origins of Life. Chem. Rev. 114:285-366.

Shapiro, R. (2000). A replicator was not involved in the origin of life.  Life, 49:173–176.

Trigo-Rodríguez, Llorca & Oró, (2004). Chemical Abundances of cometary metioroids from meteor spectroscopy. In: Seckbach et al. (eds.). Life in the Universe. Kluwer Academic. Netherlands. pp. 201-204.

Walker, S.I. (2014). Top-Down Causation and the Rise of Information in the Emergence of Life. Information. 5, 424-439; doi:10.3390/info5030424

Walker, S.I., L. Cisneros & P. Davies. (2012). Evolutionary transitions and top-down causation. Proceedings of Artificial Life XIII (pp. 283–290). MIT Press. Cambridge.

Walker, S. I., and P. C. W. Davies. (2013). The algorithmic origins of life. J. R. Soc. Interface 10(79).

Hosted by Queen's University School of Biological Sciences