DNA – The Only Way?

I’ve been struggling to write a post about the new TV show ‘Pluribus’ but it’s actually huge and therefore hard to talk about in a single piece, so for now, and possibly instead of that, I’ll just be talking about a scientific point it raises and it isn’t really about the series. There’ll be spoilers for about the first ten minutes of the first episode and then I’ll be moving off the subject. Here goes.

At the start of the first episode of ‘Pluribus’, astronomers detect a signal from the TRAPPIST-1 system around 600 light years away in the constellation of Cygnus. It repeats every seventy-eight seconds and consists of a series of four types of signal which they quickly realise represent the four bases of RNA, cytosine, guanine, adenine and uracil. This makes sense, in a way, as RNA is used to send messages from DNA for transcription into proteins, and it’s doing the same job here. This made me wonder a couple of things. How did they know it was uracil and not thymine, and RNA but not DNA? Also, does this mean that RNA and DNA are universal codes for genetic information, everywhere there’s life, or is it individually customised for different recipients, in which case how did they know terrestrial life used that code? Seems like insider information is involved somewhere.

So, crystallising that thought, this is the situation. All known life here on Earth uses one of two complex types of molecules, deoxyribonucleic acid and ribonucleic acid, DNA and RNA. At this point I’m stuck because I have no idea how much is common knowledge. If I get this wrong, I’m going to lose a lot of people. So I’m going to assume that everyone knows the remarkable general double helix with rungs structure of DNA, how its coils are themselves in coils so that it’s packed together very closely most of the time, that most of it doesn’t carry genetic information but has other functions related to it, that it has sides made of alternating sugar molecules and phosphate groups and four types of bases which link up in specific pairs, adenine with thymine and cytosine with guanine. RNA is generated from DNA and has a different, simpler structure, again with a sugar molecule alternating with phosphate groups and again four bases, except that instead of thymine it has a base called uracil. RNA is used to transfer information to ribosomes, which are like playback heads except that instead of sound they produce proteins, one amino acid at a time. Although most species of animal, plant and other organisms use DNA to store their genes, many viruses use RNA instead. RNA is less stable than DNA, so for example whereas animal or plant remains from many millennia in the past can have their DNA information extracted in a form increasingly corrupted with their age, RNA is not the same and doesn’t last long.

This is important. Please tell me if I’m assuming too much and if I’m not writing clearly. I really struggle with brevity, clarity and trying to work out what people do and don’t know about things, and one way of addressing this might be to get some feedback. In a sense, this entire blog post is a test of my ability to communicate clearly and well at least as much as it is about DNA.

So, I have questions, some of which I know some of the answers to but most I don’t. DNA can be considered to have the following components: deoxyribose, phosphate, adenine, thymine, guanine and cytosine. RNA has ribose instead of deoxyribose and uracil instead of thymine. The question is, are any or all of these essential for any molecule carrying genetic information within an organic life form, or are there other possibilities? How rigidly restrained is this aspect of biochemistry? This could be framed as a question about alien life but in fact it’s as relevant to biochemistry as it’s actually known to be on this planet as it is to that possibility.

First of all, the bases. There are two types of these: purines and pyrimidines. Purines have two rings in their molecule and pyrimidines only one. I remember this by thinking that the long name describes the short molecules and vice versa. Purines include some other familiar compounds including caffeine and the related stimulants often found with it. A particularly prominent purine is guanine, which forms the reflective layer at the back of many vertebrate retinae such as dogs and owls and increases their visual sensitivity in low-light conditions, and also the white cross on the back of garden spiders. They tend to be broken down into uric acid, so a diet high in DNA can contribute to gout and kidney stones and also conditions involving a high turnover of DNA such as leukaemia can also have these effects. Pyrimidines strike me as more obscure. Vitamin B1, thiamine, is a pyrimidine, as the name of thymine suggests, but as I understand it, although they’re widespread most of them are not well-known. However, similar pyrimidines to the ones found in nucleic acids are used as anti-cancer and anti-viral drugs.

Hence we have a system with four bases of particular kinds which can pair up with each other and consecutive groups of three bases are known as codons, each encoding for a particular amino acid, which are the blocks of proteins, as well as acting as “punctuation” such as full stops marking the end of a protein synthesis sequence. That’s sixty-four possibilities. However, since other bases can exist, it’s hypothetically feasible that these data can be stored more densely and efficiently. In particular it seems odd that uracil occurs in RNA but not DNA, but the reason for this is that it’s less stable and therefore can’t reliably encode for a long period of time, so it’s not so much that it’s used in RNA as that it isn’t used in DNA, and maybe at some stage it was but wasn’t selected for. This, then, is the first identifiable factor in the structure of DNA which determines its nature. I think there are probably at least four more usable bases, and this would double their data density. What it might not do, however, is enable evolution, as it might be that these bases are less amenable to mutation. For all I know, the first life forms in our lineage may have had different bases but couldn’t evolve as fast and therefore wasn’t able to compete with other organisms and aren’t our ancestors, even though there was nothing wrong with the basis of their genomes.

The next issue is sugar. Two sugars are involved and give their initials to the first letter of DNA and RNA. They’re pentoses, like fructose, rather than hexoses like dextrose or disaccharides like sucrose. Again, the explanation for the difference is durability and stability. The hydroxyl group on the second carbon which is absent on the deoxyribose molecule means it’s more stable than ribose and less likely to be altered by water. The presence of this hydroxyl group on the ribose molecule makes it easier to break down, ensuring that protein synthesis stops when it needs to. However, three- and four-carbon sugars could form the basis of the backbone instead of ribose or deoxyribose. Any more than five carbons stops double helix working: it gets in the way of the shape, making packing into the coils and supercoils unfeasible, and also makes it more reactive and also encourages branching. The double helix arrangement isn’t just pretty. It makes it possible to pack it into a small space, such as in chromosomes. It is possible for hexose nucleic acids to form but they don’t become double helices. Fructose is of course another pentose but the position on its molecule at which nitrogens from the bases can form are in the wrong place and the arrangement would be too crowded. Inulin, which is the daisy family’s alternative to starch which tastes like Jerusalem artichokes because those are in that family too, and sucrose itself both contain fructose but it’s not used in nucleic acids for this reason. It’s also thought that the processes which led to living processes preferred pentoses over other types of sugar, so life built on what was available.

That leaves the phosphate groups. These keep the molecule regular in shape and enable the DNA to bind to histones, which are the proteins making up much of the chromosomes around which it winds. Obviously this doesn’t apply to RNA because it isn’t wound round anything. Actually, it doesn’t apply to prokaryotic organisms such as bacteria either because they don’t have histones, but they do have nucleoid-associated proteins which do similar jobs. Bacterial DNA is in loops called plasmids. Plastids (not plasmids) have less DNA than free-living prokaryotes because many of their genes have been transferred to the nucleus.

Surprisingly, phosphate groups are not essential to the structure of nucleic acids and are in fact weaker than other options. For instance, glycine, the simplest, and the only non-chiral amino acid, can bond the sugar molecules together. Amide bonds are an option. There are also some different arrangements with phosphorus itself. These stronger bonds, though, can’t cross membranes as easily. Now I’ve previously mentioned how phosphorus may be the dog in the manger which explains the Fermi Paradox, but this is clearly not to do with DNA or RNA as it’s entirely feasible for an adequate alternative to DNA to exist without phosphorus, but with glycolysis and the Krebs Cycle where so far as I can tell it really cannot be replaced. This does however open up the possibility of life existing in the Universe in places with rather less phosphorus than this solar system. Incidentally, a decade or so ago organisms were found in a lake which were thought to be able to substitute arsenic for phosphorus in their DNA, but it turned out they were just really good at finding phosphorus.

It does seem, then, that fairly dramatically different but still perfectly functional analogues to DNA and RNA could exist, and even that they might be more likely than those two to form in an environment with less phosphorus. Getting back to ‘Pluribus’, it’s exceedingly unlikely that it’s the kind of series for this to matter. It’s known that there’s a gene for the receptor which detects the odour of Convallaria majalis in the genome received, which is lily of the valley, and this is probably a throwaway reference to that storyline in ‘Breaking Bad’, and this receptor is also found in sperm cells and attracts them towards the ovum, although it’s thought nowadays that the ovum chooses the sperm rather than the other way around. But it leads to two organisms joining. I very much doubt whether any of this matters to the show. However, it is possible to push this further for the sheer scienciness of it all. Yeah, science!

OK, so here are two alternate scenarios regarding the origin of life on Earth. One is that life as we know it originated somewhere in the Universe before the birth of the solar system and spread through the Galaxy, including this solar system. The other is that life arose many times, in this solar system and elsewhere. In the first scenario, for which there’s actually quite a bit of evidence, it’s feasible for many worlds to have life with identical biochemistry, since all of it would have the same ancestry. In such a situation, the transmission of the RNA from TRAPPIST-1 makes sense and isn’t customised for life here, at least as far as genetic code is concerned. However, the fact that it uses the code for this receptor would seem to mean a remarkable degree of convergent evolution, the presence of the gene in the last universal common ancestor with the life in that system or detailed knowledge about life here. Another is that there are various different ways of storing and transferring genetic information, in which case it’s a mild coincidence that the signal happens to be RNA base-pairs. Given what I’ve suggested here, there seems to be no particular reason why the chemical basis of the genome should be the same. There are more complex possibilities, such as there being various different independent empires of life throughout the Galaxy, and this one happens to be the same as ours.

All of this is most unlikely to have much to do with the plot of the series. I don’t know how ‘The Walking Dead’ ended but there was initially speculation about the origin and a possible cure for the Wildfire virus, but later on it seemed to become clear that these questions were irrelevant to the story. If this later changed, to my mind this would detract from the quality of the series. Whether the same is true of the ‘Pluribus’ virus remains to be seen but it doesn’t feel like treating it as a central mystery would add to the quality of the series, which is currently very high indeed of course because it’s Vince Gilligan. What’s occupying everyone’s minds right now, just after episode 5, ‘Got Milk’, is of course whether “Soylent Green is people”.