More Recent Comments

Wednesday, March 21, 2007

How RNA Polymerase Works: The Chemical Reaction

 
During the elongation step in transcription, the transcription complex consisting of RNA polymerase plus various elongation factors moves along the double-stranded DNA copying the template strand to produce a single-stranded RNA. In the example shown below the RNA product is mRNA and the enzyme would be RNA polymerase II in eukaryotes.


The transcription bubble spans about 20 nucleotides of DNA. This corresponds to the opening of two turns of the double helix. During transcription, a transient DNA:RNA double helix forms and this is sufficient to form one turn of the hybrid helix. As the complex moves down the gene from left to right, ribonucleotides are added one at a time to the growing 3′ end of the RNA. This is positioned in the active site of RNA polymerase.

The known structures of the bacterial and eukaryotic RNA polymerases have allowed workers to decipher the details of transcription in a way the wasn't possible before these structures were available. I'm going to describe the steps by which this amazing molecular machine adds nucleotides to make RNA.

Let's begin by looking at the chemical reaction. An incoming ribonucleoside triphosphate (blue) aligns with the DNA template strand to form a base pair. G pairs with C and A pairs with T/U. In the figure we use generic bases B and B′ to represent the real bases. For each addition, a number of different nucleotides need to be tested to see if they match the base on the template strand. Since there are four different ribonucleotides this means that, on average, 25% of the pairing attempts will be successful. The unsuccessful nucleotides have to be allowed to escape from the active site. Since the active site is buried deep within the enzyme, there must be a channel that allows ribonucleotides to diffuse easily to and from the active site.

Once a proper base pair has formed, the chemical reaction takes place. In technical terms this is referred to as a nucleotidyl-group-transfer reaction. It involves a nucleophilic attack by the electron-rich oxygen of the 3′ hydroxyl group on the α-phosphorus of the incoming ribonucleoside triphosphate. The result is the formation of a new phosphodiester linkage and the release of pyrophosphate. The mechanism of the reaction requires a metal ion (Mg2+) at the active site. Subsequent cleavage of pyrophosphate helps drive the reaction in the direction of RNA synthesis.

The rate of the reaction in eukaryotes is on the order of 50 nucleotide additions per second. This means that the precursors (nucleoside triphosphates) have to diffuse into and out of the active site very rapidly.

The stage is now set for the addition of the next ribonucleotide. Before this can happen the transcription complex has to shift by one base pair in the direction of transcription so the 3′ hydroxyl group of the most recently added nucleotide is now positioned in the active site next to the Mg2+ ion. Looking at the top figure you can see that several other things have to happen simultaneously. The DNA helix has to uniwind by one base pair in front of the transcription complex and rewind by one base pair at the back of the the bubble. The short DNA:RNA helix also has to unwind by one base pair. This all happens without the complex falling off the DNA because this is a highly processive reaction. (A processive reaction is one that doesn't release the polymer as it's being synthesized. )

In another post I'll show how these molecules fit into the RNA polymerase structure.

No sex for 40 million years? No problem

 
My wife sent me a link to this story [No sex for 40 million years? No problem].

I didn't know she had been following the debate on the evolution of sex. I guess all that science talk around the house is finally having an effect. John Logsdon is already on top of the story [Bdelloid Rotifers-Ancient Asexuals?]. I wonder who told him about it?

The Heterosexual Agenda: Exposing the Myths

 
Read this eye-opening exposé of the heterosexual lifestyle [The Hetereosexual Agenda: Exposing the Myths]. Don't believe the nonsense you've been hearing about the heterosexual lifestyle. It is not harmless. It threatens the family and has the potential to bring down our society. Now is the time to take action against heterosexuals. Should they be allowed to marry?

Those of us who have fallen into the sinful lifestyle of heterosexuality need to think seriously about changing. It's not genetic. You do have a choice!
Heterosexuals have rebelled against the norms that have held civilization together for all of human history. This rebellion has become the defining characteristic of the heterosexual community. Its members have no common language, religion, music, or other typical unifying norm. What heterosexuals have in common is the one thing that makes them different from everyone else — their sexual preference.

Heterosexuality is becoming increasingly more difficult to ignore. It is being forced upon us through legislation, taught to our children in school and promoted in the powerful arts/entertainment complex. If it is true that heterosexuality has the destructive effects on the individual and society that many believe, then it behooves us to know our enemy and forestall any further advance of heterosexuality by understanding what it is, what the heterosexual community is up to, and how to answer their arguments in the open marketplace of ideas.

What Heterosexuals Do

Heterosexuals would have you believe that the heterosexual lifestyle is perfectly normal. They will tell you that their lifestyle choice should be the benchmark for society. But a closer look shows that their lifestyle isn’t as safe or as desirable as heterosexual militants say it is.
[Hat Tip: Monado]

Nobel Laureate: Roger Kornberg

 

The Nobel Prize in Chemistry 2006.

"for his studies of the molecular basis of eukaryotic transcription"


Roger Kornberg won the Nobel Prize in 2006 for describing for the first time the structure of eukaryotic RNA polymerase II at the atomic level. The presentation speech summarizes this achievement.
This year's Laureate in Chemistry, Roger Kornberg, has studied what the transcription apparatus looks like in eukaryotes, organisms with cells that have a defined nucleus, which include all fungi, plants and mammals, human beings as well. In choosing the model system for his studies he swam against the stream and selected baker's yeast, which is one of the simplest eukaryotes. This was a crucial choice, as yeast cells offer a number of advantages in this endeavour compared to the cells from mammals that had previously been used. For instance, it is possible to cultivate yeast on a large scale and to benefit from the simplicity with which yeast cells can be modified genetically. The transcription apparatus in yeast cells is very similar to the corresponding system in mammal cells, which suggests that it came into being at a very early stage of development.

By combining biochemical methods and a depiction technique called X-ray crystallography, Roger Kornberg succeeded in producing particularly detailed molecular models of the transcription apparatus in yeast cells. These models are so detailed that individual atoms can be discerned. Through the study of a host of different models of the transcription apparatus both on its own and while fully engaged in copying DNA to RNA, Kornberg has been able to draw new, important conclusions about the mechanisms of transcription and how it is regulated. As a result of his study we now understand, for instance, how the transcription apparatus chooses where to start copying on the DNA strand, how it selects the correct RNA building blocks and how it moves along the DNA strand while the copy is being made.
This was such an important result that it easily made the cover of Science magazine when the structure was published in June, 2001. There were two back-to-back papers from the Kornberg lab in that issue. (The papers were available online in late April.) As the presentation speech says, they choose to work with the enzyme from yeast because it is easy to manipulate genes in yeast cells. The first paper presented the structure of yeast RNA polymerase II in the form it would take during intitiation. The second paper described the elongation form of the enzyme [see Transcription].

RNA polymerase II [see Eukaryotic RNA Polymerases] is a complex molecule with 12 different subunits. Two of them are dispensible and they were not present in the crystals that Kornberg solved. The core of the enzyme is formed from the large subunits Rpb1 and Rpb2. These are the homologues of the β and β′ subunits of the bacterial enzyme and homologous subunits are found in RNA polymease I and RNA polymerase III. The other subunits (e.g., Rbo5, Rpb9) are much smaller. They make numerous close contacts with the large core subunits to form a very compact structure.

The technical achievement represented by these structures cannot be underestimated. While there are other examples of large complexes whose structures have been solved by X-ray crystallography, this was a particularly difficult case and it took about ten years to get the result that was published in the June 2001 issue of Science.

Roger is the son of Arthur Kornberg who won the Nobel Prize in 1959 for the discovery of DNA polymerase. This is the seventh parent-offspring set of Nobel Prizes—a remarkable statistic, if you think about it. Roger's brother is Tom Kornberg who studies Drosophila development at the University of California, San Francisco. They have another brother Ken—the smart one!—who's an architect.

The Stanford University site has a photo of Roger with his father and a short video clip of Roger Kornberg at the Press Conference.

Tuesday, March 20, 2007

Eukaryotic RNA Polymerases

There are several different kinds of RNA that can be made when RNA polymerase copies a gene. The most common kind is messenger RNA (mRNA), which then goes on to be translated into protein by the translation machinery.

Another abundant RNA is ribosomal RNA (rRNA) of which there are three different versions in prokaryotes (23S, 16S, 5S) and four in eukaryotes (28S, 18S, 5.8S, 5S). Ribosomal RNA makes up the bulk of a ribosome and it is the catalyst of the reaction joining amino acid residues.

The third well-defined class of RNA is transfer RNA (tRNA). These are the molecules that carry amino acid residues into the active site of translation. They are responsible for the correct translation of mRNA sequence according to the genetic code.

The fourth class is a catch-all category called small RNAs. It includes a variety of RNA molecules that are involved in RNA processing, regulation, etc. Some of these RNAs are also catalytic RNAs.

All types of RNA are made by a single RNA polymerase in bacteria. The genes for each of the various types have distinct promoters but the bacterial RNA polymerase can bind to all of them with the help of specific transcriptional activators. This is not what happens in eukaryotes.

In eukaryotes there are five different RNA polymerases. RNA polymerase I has become specialized for transcription of the genes for the large ribosomal RNAs (class I genes). Eukaryotic cells need massive amounts of ribosomal RNA and they have many copies of ribosomal RNA genes arrayed head to tail. The electron micrograph below shows RNA polymerase I molecules in the act of transcribing adjacent ribosomal RNA genes (TU = transcription unit, NTS = non-transcribed spacer). Apparently, it was advantageous to select for a specialized RNA polymerase concentrating on producing ribosomal RNA.

RNA polymerase II is responsible for transcribing protein-encoding genes to produce mRNA (class II genes). It has evolved some special features that allow it to be coupled to the processing of mRNA precursors. Unlike bacteria mRNA, eukaryotic mRNA is modified at the 5′ and 3′ ends and the mNA precursor can be spliced.

The cartoon on the left illustrates another important difference between prokarotic and eukaryotic RNA polymerases (in this case RNAP II). The eukaryotic enzymes are all related to each other and to the bacterial enzymes. They share the same large subunits. But in addition to the homologous subunits the eukaryotic RNA polymerases have many more secondary subunits so they are quite a bit larger than their bacterial counterparts.

The eukaryotice enzymes also interact with a greater variety of transcription factors. In the example shown, the RNAP II core enzyme is associating with several transcription factors (TF) that are required for transcription initiation.

RNA polymerase III makes transfer RNA (tRNA), small ribosoma RNA (5S RNA) and most of the small RNAs that make up the fourth class of RNA (class II genes).

The 4th and 5th types of eukarytic RNA polymerases are the mitochondrial and chloroplast versions. As you might expect, these are similar to bacterial enzymes since they were transferred to eukaryotic cells during the endosymbiotic events that gave rise to mitochondrial and chloroplasts.

In the beginning it was confusing to sort out the various RNA polymerase activities in eukaryotic cells. The problem became much simpler when it was discovered that the mushroom toxin α-amanitin (left) (Mushrooms for Dinner) specifically inhibited RNA polymerase II and not RNA polymerase I. RNA polymerase III is somewhat inhibited in mammals but not in fungi or insects. This differential inhibition allowed workers to sort out the various RNA polymerases and their specificities.

This Is Your Brain on Drugs

 
Denyse O'Leary has been telling us for months that she's preparing a new book in collaboration with Mario Beauregard, a researcher at the Université de Montréal in Montreal, Quebec, Canada.

The first publicity for this upcoming book has been spotted by an astute reader at the HarperCollins website. The title of the book is The Spiritual Brain: A Neuroscientist's Case for the Existence of the Soul. It's due to be published any day now and you can already order advance copies at Amazon.com. Here's the description.

The Spiritual Brain
Beauregard, Mario with Denyse O'Leary

THE SPIRITUAL BRAIN is a study of the scientific evidence or otherwise for the existence of a human soul. It seeks to answer the question: Did God create the brain, or did the brain create God? Mainstream neuroscience has long held that mind, consciousness, and the soul are simply by-products of electrochemical brain processes. Thoughts, feelings, and desires are all merely random by-products of the activity of the brain as an organ, and spiritual/mystical experiences are simply delusions created by the brain.

But with THE SPIRITUAL BRAIN, University of Montreal neuroscientist Mario Beauregard challenges this basic doctrine, and for the first time, a highly regarded neuroscientist seeks not to debunk traditional spiritual beliefs, but rather to support them. Using brain imaging technology on Carmelite nuns, known for going into deep prayer and trance, who agreed to have their brains monitored during these mystical experiences, Beauregard argues that spiritual experiences are actual connections to a presence outside ourselves, and that their power to transform our lives is a power which derives from an authentic encounter with an outside reality.

Rights sold: Portuguese (Brazil)/Record; English(Canada only)/HarperCanada
Publication: September 2007 (MP)
Estimated length: 288 pages
Mario Beauregard earned his Ph.D. at the Université de Montréal in 1991. His first postdoc was at the University of Texas Medical School (Houston) and his second was in neuroimaging at the Neurological Institute in Montreal (1994-1996). He is currently a researcher (chercheur agrégé) in the Departments of Psychology and Radiology at the Université de Montréal.

His website shows images of brains under different stimulation conditions. He lists one of his research interests as the neurobiology of the mystical experience. There are three projects in this category. One of them is to examine the brains of Carmélite nuns while they are haivng mystical experiences. Another is to look at patients who have survived clinical death experiences. The third study will look at the functional neuroanatomy of love.

There's nothing wrong with examining the activity of the brain while people are experiencing different states of mind. What's troubling about the book blurb is the implication that Marion Beauregard has scientific evidence that "spiritual experiences are actual connections to a presence outside ourselves, and that their power to transform our lives is a power which derives from an authentic encounter with an outside reality." I'm betting that he has no such evidence. Instead he's interpreting the bahvior of the nuns in terms of what he'd like to believe and not what he actually observes.

It will be interesting to see how the Canadian scientific community responds to the book when it comes out in a few days. This is the same community that has been highly critical of social scientists for merely hinting that intelligent could be taken seriously when a McGill Professor was denied a grant [Research Council Endorses Intelligent Design]. I can't wait to see how they treat a scientist who publishes with a genuine IDiot like Denyse O'Leary.

God and Evolution (2nd notice)

 
This is the second notice of God and Evolution, a talk about the effect of intelligent design on our education system. The lectures are in my building. I'm going. Email me if you want to meet for dinner before it starts. Several people have signed up already. You can buy tickets at the door.

The lectures are sponsored by the Centre for Inquiry, Ontario [see Standing Room Only].


Brian Alters

Dan Brooks

Mushrooms for Dinner

Julia was fed up with her husband. He was cruel and abusive and obviously preferred his own son by a previous marriage to her own son by a previous marriage. They were fighting constantly and he was heard complaining about his wife to his friends and threatening to divorce her.

She couldn't let that happen. It would mean a huge change in lifestyle. Julia decided to poison her husband by serving him mushrooms for dinner. She choose the "delicacy" Amanita phalloides because it was known to act quickly. By dawn the following day, her husband was dead.

Julia's husband was Tiberius Claudius Caesar Augustus Germanicus, Emperor of Rome, and the date was October 13, 54. Julia, better known as Julia Agrippina or Agrippina the Younger, moved quickly to install her son, Nero Claudius Caesar Augustus Germanicus on the throne.

Some mushrooms of the genus Amanita contain a deady poison called α-amanitin [Monday's Molecule #18: thanks to Matt for being the first to name the molecule]. α-amanitin is a potent inhibitor of eukaryotic RNA polymerase thus blocking transcription and preventing the expression of essential genes.

The story may not be true. Nobody knows for certain that Claudius was poisoned but by all acounts it seems likely. Nobody knows for certain that Agrippina prepared the meal herself but it seems very likely she was behind the assassination.

The story has entered the list of tales told in biochemistry class because it illustrates the importance of α-amanitin. It's rarely repeated in textbooks because of the historical uncertainties, but there's a famous telling of the tale in an earlier edition of Modern Biology by Postlethwait and Hopson. On page 229 they have a Box titled Caesar Experiments with RNA Synthesis,
For the first ten hours after Casear ate this delicacy, all seemed well. But as he digested the fungus, the α-amanitin entered his bloodstream and was absorbed by his liver and kidneys, where it began to block transcription. About 15 hours after his repast, with no new mRNA to make new proteins Caesar's liver cells stopped functioning, and nausea, diarrhea, and delirium began to hit him. Two days later, he died of liver failure. It is highly doubtful that Caesar learned to appreciate the valuable role of RNA polymerase in DNA transcription. But perhaps, in a general way, Agrippina did.

Monday, March 19, 2007

How's It Working So Far in Iraq?

 
ABC News reports on the latest poll results from Iraq [Voices From Iraq 2007: Ebbing Hope in a Landscape of Loss].
Violence is the cause, its reach vast. Eighty percent of Iraqis report attacks nearby — car bombs, snipers, kidnappings, armed forces fighting each other or abusing civilians. It's worst by far in the capital of Baghdad, but by no means confined there.

The personal toll is enormous. More than half of Iraqis, 53 percent, have a close friend or relative who's been hurt or killed in the current violence. One in six says someone in their own household has been harmed. Eighty-six percent worry about a loved one being hurt; two-thirds worry deeply. Huge numbers limit their daily activities to minimize risk. Seven in 10 report multiple signs of traumatic stress.
And how do they feel about the troops who are there to help them?
The survey's results are deeply distressing from an American perspective as well: The number of Iraqis who call it "acceptable" to attack U.S. and coalition forces, 17 percent in early 2004, has tripled to 51 percent now, led by near unanimity among Sunni Arabs. And 78 percent of Iraqis now oppose the presence of U.S. forces on their soil, though far fewer favor an immediate pullout.
That's not a good sign. But at least they're better off than they were under Saddam Hussein, right?
Given all this, for the first time since the 2003 war, fewer than half of Iraqis, 42 percent, say life is better now than it was under Saddam Hussein, whose security forces are said to have murdered more than a million Iraqis.

Forty-two percent think their country is in a civil war; 24 percent more think one is likely. Barely more than four in 10 expect a better life for their children.

Three in 10 say they'd leave Iraq if they could.
It's time for the foreign troops to leave. Get out as fast as possible.

[Hat Tip: Canadian Cynic]

Transcription

 
Transcription is the process where a gene (DNA) is copied into single-stranded RNA. The enzyme responsible for this process is called RNA polymerase. (DNA polymerase is the enzyme that copies DNA during DNA replication. They are very different enzymes even though they carry out similar reactions.)

Transcription can be divided into three steps: initiation, elongation, and termination. It's easiest to describe the process in bacteria because it's simpler than eukaryotic transcription. The basics are the same in all species.

The bacterial enzyme is called the RNA polymerase holoenzyme because it's actually a complex of RNA polymerase and an activator protein. The initiation step involves assembling a transcription initiation complex at the beginning of the gene. The site of initiation is called the promoter.

The first thing that happens is that RNA polymerase binds to any old sequence of DNA then it slides along the DNA looking for a promoter sequence. The non-specific binding of E. coli RNA polymerase holoenzyme is weak and it dissociates after about three seconds. However, during that time it can slide about 2000 base pairs looking for a promoter sequence. This one-dimensional search allows it to find the start of a gene and initiate transcription much more quickly than if it had to bind directly to a promoter.

Promoters have specific DNA sequences that are recognized by the activator protein. Recall that the activator protein is part of the hololenzyme complex. In E. coli the bound activators are called σ (sigma) factors. Different σ factors recognize different promoters. In other species the activator proteins may bind to the promoter first and the RNA polymerase will encounter it when it slides along DNA. The net effect is the same whether the activator binds first to DNA or to the promoter: a transcription initiation complex assembles at the promoter.
The actual initiation event requires opening the double-stranded DNA to make a transcription bubble. Then the first few nucleotides of RNA are synthesized by copying one of the strands of DNA.

At this point the activator protein releases the RNA polymerase, which is now tightly bound to the transcription bubble. Various elongation factors join the complex and transcription proceeds along the gene copying one of the strands into RNA. As the complex moves the RNA unwinds behind the RNA polymerase and the DNA reforms a double helix. The transcription bubble moves along the gene. In the example shown below the major elongation factor (NusA) is binding to RNA polymerase as the σ factor is ejected.

Note that the shift from initiation complex to elongation complex is a crucial step in initiation. The activation protein is tightly bound to the promoter and the complex would not be able to leave the promoter if it didn't dissociate from the activator protein (σ factor, in the case of E. coli).

At the end of the gene, the elongation complex encounters a specific termination signal where specific termination factors catalyze the dissociation of RNA polymerase from DNA and the completed RNA is released.

Whether or not a gene is transcribed depends on the promoter sequence. If there's an activator protein in the cell that binds to that promoter then the gene will be transcribed. The rate of transcription will depend on how much of the activator protein is present because the more activator there is the more quickly it will find and bind to the promoter.

The rate of transcription will also depend on the strength of the promoter. If the promoter sequence is a perfect match to the ideal binding site of the activator then the gene will be transcribed often. On the other hand, if the promoter sequence is similar to the ideal binding site but not a perfect match then it will be transcribed less often because the activator won't bind as tightly. Selection will favor the appropriate promoter strength—not all promoters are ideal binding sites because not all genes need to be transcribed at maximum rate.

Gene and Transcription Orientation

 
The DNA double helix consists of two strands of DNA wound around each other to form the classic helical structure. One of the most important insights into solving the structure was when Watson and Crick realized that the two strands had to run in opposite direction. The ends of each strand are identified by the carbon atom on the deoxyribose sugar. One end is called the 5′ (five prime) end because the 5′ carbon atom is exposed. The other end is called the 3′ (three prime) end because the 3′ carbon atom is exposed.

RNA (and DNA) can only be synthesized from the 5′ to the 3′ direction. What this means is that at the beginning of the gene when the transcription bubble forms it's the template strand that's copied into RNA and the beginning of the template strand is the 3′ end. (It's the opposite orientation of the newly synthesized RNA.) [see Transcription]

The complementary strand of DNA is called the coding strand because it represents the sequence of the gene product. In other words, it's the same sequence as the RNA. By convention the orientation of the gene is determined by the coding strand and not the template strand. Thus, the beginning of a gene is called the 5′ end and the end of a gene is the 3′ end;.

The electron micrograph below shows E. coli ribosomal RNA genes being transcribed. The thin line (upper right) is the Double-stranded DNA strand. Transcription of the genes begins at the initiation site (lower left). This is the 5′ end of the genes.

RNA polymerase first bound to the initiation site and began transcribing in the 5′ to 3′ direction as shown. As the transcription complex moves along the gene the RNA product gets longer. In this case it is bound to protein so it looks compact. About halfway along the genes the RNA is processed by cutting and that's why it seems to get shorter near the middle of the gene.

The large ribosomal RNA is in the second half of the transcribed region. You can see that the RNA in the second half is larger than the small ribosomal RNA in the first half.

Note that there are many transcription complexes transcribing this region at the same time. In fact, they are about as closely packed as they can possibly be. These genes are being transcribed at the maximum possible rate. They have a very strong promoter.

Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome

In April 2005 Gil Ast published an article in Scientific American (Ast, 2005). The title of the article was “The Alternative Genome” and its main point was how alternative splicing in humans could increase the number of different proteins that we produce. He explains why he thinks the proteome is so much larger than the number of genes. (Ast claims that there are 90,000 proteins and only 25,000 genes.)

Ast begins his argument with the quotation below.
Spring of 2000 found molecular biologists placing dollar bets. Trying to predict the number of genes that would be found in the human genome when the sequence of its DNA nucleotides was completed. Estimates at the time ranges as high as 153,000. ... given our complexity we ought to have a bigger genetic assortment than the 1000-cell roundworm, Caenorhabditis elegans, which has a 19,500-gene complement, or corn, with its 40,000 genes.

When a first draft of the human sequence was published the following summer, some observers were therefore shocked by the sequencing team’s calculation of 30,000 to 35,000 protein-coding genes. The low number seemed almost embarrassing.
Ast's remarks illustrate two points that I want to address. The first point is the surprise factor. Ast, and some other scientists, were surprised (and embarrassed) by the low gene count. They imply that most genome experts were also shocked when the genome sequence was published. That’s not quite correct, as I will show below.

The second point will have to be put off for another time but it’s important enough to mention here. Ast thinks that humans need to make many times more proteins than worms and corn because we are so much more complex. There are two problems with such a point of view—are we, in fact, 2-3 times more complex than corn? And, does it take thousands of new proteins to generate the structures that make us unique?

I think some people exaggerate our complexity and the place of humans relative to other species. This incorrect perspective can cause some scientists to put their faith in weakly supported hypotheses that claim to explain why humans really are complex and important in spite of the fact that we don’t have a lot of genes.

But let's put that discussion aside for a few days in order to discuss the historic estimates of the number of genes in the human genome. The statement by Gil Ast is typical of those who are embarrassed. They exaggerate the estimates of the total number of genes in order to make it look like everyone—not just them—thought there would be far more genes than the 25,000 that have been found. Just this month (March 2007) this myth was repeated by Taft et al. (2007).
Predictions of the estimated number of protein-coding genes in the human genome prior to genome sequencing ranged from as low as 50,000 to as high as 140,000, whereas the latest estimates from genome analysis indicate that humans have approximately 20,000 protein-coding genes.
The graphic above was taken from the Genesweep lottery. This is the betting that Asp refers to. It shows the range of gene number estimates by scientists who were involved in genome sequencing projects. Note that there are many estimates in the 40-50,000 range and a fair number below 40,000. The point is obvious—lots of experts anticipated fewer than 50,000 genes in the human genome (see The nature of the number. Nature Genetics 25:127 (2000)).

The earliest estimates of gene number are based on genetic load arguments (see King & Jukes, 1969). Since approximate mutation rates were known by 1960, it was possible to estimate the maximum number of genes that could be mutated without presenting an impossible genetic load. In other words, how many genes could we have before the number of lethal mutations per generation became intolerable? This number was less than 40,000 genes; an estimate that has never been refuted or discredited. Many experts were well aware of this upper bound up until the time the genome sequence was published.

By the 1970's there were good estimates of the total number of unique Drosophila melanogaster genes that could be mutated to lethality. The range was about 5,000-10,000 genes and this correlated well with the genetic map and the organization of polytene chromosomes. It was known that the Drosophila genome was much larger than the total size of the estimated number of genes but studies from a number of labs confirmed that a great deal of genomic DNA was repetitive junk DNA.

As we learned more and more about how genes controlled development, it became clear that huge differences in morphology and "complexity" could be due to very small changes in the either the number of regulatory genes or when they were expressed. Most of the people who assimilated the advances in developmental biology began to appreciate that mammals do not need to have many more genes than fruit flies.

By 1980, the amount of unique sequence DNA in mammalian genomes was known to be capable of encoding fewer than 20,000 genes if the average size of a gene was 10,000 bp (including introns). We now know that much of the intron sequences is not unique sequence DNA but that wasn't known back then. This estimate of gene number was consistent with detailed analysis of the amount of DNA that could be protected by mRNA or by Rot analysis (kinetics of hybridization of RNA to DNA). Mouse embryos (gastrula) appeared to express about 20,000 average-sized mRNAs. Some of these were present at very low abundance leading to the idea that this value may represent most of the mouse genes in the genome (summarized in Lewin, 1980). Certainly it was known that mammals expressed about 10,000 housekeeping genes in most cells and tissues. The general consensus was that the total number of regulatory genes was unlikely to be more than twice this number (probably less) for a total of 30,000 genes at most.

It was about this time that Walter Gilbert made his famous back-of-the-envelope calculation of 100,000 genes in the human genome. This was the estimate that became widely quoted when the human genome project was first proposed. It's interesting to note that Gilbert's estimate was not based on any experimental evidence; indeed, it conflicted with most of the available evidence suggesting far fewer genes. The larger number seemed less threatening to scientists who were worried that we might not have more genes than a fruit fly.

By the late-1990's we had estimates of the total number of human genes from the sequences of chromosomes 21 and 22 and from the sequence of a large contiguous region of the MHC locus. The results suggested fewer than 45,000 genes total—even less if these sequenced regions turned out to be gene rich as was widely suspected. Thus, the number of genes was coming out to be well below 50,000 and this was in line with the data from RNA hybridization studies and genetic load. It also fit with the concept that the number of genes in mammals was probably not more than twice the number in insects.

In contrast to these results, the estimates from expressed sequence tags (ESTs) were often much higher. Expressed sequence tags are short copies of RNA isolated directly from cells. The idea was that these represented bits of mRNA so each one revealed the presence of a protein-coding gene. As more and more ESTs were deposited in the sequence libraries, it became possible to estimate when the library would be complete and the totals were often more than 100,000 distinct mRNAs. For example, just before the human genome sequence was published, (Liang et al., 2000) estimated that there were 120,000 genes based on the analysis of 2 million EST's.

Not everyone believed in the validity of the EST data. There were some who thought that most ESTs were artifacts. They turned out to be correct although this is not widely appreciated. By using the sequences of chromosomes 21 and 22 as controls
Ewing and Green (2000) were able to estimate 35,000 genes based on the EST libraries.

Thus, by the time the draft sequence was published in 2001 there were many scientists who anticipated that the number of genes would be less than 40,000 and that's why there are so many bets in that range in the Genesweep lottery. When the number of genes was announced to be about 30,000 there were many of us who were not the least bit surprised. The only ones who were surprised were those who ignored most of the data and clung to the idea humans had to have far more genes than the so-called "lower" species.

It is simply not true that all the experts were surprised at the low number of genes. Some experts were, but many were not. The interesting thing is that those who wanted there to be more genes have not given up the fight. They continue to publish rationalizations and just-so stories in an attempt to justify why they were wrong.

UPDATE:The latest estimates indicate that the human genome contains about 20,500 protein-encoding genes [Humans Have Only 20,500 Protein-Encoding Genes]. There are probably about 1500 genes for the known stable RNAs for a total of 22,000.

Ast,G. (2005) The alternative genome. Sci. Am. 292; 40-47.

Ewing,B. and Green,P. (2000) Analysis of expressed sequence tags indicates 35,000 human genes. Nat. Genet. 25; 232-234.

King,J.L. and Jukes,T.H. (1969) Non-Darwinian evolution. Science 164; 788-798.

Lewin, B. (1980) Gene Expression-2 2nd ed. Chapter 24; Complexity of mRNA Populations.

Liang,F., Holt,I., Pertea,G., Karamycheva,S., Salzberg,S.L., and Quackenbush,J. (2000) Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25; 239-240.

What Is This?

 
Why it's Stenocereus eruca, of course.

Don't know what that is?

Find out if it's a plant, an animal, or something else entirely by going [here].

It (mostly) doesn't have sex but shows a fairly high level of genetic diversity.

Sex, genes & evolution

 
With a title like that how could you not want to read John Logsdon's new blog? Yesterday was his first post but I'm looking forward to lots more in the near future [Sex, Genes & Evolution].

John is a molecular evolutionary biologist in the Biology Department at the University of Iowa. He has published a number of papers with W. Ford Doolittle from the time he was at Dalhousie University in Nova Scotia. These include several papers with Arlin Stoltzfus on the evolution of introns. The Stoltzfus/Logsdon papers from this era were among the best papers to refute the intron-early hypothesis formerly championed by their mentor, Ford Doolittle. One of the things this demonstrates is that it's possible to disagree with your boss and survive!

Their chief target at the time was the Gilbert lab. John Logsdon was one of the participants in the famous online BioMedNet debate on The Origin and Evolution of Introns in November 1996—back in the time before blogs. This was mostly a debate between members of the Ford Doolittle lab and the Gilbert lab. Unfortunately, the transcript is no longer available. It was required reading in most molecular biology courses in the late 1990's. (I wish we had more debates like that.)

The Logsdon lab is interested in sex in protists, specifically the evolution of genes involved in recombination and meiosis (e.g., RAD51). John participates in a larger project that is trying to define the eukaryotic tree of life. As most of you know, the relationship of protists is controversial and the collaborative project intends to try and resolve the controversies. It not going to be easy to figure out the early history of eukaryotic evolution. This is a problem that has perplexed evolutionary biologists for several decades.

The Iowa biologists' goal is to sequence nine genes (actin, α- and β- tubulin, cob, EF-1 a , Hsp70, Hsp90, RPB1, SSU rRNA) from at least 200 different protists [ Assembling the Tree of Eukaryotic Microbial Diversity and Eu-Tree].

I'm excited about this project because they're looking at the best gene (HSP70). I hope he won't be disappointed to learn that my undergraduates have already solved the problem [The Evolution of the HSP70 Gene Family]. But all is not lost, those other genes might make a minor contribution to understanding evolution.

Welcome to science blogging, John.

Now, why not jump right in and describe your favorite hypothesis for why we have sex? I'm guessing you're a fan of repair, right?

Monday's Molecule #18

 
Name this molecule. You must be specific but we don't need the full correct scientific name. (If you know it then please post it.)

As usual, there's a connection between Monday's molecule and this Wednesday's Nobel Laureate. This one's very easy once you know the molecule. There'll be a few extra bonus points for guessing Wednesday's Nobel Laureate(s).

Comments will be blocked for 24 hours. Comments are now open.