t is generally assumed that new genes arise through duplication and/or recombination of existing genes. The probability that a new functional gene could arise out of random non-coding DNA is so far considered to be negligible, as it seems unlikely that such an RNA or protein sequence could have an initial function that influences the fitness of an organism. Here, we have tested this question systematically, by expressing clones with random sequences in Escherichia coli and subjecting them to competitive growth. Contrary to expectations, we find that random sequences with bioactivity are not rare. In our experiments we find that up to 25% of the evaluated clones enhance the growth rate of their cells and up to 52% inhibit growth. Testing of individual clones in competition assays confirms their activity and provides an indication that their activity could be exerted by either the transcribed RNA or the translated peptide. This suggests that transcribed and translated random parts of the genome could indeed have a high potential to become functional. The results also suggest that random sequences may become an effective new source of molecules for studying cellular functions, as well as for pharmacological activity screening.
As are people
Where are you? I see the entire paper and I don't have any Nature credentials.
Caroline M. Weisman and Sean R. Eddy: (...) It's the finding that 25% of random sequences are beneficial that is of particular interest. It appears to suggest a high probability in the third term of the protogene equation-- but it's unexpected, because one would think that improving on an already highly adapted E. coli would be difficult. For example, beneficial mutations occur relatively rarely in quantitative longterm in vitro evolution experiments in E. coli . In this regard, we think there are two important caveats to this observation. One issue is that because the experiment measures relative (normalized) frequency changes in competitive growth conditions, the change in frequency of a sequence depends on its growth rate relative to every other random sequence in the pool. Even assuming that drift is negligible, as Neme et al.  argue, sequence enrichment does not mean that a sequence is beneficial relative to wildtype E. coli, only that it was better than other random sequence competitors. It could be that all the random sequences are deleterious to E. coli, but some are less deleterious than others, and these would rise to higher relative frequencies. Neme et al.  address this issue by choosing three individual beneficial sequences and showing that all three individually outcompete the 'empty vector'. This raises a second issue: the vector without insert is neither empty nor innocuous. The vector (pFLAG-CTC) carries strong transcription and translation signals. It drives IPTGinducible expression across its multiple cloning site, producing a 350 nucleotide RNA and a 38 amino-acid open reading frame at high levels. Neme et al.  do not observe these products because they assay expression by western blot with an anti-FLAG-tag antibody, but the FLAG tag in the vector without insert is out of frame by design. Because high-level expression of any exogenous plasmid-encoded sequence is detrimental to the E. coli host, under these conditions a beneficial random sequence could include anything that decreases RNA or protein expression levels relative to the vector without insert, for instance by base-pairing complementarity to the translation initiation site. Indeed, all three beneficial clones seem to show strongly reduced protein expression relative to the population average of the library. Although we have reservations about the correctness of the conclusion of Neme et al.  that 25% of their random sequences have beneficial effects on E. coli growth rate, a body of work from these and other authors does suggest that each of the three terms in the protogene equation are high enough to be measurable in laboratory experiments, and thus could easily be relevant on evolutionary timescales. Franc¸ois Jacob was correct that gene duplication and divergence is a dominant force in gene evolution, but his personal intuition about the odds of new genes arising de novo may have been simply wrong. Experiments studying the transcription, translation, and functionality of random sequences are proving to be fruitful territory, replacing Jacob's intuition with experimental data