Comparative genomics and fundamental problems in evolutionary biology

Eugene V.  Koonin

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA

Over 50 complete genome sequences of cellular life forms  Bacteria, Archaea, and Eukaryotes  are currently available, and many more are in the pipeline. Considerable comparative analysis of these genomes has already been performed, and while even more challenging work lies ahead, it is fair to ask at this juncture, what is the impact of this research on biology in general. In my opinion, comparative analysis of complete genome has already affected our ideas on biological evolution, to such an extent that it is appropriate to claim a paradigm shift in evolutionary biology.

Computer analysis of complete genomes of unicellular organisms shows that protein sequences are in general highly conserved  in evolution, with  at least 70% of them containing ancient conserved regions. This allows us to delineate families of orthologs across a wide phylogenetic range and in many cases, predict protein functions with reasonable confidence. Examination of the ‘phylogenetic pattern’ for these orthologous families shows that only ~80 families, most of which include components of the translation machinery, are universally conserved in all sequenced genomes. Thus, horizontal gene transfer and lineage-specific gene loss are not inconsequential evolutionary quirks, but rather prevailing forces of evolution, at least in the prokaryotic world. Horizontal transfer and lineage-specific loss of entire genes are complemented by numerous intragenic recombination events that manifest in domain rearrangement at the protein level. In the evolution of eukaryotes, these rearrangements take the form of domain accretion, whereby complex, multicellular organisms accrete additional domains in many orthologous protein sets, which results in the enhancement of their repertoire of regulatory and signal-transducing interactions.

In a long-term perspective, one of the main results of the “genomic revolution” is that it provides us with the data we need to address fundamental problems of evolutionary biology at an empirical level. To illustrate this approach, I will describe a simple, genome-wide test of a central prediction of the neutral theory of molecular evolution regarding the constancy of the rate of amino acid substitution in different partitions of phylogenetic trees. The analysis was performed with sets of orthologous proteins that represent complete and partially sequenced genomes of bacteria, archaea and eukaryotes. Consistent with the prediction of the neutral theory, rates of amino acid sequence evolution are significantly correlated on different branches of phylogenetic trees representing the great majority of orthologous protein clusters analyzed from all three domains of life.  However, approximately 1% of the proteins from each genome deviate from this pattern and instead show variation that is consistent with an acceleration of the rate of amino acid substitution due to functional diversification between orthologs. The proteins that show anomalous variation are largely membrane-associated and secreted proteins that are implicated in defense and environmental interactions and for which positive selection appears to be a plausible mode of evolution.