Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Sty RESEAREH Origins, evolution, and phenoty pic impact of new genes Henrik Kaessmann Genome Res 2010 20: 1313-1326 originally published online July 22, 2010 Access the most recent version at doi: 10. 1101/gr. 101386.109 References This article cites 123 articles, 48 of which can be accessed free at http:/genome.cshlp.org/content/20/10/1313.fullhtmlref-list-1 Article cited in http:/iGenome.cshlp.org/content/20/10/1313.full.htmlrelated-urls Email alerting Receive free email alerts when new articles cite this article -sign up in the box at the service top right corner of the article or click here To subscribe to Genome Research go to http:/iGenome.cshlp.org/subscriptions Copyright C 2010 by Cold Spring Harbor Laboratory Press
Access the most recent version at doi:10.1101/gr.101386.109 Genome Res. 2010 20: 1313-1326 originally published online July 22, 2010 Henrik Kaessmann Origins, evolution, and phenotypic impact of new genes References http://genome.cshlp.org/content/20/10/1313.full.html#related-urls Article cited in: http://genome.cshlp.org/content/20/10/1313.full.html#ref-list-1 This article cites 123 articles, 48 of which can be accessed free at: service Email alerting top right corner of the article or click here Receive free email alerts when new articles cite this article - sign up in the box at the http://genome.cshlp.org/subscriptions To subscribe to Genome Research go to: Copyright © 2010 by Cold Spring Harbor Laboratory Press Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Review. Origins evolution and phenotypic impact of new genes Henrik Kaessmann Center for Integrative Genomics, University of Lausanne, CH-1015 Lausanne, Switzerland Ever since the pre-molecular era, the birth of new genes with novel functions has been considered to be a major con- tributor to adaptive evolutionary innovation. here, I review the origin and evolution of new genes and their functions in eukaryotes, an area of research that has made rapid progress in the past decade thanks to the genomics revolution. Indeed organisms. The array of mechanisms underlying the origin of new genes is compelling extending way beyond the tra ditionally well-studied source of gene duplication. Thus, it was shown that novel genes also regularly arose from mes- senger RNAs of ancestral genes, protein-coding genes metamorphosed into new rna genes, genomic parasites were co- opted as new genes, and that both protein and rna genes were composed from scratch (i.e from previously non- functional sequences). These mechanisms then also contributed to the formation of numerous novel chimeric gene structures. Detailed functional investigations uncovered different evolutionary pathways that led to the emergence of novel functions from these newly minted sequences and, with respect to animals attributed a potentially important role to one specific tissue-the testisin the process of gene birth Remarkably these studies also demonstrated that novel genes of the various types significantly impacted the evolution of cellular physiological, morphological, behavioral, and reproductive phenotypic traits. Consequently it is now firmly established that new genes have indeed been major con the origin of adap What is the nature of mutations underlying adaptive evolution- change, which further underscores the importance of novel gene ary innovations? In addition to subtle genetic modifications of for organismal evolution preexisting ancestral genes that can lead to differences in their In this review, I discuss in detail the different genomic sources (protein or RNA) sequences or activities, new genes with novel of new genes in eukaryotes(with a particular emphasis on animals) functions may have significantly contributed to the evolution of and assess their relative contributions and functional implications lineage- or species-specific phenotypic traits. Consequently, the in different species and evolutionary lineages. I also examine how process of the "birth"and evolution of novel genes has attracted new protein or RNA functions may evolve from newly minted gene much attention from biologists in the past. Indeed, quite re- structures and discuss the associated selective forces. I then discuss markably, considerations pertaining to the origin and functional a hypothesis that suggests a key role of one tissue-the testis-in ate of new genes trace back to a time when the molecular nature of the establishment of new functional genes. Finally, I highlight enes had not yet been established. Based on cytological obser- recent new developments in the field and identify potential future vations of chromosomal duplications, Haldane(1933)and Muller research directions. Notably, I focus on recent developments in (1935)already hypothesized in the 1930s that new gene functions this review, while referring to previous reviews and other litera may emerge from refashioned copies of old genes, highlighting ture for details pertaining to long-established concepts and earlier for the first time the potential importance of gene duplication for findings the process of new gene origination. The early notions that gene duplication provides a significant reservoir for the emep globally of new genes of genes and hence phenotypic adaptation have now be Gene duplication-raw material for the emergence confirmed (but also refined) based on numerous large- and small scale molecular studies that were facilitated by the genomics rev. Gene duplication is a very common phenomenon in all eukaryotic olution. New duplicate genes have been shown to be abundant organisms(but also in prokaryotes; for review, see Romero and in all eukaryotic genomes sequenced to date and to have evolved Palacios 1997) that may occur in several different ways ( lynch pivotal functional roles(Lynch 2007) 2007). Traditionally, DNA-mediated duplication mechanisms have However, studies from the genomics era have also accelerated been considered and widely studied in this context, although pe. the discovery of fascinating novel mechanisms underlying the culiar intronless duplicate gene copies may also arise from RNA emergence of new genes. These include the origin of new protein- sources(see further below). DNA duplication mechanisms include coding and RNA genes"from scratch"(that is, from previously small-scale events, such as the duplication of chromosomal seg. nonfunctional genomic sequences), various types of gene fusions, ments containing whole genes or gene fragments(termed seg and the formation of new genes from RNA intermediates. It is now mental duplication), which are essentially outcomes of misguided well established that all of these mechanisms have significantly recombination processes during meiosis(Fig. 1A).However, they ontributed to functional genome evolution and phenotypic also include duplication of whole genomes through various poly ploidization mechanisms(Lynch 2007; Conant and Wolfe 2008; I Henrik Kaessmann unilch Van de peer et al. 2009). Thus, duplicate gene copies can arise in 加m blished online before print. Article and publication date are at many different ways. But what is their functional fate and evolu- wwwgenome. org/cgi/doi/10. 1101/gr. 101386 tionary relevance? :0 1313-1326 e 2010 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/10: Genome Research 1313
Review Origins, evolution, and phenotypic impact of new genes Henrik Kaessmann1 Center for Integrative Genomics, University of Lausanne, CH-1015 Lausanne, Switzerland Ever since the pre-molecular era, the birth of new genes with novel functions has been considered to be a major contributor to adaptive evolutionary innovation. Here, I review the origin and evolution of new genes and their functions in eukaryotes, an area of research that has made rapid progress in the past decade thanks to the genomics revolution. Indeed, recent work has provided initial whole-genome views of the different types of new genes for a large number of different organisms. The array of mechanisms underlying the origin of new genes is compelling, extending way beyond the traditionally well-studied source of gene duplication. Thus, it was shown that novel genes also regularly arose from messenger RNAs of ancestral genes, protein-coding genes metamorphosed into new RNA genes, genomic parasites were coopted as new genes, and that both protein and RNA genes were composed from scratch (i.e., from previously nonfunctional sequences). These mechanisms then also contributed to the formation of numerous novel chimeric gene structures. Detailed functional investigations uncovered different evolutionary pathways that led to the emergence of novel functions from these newly minted sequences and, with respect to animals, attributed a potentially important role to one specific tissue—the testis—in the process of gene birth. Remarkably, these studies also demonstrated that novel genes of the various types significantly impacted the evolution of cellular, physiological, morphological, behavioral, and reproductive phenotypic traits. Consequently, it is now firmly established that new genes have indeed been major contributors to the origin of adaptive evolutionary novelties. What is the nature of mutations underlying adaptive evolutionary innovations? In addition to subtle genetic modifications of preexisting ancestral genes that can lead to differences in their (protein or RNA) sequences or activities, new genes with novel functions may have significantly contributed to the evolution of lineage- or species-specific phenotypic traits. Consequently, the process of the ‘‘birth’’ and evolution of novel genes has attracted much attention from biologists in the past. Indeed, quite remarkably, considerations pertaining to the origin and functional fate of new genes trace back to a time when the molecular nature of genes had not yet been established. Based on cytological observations of chromosomal duplications, Haldane (1933) and Muller (1935) already hypothesized in the 1930s that new gene functions may emerge from refashioned copies of old genes, highlighting for the first time the potential importance of gene duplication for the process of new gene origination. The early notions that gene duplication provides a significant reservoir for the emergence of genes and hence phenotypic adaptation have now been globally confirmed (but also refined) based on numerous large- and smallscale molecular studies that were facilitated by the genomics revolution. New duplicate genes have been shown to be abundant in all eukaryotic genomes sequenced to date and to have evolved pivotal functional roles (Lynch 2007). However, studies from the genomics era have also accelerated the discovery of fascinating novel mechanisms underlying the emergence of new genes. These include the origin of new proteincoding and RNA genes ‘‘from scratch’’ (that is, from previously nonfunctional genomic sequences), various types of gene fusions, and the formation of new genes from RNA intermediates. It is now well established that all of these mechanisms have significantly contributed to functional genome evolution and phenotypic change, which further underscores the importance of novel genes for organismal evolution. In this review, I discuss in detail the different genomic sources of new genes in eukaryotes (with a particular emphasis on animals) and assess their relative contributions and functional implications in different species and evolutionary lineages. I also examine how new protein or RNA functions may evolve from newly minted gene structures and discuss the associated selective forces. I then discuss a hypothesis that suggests a key role of one tissue—the testis—in the establishment of new functional genes. Finally, I highlight recent new developments in the field and identify potential future research directions. Notably, I focus on recent developments in this review, while referring to previous reviews and other literature for details pertaining to long-established concepts and earlier findings. Gene duplication—raw material for the emergence of new genes Gene duplication is a very common phenomenon in all eukaryotic organisms (but also in prokaryotes; for review, see Romero and Palacios 1997) that may occur in several different ways (Lynch 2007). Traditionally, DNA-mediated duplication mechanisms have been considered and widely studied in this context, although peculiar intronless duplicate gene copies may also arise from RNA sources (see further below). DNA duplication mechanisms include small-scale events, such as the duplication of chromosomal segments containing whole genes or gene fragments (termed segmental duplication), which are essentially outcomes of misguided recombination processes during meiosis (Fig. 1A). However, they also include duplication of whole genomes through various polyploidization mechanisms (Lynch 2007; Conant and Wolfe 2008; Van de Peer et al. 2009). Thus, duplicate gene copies can arise in many different ways. But what is their functional fate and evolutionary relevance? 1 E-mail Henrik.Kaessmann@unil.ch. Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.101386.109. 20:1313–1326 2010 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/10; www.genome.org Genome Research 1313 www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Kaessmann A time that the most probable fate of a du- plicate gene copy is pseudogenization (Ohno 1972)and that hence the majority of duplicate gene copies are eventually lost from the genome. While these fundamental hypot ses have been confirmed by a large bod of data, they have since also been signif icantly extended and refined In particu- Unequal crossing-ov 人上一 lar, in addition to the process of neo- functionalization (i.e., the emergence of new functions from one copy-Ohno's basic concept), it was proposed that the 上个:已个m斯mm be shaped by natural selection or in volve purely neutral processes(Force et al. 1999: Conant and wolfe 2008: Innan and Kondrashov 2010 Global genomic screens combined with detailed experimental scrutiny have uncovered numerous intriguing examples Transcription AAA ganisms, solidly supporting their validity Detailed analyses of young duplicate gen Reverse transcription and integration have been particularly informative, be. cause many of the details associated with the emergence of new genes from gene duplicates become obscured over longer periods of time(Long et al. 2003). A pa Figure 1. Origin of new gene copies through gene duplication. (A) DNA-based duplication. a ticularly illustrative case of neofunction- via unequal alization, arguably the most intriguing ossing-over that is mediated by transposable elements(light green). There are different fates of fate of a duplicate gene, occurred in the ication). New retroposed gene copies may arise after duplication in an African leaf-eating monkey, the protein encoded by one of evolution of promoters in their 5 flanking regions that may drive their transcription. (Pink right. the copies of the ancestral RNASEI gene ngled arrow TSS, (transparent pink box) additionally transcribed flanking sequence at the insertion rapidly adapted at specific sites to derive nutrients from bacteria in the foregut under the influence of strong positive selection( Zhang et al. 2002). Remarkably, Gene duplication and new gene functions both the duplication and subsequent adaptation of this gene were later shown to have occurred independently in a very similar At least since a famous monograph, authored by Susumu Ohno, manner in an Asian leaf-eating monkey(Zhang 2006). Thus, these vas published over 40 yr ago(Ohno 1970), the word has spread RNASEl duplications represent striking cases of convergent hat gene duplication may underlie the origin of many or even lecular evolution. They were likely facilitated by the frequent oc. most novel genes and hence represents an important process for currence of segmental duplication, which allows similar duplica- functional innovation during evolution. Essentially and consis- tion events that are highly beneficial to be repeatedly fixed during tent with earlier ideas(Haldane 1933; Muller 1935), Ohno em- evolution. More generally, the convergent RNASEI duplications phasized that the presence of a second copy of a gene would open are in line with several other recent reports that include other cases up unique new opportunities in evolution by allowing one of the of new gene formation(see below)and therefore lend further two duplicate gene copies to evolve new functional properties, support to the more general idea that adaptive genome evolution whereas the other copy is preserved to take care of the ancestral is, to some extent, predictable(Stern and Orgogozo 2009). Nu (usually important) function(the concept of neofunctionalization). merous other classical or recent examples from diverse organisms Ohno also reviewed that duplicate genes can be preserved by could be discussed here that illustrate the immense potential that natural selection for gene dosage, thus allowing an increased DNA-based gene duplication has held for phenotypic evolution production of the ancestral gene product(Ohno 1970). Finally, it in different organisms(for reviews, see Li 1997; Long et al. 2003; should be emphasized that it has been widely agreed for a long Zhang 2003; Lynch 2007; Conant and wolfe 2008) 1314 Genome
Gene duplication and new gene functions At least since a famous monograph, authored by Susumu Ohno, was published over 40 yr ago (Ohno 1970), the word has spread that gene duplication may underlie the origin of many or even most novel genes and hence represents an important process for functional innovation during evolution. Essentially and consistent with earlier ideas (Haldane 1933; Muller 1935), Ohno emphasized that the presence of a second copy of a gene would open up unique new opportunities in evolution by allowing one of the two duplicate gene copies to evolve new functional properties, whereas the other copy is preserved to take care of the ancestral (usually important) function (the concept of neofunctionalization). Ohno also reviewed that duplicate genes can be preserved by natural selection for gene dosage, thus allowing an increased production of the ancestral gene product (Ohno 1970). Finally, it should be emphasized that it has been widely agreed for a long time that the most probable fate of a duplicate gene copy is pseudogenization (Ohno 1972) and that hence the majority of duplicate gene copies are eventually lost from the genome. While these fundamental hypotheses have been confirmed by a large body of data, they have since also been significantly extended and refined. In particular, in addition to the process of neofunctionalization (i.e., the emergence of new functions from one copy—Ohno’s basic concept), it was proposed that the potentially multiple functions of an ancestral gene may be partitioned between the two daughter copies. This process was dubbed ‘‘subfunctionalization’’ and may be shaped by natural selection or involve purely neutral processes (Force et al. 1999; Conant and Wolfe 2008; Innan and Kondrashov 2010). Global genomic screens combined with detailed experimental scrutiny have uncovered numerous intriguing examples for each of these models in many organisms, solidly supporting their validity. Detailed analyses of young duplicate genes have been particularly informative, because many of the details associated with the emergence of new genes from gene duplicates become obscured over longer periods of time (Long et al. 2003). A particularly illustrative case of neofunctionalization, arguably the most intriguing fate of a duplicate gene, occurred in the course of the recent duplication of a pancreatic ribonuclease gene in leaf-eating monkeys. Zhang et al. demonstrated that after duplication in an African leaf-eating monkey, the protein encoded by one of the copies of the ancestral RNASE1 gene rapidly adapted at specific sites to derive nutrients from bacteria in the foregut under the influence of strong positive selection (Zhang et al. 2002). Remarkably, both the duplication and subsequent adaptation of this gene were later shown to have occurred independently in a very similar manner in an Asian leaf-eating monkey (Zhang 2006). Thus, these RNASE1 duplications represent striking cases of convergent molecular evolution. They were likely facilitated by the frequent occurrence of segmental duplication, which allows similar duplication events that are highly beneficial to be repeatedly fixed during evolution. More generally, the convergent RNASE1 duplications are in line with several other recent reports that include other cases of new gene formation (see below) and therefore lend further support to the more general idea that adaptive genome evolution is, to some extent, predictable (Stern and Orgogozo 2009). Numerous other classical or recent examples from diverse organisms could be discussed here that illustrate the immense potential that DNA-based gene duplication has held for phenotypic evolution in different organisms (for reviews, see Li 1997; Long et al. 2003; Zhang 2003; Lynch 2007; Conant and Wolfe 2008). Figure 1. Origin of new gene copies through gene duplication. (A) DNA-based duplication. A common type of segmental duplication—tandem duplication—is shown. It may occur via unequal crossing-over that is mediated by transposable elements (light green). There are different fates of the resulting duplicate genes. For example, one of the duplicates may acquire new functions by evolving new expression patterns and/or novel biochemical protein or RNA functions (see main text for details). (Gold and blue boxes) Exons, (black connecting lines) exon splicing, (red rightangled arrows) transcriptional start sites (TSSs), (gray tubes) nonexonic chromatin. (B) RNA-based duplication (termed retroposition or retroduplication). New retroposed gene copies may arise through the reverse transcription of messenger RNAs (mRNAs) from parental source genes. Functional retrogenes with new functional properties may evolve from these copies after acquisition or evolution of promoters in their 59 flanking regions that may drive their transcription. (Pink rightangled arrow) TSS, (transparent pink box) additionally transcribed flanking sequence at the insertion site. 1314 Genome Research www.genome.org Kaessmann Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Evolution of new genes Duplication of noncoding RNAs For example, analyses of fully sequenced genomes have Suffice it to add in this review that studies pertaining to the origin revealed high rates of origin but also loss of duplicate genes(Lynch of novel genes from duplicated DNA segments have begun to be d Conery 2003; Demuth and Hahn 2009). New duplicates are extended beyond the traditionally studied protein-coding gene estimated to be "born"at the rate of -0 001 thanks to the rapid recent advances in the genomics field. For million years in eukaryotes (Lynch and Conery 2003; Lynch 2007 example, it has become clear that microRNAs(miRNAs), small rna while the death rate of duplicates is at least an order of magnitude molecules that have emerged as major post-transcriptional regu. higher, consistent with the early notion(see above) that the fate lators( Carthew and Sontheimer 2009), have expanded and func- f most duplicates is pseudogenization(Ohno 1972). Notably, not all functional categories of genes are equally prone to expand by et al. 2006). Interestingly, several individual studies indicate that duplication. In particular, a relatively small number of gene fami- the X chromosome may provide a particularly fruitful ground for lies(1.6%3%)with functions in, for example, immunity, host the origination of new lineage-specific miRNAs(Zhanget al. 2007: defense, chemosensation, and reproduction, show rapid, selec Devor and Samollow 2008: Murchison et al. 2008: Guoet al 2009), tively driven copy number changes in various eukaryotic lineages, a pattern that may be explained by the specific sex-related forces ing to that have shaped the x, given that new X-born miRNAs appear et al. 2003; Demuth and Hahn 2009) to be However, in addition to these commonalities detailed whole. mental gene duplication also seems to play a major role for the genome investigations also suggest intriguing fundamental dif- expansion of another class of small RNAS, Piwi-interacting RNAs ferences with respect to the generation and functional fate of du- (piRNAs, Malone and Hannon 2009), which are expressed in the plicates in different evolutionary lineages. For example, careful germline and are thought to be mainly involved in transposon analyses in primates revealed a burst of segmental gene duplication control. A recent study revealed that pirNA clusters rapidly in hominoids(humans and apes), especially in humans and the genomes, a process driven by intense positive selection(Assis and of these duplicates are dispersed and mediate major gend rearrangements associated with disease. The accelerated fixation Kondrashov 2009). Segmental duplication therefore provides an rate of segmental duplicons in hominoids could, in principle, be efficient vehicle for the expansion of piRNA repertoires and hen allows organisms to swiftly evolve protection barriers against the lained by the selective benefit of newly formed genes embed- lineage-specific expansion of transposable elements. There is so far ded within these s, which outweigh deleterious effects in many cases(Marques-Bonet et al. 2009b). New gene formation in little evidence for duplication of sequences transcribed into long hominoids indeed seems to have profited from the substantial raw noncoding RNAs(lnCRNAs), an abundant class of nontranslated RNAs(>200 nucleotides [nt] in length), whose functional impact material provide led by massive segmental duplication(Marques- is only beginning to be understood (Mercer et al. 2009; Ponting Bonet et al. 2009b; see below). However, the overall accelerated et al. 2009). The paucity of known duplicated IncRNA genes is fixation rate of segmental duplicons in humans and apes is prob- perhaps mainly due to their rapid sequence divergence, which ably best explained by the reduction of the effective population which will benefit from the rapidly accumulating genomic and drift and, at the same time, rendered purifying selection less effi- transcriptomic data, will clarify the role of gene duplication in the cient, thus probably allowing disproportionately high numbers of evolution of new IncRNA genes with altered or novel functions. lightly deleterious segmental duplications to be fixed in homi- noids compared with other species with larger long-term effective Global patterns population sizes (and hence more efficient selection). This hy pothesis is consistent with other types of molecular evolutionary n spite of the numerous well-founded examples of functionally data(Keightley et al. 2005; Gherman et al. 2007) important newly minted genes that arose from duplicate gene In addition to lineage-specific selection intensities, differ copies, a more global picture of the functional relevance and ences pertaining to the mutational basis of gene duplication can adaptive value of the large number of duplicate gene copies scat- lead to different characteristics of segmental duplications between tered in genomes is only beginning to emerge. Only for some species. a good example is the finding that, in contrast to humans, whole-genome duplication (WGD) events in model organisms recently duplicated chromosomal regions in the mouse are de- (in particular yeast), global assessments of the relevance of dupli- pleted in genes and transcriptsShe et al. 2008). Detailed analyses cate genes for the emergence of new gene functions have been suggest that species-specific distributions of retrotransposons tempted( Conant and wolfe 2008). However, WGD represents a which represent major promoters of segmental duplication events special case of gene duplication, which involves specific selective ( Marques-Bonet et al. 2009a), account for much of this discrepancy pressures related to dosage balance of gene products that seem to ignificantly influence the fate of resulting gene duplicates And RNA-based duplication and the emergence ven in the case of WGD, it remains largely unclear whether gene duplications often conferred novel functions or not( Conant ar of"stripped-down"new genes Wolfe 2008) As outlined above, the traditionally studied DNA-mediated gene Thus, a more global understanding of the implications of duplication mechanisms have significantly contributed to fund gene duplication for the emergence of new gene functions and its tional genome evolution and have provided many fundamental importance relative to other mutational mechanisms that affect insights regarding new gene origination. However, new gene preexisting genes will have to await future efforts. However, a copies can also arise through an alternative, less well known closer examination of the reported general distributions and char- duplication mechanism termed retroposition or retroduplication acteristics of gene duplicates in different genomes is nevertheless(Brosius 1991; Long et al. 2003; Kaessmann et al. 2009). In this instructive mechanism, a mature messenger RNA (mRNA) that is transcribed Genome Research 1315
Duplication of noncoding RNAs Suffice it to add in this review that studies pertaining to the origin of novel genes from duplicated DNA segments have begun to be extended beyond the traditionally studied protein-coding genes, thanks to the rapid recent advances in the genomics field. For example, it has become clear that microRNAs (miRNAs), small RNA molecules that have emerged as major post-transcriptional regulators (Carthew and Sontheimer 2009), have expanded and functionally diversified during evolution by gene duplication (Hertel et al. 2006). Interestingly, several individual studies indicate that the X chromosome may provide a particularly fruitful ground for the origination of new lineage-specific miRNAs (Zhang et al. 2007; Devor and Samollow 2008; Murchison et al. 2008; Guo et al. 2009), a pattern that may be explained by the specific sex-related forces that have shaped the X, given that new X-born miRNAs appear to be predominantly expressed in male-reproductive tissues. Segmental gene duplication also seems to play a major role for the expansion of another class of small RNAs, Piwi-interacting RNAs (piRNAs, Malone and Hannon 2009), which are expressed in the germline and are thought to be mainly involved in transposon control. A recent study revealed that piRNA clusters rapidly expanded through segmental duplication in primate and rodent genomes, a process driven by intense positive selection (Assis and Kondrashov 2009). Segmental duplication therefore provides an efficient vehicle for the expansion of piRNA repertoires and hence allows organisms to swiftly evolve protection barriers against the lineage-specific expansion of transposable elements. There is so far little evidence for duplication of sequences transcribed into long noncoding RNAs (lncRNAs), an abundant class of nontranslated RNAs (>200 nucleotides [nt] in length), whose functional impact is only beginning to be understood (Mercer et al. 2009; Ponting et al. 2009). The paucity of known duplicated lncRNA genes is perhaps mainly due to their rapid sequence divergence, which may render the detection of such events difficult. Future work, which will benefit from the rapidly accumulating genomic and transcriptomic data, will clarify the role of gene duplication in the evolution of new lncRNA genes with altered or novel functions. Global patterns In spite of the numerous well-founded examples of functionally important newly minted genes that arose from duplicate gene copies, a more global picture of the functional relevance and adaptive value of the large number of duplicate gene copies scattered in genomes is only beginning to emerge. Only for some whole-genome duplication (WGD) events in model organisms (in particular yeast), global assessments of the relevance of duplicate genes for the emergence of new gene functions have been attempted (Conant and Wolfe 2008). However, WGD represents a special case of gene duplication, which involves specific selective pressures related to dosage balance of gene products that seem to significantly influence the fate of resulting gene duplicates. And even in the case of WGD, it remains largely unclear whether gene duplications often conferred novel functions or not (Conant and Wolfe 2008). Thus, a more global understanding of the implications of gene duplication for the emergence of new gene functions and its importance relative to other mutational mechanisms that affect preexisting genes will have to await future efforts. However, a closer examination of the reported general distributions and characteristics of gene duplicates in different genomes is nevertheless instructive. For example, analyses of fully sequenced genomes have revealed high rates of origin but also loss of duplicate genes (Lynch and Conery 2003; Demuth and Hahn 2009). New duplicates are estimated to be ‘‘born’’ at the rate of ;0.001–0.01 per gene per million years in eukaryotes (Lynch and Conery 2003; Lynch 2007), while the death rate of duplicates is at least an order of magnitude higher, consistent with the early notion (see above) that the fate of most duplicates is pseudogenization (Ohno 1972). Notably, not all functional categories of genes are equally prone to expand by duplication. In particular, a relatively small number of gene families (1.6%–3%) with functions in, for example, immunity, host defense, chemosensation, and reproduction, show rapid, selectively driven copy number changes in various eukaryotic lineages, thus significantly contributing to their adaptive evolution (Emes et al. 2003; Demuth and Hahn 2009). However, in addition to these commonalities, detailed wholegenome investigations also suggest intriguing fundamental differences with respect to the generation and functional fate of duplicates in different evolutionary lineages. For example, careful analyses in primates revealed a burst of segmental gene duplication in hominoids (humans and apes), especially in humans and the African apes (Marques-Bonet and Eichler 2009). Notably, many of these duplicates are dispersed and mediate major genomic rearrangements associated with disease. The accelerated fixation rate of segmental duplicons in hominoids could, in principle, be explained by the selective benefit of newly formed genes embedded within these regions, which outweigh deleterious effects in many cases (Marques-Bonet et al. 2009b). New gene formation in hominoids indeed seems to have profited from the substantial raw material provided by massive segmental duplication (MarquesBonet et al. 2009b; see below). However, the overall accelerated fixation rate of segmental duplicons in humans and apes is probably best explained by the reduction of the effective population size in the hominoid lineage. This reduction increased genetic drift and, at the same time, rendered purifying selection less efficient, thus probably allowing disproportionately high numbers of slightly deleterious segmental duplications to be fixed in hominoids compared with other species with larger long-term effective population sizes (and hence more efficient selection). This hypothesis is consistent with other types of molecular evolutionary data (Keightley et al. 2005; Gherman et al. 2007). In addition to lineage-specific selection intensities, differences pertaining to the mutational basis of gene duplication can lead to different characteristics of segmental duplications between species. A good example is the finding that, in contrast to humans, recently duplicated chromosomal regions in the mouse are depleted in genes and transcripts (She et al. 2008). Detailed analyses suggest that species-specific distributions of retrotransposons, which represent major promoters of segmental duplication events (Marques-Bonet et al. 2009a), account for much of this discrepancy. RNA-based duplication and the emergence of ‘‘stripped-down’’ new genes As outlined above, the traditionally studied DNA-mediated gene duplication mechanisms have significantly contributed to functional genome evolution and have provided many fundamental insights regarding new gene origination. However, new gene copies can also arise through an alternative, less well known duplication mechanism termed retroposition or retroduplication (Brosius 1991; Long et al. 2003; Kaessmann et al. 2009). In this mechanism, a mature messenger RNA (mRNA) that is transcribed Evolution of new genes Genome Research 1315 www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press
Downloaded from genome. cshlporg on June 20, 2011-Published by Cold Spring Harbor Laboratory Press Kaessmann from a"parental"source gene is reverse transcribed into a com- likely to be redundant) than gene copies arising from DNA-based plementary DNA copy, which is then inserted into the genome duplication mechanisms. Indeed, a number of new retrogenes (Fig. 1B). The enzymes necessary for retroposition(in particular with intriguing functions have been identified. Detailed analyses the reverse transcriptase)are encoded by different retrotranspos. of these retrogenes uncovered novel mechanisms underlying the able elements in different species. In mammals, LINE-1 retro- emergence of new gene functions. For example, analyses of young transposons provide the required enzymatic machinery(Mathias retrogenes in primates not only revealed that retrogenes have et al. 1991; Feng et al. 1996; Esnault et al. 2000). Given that the contributed to hominoid brain evolution, but dentified dif- resulting intronless retroposed gene copies (retrocopies) only ferent molecular levels at which new genes may adapt to new contain the parental exon information (i.e,, they usually lack pa- functions Namely, in addition to evolving new spatial expressic rental introns and core promoter sequences), retrocopies were long patterns relative to the parental source genes, the proteins encoded thought to be consigned to the scrapheap of genome evolution by these retrogenes evolved new biochemical properties (Burki and and were routinely labeled as"processed pseudogenes"(Mighell Kaessmann 2004)and/or subcellular localization patterns(Burki et al. 2000). However, after anecdotal findings of individual func- and Kaessmann 2004; Rosso et al. 2008a, b). The latter process, tional retrocopies(so-called retrogenes) in the 1980s and 1990s, a dubbed subcellular adaptation or rele n, could be estab- uprising number of retrogenes could be discovered with the ad- lished and generalized as a new trajectory for the evolution of new vent of the genomics era. Notably, detailed analyses of this strip- gene functions after these observations (Marques et al. 2008 ped-down type of new genes have revealed previously unknown Kaessmann et al. 2009) echanisms underlying the appearance of new genes and their Other interesting retrogenes have recently been unveiled that functions and demonstrated that new retrogenes have contributed exemplify the sometimes unexpected and curious pathways of to the appearance of lineage-specific phenotypic innovations evolutionary change. An example is a mouse retrocopy of a ribo- Kaessmann et al. 2009 somal protein gene(Rps23), of which there are hundreds in mammalian g Sources of regulatory elements retropseudogenes, consistent with the idea that duplication of The observation of numerous functional retrogenes in various these genes is usually redundant and/or is subject to dosage bal- genomes(detailed below) immediately raises the question of how ance constraints. Yet the Rps23 retrocopy evolved a completely w function, not by changes in the pr retrocopies can obtain regulatory sequences that allow them to by being transcribed from the reverse strand and the incorporation become transcribed-a precondition for gene functionality. Stud- es that sought to address this question uncovered various sources of sequences flanking its insertion site as new(coding and of retrogene promoters and regulators and therefore also provided oding) exons(Zhang et al. 2009). This gave rise to a new protein general insights into how new genes can acquire promoters and (completely unrelated to that encoded by its parental gene), which had profound functional implications in that it conferred in- evolve new expression patterns(Kaessmann et al. 2009) First, it creased resistance in mice against the formation of Alzheimer- was shown that the expression of new retrogenes often benefits from preexisting regulatory machinery and expression capacities causing amyloid plaques. of genes in their vicinity. Thus, retrogenes profited from the open illustrates the far-reaching and immediate phenotypic conse. hearby genes, directly fused to host genes into which they inserted quences a retroduplication event may have. Parker et al.(2009) found that a retrocopy derived from a growth factor gene(fgf4)is (also see below), or captured bidirectional promoters of genes solely responsible for the short-legged phenotype characteristic of in their proximity (Vinckenbosch et al. 2006; Fablet et al. 2009; everal common dog breeds. Remarkably, the phenotypic impact Kaessmann et al. 2009). Second, retrogenes recruited CpG di- of the fsf 4 retrogene seems to be a rather direct consequence of the nucleotide-enriched proto-promoter sequences in their genomic vicinity not previously associated with other genes for their tran- FGF4 expression during bone development), given that its coding et al. 2009). Fourth, unexpectedly, retrogenes also seem to fre- immediately lead to phenotypic innovation (in this case o r s of of retrocopy insertion sites were shown to have provided retro- sequence is identical to that of its parental gene. The analy genes with regulatory potential(Zaiss and Kloetzel 1999; Fablet morphological trait)merely thro parental transcripts that gave rise to them(Okamura and Nakai 2008; Kaessmann et al. 2009). Finally, basic retrogene promoters Retrogenes and meiotic sex chromosome inactivation may sometimes have evolved de novo through small substitu- tional changes under the influence of natural selection(Betran and Numerous other illuminating cases of retrogenes known to 2007), Remarkably, the process of promoter and flies to plants have recently been described (for review, se lated exon-intron structure Kaessmann et al. 2009). However, global surveys of retroposition distances between the recruited promoters and retrogene insertion conducted in mammals and fruit flies have also identified a com- sites(Fablet et al. 2009) mon theme uniting a significant subset of new retrogenes in these species: expression and functionality in testes. while these retro- genes seem to have evolved a variety of functional roles(a process New retrogene functions hat may have a mechanistic basis and was likely influenced by Given that retrocopies usually need to acquire regulatory elements sexual selection, see below), the functions of a disproportionately for their transcription, retrocopies that eventually do become high number among them are apparen associated with the transcribed-a surprisingly frequent event (Vinckenbosch et al. transcriptional inactivation of the sex chromosomes in the male 2006) -are much more prone to evolve novel functions(and less germline during and(to a lesser extent)after meiosis (Turner 2007) 1316 Genome
from a ‘‘parental’’ source gene is reverse transcribed into a complementary DNA copy, which is then inserted into the genome (Fig. 1B). The enzymes necessary for retroposition (in particular the reverse transcriptase) are encoded by different retrotransposable elements in different species. In mammals, LINE-1 retrotransposons provide the required enzymatic machinery (Mathias et al. 1991; Feng et al. 1996; Esnault et al. 2000). Given that the resulting intronless retroposed gene copies (retrocopies) only contain the parental exon information (i.e., they usually lack parental introns and core promoter sequences), retrocopies were long thought to be consigned to the scrapheap of genome evolution and were routinely labeled as ‘‘processed pseudogenes’’ (Mighell et al. 2000). However, after anecdotal findings of individual functional retrocopies (so-called retrogenes) in the 1980s and 1990s, a surprising number of retrogenes could be discovered with the advent of the genomics era. Notably, detailed analyses of this stripped-down type of new genes have revealed previously unknown mechanisms underlying the appearance of new genes and their functions and demonstrated that new retrogenes have contributed to the appearance of lineage-specific phenotypic innovations (Kaessmann et al. 2009). Sources of regulatory elements The observation of numerous functional retrogenes in various genomes (detailed below) immediately raises the question of how retrocopies can obtain regulatory sequences that allow them to become transcribed—a precondition for gene functionality. Studies that sought to address this question uncovered various sources of retrogene promoters and regulators and therefore also provided general insights into how new genes can acquire promoters and evolve new expression patterns (Kaessmann et al. 2009). First, it was shown that the expression of new retrogenes often benefits from preexisting regulatory machinery and expression capacities of genes in their vicinity. Thus, retrogenes profited from the open chromatin state and accessory regulators (enhancers/silencers) of nearby genes, directly fused to host genes into which they inserted (also see below), or captured bidirectional promoters of genes in their proximity (Vinckenbosch et al. 2006; Fablet et al. 2009; Kaessmann et al. 2009). Second, retrogenes recruited CpG dinucleotide-enriched proto-promoter sequences in their genomic vicinity not previously associated with other genes for their transcription (Fablet et al. 2009). Third, retrotransposons upstream of retrocopy insertion sites were shown to have provided retrogenes with regulatory potential (Zaiss and Kloetzel 1999; Fablet et al. 2009). Fourth, unexpectedly, retrogenes also seem to frequently have directly inherited alternative promoters embedded in parental transcripts that gave rise to them (Okamura and Nakai 2008; Kaessmann et al. 2009). Finally, basic retrogene promoters may sometimes have evolved de novo through small substitutional changes under the influence of natural selection (Betran and Long 2003; Bai et al. 2007). Remarkably, the process of promoter acquisition sometimes involved the evolution of new 59 untranslated exon–intron structures, which span the often substantial distances between the recruited promoters and retrogene insertion sites (Fablet et al. 2009). New retrogene functions Given that retrocopies usually need to acquire regulatory elements for their transcription, retrocopies that eventually do become transcribed—a surprisingly frequent event (Vinckenbosch et al. 2006)—are much more prone to evolve novel functions (and less likely to be redundant) than gene copies arising from DNA-based duplication mechanisms. Indeed, a number of new retrogenes with intriguing functions have been identified. Detailed analyses of these retrogenes uncovered novel mechanisms underlying the emergence of new gene functions. For example, analyses of young retrogenes in primates not only revealed that retrogenes have contributed to hominoid brain evolution, but also identified different molecular levels at which new genes may adapt to new functions. Namely, in addition to evolving new spatial expression patterns relative to the parental source genes, the proteins encoded by these retrogenes evolved new biochemical properties (Burki and Kaessmann 2004) and/or subcellular localization patterns (Burki and Kaessmann 2004; Rosso et al. 2008a,b). The latter process, dubbed subcellular adaptation or relocalization, could be established and generalized as a new trajectory for the evolution of new gene functions after these observations (Marques et al. 2008; Kaessmann et al. 2009). Other interesting retrogenes have recently been unveiled that exemplify the sometimes unexpected and curious pathways of evolutionary change. An example is a mouse retrocopy of a ribosomal protein gene (Rps23), of which there are hundreds in mammalian genomes and that usually represent nonfunctional retropseudogenes, consistent with the idea that duplication of these genes is usually redundant and/or is subject to dosage balance constraints. Yet the Rps23 retrocopy evolved a completely new function, not by changes in the protein-coding sequence, but by being transcribed from the reverse strand and the incorporation of sequences flanking its insertion site as new (coding and noncoding) exons (Zhang et al. 2009). This gave rise to a new protein (completely unrelated to that encoded by its parental gene), which had profound functional implications in that it conferred increased resistance in mice against the formation of Alzheimercausing amyloid plaques. Another intriguing recent case of new retrogene formation illustrates the far-reaching and immediate phenotypic consequences a retroduplication event may have. Parker et al. (2009) found that a retrocopy derived from a growth factor gene (fgf4) is solely responsible for the short-legged phenotype characteristic of several common dog breeds. Remarkably, the phenotypic impact of the fgf4 retrogene seems to be a rather direct consequence of the gene dosage change associated with its emergence (i.e., increased FGF4 expression during bone development), given that its coding sequence is identical to that of its parental gene. The analysis of fgf4 in dogs thus strikingly illustrates that gene duplication can immediately lead to phenotypic innovation (in this case a new morphological trait) merely through gene dosage alterations. Retrogenes and meiotic sex chromosome inactivation Numerous other illuminating cases of retrogenes known to have evolved diverse functions in species ranging from primates and flies to plants have recently been described (for review, see Kaessmann et al. 2009). However, global surveys of retroposition conducted in mammals and fruit flies have also identified a common theme uniting a significant subset of new retrogenes in these species: expression and functionality in testes. While these retrogenes seem to have evolved a variety of functional roles (a process that may have a mechanistic basis and was likely influenced by sexual selection, see below), the functions of a disproportionately high number among them are apparently associated with the transcriptional inactivation of the sex chromosomes in the male germline during and (to a lesser extent) after meiosis (Turner 2007). Kaessmann 1316 Genome Research www.genome.org Downloaded from genome.cshlp.org on June 20, 2011 - Published by Cold Spring Harbor Laboratory Press