lipid biosynthesis;rec/,single-strand-specific endonuclease RecJ,which is another recombination protein. 2.3.3 Eukaryotic genomes 2.3.3.1 Sturcture characteristic.The human genome is split into two components: the nuclear genome and the mitochondrial genome.This is the typical pattern for ekryoes,the bulk ofthe genomeingcndn the hromome cell nucleus and a much smaller dria and in the case of photosynthetic organisms,in the chloroplasts Numerous differences exist between prokaryotic and eukaryotic cellular and genomic organization.The most striking differences are the following: 1.The eukaryotic gene cell contains a membranous nucleus which includes the chromosomes as carriers of genes.However,there is a chromosome in prokaryotic organisms. 2.The chromosomal DNA of eukaryotes is packaged into a highly regular nucleoprotein complex called chromatin. 3.In eukaryotes(particularly those with a large genome)many DNA segments are repeated hundreds,thousands,or even millions of times.The function of these highly repeated -if there any tobe elucided.In contrast,mos prokaryotic genomes contain essentially no repetitive sequences,except for a few repeats of ribosomal RNA(rRNA)and transfer RNA(tRNA genes. 4.A typical eukaryotic gene is monocistronic(i.e.,one transcription unit usually encodes one translation unit)prokarvotes however often have several translation anized in a single omp e tra ription unit(polyc on units of one operon usua lly enc teins involved n the same regulatory pathway.In contrast,eukaryotic genes,even if they belong to the same regulatory pathway,are usually not grouped closed on the same part of the chromosome 5 a characteristic feature of m ost eukaryotic genes is that they are split( amino acid s q gene product is us lly not c olline rwith the information).The noncoding sequences(socalled introns)need to be"spliced out"in a complicated process that generates mature messenger RNA(mRNA). 2.3.3.2 C-value paradox.The genome of most prokaryotes consists of one chromosome,while most eukaryotes have a diploid number of chromosomes.A genome is the infor in an loid chr ne set.The total amount of DNA in the haploid genome of a species is its Cvalue.There is normous variation in the range of C-values.from <106 bp for a mycoplasma to >1011 bp for some plants and amphibians. We can see the steady increase in genome size with complexity in the listing in Figure24 of some of the most comm nly analyzed org anism s.It is necessary to increase the genome size in order to make in sects,birds or amphibians,and mammal However,after this point there is no good relationship between genome size and morphological complexity of the organism. 6
16 lipid biosynthesis; recJ, single-strand-specific endonuclease RecJ, which is another recombination protein. 2.3.3 Eukaryotic genomes 2.3.3.1 Sturcture characteristic. The human genome is split into two components: the nuclear genome and the mitochondrial genome. This is the typical pattern for most eukaryotes, the bulk of the genome being contained in the chromosomes in the cell nucleus and a much smaller part located in the mitochondria and, in the case of photosynthetic organisms, in the chloroplasts. Numerous differences exist between prokaryotic and eukaryotic cellular and genomic organization. The most striking differences are the following: 1. The eukaryotic gene cell contains a membranous nucleus which includes the chromosomes as carriers of genes. However, there is a chromosome in prokaryotic organisms. 2. The chromosomal DNA of eukaryotes is packaged into a highly regular nucleoprotein complex called chromatin. 3. In eukaryotes( particularly those with a large genome) many DNA segments are repeated hundreds, thousands, or even millions of times. The function of these highly repeated sequences –if there is any –has jet to be elucided. In contrast, most prokaryotic genomes contain essentially no repetitive sequences, except for a few repeats of ribosomal RNA (rRNA) and transfer RNA (tRNA ) genes. 4. A typical eukaryotic gene is monocistronic (i.e., one transcription unit usually encodes one translation unit). Prokaryotes, however, often have several translation units organized in a single operon comprising one transcription unit (polycistronic gene).The translation units of one operon usually encode different proteins involved in the same regulatory pathway. In contrast, eukaryotic genes, even if they belong to the same regulatory pathway, are usually not grouped closed on the same part of the chromosome. 5. A characteristic feature of most eukaryotic genes is that they are split (.i.e., the amino acid sequence of the gene product is ususlly not collinear with the genetic information). The noncoding sequences (socalled introns) need to be “spliced out” in a complicated process that generates mature messenger RNA (mRNA). 2.3.3.2 C-value paradox. The genome of most prokaryotes consists of one chromosome, while most eukaryotes have a diploid number of chromosomes. A genome is the information in one complete haploid chromosome set. The total amount of DNA in the haploid genome of a species is its C value. There is enormous variation in the range of C-values, from <106 bp for a mycoplasma to >1011 bp for some plants and amphibians. We can see the steady increase in genome size with complexity in the listing in Figure 2.4 of some of the most commonly analyzed organisms. It is necessary to increase the genome size in order to make insects, birds or amphibians, and mammals. However, after this point there is no good relationship between genome size and morphological complexity of the organism
Useful genome sizes Phylum Species Genome (bp salina 66 mold 7 8.0 aster phibian Mammal Figure 2.4 The genome sizes of some common experimental animals. We know that genes are much larger than the sequences needed to code for proteins,because exons(coding regions)may comprise only a small part of the total length of a gene).This explains why there is much more DNA than is needed to provide reading frames for all the proteins of the organism.Large parts of an e e may not be cor cerned with coding for protein And there may also be ant ler overall size of the genome anything about the number of genes. The C-value paradox refers to the lack of correlation between genome size and genetic complexity.There are some extremely curious variations in relative genome ize The toad xe mopus and man have genomes of essentially the same size.But we that man ecomple: terms o tic deve nt!And in som phyla there are extremely large variations in DNA content between organisms that do not vary much in complexity.(This is especially marked in insects,amphibians,and plants,but does not occur in birds,reptiles,and mammals,which all show little variation within the group,with an~2x range of genome sizes.)A cricket has a of a fruit fly In amp hibians smallest g nome are <109 bp er1011bp.There is unlikely to bealarn number of genes needed to specify these amphibians.We do not understand why natural selection allows this variation and whether it has evolutionary consequences 2.3.3.3 DNA types.The general nature of the eukaryotic genome can be assessed by the kinetics ofr fdenatured DNA.This techni que was used extensively before large scale DNA seq ncing became possi between complementary sequences of DNA occurs by base pairing.This reverses the process of denaturation by which they were separated.The kinetics of the reassociation reaction reflect the variety of sequences that are present;so the reaction can be used to quantitate genes and their RNA products.Reassociation kinetics identify two ge types sequences rep titive DNA consists of sequences that are unique:there is only one copy in a haploid genome; Repetitive DNA describes sequences that are present in more than one copy in each genome. Repetitive dna is often divided into two general types:moderately repetitive
17 Figure 2.4 The genome sizes of some common experimental animals. We know that genes are much larger than the sequences needed to code for proteins, because exons (coding regions) may comprise only a small part of the total length of a gene). This explains why there is much more DNA than is needed to provide reading frames for all the proteins of the organism. Large parts of an interrupted gene may not be concerned with coding for protein. And there may also be significant lengths of DNA between genes. So it is not possible to deduce from the overall size of the genome anything about the number of genes. The C-value paradox refers to the lack of correlation between genome size and genetic complexity. There are some extremely curious variations in relative genome size. The toad xenopus and man have genomes of essentially the same size. But we assume that man is more complex in terms of genetic development! And in some phyla there are extremely large variations in DNA content between organisms that do not vary much in complexity. (This is especially marked in insects, amphibians, and plants, but does not occur in birds, reptiles, and mammals, which all show little variation within the group, with an ~2× range of genome sizes.) A cricket has a genome 11× the size of a fruit fly. In amphibians, the smallest genomes are <109 bp, while the largest are ~1011 bp. There is unlikely to be a large difference in the number of genes needed to specify these amphibians. We do not understand why natural selection allows this variation and whether it has evolutionary consequences. 2.3.3.3 DNA types. The general nature of the eukaryotic genome can be assessed by the kinetics of reassociation of denatured DNA. This technique was used extensively before large scale DNA sequencing became possible. Reassociation between complementary sequences of DNA occurs by base pairing. This reverses the process of denaturation by which they were separated. The kinetics of the reassociation reaction reflect the variety of sequences that are present; so the reaction can be used to quantitate genes and their RNA products. Reassociation kinetics identify two general types of genomic sequences: Nonrepetitive DNA consists of sequences that are unique: there is only one copy in a haploid genome; Repetitive DNA describes sequences that are present in more than one copy in each genome. Repetitive DNA is often divided into two general types: Moderately repetitive
DNA consists of relatively short sequences that are repeated typically 10-1000x in the genome.The sequences are dispersed throughout the genome,and are responsible for the high degree of structure formation in pre-mRNA,when(inverted) repeats in the introns pair to form duplex regions Highly repetitive DNA consists of very short sequences (typically <100 bp)that are present many thousands of times in the genome,often organized as long tandem repeats.Neither class represents protein. Tandemly rep eated DNA is a mon feature of eukaryotic nes but is found because DNA fragments containing tandemly repeated sequences form 'satellite bands when genomic dna is fractionated by density gradient centrifugation for example,when broken into fragments 50-100 kb in length,human DNA forms a main band(buoyant density 1.701gcm)and three satellite bands(1.687,1.693 and 1.697 g cm).The main n band DNA fragments made up mostly of -copy sequences with GC compositions close to 40.3%,the average value for the human genome.The satellite bands contain fragments of repetitive DNA.and hence have Go contents and buoyant densities that are atypical of the genome as a whole. Satellite DNA is found at centromeres and elsewhere in eukaryotic chromosomes The satellite bands in density gradients of eukary otic DNA re mad eup of fragment composed of long series of tandem repeats,possibly hundreds of kb in length.A single genome can contain several different types of satellite DNA,each with a different repeat unit,these units being anything from<5 to>200 bp.The three satellite bands in human DNA include at least four different repeat types. minisatellite and microsatellites Alth ugh not a g in satellite bands on densit gradie 1S. tw other types of tandem nly repeate d DNA are also ed a satellite'DNA.These are minisatellites and microsatellites.Minisatellites form clusters up to 20 kb in length,with repeat units up to 25 bp;microsatellite clusters are shorter,usually <150 bp,and the repeat unit is usually 13 bp or less. Minisatellite DNA is a second type of repetitive DNA that we are already familiar with its associatio with struct ral features mos e Telomeric DNA,which in humans comprises hundreds of copies of the motif 5-TTAGGG-3'is an example of a minisatellite.We know a certain amount about how telomeric DNA is formed,and we know that it has an important function in DNA replication.In addition to telomeric minisatellites,some eukaryotic genomes contain arious other cluste atellite dna although not all,near the ends of chromosomes.The functions of these other minisatellite sequences have not been identified The function of microsatellites is equally mysterious.The typical microsatellite consists of a 1-,2-,3-or 4-bp unit repeated 10-20 times,as illustrated by the microsatellites in the human BT-cell receptor locus.Although each microsatellite is reatively short,thereare many of themn the genome In humns,for exampe microsatellites with a CA repeat,such as. 5-CACACACACACACAC-3 3'-GTGTGTGTGTGTGTG-5 1
18 DNA consists of relatively short sequences that are repeated typically 10-1000× in the genome. The sequences are dispersed throughout the genome, and are responsible for the high degree of secondary structure formation in pre-mRNA, when (inverted) repeats in the introns pair to form duplex regions. Highly repetitive DNA consists of very short sequences (typically <100 bp) that are present many thousands of times in the genome, often organized as long tandem repeats. Neither class represents protein. Tandemly repeated DNA is a common feature of eukaryotic genomes but is found much less frequently in prokaryotes. This type of repeat is also called satellite DNA because DNA fragments containing tandemly repeated sequences form ‘satellite' bands when genomic DNA is fractionated by density gradient centrifugation. For example, when broken into fragments 50–100 kb in length, human DNA forms a main band (buoyant density 1.701 g cm-3) and three satellite bands (1.687, 1.693 and 1.697 g cm-3). The main band contains DNA fragments made up mostly of single-copy sequences with GC compositions close to 40.3%, the average value for the human genome. The satellite bands contain fragments of repetitive DNA, and hence have GC contents and buoyant densities that are atypical of the genome as a whole. Satellite DNA is found at centromeres and elsewhere in eukaryotic chromosomes. The satellite bands in density gradients of eukaryotic DNA are made up of fragments composed of long series of tandem repeats, possibly hundreds of kb in length. A single genome can contain several different types of satellite DNA, each with a different repeat unit, these units being anything from < 5 to > 200 bp. The three satellite bands in human DNA include at least four different repeat types. Minisatellites and microsatellites. Although not appearing in satellite bands on density gradients, two other types of tandemly repeated DNA are also classed as ‘satellite' DNA. These are minisatellites and microsatellites. Minisatellites form clusters up to 20 kb in length, with repeat units up to 25 bp; microsatellite clusters are shorter, usually < 150 bp, and the repeat unit is usually 13 bp or less. Minisatellite DNA is a second type of repetitive DNA that we are already familiar with because of its association with structural features of chromosomes. Telomeric DNA, which in humans comprises hundreds of copies of the motif 5′-TTAGGG-3′, is an example of a minisatellite. We know a certain amount about how telomeric DNA is formed, and we know that it has an important function in DNA replication. In addition to telomeric minisatellites, some eukaryotic genomes contain various other clusters of minisatellite DNA, many, although not all, near the ends of chromosomes. The functions of these other minisatellite sequences have not been identified. The function of microsatellites is equally mysterious. The typical microsatellite consists of a 1-, 2-, 3- or 4-bp unit repeated 10–20 times, as illustrated by the microsatellites in the human β T-cell receptor locus. Although each microsatellite is relatively short, there are many of them in the genome . In humans, for example, microsatellites with a CA repeat, such as:
make up 0.25%of the genome,8 Mb in all.Single base-pair repeats such as: 3-110ii11T-3 make up another 0.15%.Although their function,if any,is unknown, microsatellites have proved very useful to geneticists.Many microsatellites are variable,meaning that the number of repeat units in the array is different in different members of a sp cies Clusters and repeats.Aset of genes descended by duplication and variaion from some ancestral gene is called a gene family.Its members may be clustered together or dispersed on different chromosomes(or a combination of both).Genome analysis shows that many genes belong to families;the 40,000 genes identified in the human genome fall into 15,000 families,so the average gene has a couple of relatives in the genome.Gene families vary enormously in degree of related members,from those consisting of multiple identical members to those where the relationship is quite distant.Genes are usually related only by their exons,with introns having diverged.Genes may also be related by only some of their exons,while others are unique.The members of a well-related structural gene family usually have related r at ugh they may be expre ssed at diffe ent timesorin ifferent globin proteins are expressed in embryonic and adul red blood cells,while different actins are utilized in muscle and nonmuscle cells. When genes have diverged significantly,or when only some exons are related,the proteins may have different functions.Some gene families consist of identical members.A gene cluster is a group of adjacent genes that are identical or related. site ntaining identity between genes,although clustered genes are not necessarily identical Gene clusters range from extremes where a duplication has generated two adjacent related genes to cases where hundreds of identical genes lie in a tandem array. Extensive tandem repetition of a gene may occur when the product is needed in unusually large unts.Exampl are the g gen es for rRNA or histone pr oteins This reates a special situation with regards to the maintenance of identity and the effects of selective pressure. 2.3.3.4 The interrupted gene.Eukaryotic genomes contain interrupted genes in which exons(represented in the final RNA product)alternate with introns(removed from the initial transcript).All types of eukaryotic genomes contain interrupted genes.The proporti interrupted genes is low in yeasts and increases in the low eukaryotes:few genes are uninterrupted in higher eukaryotes.Introns are found in al classes of eukaryotic genes.The structure of the interrupted gene is the same in all tissues,exons are joined together in RNA in the same order as their organization in DNA and the introns usually have no coding function introns are removed from RNA by splicing Some genes are expressed by alterative splicing g patte as an exon in others.Positions of introns are often conserved when the organization of homologous genes is compared between species.Intron sequences vary,and may even be unrelated.although exon sequences remain well related.The conservation of
19 make up 0.25% of the genome, 8 Mb in all. Single base-pair repeats such as: make up another 0.15%. Although their function, if any, is unknown, microsatellites have proved very useful to geneticists. Many microsatellites are variable, meaning that the number of repeat units in the array is different in different members of a species. Clusters and repeats. A set of genes descended by duplication and variation from some ancestral gene is called a gene family. Its members may be clustered together or dispersed on different chromosomes (or a combination of both). Genome analysis shows that many genes belong to families; the 40,000 genes identified in the human genome fall into ~15,000 families, so the average gene has a couple of relatives in the genome. Gene families vary enormously in the degree of relatedness between members, from those consisting of multiple identical members to those where the relationship is quite distant. Genes are usually related only by their exons, with introns having diverged. Genes may also be related by only some of their exons, while others are unique. The members of a well-related structural gene family usually have related or even identical functions, although they may be expressed at different times or in different cell types. So different globin proteins are expressed in embryonic and adult red blood cells, while different actins are utilized in muscle and nonmuscle cells. When genes have diverged significantly, or when only some exons are related, the proteins may have different functions. Some gene families consist of identical members. A gene cluster is a group of adjacent genes that are identical or related. Clustering is a prerequisite for maintaining identity between genes, although clustered genes are not necessarily identical. Gene clusters range from extremes where a duplication has generated two adjacent related genes to cases where hundreds of identical genes lie in a tandem array. Extensive tandem repetition of a gene may occur when the product is needed in unusually large amounts. Examples are the genes for rRNA or histone proteins. This creates a special situation with regards to the maintenance of identity and the effects of selective pressure. 2.3.3.4 The interrupted gene. Eukaryotic genomes contain interrupted genes in which exons (represented in the final RNA product) alternate with introns (removed from the initial transcript). All types of eukaryotic genomes contain interrupted genes. The proportion of interrupted genes is low in yeasts and increases in the lower eukaryotes; few genes are uninterrupted in higher eukaryotes. Introns are found in all classes of eukaryotic genes. The structure of the interrupted gene is the same in all tissues, exons are joined together in RNA in the same order as their organization in DNA, and the introns usually have no coding function. Introns are removed from RNA by splicing. Some genes are expressed by alternative splicing patterns, in which a particular sequence is removed as an intron in some situations, but retained as an exon in others. Positions of introns are often conserved when the organization of homologous genes is compared between species. Intron sequences vary, and may even be unrelated, although exon sequences remain well related. The conservation of
exons can be used to isolate related genes in different species.The size of a gene is determined primarily by the lengths of its introns.Introns become larger early in the higher eukaryotes,when gene sizes theref significantly The range of gene sizes in mammals is generally from -1kbbut it is possible to have even larger genes:the longest known case is dystrophin at 2000 kb.Some genes share only some of their exons with other genes,suggesting that they have been assembled by addition of exons representing individual modules of the protein.such modules have been inc orated int a variety of differe t proteins.The idea that genes nassembled by accretion of exons implies that introns were present in gene of primitive organisms.Some of the relationships between homologous genes can be explained by loss of introns from the primor. 2.3.3.5 Gene family.gene family is a set of genes with a known homology.They way into families. Phylogenetic techniques can be used as a more rigorous test.The positions of introns within the coding sequence can be used to infer common ancestry.Knowing the sequence ofthe protein encoded by a gene can allow researchers to apply methods that find similarities among protein sequences that provide more info ation than similarities or differences among DNA sequences.F rt ermore,knowledge of the protein's secondary structure gives further information about ancestry,since the organization of secondary structural elements presumably would be conserved even if the amino acid sequence changes considerably.These methods often rely upon predictions based upon the DNA sequence. If the ger nes of a gene family encode proteins,the term protein family is ofter used in an analogous manner to gene family 2.3.3.6 Non-coding sequences.In genetics,non-coding DNA describes DNA which does not contain instructions for making proteins(or other cell products such as noncoding RNAs).In eukaryotes,a large percentage of many organisms'total igma").Some nonco ing DNA is involved in regulating the ity of coding regions.However,much of this DNA has no known function and is sometimes referred to as "junk DNA". Recent evidence suggests that"junk DNA"may in fact be employed by proteins created from coding DNA.An experiment concerning the relationship between introns and coded vided evidence for a theory that"junk DNA"is just as of damaging a portion of noncoding DNA in a plant which resulted in a significant change in the leaf structure because structural proteins depended on information contained in introns.Some non-coding DNA can be a non phenotypical RNA virus historical relic. -coding RNA(ncRNA)is an RNA molecule that is not translated into a protein.A previously used synonym,particularly with bacteria,was small RNA (sRNA).However,some ncRNAs are very large Less-frequently used synonyms are non-messenger RNA(nmRNA),small non-messenger RNA(snmRNA),or functional RNA(fRNA).The DNA sequence from which a non-coding RNA is
20 exons can be used to isolate related genes in different species. The size of a gene is determined primarily by the lengths of its introns. Introns become larger early in the higher eukaryotes, when gene sizes therefore increase significantly. The range of gene sizes in mammals is generally from 1-100 kb, but it is possible to have even larger genes; the longest known case is dystrophin at 2000 kb. Some genes share only some of their exons with other genes, suggesting that they have been assembled by addition of exons representing individual modules of the protein. Such modules may have been incorporated into a variety of different proteins. The idea that genes have been assembled by accretion of exons implies that introns were present in genes of primitive organisms. Some of the relationships between homologous genes can be explained by loss of introns from the primor. 2.3.3.5 Gene family. gene family is a set of genes with a known homology. They are generally biochemically similar. Genes are categorized this way into families, depending on shared nucleotide or protein sequences. Phylogenetic techniques can be used as a more rigorous test. The positions of introns within the coding sequence can be used to infer common ancestry. Knowing the sequence of the protein encoded by a gene can allow researchers to apply methods that find similarities among protein sequences that provide more information than similarities or differences among DNA sequences. Furthermore, knowledge of the protein's secondary structure gives further information about ancestry, since the organization of secondary structural elements presumably would be conserved even if the amino acid sequence changes considerably. These methods often rely upon predictions based upon the DNA sequence. If the genes of a gene family encode proteins, the term protein family is often used in an analogous manner to gene family. 2.3.3.6 Non-coding sequences. In genetics, non-coding DNA describes DNA which does not contain instructions for making proteins (or other cell products such as noncoding RNAs). In eukaryotes, a large percentage of many organisms' total genome sizes is comprised of noncoding DNA (a puzzle known as the "C-value enigma"). Some noncoding DNA is involved in regulating the activity of coding regions. However, much of this DNA has no known function and is sometimes referred to as "junk DNA". Recent evidence suggests that "junk DNA" may in fact be employed by proteins created from coding DNA. An experiment concerning the relationship between introns and coded proteins provided evidence for a theory that "junk DNA" is just as important as coding DNA. This experiment consisted of damaging a portion of noncoding DNA in a plant which resulted in a significant change in the leaf structure because structural proteins depended on information contained in introns. Some non-coding DNA can be a non phenotypical RNA virus historical relic. A non-coding RNA (ncRNA) is any RNA molecule that is not translated into a protein. A previously used synonym, particularly with bacteria, was small RNA (sRNA). However, some ncRNAs are very large Less-frequently used synonyms are non-messenger RNA (nmRNA), small non-messenger RNA (snmRNA), or functional RNA (fRNA). The DNA sequence from which a non-coding RNA is