Reagent proteins are usually required in much lower amounts han target proteins. Some can even be purchased commercially in sufficient quantities to meet the required need. Others, because of price or the required quantit expression. But, since only small quantities are usually required (<10mg), it is possible to choose an expression system with fea- tures that will favor efficient and rapid expression. Furthermore the expression scale can be minimized. The bottom line is that reagent proteins should be the least resource intensive to produce One should avoid trying to overproduce reagent proteins or scaling them to quantities that will never be used In contrast tic pre he most demanding in terms of resource. Therapeutic proteins have intrinsic biological properties like medical drugs. The ulti mate objective for expression of a therapeutic protein is the pro- duction of clinical-grade protein approaching or exceeding gram per liter quantities. For most expression systems this is not readily achievable. Other than bacterial and yeast expression, the most robust system for producing these levels is the Chinese hamster ovary(CHO)system. Due to the lack of proper post-translational modifications(e.g, glycosylation)in bacteria and yeast, CHO cell expression is often the only choice to achieve sufficient expres sion. Examples of therapeutic proteins, produced in CHO cells, include humanized monoclonal antibodies(Trill, Shatzman, and Ganguly, 1995), tPA (tissue plasminogen activator; Spellman et al 1989), and cytokines(Sarmiento et al., 1994). In many cases months are spent selecting and amplifying lines with appropriate growth properties and expression levels to meet production criteria What Do You know about the gene and the gene product Infor homologues or orthologues, enables one to make an educated guess as to what is the best eukaryotic expression system to use Is there anything published in the literature about the gene, or is it completely uncharacterized? Do we know in what tissue he gene is expressed, based on either Northern blot analysis or by quantitative or semiquantitative RT-PCR measures? Other factors to determine are whether the protein to be expressed is secreted, cytosolic, or membrane-bound. If it is a receptor, is it a homodimer, heterodimer, multimeric, single, or multispanning 496 Trill et al
Reagent proteins are usually required in much lower amounts than target proteins. Some can even be purchased commercially in sufficient quantities to meet the required need. Others, because of price or the required quantity, may necessitate recombinant expression. But, since only small quantities are usually required (<10mg), it is possible to choose an expression system with features that will favor efficient and rapid expression. Furthermore the expression scale can be minimized. The bottom line is that reagent proteins should be the least resource intensive to produce. One should avoid trying to overproduce reagent proteins or scaling them to quantities that will never be used. Therapeutics In contrast to reagent proteins, therapeutic protein agents are the most demanding in terms of resource. Therapeutic proteins have intrinsic biological properties like medical drugs. The ultimate objective for expression of a therapeutic protein is the production of clinical-grade protein approaching or exceeding gram per liter quantities. For most expression systems this is not readily achievable. Other than bacterial and yeast expression, the most robust system for producing these levels is the Chinese hamster ovary (CHO) system. Due to the lack of proper post-translational modifications (e.g., glycosylation) in bacteria and yeast, CHO cell expression is often the only choice to achieve sufficient expression. Examples of therapeutic proteins, produced in CHO cells, include humanized monoclonal antibodies (Trill, Shatzman, and Ganguly, 1995), tPA (tissue plasminogen activator; Spellman et al., 1989), and cytokines (Sarmiento et al., 1994). In many cases months are spent selecting and amplifying lines with appropriate growth properties and expression levels to meet production criteria. What Do You Know about the Gene and the Gene Product? Information about the gene product or for that matter, its homologues or orthologues, enables one to make an educated guess as to what is the best eukaryotic expression system to use. Is there anything published in the literature about the gene, or is it completely uncharacterized? Do we know in what tissue the gene is expressed, based on either Northern blot analysis or by quantitative or semiquantitative RT-PCR measures? Other factors to determine are whether the protein to be expressed is secreted, cytosolic, or membrane-bound. If it is a receptor, is it a homodimer, heterodimer, multimeric, single, or multispanning 496 Trill et al
transmembrane receptor or anchored to the surface(e. g, through a glycosyl phosphatidylinositol phosphate(GPI linkage) Fortunately we usually have the luxury of working with genes that are at least partially characterized by their biological prop- erties. But what about the genes of unknown origin or function? In this new age of genomics, many of the genes we obtain are like" genes, belonging to large families of related genes that share only a minimal percentage of homology with a known gene Despite these similarities there is often no way to know whether the same expression and purification methods used for one ortho- logue or homologue will be effective for another. Thus one is immediately faced with the challenging prospect of having to consider multiple expression strategies in order to get the protein expressed and purified to sufficient levels in an active form, in addition to not knowing what activity to look for Can You obtain the cdnA? Before embarking on an expression project you will need to locate a CDNA copy of the gene of interest. It is also possible in theory to express genomic DNA containing introns, provided that the expression host will recognize the proper splice junctions In practice, however, this is not often the most efficient route to expression because it is not usually known how the introns will affect expression levels or whether the desired splice variant will be expressed. Furthermore most mammalian genes are inter rupted by multiple intron sequences that can span many kilobases in length. This can make subcloning of genomic DNA consider ably more difficult than for the corresponding cDNA The three most common ways to obtain a known gene of interest include purchase from a distributor of clones from the Integrated Molecular Analysis of Genomes and their Expression(image)consortium(http:/image.liNlgov/),requests from a published source such as an academic lab, or RT-PCR cloning from RNa derived from a cell or tissue source. IMAGE clones can be found by performing a BLAST search of electronic database such as Gen Bank, which can be accessed at the National Library of Medicine PubMed browser (http://www.ncbinlm.nih.gov/pubmed/).Fromthereyou quickly determine if a sequence is present, if it is full ler publications related to this gene, and possible sources of the (tissue sources, personal contacts, etc). Most expressed sequence tags(EST's) matching the gene of interest are available as IMAGE clones. The trick is to find one that is full length. It is Eukaryotic Expression
transmembrane receptor or anchored to the surface (e.g., through a glycosyl phosphatidylinositol phosphate (GPI linkage). Fortunately we usually have the luxury of working with genes that are at least partially characterized by their biological properties. But what about the genes of unknown origin or function? In this new age of genomics, many of the genes we obtain are “like” genes, belonging to large families of related genes that share only a minimal percentage of homology with a known gene. Despite these similarities there is often no way to know whether the same expression and purification methods used for one orthologue or homologue will be effective for another. Thus one is immediately faced with the challenging prospect of having to consider multiple expression strategies in order to get the protein expressed and purified to sufficient levels in an active form, in addition to not knowing what activity to look for. Can You Obtain the cDNA? Before embarking on an expression project you will need to locate a cDNA copy of the gene of interest. It is also possible in theory to express genomic DNA containing introns, provided that the expression host will recognize the proper splice junctions. In practice, however, this is not often the most efficient route to expression because it is not usually known how the introns will affect expression levels or whether the desired splice variant will be expressed. Furthermore most mammalian genes are interrupted by multiple intron sequences that can span many kilobases in length. This can make subcloning of genomic DNA considerably more difficult than for the corresponding cDNA. The three most common ways to obtain a known gene of interest include purchase from a distributor of clones from the Integrated Molecular Analysis of Genomes and their Expression (IMAGE) consortium (http://image.llnl.gov/), requests from a published source such as an academic lab, or RT-PCR cloning from RNA derived from a cell or tissue source. IMAGE clones can be found by performing a BLAST search of an electronic database such as GenBank, which can be accessed at the National Library of Medicine PubMed browser (http://www.ncbi.nlm.nih.gov/PubMed/). From there you can quickly determine if a sequence is present, if it is full length, publications related to this gene, and possible sources of the gene (tissue sources, personal contacts, etc). Most expressed sequence tags (EST’s) matching the gene of interest are available as IMAGE clones. The trick is to find one that is full length. It is Eukaryotic Expression 497
easy to determine if an ESt is likely to contain a full-length sequence if it is derived from a directional oligo dT primed library and sequenced from the 5 end by searching for an ATG and an upstream stop codon. Once you identify a full-length EST, you should then be able to obtain the corresponding IMAGE clone from Incyte Genomics, Life Seq Public Incyte clones (http.//www.incyte.com/reagents/index.shtml),ResearchGenetics (http://www.resgen.com),ortheamEricanTypeCultureCollection (atcC,http://www.atcc.org).Ifthegeneispublishedyoucanalso try contacting the author who cloned it in order to obtain a cDNA clone. Most labs, including both academic and pharmaceutical/ biotech companies, will honor a request for a cDNA clone if it is published. Alternatively, you may consider deriving the gene de novo by rT-PCR using the sequence obtained above Depending on the size, abundance, and tissue distribution of the nRNA, a PCR approach could be straightforward or complex One may isolate RNA from tissue, generate cDNA from the RnA using reverse transcriptase, design PCR primers to perform PCR, and fish out the gene of interest. Alternatively, one may simply purchase a cDNA library from which to PCR amplify the gene Several vendors carry a wide array of high-quality cDNA libraries derived from human and animal tissues. For example, CDNA libraries for virtually every major human or murine tissue/organ canbeobtainedfromInvitrogen(http://www.invitrogen.com./ atalog_project/index.htmlorClontech(http://www.clontech.comm products/catalog/Libraries/index. html). These companies obtain heir samples from sources under Federal Guidelines. i Expression Vector Design and Subcloning Perhaps the most critical step in the process of expressing a gene is the vector design and subcloning. As much an art as a science, it nevertheless requires complete precision. In many cases you will need to amplify the gene by PCR from RNA. If the gene is in a library, you may also need to trim the 5 and 3 UTR (untranslated region) and to add restriction sites and/or a signal sequence if one is not already present. You may also want to add sEditor's note: In addition to the planning recommended by the authors, it is wise to ask commercial suppliers of expression systems about the existence of patents relating to the components of an expression vector(i.e, promoters) or the use of proteins produced by a patented expression vector/system Trill et al
easy to determine if an EST is likely to contain a full-length sequence if it is derived from a directional oligo dT primed library and sequenced from the 5¢ end by searching for an ATG and an upstream stop codon. Once you identify a full-length EST, you should then be able to obtain the corresponding IMAGE clone from Incyte Genomics, LifeSeq Public Incyte clones (http://www.incyte.com/reagents/index.shtml), Research Genetics (http://www.resgen.com), or the American Type Culture Collection (ATCC, http://www.atcc.org). If the gene is published, you can also try contacting the author who cloned it in order to obtain a cDNA clone. Most labs, including both academic and pharmaceutical/ biotech companies, will honor a request for a cDNA clone if it is published. Alternatively, you may consider deriving the gene de novo by RT-PCR using the sequence obtained above. Depending on the size, abundance, and tissue distribution of the mRNA, a PCR approach could be straightforward or complex. One may isolate RNA from tissue, generate cDNA from the RNA using reverse transcriptase, design PCR primers to perform PCR, and fish out the gene of interest. Alternatively, one may simply purchase a cDNA library from which to PCR amplify the gene. Several vendors carry a wide array of high-quality cDNA libraries derived from human and animal tissues. For example, cDNA libraries for virtually every major human or murine tissue/organ can be obtained from Invitrogen (http://www.invitrogen.com./ catalog_project/index.html) or Clontech (http://www.clontech.com/ products/catalog/Libraries/index.html). These companies obtain their samples from sources under Federal Guidelines.* Expression Vector Design and Subcloning Perhaps the most critical step in the process of expressing a gene is the vector design and subcloning. As much an art as a science, it nevertheless requires complete precision. In many cases you will need to amplify the gene by PCR from RNA. If the gene is in a library, you may also need to trim the 5¢ and 3¢ UTR (untranslated region) and to add restriction sites and/or a signal sequence if one is not already present. You may also want to add 498 Trill et al. *Editor’s note: In addition to the planning recommended by the authors, it is wise to ask commercial suppliers of expression systems about the existence of patents relating to the components of an expression vector (i.e., promoters) or the use of proteins produced by a patented expression vector/system
epitope tags for detection and purification(e.g, His tag). When PCR is involved, the gene will eventually need to be entirely re sequenced in order to rule out PCR-induced mutations that can occur at a low frequency. If mutations are found, they will need to be repaired, thereby adding to the time required to generate the final expression construct. The best practice is to start with a high-fidelity polymerase with a proofreading (3-5 exonuclease activity) function to avoid PCR errors. Sequence information If you are lucky enough to obtain a dnA from a known source a new litany of questions will need to be answered Is a sequence and restriction map available? Do you know what vector the gene has been cloned into? Has the gene been sequenced in its entirety? How much do you trust the source from which you have received the gene? It is usually best to have the gene re-sequenced so that you know the junctions and restriction sites and can assure yourself that you are indeed working with the correct gene. What do you do if there are differences between your sequence and the published sequence? You will need to decide if the difference is due to a mutation, an artifact from the PCr reaction, a gene poly morphism, or an error in the published sequence. A search of an eSt database coupled with a comparison with genes of other species can help distinguish whether the error is in the data- base or due to a polymorphism. Alternatively, sequencing multi ple, independently derived clones can help answer the ese questions. Control Regions We now have a gene with a confirmed sequence. But which control regions are present? Does the gene contain a Kozak sequence, 5-GCCA/GCCAUGG-3, required to promote effi cient translational initiation of the open reading frame (ORF) in a vertebrate host(Kozak, 1987) or an equivalent sequence 5-CAAAACAUG-3 for expression in an insect host( Cavener, 1987)? If this sequence is missing, it is essential to add it to your expression vector. It is also advisable to trim the gene to remove ny unnecessary sequences upstream of the ATG. The 5non coding regions may contain sequences(e.g, upstream ATG's or secondary structures) that may inhibit translation from the actual start. A noncoding sequence at the 3 end may destabilize the message Eukaryotic Expression
epitope tags for detection and purification (e.g., His6 tag). When PCR is involved, the gene will eventually need to be entirely resequenced in order to rule out PCR-induced mutations that can occur at a low frequency. If mutations are found, they will need to be repaired, thereby adding to the time required to generate the final expression construct. The best practice is to start with a high-fidelity polymerase with a proofreading (3¢–5¢ exonuclease activity) function to avoid PCR errors. Sequence Information If you are lucky enough to obtain a DNA from a known source, a new litany of questions will need to be answered. Is a sequence and restriction map available? Do you know what vector the gene has been cloned into? Has the gene been sequenced in its entirety? How much do you trust the source from which you have received the gene? It is usually best to have the gene re-sequenced so that you know the junctions and restriction sites and can assure yourself that you are indeed working with the correct gene. What do you do if there are differences between your sequence and the published sequence? You will need to decide if the difference is due to a mutation, an artifact from the PCR reaction, a gene polymorphism, or an error in the published sequence. A search of an EST database coupled with a comparison with genes of other species can help distinguish whether the error is in the database or due to a polymorphism. Alternatively, sequencing multiple, independently derived clones can also help answer these questions. Control Regions We now have a gene with a confirmed sequence. But which control regions are present? Does the gene contain a Kozak sequence, 5¢-GCCA/GCCAUGG-3¢, required to promote effi- cient translational initiation of the open reading frame (ORF) in a vertebrate host (Kozak, 1987) or an equivalent sequence 5¢-CAAAACAUG-3¢ for expression in an insect host (Cavener, 1987)? If this sequence is missing, it is essential to add it to your expression vector. It is also advisable to trim the gene to remove any unnecessary sequences upstream of the ATG. The 5¢ noncoding regions may contain sequences (e.g., upstream ATG’s or secondary structures) that may inhibit translation from the actual start. A noncoding sequence at the 3¢ end may destabilize the message. Eukaryotic Expression 499
Epitope Tags and Cleavage sites Another sequence you might need to add to your gene isan tope tag or a fusion partner with or without a protease cleav age site. This will aid in the identification of your protein product (via Western blot, ELISA, or immunofluorescence)and assist in protein purification. Among the various epitope tags available are FLAG(DYKDDDDK)(Hopp et al., 1988), influenza hema glutinin or HA (YPYDVPDYA)(Niman et al., 1983), His6 (HHHHHH)(Lilius et aL., 1991), and c-myc (EQKLISEEDL) (Evan et aL., 1985). The more popular protease cleavage sites, used to remove the tag from the protein, are thrombin(VPR'GS) (Chang, 1985), factor Xa(IEGR: Nagai and Thogersen, 1984) Pre Scission protease(LEvlFQ'GR; Cordingley et al., 1990), and enterokinase (DDDDK; Matsushima et al., 1994)One use larger fusion pal choose a protease that is not predicted to rtners such as the Fc region of human Iggl or GST. It is crucial to cleave within the protein itself, but this does not preclude spur ous cleavages The benefits and drawbacks of utilizing epitope tags are dis- cussed in greater detail below in the section, "Gene Expression Subcloning Your gene is now ready to be cloned into an expression vector of your choice, provided that you have already decided what system to use. This will traditionally involve the use of restriction enzymes to precisely excise the gene on a DNA fragment, which is subsequently ligated into a donor expression vector at the sam or compatible sites. If appropriate unique restriction sites are not located in flanking regions they can be added by PCr (incorpo rating the sequence onto the end of the amplification primer ),or by site-directed mutagenesis. Recent technological advances also offer the possibility of subcloning without restriction enzymes. These new age cloning systems are based on recombinase-mediated gene transfer. Invit- rogen offers ECHOM and Gateway cloning technologies, while Clontech markets the Creator gene cloning and expression Recombinases essentially perform restriction and liga- le step thereby eliminating the time-consuming of purifying restriction fragments for subcloning and lig ating them. These new systems are particularly advantageous when transferring the same gene into multiple expression vectors for expression in different host systems Trill et al
Epitope Tags and Cleavage Sites Another sequence you might need to add to your gene is an epitope tag or a fusion partner with or without a protease cleavage site. This will aid in the identification of your protein product (via Western blot, ELISA, or immunofluorescence) and assist in protein purification. Among the various epitope tags available are FLAG® (DYKDDDDK) (Hopp et al., 1988), influenza hemaglutinin or HA (YPYDVPDYA) (Niman et al., 1983), His6 (HHHHHH) (Lilius et al., 1991), and c-myc (EQKLISEEDL) (Evan et al., 1985).The more popular protease cleavage sites, used to remove the tag from the protein, are thrombin (VPR’GS) (Chang, 1985), factor Xa (IEGR’; Nagai and Thogersen, 1984), PreScission protease (LEVLFQ’GR; Cordingley et al., 1990), and enterokinase (DDDDK’; Matsushima et al., 1994) One may also use larger fusion partners such as the Fc region of human IgG1 or GST. It is crucial to choose a protease that is not predicted to cleave within the protein itself, but this does not preclude spurious cleavages. The benefits and drawbacks of utilizing epitope tags are discussed in greater detail below in the section, “Gene Expression Analysis.” Subcloning Your gene is now ready to be cloned into an expression vector of your choice, provided that you have already decided what system to use. This will traditionally involve the use of restriction enzymes to precisely excise the gene on a DNA fragment, which is subsequently ligated into a donor expression vector at the same or compatible sites. If appropriate unique restriction sites are not located in flanking regions they can be added by PCR (incorporating the sequence onto the end of the amplification primer), or by site-directed mutagenesis. Recent technological advances also offer the possibility of subcloning without restriction enzymes. These new age cloning systems are based on recombinase-mediated gene transfer. Invitrogen offers ECHOTM and GatewayTM cloning technologies, while Clontech markets the CreatorTM gene cloning and expression system. Recombinases essentially perform restriction and ligation in a single step, thereby eliminating the time-consuming process of purifying restriction fragments for subcloning and ligating them. These new systems are particularly advantageous when transferring the same gene into multiple expression vectors for expression in different host systems. 500 Trill et al