Overproduction of proteins in by renata.vivien


									Overproduction of proteins in E. coli.

General ref: Current Protocols in Molecular Biology, Ausubel et al Eds.


To understand the following issues with respect to production of foreign proteins in E. coli.

1. The need to provide an E. coli promoter and ribosomal binding site.
2. The need to keep expression turned off during growth and propagation of the clone.
3. Problems related to stability and purification.
4. Use of affinity purification systems.
5. Recombinant Phage display.

Reasons for over-expressing proteins.

1. To purify large amounts for study or for sale.
2. To purify from a more convenient heterologous organism.
3. To purify away from other components of the originating organism.
4. As a prelude to in vitro mutagenesis.


        The amount of care necessary to successfully express a foreign protein in E. coli depends on how
much yield you need. If you're just trying to get enough to detect activity, then most any fusion to a valid
E. coli promoter will probably do. For many research purposes, expressing the protein as about a percent
of the bacterial protein is probably more than enough. If the gene comes with its own promoter, this may
be achievable by simply putting the gene on a multicopy vector. If the gene is without a promoter (a
cDNA for example), one can get this level of expression from fusing to any number of strong E. coli
promoters. At this level of expression, one is mainly concerned with avoiding problems caused by some
noxious property of the gene product (i.e. instability, refusal to fold, toxic to the host, mRNA degradation
signals in the untranslated regions).

        Other purposes, for example supporting structural studies, require high yields. Yields in excess of
40% of total E. coli protein can be obtained. To reach these yields, one should expect to optimize every
step in the expression pathway. Getting high level transcription is usually not too hard. One may have to
supply optimal translational start signals, to supply a transcriptional terminator, to remove some
nonoptimal codons, to remove or replace untranslated regions, and to be prepared to recover large
amounts of insoluble protein and refold it.

This image is from New England Biolabs advertising information for one of their affinity expression systems. The amount of
the recombinant protein produced on top of total cell protein can be seen in lane 3.

        Typically, after induction and an expression period one spins down about a ml worth of cells,
cooks them in SDS loading buffer, and then analyze by SDS PAGE. This is total protein, including
insoluble inclusion bodies, membrane proteins, and soluble cytoplasmic protein. To distinguish whether
the protein is in the soluble or insoluble fraction, one would open the cells by sonication, separate soluble
and insoluble fractions, cook the insoluble fraction in SDS, and load each on the SDS polyacrylamide gel.
In order to carry on with the affinity purification as indicated above, the protein would have to be in the
soluble fraction. If it's in inclusion bodies, there are a series of washes to purify the inclusion bodies, then
one would have to denature and renature before carrying on. If the protein is in the membranes, there it
may be solubilized by gentle detergent treatment, eg. in Triton X 100.

       When high level expression is coupled to in vitro mutagenesis, one should expect additional
problems with the mutants. Mutant proteins are generally less stable, and therefore more susceptible to
degradation and insolubility. Multiple mutations cause progressively more trouble.

Affinity Systems.

        Most high level expression experiments in E. coli are done by making a fusion with some protein
that is easily purified by affinity chromatography. Usually the fusion partner comes as part of the
expression vector and will be the N terminal domain of the construct. This is so that the novel sequence
added is not near the translation expression signals and is not at risk of forming secondary structure with
them. Typically the vector contains a cloning site just downstream of a proteolytic cleavage site. Your
insert would most easily be added by use of PCR amplification with primers designed to add a 5'
extension serving but to provide the restriction site, and to supply an appropriate translational fusion.

The above figure is from Promega's advertisements for expression vectors

Note that in this typical case, one has to make the 5' primer to add the restriction site of choice and keep
the fusion protein in phase:

For example:

might be used together with the HindIII cleavage site in the Xa-3 vector (where Met Leu Pro is the
beginning of the natural protein). However, the protein after factor Xa cleavage will have the N-terminal
sequence Glu Lys Leu Met Leu Pro ... There are a few vectors designed to get your protein back out
without extra residues on the N-terminus.

Since there is PCR involved, expect to resequence the clone to rule out inadvertent PCR-induced

General methods of boosting expression.

1. Increase copy number of the gene.
2. Fuse to more powerful transcription and/or translation signals.
  (e.g. lac, lambda PL, Trp, TAC, beta-lactamase.)

Problems and potential solutions:

1. Codon preferences
        Resynthesize gene or segments thereof with favored codons, particularly codon #2, or replace
            runs of adjacent unfavorable codons.
        Use host strain with extra tRNAs.
2. Degradation of protein:
        lon- host.
        Fusion to another protein may stabilize small proteins.
        Use protease inhibitors after opening cells.
3. Insolubility of the expressed protein:
       a. Find in inclusion bodies and solubilize by denaturation and renaturation.
       b. Solubilize under nondenaturing conditions.
       c. Increase solubility by use of a fusion partner.
       d. Look out for missing cofactors (like metal ions) in the growth medium.
       e. Co express with a chaperonin.
       f. Try growth at reduced temperature.
       g. Express at a reduced rate to give the protein a chance to fold.
       h. Be happy with the soluble portion. (But it it's a small portion, beware that it might represent
              mistranslated or unfolded material).
4. Expression of the protein is toxic to E. coli - Use a tightly controlled promoter to keep expression
       turned off until the clone has been grown up.
5. Instability of the plasmid. This problem is particularly bad when the plasmid is maintained with
       ampicillin (or other antibiotic resisted by a beta-lactamase).
       a. Keep expression suppressed during growth.
       b. Eliminate unnecessary passages.
       c. Consider a vector with a better antibiotic selection.
       d. Use a recA- host.
6. Problems related to fusion partners.
        Protease to cleave the fusion domain off may cleave inside your protein.
        Cleavage at the protease cleavage site may be inhibited by presence of your protein.
        Extra residues added to protein may change its properties.
        Your protein may interfere with binding of the partner to its affinity resin.
        Your segment of mRNA may form secondary structure with the translation signals.


Somatostatin - Itakura et al., Sci 198,1056. (1977)

        Somatostatin is a peptide hormone. From the known amino acid sequence, a somatostatin gene
was synthesized with E. coli codon preferences. It was expressed from the lactose promoter with and
without fusion to beta-galactosidase, with the latter found to stabilize the peptide. The fusion was made
after a Met residue so that somatostatin was recovered from the fusion protein after cyanogen bromide
cleavage. The unfused construct produced no detectable somatostatin, and the fusion construct produced
a disappointingly low yield of insoluble protein.

        This was the first published attempt to mass produce a eucaryotic protein in E. coli. It mainly
served to anticipate some of the problems that must be overcome for successful mass expression. The
solubility problem remains something that requires a customized solution for each protein, although stable
globular proteins do better than short peptides. This experiment did establish the strategy of fusing
foreign peptides to a carrier protein to stabilize them. More specific means of cleaving the fusion junction
are now available.

       The low yield was related to a failure to adequately down-regulate the expression of the insert
while the clone was being grown and propagated. The lac regulation was overpowered by the copy
number of the vector (pBR322). Even though pBR322 exists in only about 20 molecules per cell, this
enough to titrate out the available lac repressor. This causes partially constitutive expression of the insert,
which causes selection for deletions that take out the promoter or the insert.

        It is a common error for people to get a poor yield and blame it on degradation, when what really
happened is that the gene or promoter was already genetically damaged in the construct by the time they
looked for expression. The classic method to investigate protein degradation is to pulse label with [35]S-
methionine and observe that the protein really is produced and then degraded. (Note: Look to be sure the
protein has an internal met codon first; the initiator met is often removed by posttranslational processing).
An alternative would be to do a western blot. For several of the affinity tag systems, one can obtain
commercial antibody to the tag, which could be used for this purpose. However, with modern expression

systems, the transgenic protein should be obvious on a simple Coomassie stained SDS gel. One should
both run a sample of the cell lysate, and a sample obtained by cooking the insoluble cell debris in SDS. It
will often be true that the major portion of the expressed protein is in the insoluble fraction.

        Another symptom of genetic instability caused by expression leakage is that the yield drops off
precipitously as the clone is propagated. So the clone might produce a great yield in a small pilot
experiment, and then make almost nothing when scaled up to several liters. One should consider keeping
back a small sample of the culture to allow examination of the plasmid DNA itself after the fact. Genetic
instability will often show up as a heterogeneous set of deletions. However, you need to keep in mind that
point mutations in the promoter, or even mutations in the host background can also destroy the expression
of the insert.

Instability and ampicillin resistance.

        The instability problem when growing expression clones is worse when trying to maintain the
clone with ampicillin resistance than with other antibiotics. This is because ampicillinase (beta-
lactamase) leaks out of the cells while they are growing in liquid culture and destroy the ampicillin in the
culture fluid. After that, bacteria that lose the plasmid tend to overgrow the culture. A typical experience
goes as follows:

       1. The clones behave as expected on an ampicillin plate.
       2. Small scale cultures produce the protein as expected.
       3. An overnight preculture is prepared to start a large scale growth.
       4. When the large culture is inoculated the next day, the optical density increases only slightly, and
              then decreases. To the practiced eye, there is an accumulation of stringy debris indicative
              of lysis.
       5. The effect is non reproducible. Sometimes the large scale culture grows and sometimes it lyses.
              When it does grow, there can be a long lag phase, and the protein yield is typically less
              than anticipated from the small scale culture.

        The explanation is that the ampicillin is cleared from the preculture and then ampicillin sensitive
bacteria that have lost the plasmid overgrow to various degrees by morning. When the preculture is used
to inoculate media with fresh ampicillin, the bacteria begin to grow. But they cannot synthesize cell wall
due to the ampicillin, so they lyse.

        This problem shows similar symptoms to a T1 phage infestation. T1 is a bacteriophage of E. coli
that survives dehydration, and spreads as an airborne contaminant. It causes aggressive lysis, producing
plaques on plates the size of a quarter. T1 infestation is rare, but when a culture gets accidentally infected,
lyses, and then opened, it can spread enough airborne contamination throughout the lab or even an entire
building that no one can grow E. coli cultures for years afterwards. This forces everyone to derive T1
resistant versions of all of their strains. This is a tremendous setback when it happens, hence everyone is
advised upon observing a culture of E. coli to lyse unexpectedly to autoclave it without opening it.
Clearly, it is inadvisable to have a background of cultures lysing unexpectedly due to this ampicillin
selection problem because it reduces vigilance against the T1 infestation problem.

       When working with an expression plasmid based on ampicillin selection, special precautions are
required to maintain the selection. The growth is generally done more continuously to avoid precultures
going to saturation. However the growth may still be done in stages with inoculation into fresh

       Some biotech companies are promoting expression vectors based on different antibiotics to
counter this effect.

Human growth hormone - Goedel, et al. (1979) Nature 281, 544.

       This is probably the first published successful mass expression of a eucaryotic protein in E. coli.
Human growth hormone is a 191 residue peptide hormone. The first 24 codons were resynthesized with
an Eco RI site upstream of the AUG convenient for joining to the lac promoter and ribosome binding site.
The other end was made as a Hae III site. The synthetic segment was first cloned and sequenced in an
independent vector to verify the correct sequence.

      The cDNA was cloned as a Hae III fragment which omits the first 24 codons. The two parts of the
gene were then ligated together an joined to an Eco RI site downstream of two lac promoters.

       They used lac iQ (overproducer of lactose repressor) to get tighter control over expression and
downstream transcriptional fusion to the tet resistance gene of pBR322 to guard against deletion. Upon
induction, they got 20% of cellular protein as HGH.

         This strategy anticipates some of the common tricks still used today. The resynthesis of the N-
terminal region as part of a linker (or more recently as a PCR primer) is a standard method of achieving a
fusion. Oligo synthesis is sufficiently advanced today to easily reach lengths of up to 100 bases. Lac iQ is
still used to improve control over the lac promoter. Sometimes the lac i gene is placed on the cloning
vector so that its gene copy number is increased together with the number of lac promoters. However, the
lac promoter still leaks expression of the insert. Other promoter systems (lambda PL and T7) can give a
negligible basal expression level, and are preferred for inserts with toxic properties. These improved
promoter systems have supplanted the use of transcriptional fusions to an antibiotic gene as the preferred
way to stabilize troublesome inserts. Additionally, one tries to avoid serial propagation. With lac and tac

promoters, induction of expression is with IPTG. IPTG should never be added until the production
cultures are grown. Specifically, it should not be added to the plates on which the clones are selected
and/or stored.

         This experiment also typifies the multistep constructions that were common in the last decade. In
a multistep construction (where you're putting a lot of different restriction fragments together), there are
lots of things that can go wrong. As much as possible, you need to make one joint at a time, clone the
intermediate, verify it, and then cut it back out to use in the next step. Today, one would try to use
modern techniques and materials to reduce the number of steps. For example, one would use an
established expression vector that already had the promoter, lots of convenient restriction sites, a host
strain, and a history of successful expression experiments. This would avoid the steps involving creation
of the vector. It would probably be preferable to use the synthetic segment as a PCR primer, and therefore
lift out the intact HGH gene in one step. However, it is not a useful simplification to throw four or more
fragments into a ligation reaction and expect them to all join together in the proper order.

        The object of simplifying a construction is to increase the reliability, not to reduce your work load.
Steps that are for verification improve reliability and should be included as much as possible. It's the steps
that are mainly opportunities for something else to go wrong that you're trying to eliminate.

Beta-globin - L. Guarente et al. (1980) Cell 20, 543.

        When joining the beta-globin AUG to the ribosomal binding site of the lac promoter, they made a
set of deletions with exo III and S1 to get a variety of spacings. The clones were translationally fused
downstream to lac z so that the efficiency of the various arrangements on the 5' end could be assayed by
looking at beta galactosidase activity. When an efficient construct was found, the 3' end was replaced to
make an unfused beta-globin gene.

         The figure above shows the relative activity recovered based on the exact sequence that was
deleted. This experiment served to show that the spacing between the AUG and the ribosome binding site
is critical. In modern constructs, one uses an exact copy of an efficiently translated E. coli gene for this

Proinsulin - K. Talmadge, et al. (1980) PNAS 77, 3988.

       Preproinsulin has a eucaryotic signal sequence at its N-terminus that normally directs it to be
secreted. Beta-lactamase has a bacterial signal sequence at its N-terminal which directs its secretion into
the periplasmic space. Several fusions with part of the bacterial and part of the eucaryotic signal

sequences were made. They all directed secretion of the protein, and in each case the signal sequence was
properly cleaved off to create correct mature proinsulin. In fact, even the plain proinsulin signal without
any bacterial component worked.
Beta lactamase signal     | Cleavage



                                         MALWRMFLPLLALLVLWEPKPAQA    FVKQHLCGPHLVEALYLVCGE...

                                         Preproinsulin signal        ^ Cleavage site

From Talmadge et al. (1980) PNAS 77, 3988-3992.

        This experiment established the feasibility of causing the foreign protein to be secreted into the
periplasmic space, along with removal of the signal sequence. The idea was that by secreting the foreign
protein, it would be easy to purify, and protected from stability and solubility problems. However, it turns
out that the periplasm has even more proteases in it than the cytoplasm, so one generally gets a lower yield
this way than by just leaving it in the cytoplasm.

Strength of the ribosomal binding site.

Ref: Mott et al. (1985) PNAS 82, 88-92.

       In order to over express the E. coli rho protein, it was fused to the lambda PL promoter either with
its own ribosomal binding site or with the ribosomal binding site of the lambda cII gene. The former
construct gave rho as 3%-5% of the cellular protein after induction, whereas the latter gave approximately
40%. So even with bacterial genes, it can help to improve the translation signals.

        The lambda PL promoter is still one of the best around due to its very low basal expression level,
its high activity, and its ease of induction (heat). However, you have to use a host with the C I857 ts
lambda repressor gene in it, and you have to be sure to grow the clones at 32C so as to avoid leakage of
expression. Modern expression vectors usually come already carrying the translational signals from a
heavily expressed gene like cII. Often there is an N terminal fusion domain, so the site for fusing your
coding sequence will not interfere with translational initiation. However, if you do try to place your
coding sequence so that it will be the N terminal domain, be sure not to disrupt anything between the rbs
and the initiator AUG from the expression vector. Your sequence can also inadvertently create mRNA
secondary structure that ties up the initiator codon or the ribosome binding sequence and cause poor
translational efficiency. Such proposed constructs should be checked out for secondary structure
problems using prediction programs. Mfold, found on the internet, is good for that purpose.

A completely synthetic approach.

Ref: Jay et al. (1984) PNAS 81, 2290-2294.

         The gene for human gamma interferon was completely synthesized including a strong
bacteriophage T5 promoter and a strong ribosome binding site. The gene was ligated together from a
series of 66 overlapping oligonucleotides as illustrated in the stylized diagram below. One can put many
oligos together in a single ligation, although it may be wise to assemble the gene as a series of smaller
restriction fragments that can be independently cloned, sequenced and then ligated together to form the
whole gene.

      The synthetic gene was ligated into a plasmid vector such that the tet resistance gene was fused
downstream to hold on selection against loss of the interferon gene. Human interferon accumulated at >
15% of cellular protein.

        Today, individual oligos of 100 bases can be made with little risk of incorporating errors. This is
because the chemistry has been changed so that error products (failure to add a base at any step) are
capped and left in a condition so that they can all be removed from the correct product in a one step
purification at the end of the synthesis. So it is possible to construct entirely synthetic genes of substantial

Use of high copy number vectors.

Ref: Winter et al. (1982) Nature 299, 756-758.

        M13 RF maintains a copy number of about 200 molecules per infected cell. Gene cloned into
M13 with their own promoters can have high level expression, even if their own promoters are not
particularly strong. M13 is a phage that packages a single stranded circle of DNA into the capsid. Within
the cell M13 grows as a double stranded plasmid called RF (replicative form). Methods of mutagenesis
prior to PCR-based methods were based on priming a single stranded template with a mutagenic primer.
Hence M13 vectors fit easily into that strategy.

        In order to avoid deletions, one should make the phage propagate by infection rather than by
division of infected cells. Also, one should avoid serial culturing.

Inclusion bodies.

       Heavily expressed proteins often aggregate and form inclusion bodies. Inclusion bodies pellet
with the bacterial debris after cell lysis. Since this fraction is often discarded, it is easy to mistakenly
believe that the expressed protein has been degraded.

        Inclusion bodies can actually protect a protein from degradation. Also, after isolation by
differential centrifugation and washing, the over-expressed protein may be almost pure within the
inclusion body.

        Proteins within inclusion bodies are insoluble, often in a denatured state, and may have
inappropriate disulfide bonding. One generally solubilizes the inclusion bodies in a denaturing agent,
such as guanidine hydrochloride, reduces, dilutes, and then tries to refold the protein out of a low
concentration of guanidine hydrochloride and in the presence of reduced and oxidized glutathione. It is
possible to impose a purification in the denatured state, either by chromatography in urea, or by affinity
purification using a His-tag. Further purification of the refolded form will be necessary along with
physical characterization to assure that it is the correct native form. For some kinds of experiments,
aggregate in the refolded material is particularly troublesome, so a gel filtration is a common follow-up
purification. The aggregated material can be recycled through the denaturation and refolding steps.
Conditions for refolding vary from protein to protein and for some may be hard to find. Failure to include
an essential metal ion cofactor would be one cause of trouble. It also follows that one has to have worked
out suitable buffers for the purification step and for storage so that the protein is not reaggregating after
the fact of getting it properly refolded. Membrane proteins will require detergents, probably at all steps.
For proteins with chronic solubility problems, a fused affinity domain with good solubility characteristics
may help to keep it in solution.

Ref: Tsuji et al, (1987) Bioch 26, 3129-3134.

A database of different refolding conditions that have been used is found at

        Proteins can be in inclusion bodies for reasons other than being unfolded. RNA binding proteins
are often found in inclusion bodies by virtue of being networked with cellular RNA. It may be possible to
release such proteins without denaturation by treating with RNAse.

        Sometimes proteins can be engineered to avoid certain folding problems. For example, if a cys is
involved in inappropriate disulfide bonding, and homologous proteins suggest an alternative acceptable
amino acid, making that replacement by in vitro mutagenesis might improve the ease of isolation of the
protein. An alternative hit-or-miss strategy for finding a version that behaves better is to isolate a variety
of homologues from related organisms.

Protease- mutant host bacteria.

       E. coli has numerous proteases that can attack and degrade recombinant proteins. In particular,
synthesis of protease La, which is the product of the lon locus, is induced by the presence of abnormal
proteins. lon is a heat shock gene, and is probably there to degrade damaged proteins after heat shock.
Recombinant proteins are degraded 2-4 times more slowly in lon- cells. Alternatively, the HPTR locus
which encodes an alternative sigma factor for directing the induction of heat shock genes can be mutated.

Ref: Goff and Goldberg (1985) Cell 41, 587.

mRNA half life.

        Most E. coli messages have half lives of about 1-2 minutes. T4 gene 32 mRNA has a half life of
about 30 minutes, this being part of the phage's strategy to achieve high level expression. Sequences in
the 5' untranslated region of the message confer this excessive stability. Expression cassettes have been
constructed wherein the 5' end of T4 gene 32 is fused to the beginning of the recombinant gene.

        As far as I can tell, this system has not appeared in a commercial vector yet. However, be warned
that the opposite effect can happen by accident. You may inadvertently introduce sequences in an
untranslated region that destabilize the mRNA. You should generally avoid including untranslated
regions from cDNAs in E. coli expression vectors.

Ref: Frey et al., (1988) Gene 62, 237-247.

Interaction of the transgenic protein with chaperonins

Ref: Overproductions of Anabaena 7120 ribulose-bisphosphate carboxylase/oxygenase in Escherichia
      coli-Larimer and Soper (1993) Gene 126: 85-92.

   In photosynthetic organisms Rubisco (D-ribulose-1,5-bisphosphate carboxylase catalyzes the initial
step in the reductive pentose phosphate pathway. Refolding of Rubisco in vitro requires chaperonins.
High-level production of Rubisco activity from E. coli was aided by the simultaneous overproduction of
the E. coli (GroESL) chaperonins.

    Curiously, some proteins may be stabilized in strains carrying a defective chaperonin (Reidharr-Olson
et al., Biochemistry 29: 7563-7571(1990))

Problems with non preferred codons.

        E. coli uses preferred codons among the synonymous sets for its own highly expressed proteins.
The implication is that the non preferred codons are translated inefficiently. Eucaryotic genes are full of
these non preferred codons, yet they usually can be highly expressed without trouble. However,
sometimes it does help to put a preferred codon at amino acid #2, or to fix stretches of adjacent non
preferred codons.

         An alternative solution is to use expression hosts that contain additional tRNA genes added for the
purpose of increasing the level of tRNA specific for non-preferred codons. Strategene sells a series of
expression strains under the trade name CodonPlus which coexpress different sets of tRNA genes targeted
at rare codons.

Other genetic code problems.

       Some genomes don't use the same genetic code as E. coli. For example, most mitochondrial
genomes use a few altered codons. Once the altered code is known, the gene will have to be altered by in
vitro mutagenesis at the variant codons to match the E. coli code. A few human nuclear genes are edited
at the RNA level. RNA editing is found elsewhere, reaching an absurd level in Trypanosome
mitochondria. One would have to be sure to be working from the final sequence after editing.

T7 RNA Polymerase/Promoter systems

        This system (marketed under the name pET by Novagen, but also publicly available) is very
popular today. It expresses the foreign gene from a T7 promoter on the vector. T7 is a bacteriophage that
makes its own RNA polymerase that is specific for its own promoters. The T7 polymerase is provided
either by an inducible T7 polymerase gene in the host, or by infecting the culture with a phage carrying the
polymerase gene after the cells have been grown up.

        There is a multiple cloning site downstream of the T7 promoter. If the gene already has a suitable
ribosomal initiation site, it can simply be inserted in the correct orientation. Alternatively, one can add a
strong ribosomal binding site, engineer the codons to match E. coli's preferences, add the restriction sites,
and even add a transcriptional terminator, all by using the linker, or PCR fusion procedures described
above. Versions of the vector exist that have a strong ribosomal binding site and a cloning site right at the
AUG, so that one could fuse right at the AUG.

         If the T7 polymerase gene is in the host background, it will be under the control of the lambda PL
promoter which is in turn under control of a lambda CI857 ts lambda repressor gene also in the host
background. This promoter has one of the lowest basal expression levels of any around, but there is still a
little leakage of expression of the transgene. If the basal expression of the transgene proves toxic to the
host, then one grows up the clone in a host with no T7 polymerase gene, and then introduces it by
infection with a phage carrying the polymerase gene. This is the major advantage of the T7 systems. One
can alter the method of control of expression without having to make new constructs.

        Some people have reported leakage of expression in this system even without the T7 polymerase.
When going for 0 basal expression, one has to worry about leakage of transcription from other promoters
in the vector that read through into the transgene.

Fusion systems:

        There are numerous commercial systems marketed in which you fuse your protein to some other
protein that provides a purification handle. Sometimes the fusion is designed to direct secretion into the
periplasmic space. Then some means is provided to subsequently cleave the fusion apart. One needs to
pay attention as to whether or not the protease will cleave internally to your protein. In most systems the
cleavage will leave extraneous amino acids attached to your protein. So you will have to evaluate if that
will be acceptable for your purposes. Hopefully the cleavage will be accomplished on the folded fusion
protein, directly releasing your folded polypeptide. Unfortunately, sometimes you have to denature the
fusion protein to get the protease to cleave the fusion site.

Maltose-Binding Proteins Fusions

        This system, based on vectors pMAL-c2 or -p2, can be obtained from New England Biolabs. In
this system, you make a translational fusion downstream of malE, which is a secreted E. coli protein that
binds maltose. When expressed in pMAL-p2 the fusion protein is recovered from the periplasmic space.
Alternatively, the pMAL-c2 version is designed to leave the fusion protein in the cytoplasm. In either
case, the fusion protein can be purified by affinity chromatography on an amylose column, and then
cleaved with factor Xa protease which is specific for the fusion site, leaving your protein with a few extra
amino acids at the N-terminus. The XmnI specificity is GAANN^NNTTC, making it possible to fuse
with no extra amino acids if you can arrange for your insert to start with the first codon at a blunt end.

        As we saw before, secretion into the periplasmic space turns out to usually reduce the yield (in this
case about 4 x). However, some proteins that form disulfide bonds fold better if secreted. On the other
hand, large proteins that are normally cytoplasmic have trouble getting through the membrane. The major
attraction of making the fusion is to allow affinity purification based on the maltose binding domain,
before cleaving it off with factor Xa. Factor Xa cleaves after the Ile Glu Gly Arg at the fusion site.

Other fusion systems:

        There are a variety of other commercially available fusion systems that are designed to assist
purification of your protein, then let you cleave your protein away from the bacterial domain.

    Novagen, Inc. (now part of Merck), has a variety of T7 pET type vectors designed to effect fusions
with various proteins that can be used as purification handles:

            Tag         n/c terminal              basis for detection
                                                   and/or purification

          T7-tag              N                      monoclonal antibody
          S-tag               N                      RNAse S-protein
          His-tag             N or C                 metal chelation
          HSV-tag             C                      monoclonal antibody
          pelB/ompT           N                      potential peri-
                                                     plasmic localiza-

   The His-tag system is essentially just a string of 6 histidines in a row fused to N or C-terminus. This is
the only affinity method that can be used in the denatured state. Some renaturing schemes call for
refolding the protein while bound to the metal ion column. Qiagen markets an exopeptidase that can
remove an N-terminal His-tag without the requirement for creating a cleavage site at the junction with the
body of the protein. There are limitations to the amount of reductant that can be used with the Ni ion
affinity resin without reducing the Ni. There are commercially available antibody affinity systems for
purifying His-tagged proteins, and antibodies to the tag can be used on a western blot to assay for protein

   Pharmacia Biotech (now GE Healthcare) uses plasmids (pGEX vectors) designed for inducible, high-
level intracellular expression of genes or gene fragments as fusions with glutathione S-transferase (GST).
The fusion proteins can be detected using colorimetric assay or immunoassay and purified using
Glutathione Sepharose 4B affinity chromatography. The GST domain is a dimer, so if the quaternary
structure of the recombinant protein is relevant to the experimental design, then it will have to be

   Eastman Kodak has the Flag System (now marketed through Sigma Aldrich) that is based on the Flag
marker octapeptide that is fused to a protein by molecular cloning of its DNA coding sequence adjacent to
the protein coding sequence for expression in an appropriate vector. Detection is by specifically binding
mouse monoclonal antibodies to the octapeptide, while purification is by affinity chromatography. An
amino-terminal Flag peptide can be removed by the protease, enterokinase. The Flag fusion proteins can
be expressed in E. coli, yeast, insect, or animal cells.

  InVitrogen uses a system based on fusion to thioredoxin with purification by binding to a phenylarsine
oxide resin.

        [Stratagene was incorporated into Agilent, and many of its products have disappeared from the
market. The most common 'solubility enhancement tag' still used is matose binding protein with an N
termina his tag.] Stratagene packs a variety of functions into its Verflex tag system. The tag for affinity
purification is based on binding to streptavidin. Also incorporated is an alpha complementing fragment of
beta galactosidase (Q-tag) which can be used to quantitate the fusion protein by a beta-galactosidase assay.
A final innovation is the inclusion of "solubility enhancing tags", which are highly negatively charged
folding domains. The idea is that these may increase solubility of the fusion protein by charge-charge

        In all cases where immunodetection or immunoaffinity purification is used, one has to use a tag
that has no endogenous counterpart.

        In the various combinations above, one can cleave with factor Xa, thrombin, or enterokinase. Any
of these might hit sites within your protein, in which case you switch to a vector that uses one of the
others. A variety of other proteases have made their way into commercial expression vector systems. GE
Healthcare Life Sciences (was Amersham Pharmacia) has a product they call PreScission protease that is
based on the rhinovirus protease. They market it as a noncleavable GST fusion. That way, you can get
your GST fusion bound to glutathione conjugated sepharose and then just mix the protease in. You
protein is released, and the GST fusion partner as well as the protease are retained on the resin.

        New England Biolabs markets a system named IMPACT where the cleavage is effected by the
activity of a self splicing protein called an intein. Chong et al., 1998. Nucl. Acids Res. 26:5109.
Depending on the variety, the cleavage is instigated by pH, temperature, or a thio reagent. It is possible to
leave a reactive thioester on the N terminus for use in subsequent coupling reactions.

Recombinant Phage Antibody System

   Pharmacia Biotech (GE Healthcare) also has the Recombinant Phage Antibody System (RPAS)
designed for the cloning and expression of recombinant antibody fragments in bacteria. In this system,
one makes two insertions into a fusion protein, one derived from an Ig heavy chain variable region, and
one from an Ig light chain variable region. The fusion protein juxtaposes the chains to allow formation of
an antigen binding site. The fusion protein is displayed on the surface of an fd (M13) phage, allowing one
to screen a library of plaques with a labeled antigen. Even better, one can purify phage that bind the
antigen by affinity and then reinfect the host.

  In essence the system produces single polypeptide versions of antibodies (ScVf) quickly in bacterial
cultures. The "cleavage" of the soluble antigen binding domain from the phage protein domain is done by
an interesting genetic manipulation. There is an amber stop codon after the antigen binding domain and
before the phage binding domain. To get bound antibody, one expresses from a amber suppressor strain.
To get soluble antibody, one expresses from a non-suppressor strain.

Phage Display

        A popular variation on the above theme is to fuse a library of peptides to the phage coat protein
and to purify the particular sequences that bind to some ligand. Typically the library is composed of
random sequences, and the clones that bind are sequenced and used to discern amino acid patterns
required for binding. Commercial libraries are available with random 7 mer peptides or 12 mer peptides.
One could hope to screen a large enough library to contain all possible 7 mers, but longer peptides will
necessarily have only a fraction of all possible sequences present. Screening is generally by panning, in
which the library of phage, each containing the DNA specifying its displayed sequence, is reacted with a
surface coated with ligand. Phage retained on the surface are eluted, amplified, and panned again several
times. Typically one uses relatively non stringent binding conditions (high concentration of the ligand) at
first because the concentration of phage that will bind is so low. Stringency is then increased in later
rounds of panning by decreasing the concentration of ligand.

       Phage Display can also be used with protein domains subjected to saturation mutagenesis.
However, non secreted proteins are often not expressed efficiently in this system because they fail to
successfully pass through the bacterial membrane in the assembly of the virus.

       Variations on this theme are:

       1. The coat protein gene could be on a phagemid. A phagemid is a plasmid that additionally has
              an origin of replication from an M13-like phage. When a helper phage is provided, a
              single stranded version of the phagemid ends up packaged as if a viral genome.

       2. Rather than coating the surface with the ligand, one could coat it with an antibody to the ligand.
              Binding could then be as a sandwich.

       3. New England Biolabs now sells a phage display kit based on M13.

       4. Novagen has a T7 phage display system.

The following figures from NEB's phage display manual
(http://www.neb.com/nebecomm/ManualFiles/manualE8101.pdf) show the method of construction, and a
diagram of the panning step for using their system:


Clackson, T., et al. Making antibody fragments using phage display libraries. Nature 352:624-628 (1991).

Scott, J.K., et al. 1990. Searching for peptide ligands with an epitope library. Science 249:386-390.

Hogrefe, H.H., et al. 1993. Cloning in a bacteriophage lambda vector for the display of binding proteins
on filamentous phage. Gene 137:85-91.


1. You express a mutant protein in E. coli and find little in the cell lysate with a Western blot. Expression
of the wild type protein had never been a problem. What do you suspect first, and how would you test
your hypothesis?

2. Do you expect any special problems from expressing a human mitochondrial gene in E. coli? If so,
what would you do about it?

3. You have determined the protein sequence from a trypanosome mitochondrial gene. What would be
the most direct way to express this protein in E. coli?

4. You make an expression construct that makes a high yield in a small pilot growth, but gives
disappointing yields when large scale (100 liter) cultures are grown up. What do you think is the problem,
and how would you solve it?

last edited 4/5/2011; Steve Hardies


To top