Cloning genomic DNA

Document Sample
Cloning genomic DNA Powered By Docstoc
					Cloning genomic DNA
If you want to understand how genes are regulated and how they have evolved you need to be able
to examine their structure. cDNA can only tell you so much about a gene – an individual cDNA
clone is derived from a processed RNA transcript. Compared with the whole gene, it will have lost
any upstream regulatory regions and it will have lost the introns. So to study genes in a complete
way you need to be able to make clones containing genomic DNA and, from your library of clones
representing the whole genome, you must be able to identify those clones which are derived from
the gene of interest.

As with all molecular biology techniques, the techniques of library construction have evolved
during the past twenty years.

The key principles are
   1. Ensure that the library is representative of the entire genome. i.e. any point in the genome
       should have an equal chance of appearing in the library compared with any other point.
       This is not always as easy to accomplish as you might think.
   2. Make sure that the library contains enough cloned DNA so that the chance of any point in
       the genome being contained within at least one of the clones is high.
Subsidiary to this, but still important
   3. Individual clones should contain sufficiently large inserts so that the number of clones
       needed to ensure that point 2 is satisfied is not so great that the library size becomes too big
       to handle conveniently.

It goes without saying (I hope) that the vector chosen, as well as helping to satisfy the conditions
above, should be able to propagate clones stably in the chosen host so that each cloned fragment of
genomic DNA is a true copy of the piece of genome from which it originated and is not
rearranged (duplicated, deleted, reshuffled or otherwise mutated) in any way.

How many clones are needed to make a representative library?
Even if every piece of the genome is equally “clonable”
(which is far from true and will be discussed below), no           If this is not easy to envisage, imagine a
                                                                   very large number of necklaces each
library is guaranteed to contain every point in the genome.        containing 100 beads numbered from 1
This is a matter of statistics. The formula which relates the      to 100. If we snap all the strings and
number of clones needed in the library to the size of the          pour the beads into a box how many
                                                                   individual beads will we have to pull
clone insert and the size of the target genome, for any given      blindfold from the box to be able to
chance that any point in the genome will appear in the library     completely reassemble at least one
at least once is:                                                  necklace? If the number of necklaces
                       ln 1  p                                  really is “very large” (as it would be if
                  N                                               each necklace represents one complete
                              i                                  genome isolated from one cell) then we
                      ln 1  
                          g                                      can never be certain that we will pull
                                                                 one copy of each bead. However, the
                                                                   more beads we take the greater the
where                                                              chance that we will in the end have one
                                                                   copy of each number.
N = number of clones in the library, p = probability that any
point in the genome will occur at least once in the library, i = insert size, g = genome size
How does this work in practice?
Several different vectors have been used as methods have evolved. This little spreadsheet shows N
calculated for different vectors given various desired values of p and a genome size of 3 × 109 bp
which is the approximate size of the human or mouse genome.

                    genome size =   3000000000 base pairs
vector     insert size        0.5           0.9      0.99      0.999
lambda          20000    103972         345387    690773    1036160
cosmid          45000     46209         153505    307009     460514
YAC           1000000       2079          6907     13813      20720
BAC            500000       4159         13814     27629      41443

In practice, these are typical values…
Vector Maximum Approx. No. of               Advantages              Disadvantages
           Insert size clones required
                         in library
lambda 20 kb           5 × 105              easy to construct       many clones required,
                                            libraries, relatively   hard to prepare DNA
                                            stable inserts          from clones
cosmid    45 kb        2 × 105              easy to construct       not always stable
                                            libraries and to
                                            prepare DNA from
YAC       1 Mb         104                  few clones required     very prone to
                                                                    difficult to construct
PAC       ~120 kb      105                  fewer clones            single copy origin of
                                            required than for       replication therefore
                                            cosmids, stable         harder to prepare DNA
BAC       > 500 kb     2 × 104              few clones              single copy origin of
                                            required, very          replication therefore
                                            stable                  harder to prepare DNA

Bacteriophage lambda was originally used as a genomic cloning vector because 20kb of its
genome, containing the genes required for lysogeny, could be replaced by insert DNA from
another species. It is now only used for genomic library building in exceptional circumstances.
Cosmids were popular in the 1980s and early 1990s for genomic cloning. They are a hybrid
vector, mostly plasmid with the lambda cohesive end incorporated. This is used to give a very
high efficiency of bacterial transformation because the recombinant DNA molecules can be
packaged into lambda protein coats which then will infect bacteria. YACs, yeast artificial
chromosomes, were designed to propagate in yeast, because of problems with genomic instability
they are no longer much used. PACs are another hybrid vector, part plasmid and part phage P1
and BACs are based around the F´ origin of replication. The book, Analysis of Genes and
Genomes by Richard Reece, (Wiley 2004) has an excellent chapter outlining all these vectors
which I recommend that you read.
Why are some pieces of genomic DNA more difficult to clone than others?
Some DNA sequences which form normal parts of eukaryotic genomes seem to be effectively
unclonable in bacteria. Some sequences are simply very AT rich, others may contain long inverted
repeat sequences which can form stable cruciform structures and which may interfere with the
supercoiling of plasmids containing them. Thus the plasmids cannot be maintained, the antibiotic
resistance is lost and the host dies. Some sequences are simply “poison”. Possibly they can
accidentally code for a polypeptide with detrimental effects, sometimes they accidentally act as
promoters driving unwanted transcription into the vector. Modern vectors contain powerful
“terminator” sequences flanking the cloning site which prevent insert driven transcription escaping
from the cloning site. A DNA sequence does not have to be absolutely unclonable to be
effectively unclonable. If a sequence is present at just 10% of the average frequency in a library
and that library is less than 10 deep then it is likely to be absent.

How do we create the fragments for cloning?
If we want to clone fragments of genomic DNA which are even as small as 20kb we cannot just
cut the genomic DNA to completion with a restriction enzyme.
For an average enzyme, the average distance between sites is given by the formula:

                        D = 4n
               where D = distance in base pairs,
                        n = number of bases in the recognition site
                        and 4 because there are 4 different bases in DNA
this is shown in this table for the two commonest classes of enzyme.

             Number of bases in recognition     Average distance between sites
                          4                                    256
                          6                                   4096

Neither of these distances is long enough to be useful in a
complete digestion.

Two methods have been used to generate suitable large
fragments of DNA.
    1. carry out a partial restriction digest where a small
       amount of the restriction enzyme is added to the DNA
       for a short time so that only some sites are cleaved.
       This can be difficult to control because high molecular
       weight DNA solutions are very viscous and it is
       difficult to mix the enzyme with the DNA adequately
       in the short time available (maybe only 2 min. or so)
       without shearing the DNA by too vigorous stirring.
    2. A better method is to mix the restriction enzyme with
       its corresponding methylase. Then the cleavage and
       the protective methylation reactions are in a race (see       Figure 1: a partial digestion time course.
       Figure 2). Both enzymes will diffuse into the viscous         Genomic DNA has been digested with a
                                                                     small amount of restriction enzyme for from
       DNA solution at the same rate so the proportion of            0 to 25 min. The size markers are  digested
       cleaved sites compared to the protected sites remains         with HindIII
       constant throughout the reaction. The proportion of
       sites which are cleaved is determined by the relative amounts of the two enzymes which
       can be adjusted in trial reactions beforehand

Restriction sites occur randomly in the genome.
But that is not what you want, you would prefer
them to occur evenly spaced in the genome. True
randomness gives some areas of the genome where
the sites for one enzyme are widely spaced and
other areas where the sites are tightly clustered.
This has the consequence that it requires different
digestion conditions to prepare the same length
DNA fragments from different regions of the
genome. Consequently, a range of conditions are
sometimes employed and the resulting digests are
mixed before cloning is attempted. This effect has
more marked consequences for enzymes with a six
base pair site rather than those with a four base pair Figure 2: competition between a restriction enzyme
site. So, on the whole, four base pair enzymes             and its methylase
such as MboI have been preferred for library
construction. (MboI has the added advantage that its sticky end is compatible with the sticky end
generated by the six base pair enzyme BamHI which can be used to prepare the vector.) No further
treatment of the DNA is required after the digestion for lambda and cosmid cloning because the
lambda head packaging machinery takes care of the size selection. But for YACs or BACs the
partial restriction digest is usually size fractionated by electrophoresis using pulsed field gel
electrophoresis, the region containing the required size of DNA fragments is cut from the gel and
the agarose is digested away using the enzyme “β-agarase”. Sometimes, to protect the high
molecular weight DNA from breakage by random shearing, the partial restriction digest and the
ligation to vector are both carried out before the agarose is removed.

What do we do once the library is made?
A library is formed when a population of recombinant DNA molecules is inserted (by
transformation or transfection) into the host E. coli. At this point the experiment will still
comprise just a suspension of bacteria in some nutrient broth. Any “clone” will still be represented
by just a single bacterium. The suspension will then be plated onto nutrient agar (which in the case
of plasmid libraries will contain a selective antibiotic). Colonies (or plaques – but from here on I'll
just refer to colonies) form overnight, each will consist of about a million descendants of the
original transformant. At this point the library is “unamplified”. If the library could be screened at
this point (for instance by making a replica plate and carrying out a “colony hybridisation” using a
gene specific probe) then all the signals obtained would result from independent cloning events.
(The number of signals would give you some indication of the “depth” of the library. Usually the
library will be plated at a high density onto large square filters (20cm × 20cm) placed on nutrient
agar plus antibiotic. It is not easy to store libraries in this unamplified state. They must first be
replicated onto fresh filters so that replicas retain the positions of all the colonies. Replica filters
can then be stored at -80°. Often all the colonies of one replica will be resuspended in a small
volume of buffer and 1ml aliquots frozen at -80°. In this state the library is available for replating
if it is desired to screen it. However, there is no longer any guarantee that signals obtained from
the replated library will be independent – each colony will have contributed approximately one
million genetically identical cells to the suspension each of which can found a new colony. Also,
if two different laboratories each screen the same amplified library with the same probe and each
obtain some positive colonies it will be necessary to test if they have each obtained exactly the
same clone or whether
they have obtained
overlapping clones which
were both originally
present in the unamplified

To circumvent these
problems “gridded”
libraries have been
introduced. The first
stage of library
construction is carried out
as normal except that the
library is plated out at a       Figure 3: “gridding” a library
low colony density so that clones are well separated. (In the old fashioned libraries above,
colonies were plated at a very high density to minimise the number of plates which had to be
replica plated and to minimise the number of filters which had to be screened.) Then a colony
picking robot takes 96 sterile needles and pricks 96 colonies. The 8 × 12 array of needles is used
to inoculate small cultures in the wells of a 96 well tray, the needles are sterilised and the robot is
then ready to pick another 96 colonies. In this way thousands of individual clones can be picked.
(The Sanger institute has a good information page at which is worth a visit.)
Each clone thus gains an address, its
plate, row and column number e.g.
255A6 would mean tray 255 row A
column 6. The advantages of gridding
are several. The trays are easily handled
by robots and so replication is
straightforward. Trays are easy to store
in a deep freeze. Copies of the same
library can be made available to any
interested laboratory and information
about any individual clone can be placed
in a central database. In this way, if two
laboratories discover independent pieces
of information they will realise that both
                                                 Figure 3: Colony picking robot at the Sanger Institute
facts are true of the same clone – for           On the right, illuminated by a red lamp, are the nutrient agar plates, in
instance that two different genes are in         the middle are trays of 70% ethanol for sterilising the pins, on the left
the same clone, which might not have             are trays containing wells of nutrient broth into which individual
                                                                   placed. The pins are in rack behind the lamp
been easy to detect if the two laboratories colonies will beforward individually and apositioned very exactlyand they
                                                 can be brought                                                     over a
had probed the same amplified library            single colony.
each with their own unrelated probes.
As time has gone by most labs have upgraded to 384 well (double density) and even to 1536 well
(quadruple density) trays. The same 96 pin “hedgehogs” can still be used to transfer between trays
– it just takes extra cycles by the robot.

Library screening
Libraries may be screened by one of two methods, colony hybridisation or by PCR screening.
   1. colony hybridisation: this is essentially the same as is used to screen a cDNA library. The
      probe has frequently been a cDNA –a genomic
      clone for a known gene is identified by screening        Chromosome walking
      colonies with the appropriate cDNA clone. The            From an initial clone, “subclone” the end
      probe will only hybridise to the exons. If the gene      fragments so that each may be used to
                                                               reprobe the library. The ends may hybridise
      contains introns which are bigger than the insert        to genomic clones which overlap and extend
      size in the library then you will not be able to         the first clone. Identify how the clones
      identify clones containing them directly. You            overlap, subclone the end fragment of the
                                                               new clone and continue for as many cycles as
      may need to “walk”. Gridded libraries can be             necessary.
      replica plated by robots onto nylon filters so that                                   End probe
      colonies grow in a very dense array. When the
      colonies have been lysed and their DNA has stuck
      to the filter at the site of the colony, they can be
      probed with a radioactive cDNA probe. Problems
      may arise if your probe contains a repetitive             original
      sequence or if the gene is a member of a gene             clone                       New clone
      family in which case genomic clones for related              Original site of
      genes may be found. This is especially a problem             hybridisation to
                                                                   cDNA probe
      if there are “retroposons” of your gene in the
      genome which will probably contain most of the cDNA sequence and thus give a much
      stronger signal than clones derived from the real gene but which contain only a few small
   2. Figure 4 shows a typical array pattern. Each dot represents one colony. Thirty two 96 well
      trays of colonies have been arrayed here (or more likely, 16 × 96 wells twice, so that each
      colony is present in duplicate).

Figure 4:   A typical colony screening filter.        Figure 5:   some duplicate signals overlaid with an
                                                               alignment grid
   3. PCR screening: When you already know some DNA sequence in the target gene you can
       design a PCR assay to ask which genomic clone contains it. It would be time consuming,
       expensive and foolish to PCR every clone in the library – this would be thousands of PCR
       reactions. So it is necessary to subdivide the library into pools which can be used to reveal
       the coordinates of the positive clones. A library of 20,000 clones will consist of about 200
       × 96 well plates. The clones in these can be pooled into groups of 10 plates – see the
       diagram in figure 6 of one such “super-pool”. Then, from each super-pool of 10 plates, can
       be made 10 individual plate pools, 12 column pools and 8 row pools. In the figure this
                                                                        corresponds to 10 yellow
                                                                        pools, 12 red pools and 8
                                                                        blue pools. Every clone in
                                                                        the library will thus be found
                                                                        within one plate pool, one
                                                                        row pool and one column
                                                                        pool. The library is screened
                                                                        by first just testing the super-
                                                                        pools (20 PCR reactions plus
                                                                        controls). When a positive
                                                                        signal is obtained in one
                                                                        such pool then its plate / row
                                                                        / column sub-pools are tested
                                                                        (10 + 8 + 12 = 30 PCR
Figure 6: part of a library pooling scheme                              reactions). There should be
       one positive signal in each of these which will reveal the coordinates of the positive clone.
       Finally this can be confirmed with a third round of testing (1 PCR reaction). Giving a
       grand total of about 60 reactions (including controls). This method works well so long as
       the super-pools are unlikely to contain more than 2 or 3 positive clones each (preferably no
       more than one). It is also very amenable to robotization and so robots can carry out many
       such screens simultaneously.

Shared By: