03 by PIjPmjW7


									Know the Limitations of your
  Data – X-ray, NMR, EM

        Pharm 201/Bioinformatics I
             Philip E. Bourne
              SSPPS, UCSD

 Prerequisite Reading: Structural Bioinformatics Chapters 4-6

                      Pharm 201 Lecture 3 2011                  1
When You Grab a PDB Fie What
   Are You Starting With?

          Pharm 201 Lecture 3 2011   2
                               Data Views
   • Depositor/Annotator
   • Type of experiment: X-ray, NMR, EM
   • Type of molecule: protein, nucleic acid, or protein-nucleic acid complex

                                 Step 2

                            Validation Report
Depositor     Step 1
              PDB ID
              Deposit                                    PDB
                            Annotate        Validate                        Distribution
                                                                  Core          Site
               Step 3                                              DB
                                       Step 4

                                Depositor Approval
                                 Pharm 201 Lecture 6, 2010
• Resolve nomenclature and format problems

• Add missing required data items

• Add higher level classifications

• Review validation report and summary letter to the

• Produce and check final mmCIF and PDB files

• Update status and load database

• Check data consistency across archive
                    Pharm 201 Lecture 6, 2010
     Annotation – More Specifics
• Make sure entry is complete (mandatory items from mmCIF
• Format exchange
   – Converts between PDB and mmCIF formats
   – Recognizes most variants of PDB format
• Check nomenclature
   – Residue
   – Polymer atoms
   – Hydrogen atoms
   – Ligand atoms

                       Pharm 201 Lecture 6, 2010
• Covalent geometry
   – Comparison with standard values (Engh and Huber1; Gelbin
     et al.3; Clowney et al.2 )
   – Identify outliers
• Stereochemistry – check chiral centers
• Close contacts in asymmetric unit and unit cell
• Occupancy
• Sequence in SEQRES and coordinates
• Distant waters
• Experimental (SFCHECK4)
                                         1R.A.Engh  & R.Huber. Acta Cryst. A47 (1991):392-400
                                         2L. Clowney et al. J.Am.Chem.Soc. 118 (1991):509-518
                                         3A. Gelbin et al. J.Am.Chem.Soc. 118 (1991):519-529
                                         4A.A. Vaguine, J. Richelle, and S.J. Wodak. Acta Cryst. D55


                      Pharm 201 Lecture 6, 2010
    The process by which
biological data in a database
are annotated and validated
  changes over time – this
    introduces a temporal

         Pharm 201 Lecture 6, 2010
              Summary Thus Far
• The biocurators (annotators) are the unsung
  heroes of modern biology
  P.E.Bourne and J. McEntyre 2006 Biocurators: Contributors to the World of Science
  PLoS Comp. Biol., (Editorial) 2(10) e142 [PDF]
  – International Society for Biocuration
• As a resource developer - start right and the
  need for data remediation in years to come will
  be less likely
• As a resource user - be aware of the process
  used to provide the data and hence the
  limitations of the data you are using

                             Pharm 201 Lecture 6, 2010
The quality of the data you use in
a bioinformatics experiment is a
 function of the method used to
 collect these data – understand
            the method

             Pharm 201 Lecture 3 2011   9
                           As of Oct 5, 2011


Pharm 201 Lecture 3 2011                       10
            X-ray Crystallography
•   Oldest technique
•   Majority of the depositions
•   A number of Nobel prizes
•   International Union of Crystallography (IUCr) .. Acta ..
•   Method based on scattering from electrons – hydrogen
    atoms usually not seen (sometimes modeled in)
•   In fact modeling in is an issue
•   Atoms of similar atomic weight not distinguishable eg O, N,
•   Influence of crystal packing eg malate dehydrogenase
•   Environment in crystal highly aqueous
•   Produces similar structures to NMR eg thioredoxin (3TRX
    vs 1SRX)              Pharm 201 Lecture 3 2011           11
     The X-ray Crystallography Pipeline
Basic Steps
          • Isolation,
 Target • Expression,       Data          Structure Structure   Functional
Selection • Purification, Collection      Solution Refinement   Annotation   Publish
          • Crystallization

                                 Pharm 201 Lecture 3 2011                    12
   Limitations - Crystallization
• Crystallization:
  –   Non-soluble
  –   Twinning
  –   Micro heterogeneity
  –   Disorder

                   Pharm 201 Lecture 3 2011   13
Limitations – Data Collection

          Pharm 201 Lecture 3 2011   14
Limitations - Refinement

       Pharm 201 Lecture 3 2011   15
     Limitations – Map Fitting
• In an intricate study the only way to be sure
  that the work is correct is to make your own
  judgment from the electron density – this is
  never done.
• It can be done at http://eds.bmc.uu.se/eds/
• It requires that the experimental data (the 100d
  structure factors be available)

                 Pharm 201 Lecture 3 2011    16
Limitations – Non-crystallographic
         Symmetry (NCS)

            Pharm 201 Lecture 3 2011   17
      Limitations – Refinement

• Introduces restraints/constraints that may or may
  be realistic
• Water has been used unnecessarily
• Resolution quoted wrongly
• Standards have helped
• See for example: H. Weissig, and P.E. Bourne
  1999 Bioinformatics 15(10) 807-831. An Analysis
  of the Protein Data Bank in Search of Temporal
  and Global Trends

                   Pharm 201 Lecture 3 2011       18
                    Limitations – Interpretation of the
                      Biologically Active Molecule


                                                     Pharm 201 Lecture 3 2011                                    19
 Limitations – Functional Annotation

• Functional annotation is ONLY in the publication
• Attempt to address this with GO assignments
• Attempt to address this with literature integration
• Structural genomics – function unknown
• One structure – one to many functions (power
  law) – functions may be unrecognized since the
  PDB is relatively static
• Many efforts at functional annotation
                    Pharm 201 Lecture 3 2011        20
Why Are Understanding Limitations

• Later we will study reductionism – a key
  process in the use of biological data
• As a result of reductionism you will need to
  choose a representative structure for the
  task at hand
• Understanding the limitations of the
  experiment will help us do this

                 Pharm 201 Lecture 3 2011    21
 Summary of Important Features in using
   Structure Data Determined by X-ray
• Resolution is a key indicator – think about it
  relative to atomic resolution ie 1.54A for a C-C
  single bond
• Disorder (ie undetermined or alternative atomic
  coordinates) is a natural part of many structures
• R factor (all) describes the agreement of the model
  with the experimental data. It should be better than
  0.20 (Rfree 0.26)

                    Pharm 201 Lecture 3 2011        22
   Summary of Important Features in using
     Structure Data Determined by X-ray
            Crystallography Cont.

• B (aka temperature)
  factors offer indicators
  both to the accuracy of
  a structure and the
  most mobile regions
• At right is 5EBX
  drawn with QuickPDB

                    Pharm 201 Lecture 3 2011   23

Pharm 201 Lecture 3 2011   24
              Features of NMR
• Limited in size (25-100 kDa) – provided labeled samples
  are obtainable
• Selected information on proteins to ~150kDa
• Solution study – small sample needed for soluble proteins
• Only a few solid state studies
• Reveals hydrogen positions
• Leads to an ensemble of dynamical structures – these are
  rarely used in bioinformatics studies
• Useful in high throughput screens to determine protein
  ligand interactions
• Used for phasing of X-ray structures ie the methods are
• Until recently applicable to membrane proteins
                      Pharm 201 Lecture 3 2011            25
              NMR - Methodology
• Molecules are tumbling and vibrating with thermal motion
• Usually labeled with H1 C13 N15 P31 - in an external magnetic field
  have two spin states – one paired and one opposed to the external
  magnetic field
• Detects and assigns chemical shifts of atomic nuclei with non-zero
• The shifts depend on their electronic environments ie identities and
  distances of nearby atoms
• The system can be tuned to look at specific features of the
  characteristic spin moments
• H1 H1 provides NOE constraints
• Better resolution is obtained when the molecule is tumbling fast – size
  slows this – offset by higher magnetic field strengths
• Protein must be soluble at high concentration and stable without
  aggregation – high throughput can show this and folded vs unfolded
  very quickly

                           Pharm 201 Lecture 3 2011                      26
      NMR – Methodology cont.
• Result is a set of distance constraints between pairs of
  atoms either bonded or non-bonded
• If there are sufficient constraints then an ensemble of
  possibilities results
• Often this ensemble is averaged and constraints adjusted to
  conform to normal bond lengths and distances
• Usually left with 15-30 members of the ensemble
• Ideally less than 1Å RMSD between models (backbone
• Portions of the molecule with high motion have tell-tale
  signals eg apo calmodulin

                       Pharm 201 Lecture 3 2011            27
BMRB - http://www.bmrb.wisc.edu/

           Pharm 201 Lecture 3 2011   28
                     NMR Terms
• COSY/NOESY spectra: Allow the space interactions between atoms
  to be measured and generate a 3D structure of the protein. (what we
  have discussed)
• TROSY Transverse Relaxation Optimized Spectroscopy: Invented
  about 1997. First described by Professor Kurt Wuthrich. Useful for
  analyzing larger protein systems. TROSY is a method for getting
  sharper peaks on large proteins. TROSY is best at higher fields. If the
  aim is to study a large complex or a chemical shift perturbation when
  a protein binds to a receptor using NMR, it’s better to use a 900 MHz
  machine than a more standard lower-field machine
• solid state NMR: Requires wider-bore (63 or even 89 mm diameter)
  magnets (than solution state NMR). The higher stored energy of these
  wide bore magnets means that they are significantly more difficult to
  build, and as a result high-field solid state NMR lags behind liquid
  state in terms of available field strength.
• multidimensional (three- and four-dimensional) NMR: Introduced
  about 12-15 years ago. This technology has the advantage of resolving
  the severe overlap in 2D spectra.

                          Pharm 201 Lecture 3 2011                     29
In both X-ray crystallography and
 NMR there is the danger that the
final structure reflects the model it
       was computed against

             Pharm 201 Lecture 3 2011   30
  Additional Validation Checks

• Stereochemical quality
  –   Ramachandran plot outliers
  –   Dihedrals, bond lengths and angles
  –   Fold Deviation Score (FDS)
  –   Validation Server

                    Pharm 201 Lecture 3 2011   31
       Use the PDB Geometry Data

Pharm 201 Lecture 3 2011           32
                Electron Microscopy

• Able to look at large molecular assemblies
• Resolution now 30A to below 4A
• Cryo-EM preserves aqueous environment (no
• Experimentally more tractable
• Can resolve images (direct measurement of
  phases) or diffraction patterns
• Can provide a 3D volumetric reconstruction
• Suitable for the study of membrane proteins eg
  bacteriorhodopsin (1990)
                          Pharm 201 Lecture 3 2011                             33
         1P85 Real space refined coordinates of the 50S subunit fitted into the low
          resolution cryo-EM map of the EF-G.GTP state of E. coli 70S ribosome

• Single particle reconstruction – multiple
  orientations of the same particle found in
  the specimen (viruses, ribosome…)
• Electron tomography – 3D reconstruction of
  a single particle (organelles, whole cells)

                       Pharm 201 Lecture 3 2011                          34
Example EM Result
                  •    Example for a hybrid study that combines
                       elements of electron crystallography and helical
                       reconstruction with homology modeling and
                       molecular docking approaches in order to
                       elucidate the structure of an actin-fimbrin
                       crosslink (Volkmann et al., 2001b). Fimbrin is a
                       member of a large superfamily of actin-binding
                       proteins and is responsible for crosslinking of
                       actin filaments into ordered, tightly packed
                       networks such as actin bundles in microvilli or
                       stereocilia of the inner ear. The diffraction
                       patterns of ordered paracrystalline actin-fimbrin
                       arrays (background) were used to deduce the
                       spatial relationship between the actin filaments
                       (white surface representation) and the various
                       domains of the crosslinker (the two actin-
                       binding domains of fimbrin are pink and blue,
                       the regulatory domain cyan). Combination of
                       this data with homology modeling and data
                       from docking the crystal structure of fimbrin’s
                       N-terminal actin-binding domain into helical
                       reconstructions (Hanein et al., 1998), allowed
                       us to build a complete atomic model of the
                       crosslinking molecule (foreground, color
                       scheme as in surface representation of the

                  •    From Structural Bioinformatics 2005 p124
    Pharm 201 Lecture 3 2011                                    35
Example EM Result           •   Example for a combination of high-resolution
                                structural information from X-ray crystallography
                                and medium-resolution information from electron
                                cryomicroscopy (here 2.1 nm). Actin and myosin
                                were docked into helical reconstructions of actin
                                decorated with smooth-muscle myosin (Volkmann et
                                al., 2000). Interaction of myosin with filamen-tous
                                actin has been im-plicated in a variety of biological
                                activities including muscle contraction, cytokinesis,
                                cell movement, membrane transport, and certain
                                sig-nal transduction pathways. Attempts to
                                crystallize actomyosin failed due to the tendency of
                                actin to polymerize. Docking was performed using a
                                global search with a density correlation measure
                                (Volkmann and Hanein, 1999). The estimated
                                accuracy of the fit is 0.22 nm in the myosin portion
                                and 0.18 nm in the actin portion. One actin molecule
                                is shown on the left as a molecular sur-face
                                representation. The yellow area de-notes the largest
                                hydrophobic patch on the exposed surface of the
                                filament, a region expected to participate in
                                actomyosin interactions. The fitted atomic model of
                                my-osin is shown on the right. The trans-par-ent
                                envelope repre-sents the density correspond-ing to
                                myosin in the 3D reconstruc-tion. The solution set
                                concept (see text) was used to evaluate the results
                                and to assign probabilities for residues to take part in
                                the interaction. The tone of red on the myosin model
                                is proportional to this statistically evaluated
                                probability (the more red, the higher the
                            •   From Structural Bioinformatics 2005 p127

            Pharm 201 Lecture 3 2011                                       36
Small-angle X-ray Scattering SAXS

• Reveals shape and size of macromolecules
  in the range 5-25nm
• Handles partially ordered systems
• No need for crystalline sample; larger
  molecules than NMR, but at lower
• Leading to hybrid techniques

                        Pharm 201 Lecture 3 2011                        37
       Summary Regarding Data
• Pay attention to the method its pluses and minuses
• Be aware of models
• Be aware of the general limitations of each method
• For NMR be aware of an ensemble of structures
• Be aware of hybrid models
• For all methods be aware of the parameters that govern the
• You will need to know these limitations for just about any
  bioinformatics study since it will be necessary to choose a
  non-redundant set (NR) – we will visit Astral and Pisces
  which are tools in defining an NR set

                       Pharm 201 Lecture 3 2011             38

To top