greg_wilde_report by stariya

VIEWS: 5 PAGES: 17

									   Feasibility of Prediction of EPR Spectra by

               Molecular Dynamics Simulation
                                             Greg Wilde




\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
Introduction

      The goal of this project was to model the dynamics of a spin label, namely IASL, with

Insight and Discover. The first few weeks of the summer were spent in getting acquainted with

the software. The project then involved determining the best way to represent IASL’s structure

for simulation with Discover. Several methods were assessed, and the quickest method was

chosen while still retaining accuracy. Once the label was built and recognized correctly by

Insight, several tests were considered to allow for dynamics runs to be fully automated, and a

program was written in Biosym tcl which used what we felt was the best test for ending

dynamics simulation once the spin label established an equilibrium.

      In what is to follow, I will outline exactly how Insight and Discover were used for

implementation of the desired parameters. The first section discusses some of the basics in using

the software for what we wanted to achieve. The second section discusses some of the options

considered for building a spin label and how this was ultimately done. The final section consists

of an outline of constituents of the automation program and of immediate future goals for this

project.

Some Interesting How-to’s with Insight

1. For rendering, coloring, and displaying a molecule in a desired manner, the Molecule menu

    contains most of what you need. The important thing here is how to use this menu, and this

    involves specifying atoms, residues, and molecules in an appropriate manner. The basic

    method of specifying is MOLECULE:RESIDUE:ATOM. Commas can be used in each

    argument as separators for specific residues, molecules, and atoms; stars and question marks

    are used as wild cards just as in Unix; and dashes can be used for specifying a range of

    residues. Residues must be specified by either their sequence number or type, but not both in



\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
    the same argument. For example, to specify all the carbons and oxygens and gamma sulfurs

    on lysines or from residue 132-167 on the heavy chain in molecule X, you would use

    X:LYS,H132-H167:O*,C*,SG. Any number of operations can be performed on that residue.

2. Superimposing two molecules. The key to this is making sure that you specify the same

    number and types of atoms in each molecule for superimposition. For quick comparison, this

    may mean simply superimposing ten residues in common between both molecules. For exact

    comparison, both molecules must be tabulated and all similarities must be exploited. For

    example, to superimpose Rayment’s subfragment 1 and Pitbull, use the Molecule:Tabulate

    command to tabulate the residues of each molecule and determine at what point the residues

    begin to align in the sequence of each molecule. Then specify these residues as in 1) to

    superimpose them.

3. Using subsets to define regions or properties of interest. Subsets are your friend. There are

    three types of subsets: defined (by a property), zone, and interface. To make a defined subset,

    a molecule, a property, and values for that property must be selected. This is useful for

    selecting residues that are within a certain value for hydrophobicity or temperature factor, for

    example. The zone subset is used to group the set of atoms, residues, or molecules within a

    certain radius of a central set of atoms, which can be defined as in 1) or as the

    ATOM_SELECTION subset or any other subset. This is useful for fixing a set of atoms in a

    dynamics simulation. A zone subset is defined within a certain radius of the spin label, which

    is allowed to be active. The SUBSET:COMBINE command is then used to make a new

    subset FIX which is the difference between myosin and the ACTIVE subset. The interface

    subset is similar to the zone subset in that a central set of atoms is defined and a radius is

    defined, except that the set of atoms from which the new group of atoms is taken must also




\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
    be specified. This is useful for displaying, for example, the solvent accessible surface of actin

    within 25 angstroms of myosin in pitbull.pdb. The set of atoms from which the new subset of

    atoms, SOLV_ACCESS_ACT, is to be taken is defined as actin, the central set of atoms is

    myosin, and the radius is 25. Using Interface instead of Zone ensures that the new set is

    composed of only actin atoms. The MOLECULE:SURFACE command can then be used to

    create a Conolly surface of the subset SOLV_ACCESS_ACT.

4. Coloring by property. The MOLECULE:COLOR command can be used. First property must

    be selected as the coloring specification once the set of atoms or residues to be colored is

    defined. Then the property—sequence number, hydrophobicity, chirality—is selected. For

    certain properties, there is a default spectrum of colors that are used. For others, use

    MOL_SPECTRUM. The colors, range, and boundary colors can be modified with the

    Spectrum menu.

Building the Label

      IASL’s structure was assembled using the Builder module in Insight. The methodology

was quite straightforward, except for modeling the nitroxide bond containing the unpaired

electron. Insight allowed for partial double bonds, which, when implemented, created a more

trigonal planar shape rather than tetrahedral shape at the nitrogen. Although partial double bonds

are normally used to model resonant structures, where there is equal sharing of electrons between

atoms, the unequal sharing of the unpaired electron between the nitrogen and oxygen in the

nitroxide bond could be accounted for by manipulation of partial charges on these two atoms.

      Semiempirical and ab initio methods were considered for attaining a quantum

mechanically accurate structure for IASL4, but eventually it was decided that these methods were

too complex of a treatment for two reasons:



\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
1) Since the dynamics simulation that Discover or Amber produces is based on a Newtonian

physics method and because this simulation for an EPR spectrum has to be carried out over at

least one million time steps, whether the structure is minimized quantum mechanically is

essentially irrelevant.

2) Discover was able to produce the correct minimized structure according to the crystal

structure for another spin label (2,2,6,6-tetramethyl-4-piperidinone-1-oxyl1) using a simulated

annealing procedure. These structures are found in the 2266tetramethyl files. This procedure is

probably accurate over a relatively small number of atoms, and for our purposes IASL was well

within its limitations. Simulated annealing produced fifty structures, all of which were stored in

the archive file. The energies for all of these structures were then tabulated using the Analysis

module, and the frame number with the lowest energy structure was chosen.

       Once the geometrically optimized structure was produced by simulated annealing, it was

necessary to use the MOPAC/AMPAC module in Insight to generate theoretically accurate

partial charges for the structure. The protocol for MOPAC was taken from MSI’s support page

concerning adding an ionized tyrosine to the residue library--www.msi.com/support/FAQ.html.

The AM1 hamiltonian2 was used with restricted Hartree-Fock theory because unrestricted

Hartree-Fock theory was unable to generate the correct Mulliken charge population analysis4.

The following is the output structure with partial charges from MOPAC (It can be viewed as a

.car file):

FINAL GEOMETRY OBTAINED                              CHARGE
GRAPH DOUBLET AM1 1SCF MMOK MULLIK
INSIGHT generated AMPAC/MOPAC input file
DATE: Tue Jul 21 09:59:25 1998
 N 0.0000000 0 0.000000 0      0.000000 0 0        0     0  0.0033
 O 1.3183060 0 0.000000 0      0.000000 0 1        0     0 -0.3588
 C 2.8548730 0 60.735138 0 0.000000 0 2             1    0 -0.2105
 H 1.1052180 0 146.300079 0 -140.086685 0         3     1 2    0.0786
 H 1.1051670 0 106.860168 0 -124.239586 0         3     4 1    0.1036
 H 1.1053200 0 107.305809 0 -113.950066 0         3     4 5    0.0848



\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
 C   1.4943160   0   117.499352 0    -26.067738 0    1    2 3   0.0750
 C   1.5424950   0   107.479088 0    -61.999462 0    7    3 4 -0.2267
 H   1.1054650   0   111.680130 0    -61.851372 0    8    7 3   0.1023
 H   1.1048900   0   112.069122 0     58.023907 0   8     7 3   0.0797
 H   1.1029020   0   112.525177 0    178.388672 0    8     7 3   0.0752
 C   1.5638810   0   106.705391 0     56.128834 0   7     3 4 -0.1832
 H   1.1089950   0   109.138687 0     43.291714 0   12     7 3   0.0966
 H   1.1061780   0   109.682854 0    -71.167839 0   12     7 3   0.0970
 C   2.4916550   0   148.685593 0   -108.916786 0     1    7 3 -0.2108
 H   1.1051770   0    93.284210 0   172.605194 0    15     1 7   0.1040
 H   1.1052190   0   106.858009 0   -152.640625 0    15    16 1 0.0802
 H   1.1053040   0   106.525398 0   -114.481674 0    15    16 17 0.0836
 C   1.4943930   0   123.228271 0   -146.940247 0     1    7 3   0.0755
 C   1.5652270   0   106.701904 0   -176.206955 0    19    15 16 -0.1764
 H   1.1068420   0   109.595207 0     71.673302 0   20    19 15   0.1074
 H   1.1091370   0   108.932915 0    -42.925575 0   20    19 15   0.0916
 C   1.5423810   0   107.445526 0    -58.024807 0   19    15 16 -0.2267
 H   1.1049040   0   112.077690 0    -58.171314 0   23    19 15   0.0853
 H   1.1054740   0   111.684242 0     61.702450 0   23    19 15   0.1008

      The first number after the atom it’s referring to represents distance between that atom and

the atom referenced by the line number in the eighth column of information; the second number

represents the angle between that atom and the atoms referenced by the eight and ninth column,

with the eighth column serving as vertex; and the third number represents the torsion angle

formed between that atom and the atoms referenced by the eighth, ninth, and tenth columns, in

that order.

      Using these partial charges, the following addendum was made to the cvff.rlb file (residue

library for consistent valence forcefield), except that the columns line up in the original:

      IASL                35     0.000 IASL, attached to cystine
      CH SG 1.0 *          0.000 0.000 0.000       0 0 c2 -0.2042 act 0
      HH1 CH 1.0           0.000 0.000 0.000       0 0 h 0.1022 act 0
      HH2 CH 1.0           0.000 0.000 0.000       0 0 h 0.1020 act 0
      C CH 1.0           0.000 0.000 0.000       0 0 c' 0.3317 act 1
      O C 2.0           0.000 0.000 0.000       0 0 o' -0.3317 act 0
      NZ C 1.5           0.000 0.000 0.000       0 0 n -0.3376 amd 0
      HZ NZ 1.0           0.000 0.000 0.000 0 0 hn 0.2221 amd 0
      CE NZ 1.0           0.000 0.000 0.000 0 0 c1 0.0328 amd 1
      HE CE 1.0           0.000 0.000 0.000 0 0 h 0.0828 amd 0
      CD CE 1.0           0.000 0.000 0.000 0 0 c2 -0.1895 amd 0
      HD1 CD 1.0           0.000 0.000 0.000       0 0 h 0.0943 amd 0
      HD2 CD 1.0           0.000 0.000 0.000       0 0 h 0.0942 amd 0
      CG CE 1.0           0.000 0.000 0.000 0 0 c2 -0.1827 amd 0
      HG1 CG 1.0           0.000 0.000 0.000       0 0 h 0.1011 amd 0
      HG2 CG 1.0           0.000 0.000 0.000       0 0 h 0.0816 amd 0



\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
      CB CD 1.0      0.000 0.000 0.000 0 0 c 0.0852 ring 0
      CB1 CB 1.0     0.000 0.000 0.000    0 0 c3 -0.2167 ring 0
      HB11 CB1 1.0     0.000 0.000 0.000    0 0 h 0.0783 ring 0
      HB12 CB1 1.0     0.000 0.000 0.000    0 0 h 0.1033 ring 0
      HB13 CB1 1.0     0.000 0.000 0.000    0 0 h 0.0846 ring 0
      CB2 CB 1.0     0.000 0.000 0.000    0 0 c3 -0.2330 ring 0
      HB21 CB2 1.0     0.000 0.000 0.000    0 0 h 0.1072 ring 0
      HB22 CB2 1.0     0.000 0.000 0.000    0 0 h 0.0899 ring 0
      HB23 CB2 1.0     0.000 0.000 0.000    0 0 h 0.0854 ring 0
      CA CG 1.0      0.000 0.000 0.000 0 0 c 0.0857 ring 0
      CA1 CA 1.0      0.000 0.000 0.000    0 0 c3 -0.2167 ring 0
      HA11 CA1 1.0     0.000 0.000 0.000    0 0 h 0.0783 ring 0
      HA12 CA1 1.0     0.000 0.000 0.000    0 0 h 0.1033 ring 0
      HA13 CA1 1.0     0.000 0.000 0.000    0 0 h 0.0846 ring 0
      CA2 CA 1.0      0.000 0.000 0.000    0 0 c3 -0.2330 ring 0
      HA21 CA2 1.0     0.000 0.000 0.000    0 0 h 0.0899 ring 0
      HA22 CA2 1.0     0.000 0.000 0.000    0 0 h 0.1072 ring 0
      HA23 CA2 1.0     0.000 0.000 0.000    0 0 h 0.0854 ring 0
      N CA 1.0 CB     0.000 0.000 0.000    0 0 no -0.1033 ring 1
      O N 1.5      0.000 0.000 0.000    0 0 o- -0.2651 ring 0

      The last four columns of this file are all that Insight needs for correctly representing IASL

in a dynamics simulation. These are the potential atom types, which are defined in the

cvff_templates.dat file, the partial charges, the charge groups, and the switching atoms for these

charge groups. The other columns refer to atom bonded to, bond order, bond distance, bond

angle, and torsion angle. This is all explained on the Insight file types page which can be

accessed by going to http://www.msi.com/doc username: msi password: msi-doc or going to the

local copy on BSCL’s home page.

      The potential atom types are listed in the first of these last four columns and are the most

important determinant of dynamics runs, for they dictate how that atom oscillates, how a valence

angle it’s involved in oscillates, and how a torsion angle it’s involved in oscillates. The potential

atom type also contributes to cross terms in the energy expression if these are used3. The last

three columns deal specifically, then, with the nonbond Coulombic interactions. The partial

charge is the charge used for that atom in the Coulombic expression, the charge group is a group

whose partial charges all sum to zero so that cutoffs for the distance within which the energy

expression will be evaluated for a dynamics run don’t change the charge on the molecule, and


\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
the switching atom is the atom which must be within the cutoff for that group to be included in

the calculation. As you will notice if you look closely, the partial charges defined in the rlb file

were manipulated from the original charges generated by MOPAC so that they summed to zero

for the various charge groups.

      Once this addendum was made to the residue library, the Cartesian coordinate file also had

to be manipulated so that the atoms were in the same order as those in the rlb file so that Insight

would recognize them. The Cartesian coordinate file for IASL is below:

      !BIOSYM archive 3
      PBC=OFF
      input file for discover
      !DATE Tue Jul 28 10:14:34 1998
      CH       7.781514168 -2.453332424 -8.581042290 ACT 1D c2      C -0.204
      HH1       7.058044910 -2.064925432 -7.842758656 ACT 1D h      H 0.102
      HH2       8.695223808 -2.686850786 -8.005611420 ACT 1D h      H 0.102
      HH3       7.350610256 -3.313054562 -9.094169617 ACT 1D h      H 0.000
      C      8.104196548 -1.366734028 -9.595120430 ACT 1D c'      C 0.332
      O      9.274229050 -1.096067309 -9.841795921 ACT 1D o'      O -0.332
      NZ       7.187695026 -0.661928833 -10.254128456 AMD 1C n      N -0.338
      HZ       7.649980545 0.008373993 -10.874497414 AMD 1C hn       H 0.222
      CE      5.731539249 -0.824736476 -10.124553680 AMD 1C c1       C 0.033
      HE       5.469759941 -1.859790206 -9.826653481 AMD 1C h       H 0.083
      CD       5.070664883 -0.517400205 -11.492909431 AMD 1C c2      C -0.190
      HD1       5.366045475 0.505394220 -11.804093361 AMD 1C h       H 0.094
      HD2       5.473124027 -1.186820269 -12.277119637 AMD 1C h      H 0.094
      CG       5.163462162 0.191637427 -9.103297234 AMD 1C c2       C -0.183
      HG1       5.638068199 0.056319732 -8.113319397 AMD 1C h       H 0.101
      HG2       5.460052013 1.210425735 -9.425759315 AMD 1C h       H 0.083
      CB       3.508353233 -0.611164689 -11.473765373 RINg 1B c    C 0.085
      CB1       3.062123537 -2.084994793 -11.561230659 RINg 1B c3    C -0.217
      HB11 3.451367617 -2.583674669 -12.467103958 RINg 1B h          H 0.078
      HB12 3.402240753 -2.681606293 -10.698297501 RINg 1B h          H 0.103
      HB13 1.960952044 -2.177205563 -11.592575073 RINg 1B h          H 0.085
      CB2       3.003194571 0.104307659 -12.753469467 RINg 1B c3    C -0.233
      HB21 3.447175264 -0.325460106 -13.669819832 RINg 1B h          H 0.107
      HB22 1.905986667 0.032370541 -12.864768982 RINg 1B h           H 0.090
      HB23 3.248203516 1.182071924 -12.743351936 RINg 1B h           H 0.085
      CA       3.607823133 0.133532256 -8.954020500 RINg 1A c      C 0.086
      CA1       3.204863787 -1.048842072 -8.049066544 RINg 1A c3    C -0.217
      HA11 2.111191511 -1.089652181 -7.893223763 RINg 1A h          H 0.078
      HA12 3.498114109 -2.024073601 -8.472522736 RINg 1A h          H 0.103
      HA13 3.669029236 -0.982845843 -7.048577785 RINg 1A h          H 0.085
      CA2       3.180124044 1.438269138 -8.233604431 RINg 1A c3     C -0.233
      HA21 3.698766708 1.565259814 -7.265929699 RINg 1A h           H 0.090
      HA22 3.399315596 2.333250761 -8.844088554 RINg 1A h           H 0.107
      HA23 2.095747232 1.460177898 -8.021370888 RINg 1A h           H 0.085
      N      2.899796963 0.023303550 -10.265320778 RINg 1     n   N -0.103



\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
      O      1.596008420    0.218258858 -10.271338463 RINg 1     o    O -0.265
      end

      Once this file was created, the structure was added to the fragment library as described by

the MSI’s FAQ page. This completed the creation of IASL for simulation. It is important to note

that when the potentials on myosin with IASL attached are fixed, it is normal for the program to

adjust some partial charges so that the charge of the entire molecule balances out. This

adjustment should not change IASL’s partial charges to a very great degree, but it adds another

argument to using only a simplified method for generating the minimized structure and

calculating partial charges, since Insight has to change this parameter to a certain extent anyway.

Automating with Biosym tcl

      The automation program is shown below; it is divided into sections, with an explanation

for each section.

#BIOSYM btcl 3
#
# Input File For Discover Generated By Insight 97.0
# Date:       Sat Jul 25 18:39:26 1998
# User Name: wilde
# Host Name: bi7.msi.umn.edu
# Host Type: iris
#
# System Name: MYSEDH0
#

#Stage Name: 1 begin
  begin
#

      Tcl procedures are the equivalent of subroutines and functions and always have to be put at

the beginning of the input file. They are always preceded by the word proc with the name of the

procedure, the input and output variables in braces, and the entire procedure in braces. For

variables that are returned by the procedure the upvar command must be used; this command

associates another variable in the procedure with the output variable, and thenceforth this new




\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
variable is used in operations. At the end of the procedure, the value of this variable is returned

as the output variable’s value.


# procedure for finding the maximum value
 proc AMAX1 {amax1 a b} {
   upvar 1 $amax1 c
   if {$a > $b} {
   $c = $a} else {$c = $b}
   }

       In the following procedure, the first two variables are output variables, so two upvar

statements are necessary. The variable j is used for counting the total number of times this

procedure returns a value for probks (the probability K-S statistic). Any echo statement in a Btcl

program is written to standard output if Discover is being actively run or to the runname.out file

if Discover is run in the background. In testing the program, oftentimes a value would be too

large to calculate for term; hence, the exiting feature. Notice that to exit the procedure at any

time with the current values of the output variables returned, a simple return statement is

invoked. I wasn’t aware of any way of setting parameters, so I defined variables for these.


# procedure for computing probability--returns array element of array prob for conditional
# loop--uses alam for calculation
 proc PROBKS {j probks alam} {
   upvar 1 $probks c
   upvar 1 $j b
   echo alam = $alam
   $eps1 = 0.001
   $eps2 = 1.0e-8
   set a2 [expr -2.0 * pow($alam,2)]
   if {$a2 < -170 || $a2 > 230} {
      echo Factor is too big or too small, breaking loop
      return}
   echo a2 = $a2
   $fac = 2
   $c = 0
   $termbf = 0
   for {$i = 1} {$i < 101} {incr i} {
      set term [expr $fac * exp($a2 * pow($i,2))]
      $c = $c + $term
      if {[expr log($c)] < [expr log(1e-100)] || [expr log($c)] > [expr log(1e100)]} {
         echo Prob is too big or too small, breaking loop
         $b = $b
         return}



\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
     if {[expr abs($term)] < [expr $eps1 * $termbf] || [expr abs($term)] < [expr $eps2
* $c]} {
        $b = $b + 1
        return}
     $fac = -$fac
     set termbf [expr abs($term)]
   }
   $c = 1
   $b = $b + 1
   }

        The second stage always sets parameters for the nonbond calculations, which refer to the

Coulombic and van der Waals terms in the energy expression. The main summation methods

I’ve been using are group_based and cell_multipole. Group_based with a cutoff of 9.5 angstroms

is the default; any other method must be specified. Refer to the Discover 3 manual for

instructions on using the forcefield command. A group_based method calculates nonbond

interactions by dividing all atoms into charge groups whose atoms all sum to zero and using a

switching atom in each of those groups which determines whether that group will be included in

the calculation based on the cutoff distance3. The cell_multipole method divides atoms up into

larger and larger cells as you move farther and farther from the point of calculation, and the

electric and van der Waals potentials of each of these cells are used for determining the nonbond

interactions at the calculation point3.


#Stage Name: 2 nonbonds
  forcefield nonbond \
    -separate_coulomb \
    coulomb \
     -distance_dependent_dielectric \
     dielectric_value = 1.0
  #
#

       The atom movability can apply to a subset defined in the .mdf file or to specific atoms and

residues specified as in section 1 of this discussion. Atom movability is used to reduce

computation time by fixing some atoms in a certain position and excluding others from the




\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
calculation. Fixed atoms are not allowed to move but still contribute to calculations through

nonbond interactions, whereas excluded atoms are completely excluded from the calculation.


#Stage Name: 3 fix
 setAtomMovability fixed 3__fix \
  "FIX"

#Stage Name: 4 dynamics

Create an object vectors with 3 elements in each item.
object vectors create -elements 3 double
 $length = 1200.0
 $lengtho = $length

Count the number of runs.

$i = 0

Count the number of values calculated for prob.

 $m = 0

Initialize array of prob values

 $prob($m) = 0.0
 rattle bonds -tolerance 1e-5
# while the K-S statistic is not satisfied

Vary the K-S statistic convergence criteria to allow for more flexibility in convergence.

 while {$prob($m) < 0.01} {
  dynamics \
   time = $length timestep = 3.0 \
   initial_temperature = 298.0 +boltzmann \
   ensemble = nvt temperature_control_method = velocity_scaling \
   integration_method = Velocity_verlet \
   temperature = 298.0 temperature_window = 10 \
   deviation = 5000 \
   execute frequency = 30.0 last_step = 0 command = {print history} \

Calculate the cross product vector at each time step of dynamics and append it to object vectors.
    execute frequency = 3.0 last_step = 0 command = {database handle dbh system.
      select nitro "*:1B:N"
      select carb1 "*:1B:CA1"
      select carb2 "*:1B:CB1"
      $dbh get coord_nitro Atom.Coord $nitro
      $dbh get coord_carb1 Atom.Coord $carb1
      $dbh get coord_carb2 Atom.Coord $carb2
      vector v1 subtract $coord_carb1 $coord_nitro



\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
       vector v2 subtract $coord_carb2 $coord_nitro
       vector v1_x_v2 cross $v1 $v2
       object vectors append $v1_x_v2
    }
# divide vectors into two data sets of x coordinates based on length
   object vectors group number
   object nmr range $number end
   set nmbr [object nmr]
   echo nmbr = $nmbr
   $en1 = $nmbr / 2
   $en2 = $nmbr - $en1

Get the x coordinates for each vector in object vectors

  object frsthlf range $vectors 0 [expr $en1 - 1]
  object frsthlfx get -elements 0 $frsthlf
  object scndhlf range $vectors $en1 end
  object scndhlfx get -elements 0 $scndhlf

Sort the values in ascending order.

   object frsthlfxsrt sort -value $frsthlfx
   object scndhlfxsrt sort -value $scndhlfx
# initialize for Kolmogorov-Smirnov
   $j1 = 0
   $j2 = 0
   $fo1 = 0
   $fo2 = 0
   $D = 0
# loop through elements of data sets, comparing the CDFs
   while {$j1 <= [expr $en1 - 1] && $j2 <= [expr $en2 - 1]} {
    object DAT1 range $frsthlfxsrt $j1
    object DAT2 range $scndhlfxsrt $j2
    set DATA1 [object DAT1]
    set DATA2 [object DAT2]
    if {$DATA1 <= $DATA2} {

Use the function double to return a real number.

       set fn1 [expr (double($j1 + 1)) / $en1]
       AMAX1 DT [expr abs($fn1 - $fo2)] [expr abs($fo1 - $fo2)]
       if {$DT > $D} {$D = $DT}
       set fo1 $fn1
       $j1 = $j1 + 1
       } else {set fn2 [expr (double($j2 + 1)) / $en2]
       AMAX1 DT [expr abs($fn2 - $fo1)] [expr abs($fo2 - $fo1)]
       if {$DT > $D} {$D = $DT}
       set fo2 $fn2
       $j2 = $j2 + 1
       }
   }




\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
Print out specified values.

   echo Run [expr $i + 1]
   echo j1 = $j1
   echo j2 = $j2
   echo en1 = $en1
   echo en2 = $en2
   echo D = $D
   $i = $i + 1
# compute the probability
   PROBKS m prob([expr $m + 1]) [expr sqrt(double($en1) * $en2 / ($en1 + $en2)) * $D]
# Increase the length of the run based on the number of runs done--max length=100 times original length
   echo prob = $prob($m)

Ceil() returns the value of the smallest integer not less than the input value.

  set x [expr ceil(double($i + 1) / 10)]

Every 10th run starting with run 11, the original length is multiplied by 10 then 20 then 30 until
100 is reached.

   if {$i == [expr 10 * ($x - 1)] && $i < 101} {set length [expr 10 * ($x - 1) * $lengtho]}
   }
# name the files according to the number of runs for the sorted lists and vectors and the
# number of elements in prob

Define a variable for each of the new files to be opened.

 set probs [open probs$m.out w]
 set ftbl [open vect$i.out w]
 set srtlists [open srtlst$i.out w]

Use for loops to put the desired values into those variables.

  for {$y = 1} {$y < [expr $m + 1]} {incr y} {puts $probs [format "%20.16f" $prob($m)]}
# create an object for each element of the sorted lists for comparison
  puts $srtlists "First Half   Second Half"
  for {$k = 0} {$k < $en2} {incr k} {
    object frsthlf_elmnt range $frsthlfxsrt $k
    object scndhlf_elmnt range $scndhlfxsrt $k
    puts $srtlists [format "%10.6f\t%10.6f" [object frsthlf_elmnt] [object scndhlf_elmnt]]
  }
# create an object for each element in vector (for printing purposes)
  object vectors_x get -elements 0 $vectors
  object vectors_y get -elements 1 $vectors
  object vectors_z get -elements 2 $vectors
  for {$j = 0} {$j < $nmbr} {incr j} {
    object vectors_x1 range $vectors_x $j
    object vectors_y1 range $vectors_y $j
    object vectors_z1 range $vectors_z $j
    puts $ftbl [format "%10.6f\t%10.6f\t%10.6f" [object vectors_x1] [object vectors_y1] [o
bject vectors_z1]]
  }
#


\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
 writeFile coordinate filename = .cor

      Initial tests of this program on IASL attached to an  helix indicated that the use of a

convergence criteria of .01 might be too stringent. Testing showed that when all residues in the

helix were fixed and only the label was allowed to move, the Kolmogorov-Smirnov statistic

came nowhere near to a value that would lead to convergence (on the order of 1030 off the

convergence criteria); this was after over a nanosecond of simulation time, a time scale on which

even an isotropic EPR spectrum in vacuo would have converged to an order parameter of 0.

Since, however, the test requires in effect at least double the simulation time of sampling all

orientational space once, it might not be unreasonable that these fixed simulations produced no

convergence for K-S when  for these tests was easily >270o. Tests in which both the spin label

and residues within 10 angstroms of it were allowed to move produced inconsistent convergence

results. Two out of four runs produced convergence after 13 runs—these are files MYTEST4.out

and MYTEST6.out.

      Clearly there must be future study of the statistical properties of the vectors generated by

Discover runs in order to determine how different parts of the orientational distribution are

related to variance in the Kolmogorov-Smirnov statistic and to production of EPR spectra. I

suggest accumulating molecular dynamics data on myosin with IASL incorporated and plotting

the distribution with respect to the mode and median. Hopefully, as more data on these molecular

dynamics runs is accumulated, more accurate parameters can be applied and more accurate

spectra can be generated.

      Once the statistical properties of the system are more fully understood, an important goal

of this project will be to determine the Euler transformations necessary to move from the spin

label’s coordinate axes frame of reference to the protein’s coordinate axes frame of reference so

that the orientation of myosin with the actin fiber axis can be determined from the angle of the


\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
spin label with the fiber axis. When the spin label’s most probable orientation is determined, the

simplest Euler transformation would be from the spin label’s coordinate axes to the protein’s

crystallographic axes. Have fun!




\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)
                                                Works Cited

      1. R.P. Shibaeva et al. “Crystal and Molecular Structures of 2,2,6,6-tetramethyl-4-

          piperidinone-1-oxyl.” Acta Crystallographica.1973.

      2. Michael Dewar et al. “AM1: A New General Purpose Quantum Mechanical Model.” J.

          Am. Chem. Soc. Vol.107: 3902-3909, 1985.

      3. Molecular Simulations, Inc. Forcefield-Based Simulations: Forcefields and Preparing

          the Energy Expression.

      4. Christopher J. Cramer. Notes on MO Theory and Quantum Mechanics.

          http://pollux.chem.umn.edu/~sullivan/8003/handouts/QM-MO_notes/index.html




\\BIOCHEM_DDT\DOC\PAPERS\Wilde\SUMMARY.DOC (Word 7.0 12/04/11 6:07 AM by gw)

								
To top