Docstoc

The Science and Engineering of - PowerPoint Presentation

Document Sample
The Science and Engineering of - PowerPoint Presentation Powered By Docstoc
					How Science Thinks:
 The Science and Engineering of Science and Engineering

    What is engineering?
      Engineering is a search for artifacts that serve particular functions.

    What is science?
      Science is a search for artifacts called models (or theories)
          that can explain and predict phenomena.

    Wait! So is science just a type of engineering?
      Um, yeah, I guess so...

    Okay smart ass, so what‟s search!?
How Science Thinks:
 The Science and Engineering of Science and Engineering

  What is Search?
     (It‟s not what Google does; that‟s more like “lookup”).

         Search is the exploration of a space of possibilities
                         for one or more that satisfy a particular goal.


  Gimme some examples!

    Searching for the right play in a game like football or Go.
    Searching for your cell phone.
    Searching for a place to eat tonight.
    Searching for an way to detect gravity waves.
    Searching for a theory of gravity waves.
    Searching for a cure for cancer.
How Science Thinks:
 The Science and Engineering of Science and Engineering

Searching for a way to detect gravity waves.

Searching for a theory of gravity waves.

                                                 Theory




                                               Theory drives and
 Observations drive and
                                               guides instrument and
 guide theory modification
                                               experiment development




                                               Observation
How Science Thinks:
 The Science and Engineering of Science and Engineering

          Searching for an instrument that will detect gravity waves.

          Searching for a theory of gravity waves.


     How big is the search space?

         E=mc2
         E=mc3
         E=mc4
         E=mc5             For every right answer in science,
         E=mc6             There is an infintitude(-1) of wrong ones!
         E=mc7
         E=mc8
         E=...
Historical Theory↔Experiment SeeSawing of the Gyromagnetic Ratio “g-factor”




                                          Galison (1987) How Experiments End. Chicago U. Press
How Science Thinks:
 The Science and Engineering of Science and Engineering


       For every right answer in science,
       There is an infintitude(-1) of wrong ones!

                        HOW COULD THIS EVER WORK!?



     1. Close is often good enough, or at least guides you to the right answer.

     2. Theory (model) guidance reduces the search space by huge orders!

     3. We‟ve been really really lucky ... so far, anyway!

     4. You are not alone! (>15Million abstract in pubmed alone!)
The Science and Engineering of “Drug Discovery”



                Pine trees seem a good place
                To start. Notwithstanding this table
                Of pine, unfinished, unruled,
                The pulp upon which we reveal
                The unnerved thought.
                How casual we are at discarding
                Our feelings, a rubble we
                Leave behind for the living.
                Who among us can absorb
                The spiritual load we see as
                What others carry.

                                          Alexander Shulgin
                                          PIHKAL 1991
The Science and Engineering of “Drug Discovery”
             CH3O                                 NH2




             CH3O


                                   SCH3

    3,4-dimethoxy-5-methylthiophen-
    ethylamine (60-100mg orally)
The Science and Engineering of “Drug Discovery”


                        How big is the search space?



  Searching for a theory of gravity waves.       Searching for a cure for cancer.


          E=mc2
          E=mc3
          E=mc4
          E=mc5
          E=mc6
          E=mc7
          E=mc8
          E=...


                     For every right answer in science,
                     There is an infintitude(-1) of wrong ones!
The Science and Engineering of “Drug Discovery”


 “Albert [Schatz] hunted for new strains of actinomyces in soil, in
 manure heaps, in drains, even from the culture plates that were
 being thrown away by colleagues working on other unrelated
 projects, indeed anywhere in the world that his imagination would
 take him—this was Albert‟s entire life.” (p. 215)

 “It was salt mine, where, in order to pull a practical antibiotic
 producer out of Mother Nature, we literally have to work our
 asses off. The failure rate is about 99.99 per cent” (p. 218).

 “Using techniques that seem closer to gardening than the
 intellectual exercise of science, Rene [Dubos] trowelled soil into
 pots, searched in farmers‟ fields, manure heaps, lawns and
 hedges, altered growing conditions, added and subtracted
 chemicals. (p. 65)
The Science and Engineering of “Drug Discovery”

    Afferent helps medicinal chemists do lead discovery.

          Drives the (robotic) synthesis of combinatorial reactions.
          Closes the synthetic/analytic loop on drug (lead) discovery.
          Gives scientists direct control over the search process.
Combinatorial Drug Discovery -- Closing the Loop
                                         Purity Filter
                    Combinatorial
         Robotics                    Assays
                      Library



   Combinatorial
    Chemistry                     Bad        Good
                                 Cases       Leads



                     Improved
                                     Analysis
                     Chemistry
Chemists “Teach” Afferent Organic Chemistry
Afferent Runs Chemical Robots to Do the Reactions
Afferent Simulates Combinatorial Chemistry
Afferent can “see” both successes and failures in mass spec data.
Afferent can make “educated guesses” about what might have gone wrong
How Science Thinks:
 The Science and Engineering of Science and Engineering

Searching for an instrument that will detect gravity waves.

Searching for a theory of gravity waves.

                                                    Theory




                                                  Theory drives and
 Observations drive and
                                                  guides instrument and
 guide theory modification
                                                  experiment development




                                                 Observation
Explanation is the main function of theories (models)




                       http://files.turbosquid.com/PreModel/Content_on_8_29_2002_07_36_31/gears03.jpgDA4233CA-5F01-47A6-9BDFCDF1F8087F89.jpgLarge.jpg
www.geocities.com/Baja/8205/gears.htm
Explanation is the main function of theories (models)

                                        Models
                               Attention and Language
                                  Skills that help us
                                Organize Experience




  Model Application is a Cognitive Process through which we organize experience.
  Explanation is the most obvious (public) features of this process.




                                       Models
                                   Sets of Models
                                    Applied in a
                                  Particular Domain
Explanation is the main function of theories (models)

                                        Models
                               Attention and Language
                                  Skills that help us
                                Organize Experience




                                 Scientific Search




                                       Models
                                   Sets of Models
                                    Applied in a
                                  Particular Domain
Explanation is the main function of theories (models)

                                        Models
                               Attention and Language
                                  Skills that help us
                                Organize Experience




                                                          Modeling and Explanation
      Forming Explanations:
       Labeling and categorization
       “Conceptual Blending”
       Sequencing of attention, action and expectations
       “Discovery” of non-obvious features
       Focusing on relevant features

                                       Models
                                   Sets of Models
                                    Applied in a
                                  Particular Domain
Explanation is the main function of theories (models)

                                        Models
                               Attention and Language
                                  Skills that help us
                                Organize Experience
   Forming New Abstractions




                                                Generalization to New Models
                                                    Records for Later Analogy
                                                     Domain Characterization
                                                           Repetition/Practice
                                               Label Abstraction (“Gene”, etc.)


                                       Models
                                   Sets of Models
                                    Applied in a
                                  Particular Domain
Explanation is the main function of theories (models)

                                        Models
                               Attention and Language
                                  Skills that help us
                                Organize Experience




                                 Scientific Search




                                       Models
                                   Sets of Models
                                    Applied in a
                                  Particular Domain
Chemists “Teach” Afferent Organic Chemistry
Afferent Simulates Combinatorial Chemistry
Afferent can “see” both successes and failures in mass spec data.
Afferent can make “educated guesses” about what might have gone wrong
The Science and Engineering of “Drug Discovery”


 “Albert [Schatz] hunted for new strains of actinomyces in soil, in
 manure heaps, in drains, even from the culture plates that were
 being thrown away by colleagues working on other unrelated
 projects, indeed anywhere in the world that his imagination would
 take him—this was Albert‟s entire life.” (p. 215)

 “It was salt mine, where, in order to pull a practical antibiotic
 producer out of Mother Nature, we literally have to work our
 asses off. The failure rate is about 99.99 per cent” (p. 218).

 “Using techniques that seem closer to gardening than the
 intellectual exercise of science, Rene [Dubos] trowelled soil into
 pots, searched in farmers‟ fields, manure heaps, lawns and
 hedges, altered growing conditions, added and subtracted
 chemicals. (p. 65)
The Science and Engineering of “Drug Discovery”




  “[It] was a deduction so brilliant that [Jorgen Lehmann‟s] fellow doctors and
  scientists would refuse to believe it. How could Lehmann have possibly picked
  out this single chemical derivative of aspirin as the one to test before a single
  experiment had been performed?” (p. 242)
Computational Biology; A “Turing Test” for Scientific Computing



     Simulation: What does this model predict?
     Explanation: How does it make these predictions?
     Model Identification: What models fit this data?
Explanation by Pathway Tracing
 (photosynthesis isa process with
   inputs (chloroplast-inside.water everywhere.light chloroplast-outside.nadph+
           chloroplast-outside.adp chloroplast-outside.pi)
   outputs (chloroplast-outside.atp chloroplast-outside.nadph everywhere.o2)
   implemented-by photosystem)
 (photosystem composition (psii antenna-array atpase pq-pool))
 (light-absorption isa process with
   inputs (everywhere.light)
   outputs (chlorophyll.energy)
   function absorption
   implemented-by chlorophyll)
 (light-energy-concentration isa process with
   outputs psii.energy
   driver chlorophyll.energy
   function concentration
   implemented-by antenna-array)
 (psii-water-breakdown isa process with
   inputs (chloroplast-inside.water)
   driver psii.energy
   outputs (psii.e- psii.e- chloroplast-inside.h+ chloroplast-inside.o2)
   function molecular-splitting
   implemented-by psii)
 (psii-pq-reduction isa process with
   inputs (psii.e- chloroplast-membrane.h+ chloroplast-membrane.plastoquinone)
   outputs (chloroplast-membrane.plastoquinol)
   function reduction
   implemented-by psii
   inhibited-by dcmu)
Explanation by Pathway Tracing
 (photosynthesis isa process with
   inputs (chloroplast-inside.water everywhere.light chloroplast-outside.nadph+
           chloroplast-outside.adp chloroplast-outside.pi)
   outputs (chloroplast-outside.atp chloroplast-outside.nadph everywhere.o2)
   implemented-by photosystem)
 (photosystem composition (psii antenna-array atpase pq-pool))
 (light-absorption isa process with
   inputs (everywhere.light)
   outputs (chlorophyll.energy)
   function absorption
   implemented-by chlorophyll)
 (light-energy-concentration isa process with
   outputs psii.energy
   driver chlorophyll.energy
   function concentration
   implemented-by antenna-array)
 (psii-water-breakdown isa process with
   inputs (chloroplast-inside.water)
   driver psii.energy
   outputs (psii.e- psii.e- chloroplast-inside.h+ chloroplast-inside.o2)
   function molecular-splitting
   implemented-by psii)
 (psii-pq-reduction isa process with
   inputs (psii.e- chloroplast-membrane.h+ chloroplast-membrane.plastoquinone)
   outputs (chloroplast-membrane.plastoquinol)
   function reduction
   implemented-by psii
   inhibited-by dcmu)
Explanation by Pathway Tracing

    (track-object 'chloroplast-inside.water)
    Tracking CHLOROPLAST-INSIDE.WATER
      -> PHOTOSYNTHESIS:
        Tracking CHLOROPLAST-OUTSIDE.ATP
        Tracking CHLOROPLAST-OUTSIDE.NADPH
        Tracking EVERYWHERE.O2
      -> PSII-WATER-BREAKDOWN:
        Tracking PSII.E-
          -> PSII-PQ-REDUCTION:
            Tracking CHLOROPLAST-MEMBRANE.PLASTOQUINOL
          -> E-FUNNLING-PSII-TO-PSI:
            Tracking PSI.E-
              -> PSI-NADPH-FORMATION:
        Tracking CHLOROPLAST-INSIDE.H+
          -> ATP-FORMATION:
        Tracking CHLOROPLAST-INSIDE.O2
          -> O2-DIFFUSSION:
Simulation
Reactions from Glycolysis and the TCA Cycle:

   CYTOSOLIC:glucose + ATP
         ---[Hexokinase]-->
               glucose 6-phosphate + ADP

   CYTOSOLIC:1,3-bisphosphoglycerate + ADP
         ---[Phosphoglycerate kinase]-->
               3-phosphoglycerate + ATP

   MITOCHONDRIAL:isocitrate + NAD+
         ---[Isocitrate dehydrogenase]-->
               a-ketoglutarate + NADH + H+ + Co2

   MITOCHONDRIAL:succinyl CoA + GDP + phosphatate
         ---[Succinyl CoA synthase]-->
               succinate + GTP + CoA
Simulation: Find pathways that connect species


 Solution for Fructose environment (Target = Malate)
 frucose ---[Fructokinase]--> fructose 1-phosphate
 fructose 1-phosphate ---[Fructose 1-phosphate aldolase]--> glyceraldehyde + dihydrozyacetone phosphate
 dihydrozyacetone phosphate ---[Isomerase]--> glyceraldehyde 3-phosphate
 phosphatate + NAD+ + glyceraldehyde 3-phosphate ---[Triose phosphate dehydrogenase]--> 1,3-bisphosphoglycerate
 1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP
 3-phosphoglycerate ---[Phosphoglyceromutase]--> 2-phosphoglycerate
 2-phosphoglycerate ---[Enolase]--> phosphoenolpyruvate + H2O
 phosphoenolpyruvate + ATP ---[Pyruvate kinase]--> pyruvate + ADP
 malate + NAD+ ---[Malate dehydrogenase]--> oxaloacetate + NADH + H+
 pyruvate + NAD+ + CoA ---[NIL]--> NADH + H+ + Co2 + acetyl CoA
 acetyl CoA + oxaloacetate ---[Citrate synthase]--> citrate + CoA
 citrate ---[Aconitase]--> isocitrate
 isocitrate + NAD+ ---[Isocitrate dehydrogenase]--> a-ketoglutarate + NADH + H+ + Co2
 a-ketoglutarate + NAD+ + CoA ---[a-ketogluterate dehydrogenase complex]--> succinyl CoA + NADH + H+ + Co2
 succinyl CoA + GDP + phosphatate ---[Succinyl CoA synthase]--> succinate + GTP + CoA
 succinate + FAD ---[Succinate dehydrogenase]--> fumarate + FADH2
 fumarate + H2O ---[Fumerase]--> malate


 Solution for Glucose environment (Target = Malate)
 glucose + ATP ---[Hexokinase]--> glucose 6-phosphate + ADP
 glucose 6-phosphate ---[Phosphoglucomutase]--> frucose 6-phosphate
 frucose 6-phosphate + ATP ---[Phosphofructokinase]--> frucose 1,6 bisphosphate + ADP
 frucose 1,6 bisphosphate ---[Aldolase]--> dihydrozyacetone phosphate + glyceraldehyde 3-phosphate
 phosphatate + NAD+ + glyceraldehyde 3-phosphate ---[Triose phosphate dehydrogenase]--> 1,3-bisphosphoglycerate
 1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP
 …[same as above from this point onward…]
Simulation: Simulate natural or experimental “knockouts”...

  glucose + ATP ---[Hexokinase]--> glucose 6-phosphate + ADP
  glucose 6-phosphate ---[Phosphoglucomutase]--> frucose 6-phosphate
  frucose 6-phosphate + ATP ---[Phosphofructokinase]--> frucose 1,6 bisphosphate + ADP
  frucose 1,6 bisphosphate ---[Aldolase]--> dihydrozyacetone phosphate + glyceraldehyde 3-phosphate
  phosphatate + NAD+ + glyceraldehyde 3-phosphate ---[Triose phosphate dehydrogenase]--> 1,3-bisphosphoglycerate
  1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP
  3-phosphoglycerate ---[Phosphoglyceromutase]--> 2-phosphoglycerate
  2-phosphoglycerate ---[Enolase]--> phosphoenolpyruvate + H2O
  phosphoenolpyruvate + ATP ---[Pyruvate kinase]--> pyruvate + ADP
  malate + NAD+ ---[Malate dehydrogenase]--> oxaloacetate + NADH + H+
  pyruvate + NAD+ + CoA ---[NIL]--> NADH + H+ + Co2 + acetyl CoA
  acetyl CoA + oxaloacetate ---[Citrate synthase]--> citrate + CoA
  citrate ---[Aconitase]--> isocitrate
  isocitrate + NAD+ ---[Isocitrate dehydrogenase]--> a-ketoglutarate + NADH + H+ + Co2
  a-ketoglutarate + NAD+ + CoA ---[a-ketogluterate dehydrogenase complex]--> succinyl CoA + NADH + H+ + Co2
  succinyl CoA + GDP + phosphatate ---[Succinyl CoA synthase]--> succinate + GTP + CoA
  succinate + FAD ---[Succinate dehydrogenase]--> fumarate + FADH2
  fumarate + H2O ---[Fumerase]--> malate



  Knockout:
     1,3-bisphosphoglycerate + ADP
           ---[Phosphoglycerate kinase]-->
                             3-phosphoglycerate + ATP
Simulation: ...and propose “bridging” reactions

  Knockout:
    1,3-bisphosphoglycerate + ADP
          ---[Phosphoglycerate kinase]-->
                            3-phosphoglycerate + ATP

   25 plausible (single) “bridging” reactions are proposed:
         <CYTOSOLIC:glyceraldehyde 3-phosphate ---[]--> 3-phosphoglycerate>
         <CYTOSOLIC:dihydrozyacetone phosphate ---[]--> 3-phosphoglycerate>
         <CYTOSOLIC:frucose 1,6 bisphosphate ---[]--> phosphoenolpyruvate + 3-phosphoglycerate>
         <CYTOSOLIC:frucose 1,6 bisphosphate ---[]--> 2-phosphoglycerate + 3-phosphoglycerate>
         <CYTOSOLIC:frucose 1,6 bisphosphate ---[]--> 3-phosphoglycerate + 3-phosphoglycerate>
         <CYTOSOLIC:ATP + frucose 1,6 bisphosphate ---[]--> ADP + 1,3-bisphosphoglycerate + 3-phosphoglycerate>
         <CYTOSOLIC:frucose 1,6 bisphosphate ---[]--> glyceraldehyde 3-phosphate + 3-phosphoglycerate>
         <CYTOSOLIC:frucose 1,6 bisphosphate ---[]--> dihydrozyacetone phosphate + 3-phosphoglycerate>
         <CYTOSOLIC:ADP + frucose 1,6 bisphosphate ---[]--> ATP + Co2 + acetyl + 3-phosphoglycerate>

         <CYTOSOLIC:ADP + 1,3-bisphosphoglycerate
                      ---[]--> ATP + 3-phosphoglycerate>
         <CYTOSOLIC:ADP + frucose 1,6 bisphosphate ---[]--> ATP + pyruvate + 3-phosphoglycerate>
         <CYTOSOLIC:ADP + frucose 1,6 bisphosphate ---[]--> ATP + glycerate + 3-phosphoglycerate>
         <CYTOSOLIC:ADP + frucose 1,6 bisphosphate ---[]--> ATP + glyceraldehyde + 3-phosphoglycerate>
         <CYTOSOLIC:ADP + frucose 1,6 bisphosphate ---[]--> ATP + dihydroxyacetone + 3-phosphoglycerate>
         <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + phosphoenolpyruvate + 3-phosphoglycerate>
         <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + 2-phosphoglycerate + 3-phosphoglycerate>
         <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + 3-phosphoglycerate + 3-phosphoglycerate>
         <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + glyceraldehyde 3-phosphate + 3-phosphoglycerate>
         <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + dihydrozyacetone phosphate + 3-phosphoglycerate>
         <CYTOSOLIC:glucose 6-phosphate ---[]--> Co2 + acetyl + 3-phosphoglycerate>
         <CYTOSOLIC:glucose 6-phosphate ---[]--> pyruvate + 3-phosphoglycerate>
         <CYTOSOLIC:glucose 6-phosphate ---[]--> glycerate + 3-phosphoglycerate>
         <CYTOSOLIC:glucose 6-phosphate ---[]--> glyceraldehyde + 3-phosphoglycerate>
         <CYTOSOLIC:glucose 6-phosphate ---[]--> dihydroxyacetone + 3-phosphoglycerate>
         <CYTOSOLIC:glucose + ATP ---[]--> 1,3-bisphosphoglycerate + 3-phosphoglycerate>
Computational Biology; A “Turing Test” for Scientific Computing



     Simulation: What does this model predict?
     Explanation: How does it make these predictions?
     Model Identification: What models fit this data?
Model formation and revision

                                    Interactive Guidance
                                       from Scientists
                                                                                      Background knowledge
      Experimental data
                                                                                      NBLR           NBLA        PBS
                                                                              +                  +           -
                                                                                                                         -
                                                                      DFR                            psbA1                   Health
                                                                                                             +
                                                                                             -                           -
                                           Discovery                          +              -
                                                                     +                RR             psbA2       Photo
                                                                                                             +
                                                                                             -
                                                                     Light                           cpcB




                                    NBLR           NBLA        PBS
                                +              +           -
                                                                         +
                          DFR                      psbA1                     Health
                                                           +
                                           -
           Updated                                                       -
                                -

           models     +              RR
                                           ×       psbA2
                                                           ×   Photo

                                           -
                      Light                        cpcB
    How do cells control response to light?
I.e., What genes are related to the adaptation to high light?




     Prochlorococcus MED4




     Prochlorococcus MIT9313
                              The Data: Analyzing Acclimation Dynamics




                                                                                   www.affymetrix.com/




                                                                                 Statistical Annotation




                                Stress (e.g., High Light)

                                    Acclimation              Adaptation
Cell Density




                                                            Sampling mRNA/cDNA


               Initial Equlibrium




                                    Time
                                                                                            www.affymetrix.com/
Most positively light-correlated responses:




                          Light
Model formation and revision




                   Statistics (R)   Constraints




                                            Model Space
                                              Search
    “Knowledge lean” (de novo) Discovery




       Knowledge                      Data


                   A Useful Model         Intense
                                          Data Use
Simplified
Model Space
                                    Efficient
                      Search
                                    Search
                                    Control
How many regulatory models are there for n
genes    (In the worst case)?

                            2
                       1/2(N - N)
                      L


    Number of
    combinations                    Number of ways
    of L link types                 to arrange links
                                    among N nodes
How many regulatory models are there for n
genes
     (In the worst case)?
        2                                  2
   1/2(N - N)                   1/2(300 - 300)
  L                           ~4

                                   89700
                              ~4

                              ~ Infinity

                              Identification
                                          n
                              requires ~2
                              observations!
      N=300
      L=4
  How many models are
  there
  for the C. reinhardtii chip?
N=8000
L=4


                  2
          1/2(8000 - 8000)
     ~4

           31996000
      ~4
      (Not to mention
      28000 observations!)
                    biologists

Go out and bring
us more data!




Jump naked into a
vat of hot acid!
Shrager’s first law of (computational) biology:

       If you think that you need more data…..

                      You need more knowledge!
    “Knowledge lean” (de novo) Discovery




       Knowledge                      Data


                   A Useful Model         Intense
                                          Data Use
Simplified
Model Space
                                    Efficient
                      Search
                                    Search
                                    Control
 “Knowledge Rich” Computational Discovery




       Knowledge                    Data


                   A Useful Model


Constrained
Model Space
                      Search
Explanation is the main function of theories (models)

                                        Models
                               Attention and Language
                                  Skills that help us
                                Organize Experience




                                 Scientific Search




                                       Models
                                   Sets of Models
                                    Applied in a
                                  Particular Domain
Adding knowledge: Limiting
search to subsystems.




                         +           -
                NBLR         NBLA        PBS
            +
                                               -
      DFR                    psbA1                 Health
                                     +
                     -                         -
            +
                         -           +
  +             RR           psbA2       Photosynthetic
                                         activity
                     -
  Light                      cpcB
Adding Knowledge: Annotate the theory in terms of Models.
 What are Models?
  Conceptually coherent, possibly complex, units of partially
  abstract knowledge that can be incrementally “mixed into” an
  existing model (by “Model Application”), updating the model in
  accord with the principles represented in the Model.
                                              (aka. Schemas, Scripts)

 Some Models in Cell Biology:
       Transcriptional Regulation           Operon
       Attentuation                         Chemical Cycle
       Transposon Insertion                 Feedback Regulation
       Allosteric Modulation                Protein Assembly
       Signal Transduction
Graphical Model for Light Response Curve:
Fitting the Structural Model to the Data:

                                                       20030303 Cyclodyn (Cy5) Light Correlated

                                          4                                                            SLL1577 275.18512 phyco
                                                                                                       subunit (cpcB)

                                                                                                       SLL1578 207.43861 phyco
                                                                                                       subunit (cpcA )

                                                                                                       SLR2067 393.91397 alloph
                                                                                                       a chain (apcA )
                                          2
                                                                                                       SLR1311 204.0282 photos
                                                                                                       protein (psbA 2)

                                                                                                       SLL1579 60.437798 phyco
                                                                                                       associated linker protein (c

                                                                                                       SLL1330 201.90501 OmpR
                                          0
                                               0   5               10                    15       20   SLL1321 102.15744 hypoth


                      Log2(Measure/Ref)
                                                                                                       protein (atp1)

                                                                                                       SLL0851 74.04078 photosy
                                                                                                       CP43 protein (psbC)
                                          -2                                                           SLL1867 50.749165 photos
                                                                                                       D1 protein (psbA 3)

                                                                                                       SLL1745 105.46256 50S ri
                                                                                                       protein L10 (rpl10)
                                                                                                       SLR1834 396.65952 P700
                                                                                                       subunit Ia (psaA )
                                          -4
                                                                                                       SLL1322 63.89238 A TP sy
                                                                                                       subunit a (atpI)
                                                                                                       SLL0819 83.48433 photosy
                                                                                                       subunit III (psaF)

                                                                                                       SLR0927 52.74279 photos
                                          -6                                                           protein (psbD2)

                                                                                                       SLR0533 130.25998 senso
                                                                                                       transduction histidine kinas




                                          -8
                                                                  Hours after midnight




Unparameterized (Unfitted) Model
Parameterized (Fitted) Model
Computational Biology; A “Turing Test” for Scientific Computing



     Simulation: What does this model predict?
     Explanation: How does it make these predictions?
     Model Identification: What models fit this data?
    How do cells control response to light?
I.e., What genes are related to the adaptation to high light?




     Prochlorococcus MED4




     Prochlorococcus MIT9313
Hihara, Kamei, Kanehisa, Kaplan, and Ikeuchi (2001) DNA microarray analysis
of cyanobacterial gene expression during acclimation to high light. Plant Cell,
13(4)




      Synechocystis PCC 6803
    How do cells control response to light?
I.e., What genes are related to the adaptation to high light?

                      Outline Protocol

 Look for:

     • Gene present in Prochlorococcus MED4
       MED4 is naturally adapted to grow in high light.
     • Ortholog absent in Prochlorococcus MIT9313
       MIT9313 is naturally adapted to grow in low light
     • Ortholog present in Synechocystis PCC 6803
       In order to make contact with annotation and microarray data

     • Synechocystis PCC 6803 ortholog responds to high light
       Gene turns on by factor > 2 in response to high light
Natural Language Deductive Biocomputing




  List the genes that pertain to med4 and that have an ortholog in s6803 that has a
  hihara ratio greater than 2 and that do not have orthologs in mit9313.


  What genes confer differential adaptation to light in promed4 versus pro9313?
Language for Expressing Conjectures, and Platform for Analysis

 A. First Order Logic (FOL) representation
 B. Subject Domain Theory
 C. Biological Process (and entities) Ontology
 D. Visual query language.
                                                          Goal Query
                                                                Subject
                                                            Domain Theory:




                                                   Subject Domain Theory
Goal Query:

Result:

   ?gene: #$PMED4.PMM0817
   ?organism2: #$prochlorococcus_marinus_mit9313
   ?experiment: HIHARA
   ?organism3: #$synechocystis_pcc6803
   ?gene3: #$S6803.ssr2595

I.e., A low-light organism that has no ortholog to ?gene is prochlorococcus
marinus pcc. 9313. Experiments were performed by Hihara on the organism
synechocystis pcc 6803, and a high regulation ratio was discovered in those
experiments on gene S6803.ssr2595, which is an ortholog of PMM0817. The
annotation for PMM0817 reads: “possible high-light inducible protein”.

(Matches the results from: Bhaya, Dufresne, Vaulot, and Grossman: Analysis of the hli gene family
in marine and freshwater cyanobacteria. FEMS Letters, 2002, 205(2). PMM0817 is called hli17 in
this paper.)
Goal Query:

Result:

  ?gene: #$PMED4.PMM0817
  ?organism2: #$prochlorococcus_marinus_mit9313
  ?experiment: HIHARA
  ?organism3: #$synechocystis_pcc6803
  ?gene3: #$S6803.ssr2595


 + “Explanation”
How Science Thinks:
 The Science and Engineering of Science and Engineering


       For every right answer in science,
       There is an infintitude(-1) of wrong ones!

                        HOW COULD THIS EVER WORK!?



     1. Close is often good enough, or at least guides you to the right answer.

     2. Theory (model) guidance reduces the search space by huge orders!

     3. We‟ve been really really lucky ... so far, anyway!

     4. You are not alone! (>15Million abstract in pubmed alone!)
Computational Biology; A “Turing Test” for Scientific Computing



     Simulation: What does this model predict?
     Explanation: How does it make these predictions?
     Model Identification: What models fit this data?
Computational Biology; A “Turing Test” for Scientific Computing



     Simulation: What does this model predict?
     Explanation: How does it make these predictions?
     Model Identification: What models fit this data?
     Collaboration: Interact with scientists...
       ...and help scientists interact with one another!
         Models
Attention and Language
   Skills that help us
 Organize Experience




  Scientific Search




       Models
   Sets of Models
    Applied in a
  Particular Domain


  Cognitive Sphere
    Social Sphere

         Models
Attention and Language
   Skills that help us
 Organize Experience




  Scientific Search




       Models
   Sets of Models
    Applied in a
  Particular Domain


  Cognitive Sphere
Historical Theory↔Experiment SeeSawing of the Gyromagnetic Ratio “g-factor”




                                          Galison (1987) How Experiments End. Chicago U. Press
Galison (1987) How Experiments End. Chicago U. Press
                             Social Sphere

                                 Models
                        Attention and Language
                           Skills that help us
                         Organize Experience


Collaborators can divide up the search space,
Suggest models to one another, support one
another‟s explanation process, divide the work
between „experimentalists‟ and „theorists‟, etc.




                                Models
                            Sets of Models
                             Applied in a
                           Particular Domain
    Social Sphere

         Models
Attention and Language
   Skills that help us
 Organize Experience



Collaborators can form different abstractions
from the same set of observations, thus more
efficiently creating models that are potentially
useful as search heuristics.



       Models
   Sets of Models
    Applied in a
  Particular Domain
    Social Sphere

         Models
Attention and Language
   Skills that help us
 Organize Experience




  Scientific Search
 In the Web World?


       Models
   Sets of Models
    Applied in a
  Particular Domain


  Cognitive Sphere
Scientific Collaborations as “Trading Zones”




                                         Galison, Image and Logic, p.819
Scientific Collaborations as “Trading Zones”




                      http://www.zum.de/whkmla/histatlas/africa/colafr1913.gif
Scientific Collaborations as “Trading Zones”




               http://www.artsci.wustl.edu/~anthro/courses/306/africa_linguistic_map.gif
Scientific Collaborations as “Trading Zones”




                           www.zum.de/whkmla/histatlas/africa/afr95lang.gif
Scientific Collaborations as “Trading Zones”




“...engineers structured their work around components, rather
than ... Around „pure‟ and „applied‟ science. Working out a
common language became the order of the day.”
                                  Galison, Image and Logic, p.819
Scientific Collaborations as “Trading Zones”




                                               In the Web World?
                  BioBike
                  KnowOS




www.biobike.org             www.knowos.org
Knowledge Operating System Integrates Knowledge Resources



        KnowOS
       Integrated               COG
       Knowledge
         Server


                                 KnowOS
BioBike/KnowOS Integrates Scientists and Computation in a Trading Zone



        KnowOS
       Integrated                COG
       Knowledge
         Server


                                   KnowOS
BioBike/KnowOS is a “Web 24.0” Platform:


                 Web 1.0: The “Page” Web
                 Web 2.0: The “Social” Web
                 Web 3.0: The “Semantic” Web
                 Web 4.0: The “Programmable” Web

                 1.0 x 2.0 x 3.0 x 4.0 = 24.0000000000001:
                 The “Social Semantic Programmable” Web!
From: DR. X <[Michigan]>
Date: Oct 21, 2004 7:09 AM
Subject: Help with BioLingua

I'm a new user of BioLingua, with very little experience in computer programming. I'm searching
for housekeeping genes in Anabaena 7120 that are longer than 3000 bp. I could load Anabaena
sequences by:

>> (setf an (load-organism "A7120"))

I found genes that are involved in metabolism by:

>> (setf metabolism (find-frames "metabolism"))

I got a list of related genes by:

>> (df #$go.metabolism)

now I want to find the length of each gene in the list "metabolism" and check if it is longer than
3000. This is where I don't know what function to use.

I tried to start with the loop:

(LOOP FOR LongSequences in (GENES-OF a7120)
 as length = (LENGTHS-OF LongSequences)
 when (length > 3000)
 Collect LongSequences)

or some variation of it. None worked although I'm sure I'm pretty close.

I also do not understand why I didn't get a list of genes when I used the "find-frames" command
(function?), what exactly the value of this command?
From: Dr. E <[Virginia]>
To: DR. X <[Michigan]>
Date: Oct 21, 2004 7:53 AM
Subject: Re: Help with BioLingua

It's remarkable that you got as far as you have!
Here's one way to get a list of genes that you can then sift through
by length:

(LOOP FOR frame IN (FIND-FRAMES "metabolism")
 AS genes = (GET-ELEMENT GO.related-genes FROM frame)
 WHEN (EXISTS genes)
 APPEND genes)

If you like what you get, you can save the result in a variable:

(ASSIGN metabolic-genes *)

The asterisk inserts the results of the previous operation). To find
out how many genes you got:
...
From: Mr. M <[California]>
To: Dr. E <[Virginia]>
Date: Oct 21, 2004 9:34 AM
Subject: Re: Help with BioLingua

What are 'housekeeping' genes?

> (LOOP FOR frame IN (FIND-FRAMES "metabolism")
> AS genes = (GET-ELEMENT GO.related-genes FROM frame)
> WHEN (EXISTS genes)
> APPEND genes)

But this does not restrict the genes to the Anabaena 7120 organism.
You could do

APPEND (remove-if-not 'is-anabaena7120-gene genes)
  and
(defun is-anabaena7120-gene (gene) (eq ana7120 (#^Organism gene)))

=========================================================================

From: Dr. E <[Virginia]>
To: Mr. M <[California]>
Date: Oct 21, 2004 10:07 AM
Subject: Re: Help with BioLingua

>What are 'housekeeping' genes?

Housekeeping genes are those genes that are useful for the general maintenance of the cell
under normal conditions. The term is usually used in the context "just houskeeping genes",
implying "not interesting". But for those looking metabolism as a whole, they can be very
interesting.
From: Mr. M <[California]>
To: DR. X <[Michigan]>
Date: Oct 22, 2004 11:13 AM
Subject: Re: Help with BioLingua

Here's an abbreviated script showing how to do exactly what you want, starting from after you
found the GO.METABOLISM frame.

Hope this helps.

[...]
<2>> a7120
:: #$anabaena_pcc7120
<3>> (defun is-a7120-gene (g) (equal a7120 (#^Organism g)))
:: IS-A7120-GENE
<4>> (setq housekeeping-genes (#^Go.Related-Genes #$Go.Metabolism))
:: (#$A7120.alr7635 #$A7120.alr7622 #$A7120.all7592 #$A7120.alr7073
...)

<5>> (setq a7120-housekeeping-genes (remove-if-not 'is-a7120-gene
housekeeping-genes))
:: (#$A7120.alr7635 #$A7120.alr7622 #$A7120.all7592 #$A7120.alr7073
...)
<6>> (length housekeeping-genes)
:: 229
<8>> (setq result (loop for g in a7120-housekeeping-genes
when (> (length (extract-sequence g)) 3000)
collect g))
:: (#$A7120.alr3809 #$A7120.alr2680 #$A7120.alr2679 #$A7120.alr2678
#$A7120.all2649 #$A7120.all2648 #$A7120.all2647 #$A7120.all2646
#$A7120.all2645 #$A7120.all2644 #$A7120.all2643 #$A7120.all2642
#$A7120.all2635 #$A7120.all1695 #$A7120.all1649 #$A7120.all1648
#$A7120.all1643)
<9>> (length result)
:: 17
BioBike/KnowOS is a “Web 24.0” Platform:
“In developing BioBike, the biologists and computer scientists are developing a fundamental
biological instrument—a biocomputational tool that must be used, and indeed is being used—by
biologists to get real scientific work done—work that they could not get done any other way.”

[A the same time they] are co-evolving a pidgin which exists [in both] their conversation, and [...]
in the biocomputing platform [...].

The facility to dynamically extend the system‟s working vocabulary makes BioBike unique
among computationally-based collaboration tools which, although they often support
conversations among participants, do not usually themselves grow organically through these
conversations.

Not merely learning to talk to one another, the scientists, engineers, and BioBike are doing real
work of biocomputation and at the same time as they are evolving the way that this work gets
done, they are extending their own understandings, amoeba-like into one another‟s areas of
expertise.

Specialized programming platforms are becoming increasingly important as computers infuse
greater parts of our daily lives, and as we wish to have greater control over them. [...] the
programming languages that are the heart of computing platforms serve as, at the same time,
inter-languages in the trading zones that are these platforms, and that the functions and objects
of those languages serve as boundary objects in these trading zones. [...] the participants in the
collaboration co-evolve the BioBike inter-languages themselves...”
                                      J Shrager, in press, The Evolution of BioBike: Community Adaptation of a Biocomputing Platform;
                                      Studied in the History and Philosophy of Science.
BioBike/KnowOS Integrates Scientists and Computation in a Trading Zone



    Simulation: What does this model predict?
    Explanation: How does it make these predictions?
    Model Identification: What models fit this data?
    Collaboration: Interact with scientists...
      ...and help scientists interact with one another!

         -- Inference sharing and peer group critical analysis

         -- Ability to track the chain of inference
Trading Zones and the
                                                           Incoming knowledge
   Bayes Community Model
                                       New                 are distributed to the
                                     Knowledge             scientists according
Client/server architecture                                 to the hypotheses they
permits collaboration                                      are working, and heur-
among scientists through                                   istically knitted into
“publication” of hypothesis                                the ongoing model
and linking in as evidence                                 development process




               Linked matrices project a Bayesian influence network
ACH:
   Analysis of
   Competing
   Hypotheses
Trading Zones and the
   Bayes Community Model              user: Shrager:

Scientists can “promote” hypotheses
as if they were results, and other
scientists can import these.
The system automatically tracks
provenance (code+params,
or BioDeducta “explanations”)
to build a network of support.



 user: Heuer:
When the support for linked
                                   user: Shrager:
results changes, results that
depend upon those are likewise
changed in level of belief, or
are flagged for reconsideration.




user: Heuer:
-- Inference sharing and peer group critical analysis

-- Ability to track the chain of inference
    Social Sphere

         Models
Attention and Language
   Skills that help us
 Organize Experience




  Scientific Search




       Models
   Sets of Models
    Applied in a
  Particular Domain


  Cognitive Sphere
Cultural/Historical Sphere
                                 Social Sphere

                                      Models
                             Attention and Language
                                Skills that help us
                              Organize Experience




                               Scientific Search
                              In the Web World


                                    Models
                                Sets of Models
                                 Applied in a
                               Particular Domain


                               Cognitive Sphere
How Science Thinks:
 The Science and Engineering of Science and Engineering


      BioBike/KnowOS:             Afferent:                Cyclodyn Experiments:
       JP Massar                   David Chapman            Kevin Arrigo
       Andrew Pohorille            David Gladstein          Stephen Bay
       Mike Travers                Randy Gobbel             Devaki Bhaya
       Jeff Elhai                  Jon Handler              Arthur Grossman
       Richard Waldinger           Mike Travers             Rochelle Labiosa
                                                            Tasha Reddy
                                                            CJ Tu


           CACHE:                            BioDiscovery:
            JP Massar                         Stephen Bay
            Peter Pirolli                     Lonnie Chrisman
            Dorrit Billman                    Pat Langley
            Gregorio Convertino               Andrew Pohorille
                                              Kazumi Saito
                                              Richard Waldinger




  Funding from NASA, NSF, Carnegie Inst. DPB, Franz Inc., Lispworks Inc. and others.
How Science Thinks:
 The Science and Engineering of Science and Engineering
How Science Thinks:
 The Science and Engineering of Science and Engineering