Docstoc

Pathway Database

Document Sample
Pathway Database Powered By Docstoc
					Pathway Database

   Carl Schaefer
 February 21, 2003
     Why Spend Effort on Pathways?

• Target as process vs. target as molecule
   – In the end, what matters is a hyperactive process (e.g.
     mitosis), not just an over-expressed protein
• Phenotype classification
   – Higher-level feature than transcript abundance
      Why Spend Effort on a Pathway
              Database?
• A picture may be worth a thousand words ...
   – but a computable representation is even better
• Make assumptions explicit
• Combine sources of data
   – KEGG, BioCarta, ...
• Merge data from separate pathways
   – E.g. BioCarta’s “Cyclins and Cell Cycle Regulation” and “Cyclin
     E Destruction Pathway”
• Causal framework for quantitative simulation/analysis
   – ... when the data becomes available
                      Basics

•   Model a causal network
•   Be composable (novel pathways)
•   Cope with lack of knowledge
•   Promote understanding
            Model A Causal Network
• Graph (nodes & edges)
• Distinguish two kinds of nodes (molecules & processes)
• Allow labels on nodes and edges
   – molecule-type (compound, protein, complex, rna)
   – molecule-id (...)
   – process-type (reaction, binding, modification, translocation,
     transcription, cell process)
   – edge-type (input, output, agent, inhibitor)
   – activity-state (active, inactive)
   – location (extracellular, transmembrane, cytoplasm, nucleus)
   – reversible (yes, no)
                    Composable

• “Atomic pathway”
   – a process node
   – immediately adjacent molecules
   – the connecting edges
• Join atomic pathways on identical molecules
   – ... and maybe on molecule subtype relation
 Pathway Construction:
Joining Atomic Pathways
               Lack of Knowledge

• Hierarchy of label values
   – e.g., edge-type  incoming-edge  agent
• Hierarchy of molecule ids
   – GO id
      • Gene product
         – Specific protein
   – Families of molecules
• “Handbook”
   – E.g.: “for Raf-1, ‘active-1’ means phosphorylation at
     S259”
           Promote Understanding

• Hide unwanted detail
   – prune common molecules
   – encapsulate sub-pathways
• Query by connectedness (cause & effect)
• Find patterns
Omission of Don’t Care Detail:
Pruning Common Compounds
                   Query by Connectedness:
                   Predecessors/Successors
atom-id = 411
direction = forward
degree = 3
prune common compounds
                      Patterns

• Templates for atomic pathways:
   process-type=modification::
   molecule-type=protein[1]:edge-type=agent::
   molecule-type=protein[2]:edge-type=input:activity-
     state=inactive::
   molecule-type=protein[2]:edge-type=output:activity-
     state=active
• Maybe multi-process templates (e.g., a cascade)
              What Do We Need?

• Computation model of pathway interactions
• Persistent data model
• Tools:
   – data input
   – query and analysis
   – visualization
• Data, data, data, ...
                  What Do We Have?
• Computation model: mostly worked out
• Persistent data model: mostly worked out
• Tools:
   – working on data input
   – have a query/analysis tool
       • joins, prunes, finds predecessors/successors
       • produces graph output
       • extracts first-order patterns
   – using GraphViz to produce SVG diagrams
• Data, data, data ...
   – Loaded KEGG into database
   – Next: ~30 BioCarta pathways related to apoptosis, cell-cycle
     regulation and histone deacetylase activity

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:5/22/2012
language:
pages:14