Computational biology and computational biologists

Document Sample
Computational biology and computational biologists Powered By Docstoc
					Computational biology and
 computational biologists
       Tandy Warnow, UT-Austin
            Department of Computer Sciences
      Institute for Cellular and Molecular Biology
     Program in Evolution, Ecology, and Behavior
  Center for Computational Biology and Bioinformatics
  Two computational biologists
• One computational biologist needs to know
  a lot of biology



• Another needs to know a lot of mathematics
    Another two computational
            biologists
• Craig Benham: mathematics of stressed
  DNA (understanding regulation)



• Gene Myers: whole genome sequencing and
  BLAST
        Two different types of
       computational biologists
• One works on mathematical or computational
  problems (derived from biology) that are well
  posed, and are hard to solve -- these need
  significant computer science/math/statistics
• One works on biological problems that are not
  well posed, and where the computer
  science/math/statistics needed may be “easier”
• Both can be problems that are important to
  biologists, and which they cannot solve without
  computational biologists’ involvement
       My view of Pasteur’s Quadrant
Hard
math




Easy
math   Easily applicable         Not applicable
       My view of Pasteur’s Quadrant
Hard
math   What computational
       scientists want




Easy
math   Easily applicable         Not applicable
       My view of Pasteur’s Quadrant
Hard
math   What computational
       scientists want
                            What computational
                            scientists do




Easy
math   Easily applicable                 Not applicable
       My view of Pasteur’s Quadrant
Hard
math   What computational
       scientists want
                              What computational
                              scientists do




       What biologists want
Easy
math   Easily applicable                   Not applicable
               Phylogeny
                         From the Tree of the Life Website,
                                    University of Arizona




Orangutan   Gorilla   Chimpanzee             Human
      DNA Sequence Evolution
                                                   -3 mil yrs
                        AAGACTT

                                                   -2 mil yrs
    AAGGCCT                    TGGACTT


                                                   -1 mil yrs
AGGGCAT              TAGCCCT         AGCACTT




AGGGCAT   TAGCCCA   TAGACTT    AGCACAA   AGCGCTT          today
          Molecular Systematics
  U         V           W        X         Y

AGGGCAT   TAGCCCA   TAGACTT     TGCACAA   TGCGCTT




                                     X
                U

                                     Y


                    V       W
   Computational challenges for
   Assembling the Tree of Life
8 million species for the Tree of Life -- cannot
  currently analyze more than a few hundred (and
  even this can take years)
• We need new methods for inferring large
  phylogenies - hard optimization problems!
• We need new software for visualizing large trees
• We need new database technology
• Not all phylogenies are trees, so we need methods
  for inferring phylogenetic networks
Time is a bottleneck for MP and ML
• Systematists tend to prefer trees with the optimal
  maximum parsimony score or optimal maximum
  likelihood score; however, both problems are hard to solve
• (Our experimental studies show that polynomial time
  methods do not do as well as MP or ML heuristics, when
  trees are big and have high rates of evolution)




                                          Local optimum
    MP score
                                          Global optimum
                  Phylogenetic trees
                MP/ML heuristics
                     Fake study

                     Performance of hill-climbing heuristic

MP score
of best trees




                        Time
               DCM-boosting
        Speeding up MP/ML heuristics
                                Fake study

                                Performance of hill-climbing heuristic

MP score
of best trees




                Desired Performance

                                      Time
              Characteristics
• The research can be published in
  mathematics/statistics/computer science journals
  and conferences, and evaluated along these lines
• These people can be faculty in
  Math/Statistics/Computer Science departments,
  and *maybe* in some biology departments
• Substantive improvements are hard, but if
  achieved will have enormous impact on many
  biologists
• Why? These are old problems, endorsed by
  biologists, of a computational nature.
             The “other” type
• Deals with problems like: protein fold prediction,
  inferring metabolic or regulatory networks,
  finding genes within genomes, or even computing
  a good multiple sequence alignment
• Needs to know a lot of biology to pose appropriate
  computational problems
• Resultant algorithms may not (in some cases)
  make for interesting or publishable mathematics
• Note: generally new problems because of new
  data
   What’s needed (for all types)
• Ability to collaborate with a variety of people, and
  learn what they want to achieve
• Ability to be flexible in terms of how one
  evaluates research results (e.g., real vs. simulated
  data, theory versus experiment)
• Ability to communicate research results to
  different types of researchers
• Ability to use a variety of techniques to solve
  biological problems
• Ability to model and pose appropriate
  computational approaches for biological problems
           Difficult questions
• What departments should have computational
  biologists (especially of the second type)?
• Should there be departments of computational
  biology?
• Should there be PhD programs in computational
  biology?
• How to evaluate a computational biologist of
  either type?
      Some issues for academic
      computational biologists
• Journal versus conference papers, and number of
  each
• Experimental/empirical versus theoretical work
• Software versus papers
• Authorship order within publications
• Promotion and Tenure in two departments?
• Biggest issue: How to predict future success???

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:2/9/2012
language:
pages:20