Regulatory Motif Finding (II) by W0bfO4HM

VIEWS: 6 PAGES: 30

									Regulatory Motif Finding (II)

              Balaji S. Srinivasan
                           CS 374
                        Lecture 18
                        12/6/2005
Overview
   Biology of DNA binding
    motifs
   Why motifs?
   Overview of motif finding
    algorithms
   Open problems in this
    area
Biology of Motifs
   From last time…
Biology of Motifs
   From last time…
Biology of Motifs

   Given transcription
    factor (TF) of fixed
    sequence…
   binding affected by
       secondary, tertiary
        structure of DNA
       methylation state
       DNA binding motifs
Biology of Motifs
   DNA Motifs (regulatory
    elements)
       Binding sites for proteins
       Short sequences (5-25)
       Up to 1000 bp (or farther)
        from gene
       Inexactly repeating patterns
Biology of Motifs
   TF binding affected by
       secondary, tertiary structure
        of DNA                              dynamic chromosome:
                                             accessibility affects
       methylation state                      transcription…
       DNA binding motifs
   Should be on your radar…
   motifs frontier of research
    why?
       sequence data exists
       static, not dynamic



                                        dynamic epigenome
                                         (methylation state)
Biology of Motifs
                                   proks:
   Prokaryotes                  immediate
       fewer TFs               upstream reg

       long motifs
       affinity dep on match
   Eukaryotes (HARD)
       more TFs per gene
       shorter motifs             euks:
       MUCH more noncoding     long range
        seq                      regulation

       regulatory modules
       long range effects
Biology of Motifs
   Transcription Factors
       often dimer, tetramer:
        palindromic binding site
       binding
           stochastic
           affinity =
            structural/sequence match
           high affinity not always
            desirable
       combinatorial regulation
        (esp. eukaryotes)
           order important!
           site spacing important!
Why motifs?
   Given: all TF/motif pairs
   Get: global genetic regulatory network




        microbial                     eukaryotic
Recap #1
   To figure out
    transcriptional control…
       find transcription factor
        binding sites
   Eukaryotes: hard b/c
       much more noncoding
        sequence
       shorter motifs
       longer range interactions
Motif Finding Overview
   Methods
       1 genome
           sequence
            overrepresentation (NBT
            shootout, not good)
       Functional Genomics
           predict regulons (Segal,
            etc.)
       N genomes
           phylogenetic footprinting
            (Kellis, etc.)
       N genomes + Func
        Genomics
           Phylocon (Tompa)
           New ideas…
Motif Shootout
   Nature Biotech Jan.
    2005
       13 way shootout
       disappointing results
   Useful in that
       shows importance of
        using all info
       benchmarking is clearly
        trouble area
Motif Shootout
Motif Shootout                     upstreams


   Conceptually
       load FASTA hopper of
        intergenic sequence from
        1 genome into black box
       output: motif matrices
   But…
       how to pick sequences?
       comparison?
       functional clustering?
       benchmarking?
Motif Shootout
   But…
        how to pick sequences?
        comparison?
        functional clustering?
        benchmarking?
   So
        not as useful as it seems…
        huge, artificial limitations
        “consider a spherical cow”
   What if limitations
    removed?
Motifs via Functional
Genomics
   Coexpression
       most popular (e.g. Segal
        2003)
   Functional clustering
       then hunt upstream
Motifs via Functional
Genomics
   Chip/CHIP
       key idea: assay DNA
        segments where TF
        binds
       direct test of motif
        binding (e.g. Laub 2002)
   Disadvantages
       one TF at a time
       need an antibody!
Motifs via Functional
Genomics
   Coinheritance, etc.
       predict regulons, then
        look upstream
       heuristic network
        integration
         will return to this point

       decent signal in
        prokaryotes (Manson-
        Mcguire 2001)
Motifs via Phylogenetic
Footprinting
   Key idea
       functional sequence
        evolves more slowly
                                  ultraconserved
       conservation hierarchy
         ultraconserved NC
          elems (Bejerano &
          Haussler 2004)
         proteins, ncRNAs

         DNA binding motifs

         unconstrained, neutrally
          drifting regions

                                 no conservation
Motifs via Phylogenetic
Footprinting
   Phylogenetic footprint
       “footprint” is conservation
       simple version
           multiple alignment of
            orthologous upstream
            regions
       Problem: nonfunctional
        sequence drifts rapidly
           multiple align difficult if
            only small % conserved
           protein twilight zone: 30%
            identity
           nucleic acids upstream
            regions: often much less…
Motifs via Phylogenetic
Footprinting
   Phylogenetic Footprint
       Problem: multiple
        alignment of upstreams
        hits twilight zone
       One solution
         search for parsimonious
           substrings…
         without direct alignment
           (Blanchette 2003)
Motifs via Phylogenetic
Footprinting
   Multiple genome
    alignment can work
       need close enough
        species
       Kellis 2003 (four yeasts,
        genome alignments)
       Xie 2005 (“four”
        mammals, genome
        alignment)
       Discussed last time
   Key points
       Genome wide search
       Motif Conservation
        Score: null model based
Recap
   Many programs for
    motif search
       most are useless!
       Lesson:
         must use comparative
          genomics (e.g.
          alignment)
         …or functional
          genomics (e.g.
          expression)
       what about both
        together??
Integrated Motif Finding
   Recall
       comparative genomics
         one upstream region in
          N species
       functional genomics
         N upstream regions in
          one species
   Phylocon (Tompa
    2003)
       N upstreams in N
        species
Integrated Motif Finding
   Phylocon
       given N species
       align upstream regions
       key idea: align the
        alignments
   Boosts sensitivity
       LEU3 hard to find…
Integrated Motif Finding
   Boosts sensitivity
       LEU3 hard to find…
       but align the alignments



                                   true motif
                                    pops out!
Integrated Motif Finding
   Important features
       no prior motif length reqd.
       profile approach matches
        distribution, not sample
        (robust to subs)
       several alignments for each
        upstream are OK
       does well vs. real data…
   ALLR (avg. log. like. ratio)
       Q: are 2 profile columns
        samples from same
        distribution?
       if so, that may be a
        matching motif position…
Open Questions
   Phylocon is strong step
    in right direction…
       align the alignments
   But how do we…
       choose species?
       choose upstreams?
       validate motifs?
       find TF/motif pairs?
Conclusion
   Motifs important
       static, tractable, impt.
       want: genetic regulatory
        networks
   Motif finder selection
       Don’t: use 1 genome w/o
        comparison or func.
        genomics
       Do: use alignment & func
        genomics
   Phylocon (Tompa), MCS
    (Kellis)
       best to date b/c use N
        genes and M species

								
To top