gmt Transcription

Document Sample
gmt Transcription Powered By Docstoc
					                   Generalized Planted (l,d)-Motif
                    Problem with Negative Set

                                Presented by   Marcel Schulz



                                Sponsored by



The generalized (l,d)-motif problem                            Journal Club 15.11.2005
                                      Outline

        The planted (l,d)-motif problem
          Formulation & limitations
        The generalized (l,d)-motif problem with
        negative set
          Formulation
        Solving both problems
          Voting algorithms


        Experimental results

The generalized (l,d)-motif problem             Journal Club 15.11.2005
                                      Outline

      The planted (l,d)-motif problem
          Formulation & limitations
    The generalized (l,d)-motif problem with
   negative set
          Formulation
      Solving the problems
          Voting algorithms


      Experimental results

The generalized (l,d)-motif problem             Journal Club 15.11.2005
                                      Motivation

    Transcription factor binding sites, microRNA target sites
    Algorithms for the discovery of short motifs in DNA are a
   prominent issue in Bioinformatics research




The generalized (l,d)-motif problem                 Journal Club 15.11.2005
                                      Motivation

    Transcription factor binding sites, microRNA target sites
    Algorithms for the discovery of short motifs in DNA are a
   prominent issue in Bioinformatics research
                                                     [1]




The generalized (l,d)-motif problem                 Journal Club 15.11.2005
                                The planted (l,d)-motif
                                       problem

             introduced in 2000 by Pavel Pevzner and Sing-Hoi Sze[2]


                           Find the motif M of length l ?

                                             d=1   l=3       x=mismatch

    given:                              1
                                        .
    •T sequences of length n             .                   x
                                         .
    •one d-variant of M in every        T                x
    sequence


The generalized (l,d)-motif problem                              Journal Club 15.11.2005
                                The planted (l,d)-motif
                                       problem

      The Neighbourhood of a motif M

                                           d
                                                       i
                       N(M,d) =           Σ( )3  l
                                                 i
                                          i=0


                       Neighbourhood for different values of d and l

                       d \       l    3          5     9      15
                           0          1          1     1       1
                           2          37        106   352     991
                           3          64        376   2620   13276

The generalized (l,d)-motif problem                                    Journal Club 15.11.2005
                                The planted (l,d)-motif
                                       problem

     The expected number of length-9 strings in T that have at
   least one d-variant of M, (20 sequences of length 600)
                                                                  Unsolvable region
                                                            1
                                                                  above 1




                                      (9,2) is a
                                      challenging problem       Problems (9,>=3)
                                                                are unsolvable
                   l=9




The generalized (l,d)-motif problem                                     Journal Club 15.11.2005
                                The planted (l,d)-motif
                                       problem

     The expected number of length-9 strings in T that have at
   least one d-variant of M, (20 sequences of length 600)
                                                     Unsolvable region
       1                                             above 1
               l=9

                     l=15


                                                   Problems (15,>=6)
                             l=30
                                                             (30,>=14)
                                                   are unsolvable



The generalized (l,d)-motif problem                        Journal Club 15.11.2005
                              The generalized planted
                                (l,d)-motif problem

              introduced in 2005 by Henry C.M. Leung & Francis Y. L. Chin [3]




                        True set T                                    False set F

                                          d=1       l=3       x=mismatch

   1                                                      1
   .                                                      .
   .           x                      x                   .
   .                                                      .
                                x               x                                    x
   T                                                      F


The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
                              The generalized planted
                                (l,d)-motif problem

              introduced in 2005 by Henry C.M. Leung & Francis Y. L. Chin [3]




                        True set T                                        False set T

                                          d=1       l=3       x=mismatch

   1                                                      1
   .                                                      .
   .           x                      x                   .
   .                                                      .
                                x               x                                        x
   T                                                      F

                                    no d-variant of this string is the motif
The generalized (l,d)-motif problem                                             Journal Club 15.11.2005
                              The generalized planted
                                (l,d)-motif problem


   Expected number of length-9 strings           1
  that don‘t have any d-variant of M in F


                                                     l=9




                                      in T
                                      not in F




The generalized (l,d)-motif problem                        Journal Club 15.11.2005
                              The generalized planted
                                (l,d)-motif problem

  Expected number of length-l strings that
  don‘t have a d-variant of M in F
                                                                             1
                                                            l=30
  low d      more
information in T                       l=9



  high d    more
information in F



                                             in T
                                             not in F
The generalized (l,d)-motif problem                     Journal Club 15.11.2005
                              The generalized planted
                                (l,d)-motif problem

  Expected number of length-l strings that have at least on d-variant of M
in T but no d-variant of M in F
                                                                                      1
                                                                     l=30
                  in T
                                      l=9
                  not in F




  in T but not in F




The generalized (l,d)-motif problem                              Journal Club 15.11.2005
                              The generalized planted
                                (l,d)-motif problem

                                         all generalized problems for
                                      l <= 20 are solvable
                                         we have new challenging
                                         generalized (30,13) and (30,14)-
                                         problems




The generalized (l,d)-motif problem                             Journal Club 15.11.2005
                              The generalized planted
                                (l,d)-motif problem

                                         all generalized problems for
                                      l <= 20 are solvable
                                         we have new challenging
                                         generalized (30,13) and (30,14)-
                                         problems




The generalized (l,d)-motif problem                             Journal Club 15.11.2005
                            Solving Both Problems

                    Depending on d we use a different strategy

                                      small d   large d




                             search in the      search in the
                               True set           False set



                             filter with the    filter with the
                                 False set          True set



The generalized (l,d)-motif problem                               Journal Club 15.11.2005
                                      Search with Voting
                                         Algorithms

   Idea 1: Motif M is a d-variant of all its` d-variants
           d=1 l= 3

          Motif M = ACG                             ACT               ACG

                                        1-variant         1-variant
                                        of M              of ACT



       We know: Motif M gets 1 vote
       from every sequence !


The generalized (l,d)-motif problem                                     Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                   # set with candidate motifs
   for i = 1 to T
                do for j = 1 to n – l + 1
                            do for each length-l string s in N(s=Ti [j…j+l-1],d)
                                        do if R[s] <> i
                                                   then V[s] = V[s] + 1
                                                         R[s] = i
   for j = 1 to n – l + 1
                do for each length-l string s in N(s=Tt [j…j+l-1],d)          i=   1 j=1       C={}
                            do if V[s] = T
                                        then insert s into C
                                                                              s       V[s] R[s]
                   Example:                                                 AT          0          0
  d=0                   T1 = A T A C
  l=3                                                                       TA          0          0
                        T2 = G A T A
  M = AT
                                                                            AC          0          0
The generalized (l,d)-motif problem                                                 Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=1 j=1          C={}
                do if V[s] = T
                           then insert s into C
                                                                       s      V[s] R[s]
                   Example:                                            AT       1          0
  d=0                   T1 = A T A C
  l=3                                                                  TA       0          0
                        T2 = G A T A
  M = AT
                                                                       AC       0          0
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=1 j=1          C={}
                do if V[s] = T
                           then insert s into C
                                                                       s      V[s] R[s]
                   Example:                                            AT       1          1
  d=0                   T1 = A T A C
  l=3                                                                  TA       0          0
                        T2 = G A T A
  M = AT
                                                                       AC       0          0
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=1 j=2          C={}
                do if V[s] = T
                           then insert s into C
                                                                       s      V[s] R[s]
                   Example:                                            AT       1          1
  d=0                   T1 = A T A C
  l=3                                                                  TA       1          1
                        T2 = G A T A
  M = AT
                                                                       AC       0          0
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=1 j=3          C={}
                do if V[s] = T
                           then insert s into C
                                                                       s      V[s] R[s]
                   Example:                                            AT       1          1
  d=0                   T1 = A T A C
  l=3                                                                  TA       1          1
                        T2 = G A T A
  M = AT
                                                                       AC       1          1
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=2 j=1          C={}
                do if V[s] = T
                           then insert s into C
                                                                       s      V[s] R[s]
                                                                       AT       1          1
                   Example:
  d=0                   T1 = A T A C                                   TA       1          1
  l=3                   T2 = G A T A                                   AC       1          1
  M = AT
                                                                       GA       1          1
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=2 j=2          C={}
                do if V[s] = T
                           then insert s into C
                                                                       s      V[s] R[s]
                                                                       AT       2          2
                   Example:
  d=0                   T1 = A T A C                                   TA       1          1
  l=3                   T2 = G A T A                                   AC       1          1
  M = AT
                                                                       GA       1          1
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=2 j=3          C={}
                do if V[s] = T
                           then insert s into C
                                                                       s      V[s] R[s]
                                                                       AT       2          2
                   Example:
  d=0                   T1 = A T A C                                   TA       2          2
  l=3                   T2 = G A T A                                   AC       1          1
  M = AT
                                                                       GA       1          1
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           T = 2, j = 1,     C={}
                do if V[s] = T
                           then insert s into C
                                                                        s      V[s] R[s]
                                                                       AT        2          2
                   Example:
  d=0                   T1 = A T A C                                   TA        2          2
  l=3                   T2 = G A T A                                   AC        1          1
  M = AT
                                                                       GA        1          1
The generalized (l,d)-motif problem                                          Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           T = 2, j = 2, C = { AT}
                do if V[s] = T
                           then insert s into C
                                                                         s     V[s] R[s]
                                                                        AT       2          2
                   Example:
  d=0                   T1 = A T A C                                    TA       2          2
  l=3                   T2 = G A T A                                    AC       1          1
  M = AT
                                                                        GA       1          1
The generalized (l,d)-motif problem                                          Journal Club 15.11.2005
                            Search with Voting from T

   C={}                                 # set with candidate motifs
   for i = 1 to T
        do for j = 1 to n – l + 1
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                      then V[s] = V[s] + 1
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)           T = 2, j = 3, C = { AT,TA}
                do if V[s] = T
                           then insert s into C
                                                                          s     V[s] R[s]
                                                                         AT       2          2
                   Example:
  d=0                   T1 = A T A C                                     TA       2          2
  l=3                   T2 = G A T A                                     AC       1          1
  M = AT
                                                                        GA        1          1
The generalized (l,d)-motif problem                                           Journal Club 15.11.2005
                             Filter from False set F

   C = { AT,TA}, C* = { }                     # set with candidate motifs
   for a = 1 to |C|
       true = 1
       do for i = 1 to F
             do for j = 1 to n – l + 1
                       if Ca is in Neighbourhood of s = Fi [j…j+l-1]
                                true = 0
       if true == 1
             then insert Ca into C*
                   Example:
  d=0                   F1 = G G G A           a = 2, i = 1, j = 3
  l=3                   F2 = C C C A
  M = AT

The generalized (l,d)-motif problem                                  Journal Club 15.11.2005
                             Filter from False set F

   C = { AT,TA}, C* = { }                    # set with candidate motifs
   for a = 1 to |C|
       true = 1
       do for i = 1 to F
             do for j = 1 to n – l + 1
                       if Ca is in Neighbourhood of s = Fi [j…j+l-1]
                                true = 0
       if true == 1
             then insert Ca into C*
                   Example:
  d=0                   F1 = G G T A                       C* = { AT }
  l=3                   F2 = C C T A
  M = AT

The generalized (l,d)-motif problem                              Journal Club 15.11.2005
                               Search and Filtering
   # voting from T
   C={}
   for i = 1 to T
        do for j = 1 to n – l + 1                                             d                 i
                do for each length-l string s in N(s=Ti [j…j+l-1],d)
                           do if R[s] <> i
                                                                       nT Σ ( ) 3     l
                                                                                      i
                                                                          i=0
                                      then V[s] = V[s] + 1
                                                                         Neighbourhood(s,d)
                                            R[s] = i
   for j = 1 to n – l + 1
        do for each length-l string s in N(s=Tt [j…j+l-1],d)              d
                                                                                            i
                do if V[s] = T
                           then insert s into C                        nΣ( )3     l
                                                                                  i
                                                                        i=0
   for a = 1 to |C|
       do for i = 1 to F
              do for j = 1 to n – l + 1
                  if Ca is in Neighbourhood of s = Fi [j…j+l-1]         |C| n F l
                        We can solve the (9,<=2),(15,<=5), challenging(30,<=13)-problems
The generalized (l,d)-motif problem                                           Journal Club 15.11.2005
                            Solving Both Problems

                     Depending on d we use a different strategy

                                      small d   large d




                             vote from the      vote from the
                                True set          False set



                             filter with the    filter with the
                                 False set          True set



The generalized (l,d)-motif problem                               Journal Club 15.11.2005
                            Search with Voting from F

   find length-l string that has no d-variant in F
   C={}
   for i = 1 to F
       do for j = 1 to n – l + 1                                         d
                                                                                          i
              do for each length-l string s in N(s=Fi [j…j+l-1],d)   nF Σ ( ) 3  l
                                                                                 i
                        do if R[s] <> i                                i=0
                                  then V[s] = V[s] + 1
                                        R[s] = i


                                           not suitable for large d !

                               reduce d and l to values which
                               have acceptable running time
The generalized (l,d)-motif problem                                    Journal Club 15.11.2005
                            Search with Voting from F

   Example: consider a generalized (4,3)-problem
            vote from F with a (3,2)-problem
            recombine candidate motifs and filter with T

                                      Motif M = ATCG
                                              vote from F with (3,2)-problem


                    find:        prefix ATC      suffix TCG
                                                       recombine to Motif M

                                          ATCG
                                              filter out false candidate motifs with T

The generalized (l,d)-motif problem                                        Journal Club 15.11.2005
                            Search with Voting from F


   using reduced generalized problems we can solve:

                                 l     9    15     30

                                d     >=3   >=6   >=20

                                d‘     1     4     6

                  by first voting from F
                  recombine overlapping candidate motifs
                  filtering with T
The generalized (l,d)-motif problem                      Journal Club 15.11.2005
                               Experimental results

        T yeast promoter sequences each containing d-variant of the motif
        F randomly picked yeast promoter sequences
        d=1




         found the binding sites for all sets within one second




The generalized (l,d)-motif problem                           Journal Club 15.11.2005
    [1] medline trend Dan corlan
    [2] Pavel A. Pevzner, Sing-Hoi Sze, Combinatorial Approaches to Finding Subtle
    Signals in DNA Sequences, International Conference on Intelligent Systems for
    Molecular Biology 8 (200) 269-278
    [3]Generalized planted (l,d)-motif problem with negative set, Henry C.M. Leung ,
     Francis Y. L. Chin, WABI 2005




The generalized (l,d)-motif problem                                 Journal Club 15.11.2005

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:5/5/2013
language:English
pages:38