Document Sample

Generalized Planted (l,d)-Motif Problem with Negative Set Presented by Marcel Schulz Sponsored by The generalized (l,d)-motif problem Journal Club 15.11.2005 Outline The planted (l,d)-motif problem Formulation & limitations The generalized (l,d)-motif problem with negative set Formulation Solving both problems Voting algorithms Experimental results The generalized (l,d)-motif problem Journal Club 15.11.2005 Outline The planted (l,d)-motif problem Formulation & limitations The generalized (l,d)-motif problem with negative set Formulation Solving the problems Voting algorithms Experimental results The generalized (l,d)-motif problem Journal Club 15.11.2005 Motivation Transcription factor binding sites, microRNA target sites Algorithms for the discovery of short motifs in DNA are a prominent issue in Bioinformatics research The generalized (l,d)-motif problem Journal Club 15.11.2005 Motivation Transcription factor binding sites, microRNA target sites Algorithms for the discovery of short motifs in DNA are a prominent issue in Bioinformatics research [1] The generalized (l,d)-motif problem Journal Club 15.11.2005 The planted (l,d)-motif problem introduced in 2000 by Pavel Pevzner and Sing-Hoi Sze[2] Find the motif M of length l ? d=1 l=3 x=mismatch given: 1 . •T sequences of length n . x . •one d-variant of M in every T x sequence The generalized (l,d)-motif problem Journal Club 15.11.2005 The planted (l,d)-motif problem The Neighbourhood of a motif M d i N(M,d) = Σ( )3 l i i=0 Neighbourhood for different values of d and l d \ l 3 5 9 15 0 1 1 1 1 2 37 106 352 991 3 64 376 2620 13276 The generalized (l,d)-motif problem Journal Club 15.11.2005 The planted (l,d)-motif problem The expected number of length-9 strings in T that have at least one d-variant of M, (20 sequences of length 600) Unsolvable region 1 above 1 (9,2) is a challenging problem Problems (9,>=3) are unsolvable l=9 The generalized (l,d)-motif problem Journal Club 15.11.2005 The planted (l,d)-motif problem The expected number of length-9 strings in T that have at least one d-variant of M, (20 sequences of length 600) Unsolvable region 1 above 1 l=9 l=15 Problems (15,>=6) l=30 (30,>=14) are unsolvable The generalized (l,d)-motif problem Journal Club 15.11.2005 The generalized planted (l,d)-motif problem introduced in 2005 by Henry C.M. Leung & Francis Y. L. Chin [3] True set T False set F d=1 l=3 x=mismatch 1 1 . . . x x . . . x x x T F The generalized (l,d)-motif problem Journal Club 15.11.2005 The generalized planted (l,d)-motif problem introduced in 2005 by Henry C.M. Leung & Francis Y. L. Chin [3] True set T False set T d=1 l=3 x=mismatch 1 1 . . . x x . . . x x x T F no d-variant of this string is the motif The generalized (l,d)-motif problem Journal Club 15.11.2005 The generalized planted (l,d)-motif problem Expected number of length-9 strings 1 that don‘t have any d-variant of M in F l=9 in T not in F The generalized (l,d)-motif problem Journal Club 15.11.2005 The generalized planted (l,d)-motif problem Expected number of length-l strings that don‘t have a d-variant of M in F 1 l=30 low d more information in T l=9 high d more information in F in T not in F The generalized (l,d)-motif problem Journal Club 15.11.2005 The generalized planted (l,d)-motif problem Expected number of length-l strings that have at least on d-variant of M in T but no d-variant of M in F 1 l=30 in T l=9 not in F in T but not in F The generalized (l,d)-motif problem Journal Club 15.11.2005 The generalized planted (l,d)-motif problem all generalized problems for l <= 20 are solvable we have new challenging generalized (30,13) and (30,14)- problems The generalized (l,d)-motif problem Journal Club 15.11.2005 The generalized planted (l,d)-motif problem all generalized problems for l <= 20 are solvable we have new challenging generalized (30,13) and (30,14)- problems The generalized (l,d)-motif problem Journal Club 15.11.2005 Solving Both Problems Depending on d we use a different strategy small d large d search in the search in the True set False set filter with the filter with the False set True set The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting Algorithms Idea 1: Motif M is a d-variant of all its` d-variants d=1 l= 3 Motif M = ACG ACT ACG 1-variant 1-variant of M of ACT We know: Motif M gets 1 vote from every sequence ! The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) i= 1 j=1 C={} do if V[s] = T then insert s into C s V[s] R[s] Example: AT 0 0 d=0 T1 = A T A C l=3 TA 0 0 T2 = G A T A M = AT AC 0 0 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) i=1 j=1 C={} do if V[s] = T then insert s into C s V[s] R[s] Example: AT 1 0 d=0 T1 = A T A C l=3 TA 0 0 T2 = G A T A M = AT AC 0 0 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) i=1 j=1 C={} do if V[s] = T then insert s into C s V[s] R[s] Example: AT 1 1 d=0 T1 = A T A C l=3 TA 0 0 T2 = G A T A M = AT AC 0 0 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) i=1 j=2 C={} do if V[s] = T then insert s into C s V[s] R[s] Example: AT 1 1 d=0 T1 = A T A C l=3 TA 1 1 T2 = G A T A M = AT AC 0 0 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) i=1 j=3 C={} do if V[s] = T then insert s into C s V[s] R[s] Example: AT 1 1 d=0 T1 = A T A C l=3 TA 1 1 T2 = G A T A M = AT AC 1 1 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) i=2 j=1 C={} do if V[s] = T then insert s into C s V[s] R[s] AT 1 1 Example: d=0 T1 = A T A C TA 1 1 l=3 T2 = G A T A AC 1 1 M = AT GA 1 1 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) i=2 j=2 C={} do if V[s] = T then insert s into C s V[s] R[s] AT 2 2 Example: d=0 T1 = A T A C TA 1 1 l=3 T2 = G A T A AC 1 1 M = AT GA 1 1 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) i=2 j=3 C={} do if V[s] = T then insert s into C s V[s] R[s] AT 2 2 Example: d=0 T1 = A T A C TA 2 2 l=3 T2 = G A T A AC 1 1 M = AT GA 1 1 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) T = 2, j = 1, C={} do if V[s] = T then insert s into C s V[s] R[s] AT 2 2 Example: d=0 T1 = A T A C TA 2 2 l=3 T2 = G A T A AC 1 1 M = AT GA 1 1 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) T = 2, j = 2, C = { AT} do if V[s] = T then insert s into C s V[s] R[s] AT 2 2 Example: d=0 T1 = A T A C TA 2 2 l=3 T2 = G A T A AC 1 1 M = AT GA 1 1 The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from T C={} # set with candidate motifs for i = 1 to T do for j = 1 to n – l + 1 do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i then V[s] = V[s] + 1 R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) T = 2, j = 3, C = { AT,TA} do if V[s] = T then insert s into C s V[s] R[s] AT 2 2 Example: d=0 T1 = A T A C TA 2 2 l=3 T2 = G A T A AC 1 1 M = AT GA 1 1 The generalized (l,d)-motif problem Journal Club 15.11.2005 Filter from False set F C = { AT,TA}, C* = { } # set with candidate motifs for a = 1 to |C| true = 1 do for i = 1 to F do for j = 1 to n – l + 1 if Ca is in Neighbourhood of s = Fi [j…j+l-1] true = 0 if true == 1 then insert Ca into C* Example: d=0 F1 = G G G A a = 2, i = 1, j = 3 l=3 F2 = C C C A M = AT The generalized (l,d)-motif problem Journal Club 15.11.2005 Filter from False set F C = { AT,TA}, C* = { } # set with candidate motifs for a = 1 to |C| true = 1 do for i = 1 to F do for j = 1 to n – l + 1 if Ca is in Neighbourhood of s = Fi [j…j+l-1] true = 0 if true == 1 then insert Ca into C* Example: d=0 F1 = G G T A C* = { AT } l=3 F2 = C C T A M = AT The generalized (l,d)-motif problem Journal Club 15.11.2005 Search and Filtering # voting from T C={} for i = 1 to T do for j = 1 to n – l + 1 d i do for each length-l string s in N(s=Ti [j…j+l-1],d) do if R[s] <> i nT Σ ( ) 3 l i i=0 then V[s] = V[s] + 1 Neighbourhood(s,d) R[s] = i for j = 1 to n – l + 1 do for each length-l string s in N(s=Tt [j…j+l-1],d) d i do if V[s] = T then insert s into C nΣ( )3 l i i=0 for a = 1 to |C| do for i = 1 to F do for j = 1 to n – l + 1 if Ca is in Neighbourhood of s = Fi [j…j+l-1] |C| n F l We can solve the (9,<=2),(15,<=5), challenging(30,<=13)-problems The generalized (l,d)-motif problem Journal Club 15.11.2005 Solving Both Problems Depending on d we use a different strategy small d large d vote from the vote from the True set False set filter with the filter with the False set True set The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from F find length-l string that has no d-variant in F C={} for i = 1 to F do for j = 1 to n – l + 1 d i do for each length-l string s in N(s=Fi [j…j+l-1],d) nF Σ ( ) 3 l i do if R[s] <> i i=0 then V[s] = V[s] + 1 R[s] = i not suitable for large d ! reduce d and l to values which have acceptable running time The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from F Example: consider a generalized (4,3)-problem vote from F with a (3,2)-problem recombine candidate motifs and filter with T Motif M = ATCG vote from F with (3,2)-problem find: prefix ATC suffix TCG recombine to Motif M ATCG filter out false candidate motifs with T The generalized (l,d)-motif problem Journal Club 15.11.2005 Search with Voting from F using reduced generalized problems we can solve: l 9 15 30 d >=3 >=6 >=20 d‘ 1 4 6 by first voting from F recombine overlapping candidate motifs filtering with T The generalized (l,d)-motif problem Journal Club 15.11.2005 Experimental results T yeast promoter sequences each containing d-variant of the motif F randomly picked yeast promoter sequences d=1 found the binding sites for all sets within one second The generalized (l,d)-motif problem Journal Club 15.11.2005 [1] medline trend Dan corlan [2] Pavel A. Pevzner, Sing-Hoi Sze, Combinatorial Approaches to Finding Subtle Signals in DNA Sequences, International Conference on Intelligent Systems for Molecular Biology 8 (200) 269-278 [3]Generalized planted (l,d)-motif problem with negative set, Henry C.M. Leung , Francis Y. L. Chin, WABI 2005 The generalized (l,d)-motif problem Journal Club 15.11.2005

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 2 |

posted: | 5/5/2013 |

language: | English |

pages: | 38 |

OTHER DOCS BY erin.natividad

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.