# gmt Transcription

Document Sample

```					                   Generalized Planted (l,d)-Motif
Problem with Negative Set

Presented by   Marcel Schulz

The generalized (l,d)-motif problem                            Journal Club 15.11.2005
Outline

The planted (l,d)-motif problem
 Formulation & limitations
The generalized (l,d)-motif problem with
negative set
 Formulation
Solving both problems
 Voting algorithms

Experimental results

The generalized (l,d)-motif problem             Journal Club 15.11.2005
Outline

The planted (l,d)-motif problem
 Formulation & limitations
The generalized (l,d)-motif problem with
negative set
 Formulation
Solving the problems
 Voting algorithms

Experimental results

The generalized (l,d)-motif problem             Journal Club 15.11.2005
Motivation

Transcription factor binding sites, microRNA target sites
Algorithms for the discovery of short motifs in DNA are a
prominent issue in Bioinformatics research

The generalized (l,d)-motif problem                 Journal Club 15.11.2005
Motivation

Transcription factor binding sites, microRNA target sites
Algorithms for the discovery of short motifs in DNA are a
prominent issue in Bioinformatics research
[1]

The generalized (l,d)-motif problem                 Journal Club 15.11.2005
The planted (l,d)-motif
problem

introduced in 2000 by Pavel Pevzner and Sing-Hoi Sze[2]

Find the motif M of length l ?

d=1   l=3       x=mismatch

given:                              1
.
•T sequences of length n             .                   x
.
•one d-variant of M in every        T                x
sequence

The generalized (l,d)-motif problem                              Journal Club 15.11.2005
The planted (l,d)-motif
problem

The Neighbourhood of a motif M

d
i
N(M,d) =           Σ( )3  l
i
i=0

Neighbourhood for different values of d and l

d \       l    3          5     9      15
0          1          1     1       1
2          37        106   352     991
3          64        376   2620   13276

The generalized (l,d)-motif problem                                    Journal Club 15.11.2005
The planted (l,d)-motif
problem

The expected number of length-9 strings in T that have at
least one d-variant of M, (20 sequences of length 600)
Unsolvable region
1
above 1

(9,2) is a
challenging problem       Problems (9,>=3)
are unsolvable
l=9

The generalized (l,d)-motif problem                                     Journal Club 15.11.2005
The planted (l,d)-motif
problem

The expected number of length-9 strings in T that have at
least one d-variant of M, (20 sequences of length 600)
Unsolvable region
1                                             above 1
l=9

l=15

Problems (15,>=6)
l=30
(30,>=14)
are unsolvable

The generalized (l,d)-motif problem                        Journal Club 15.11.2005
The generalized planted
(l,d)-motif problem

introduced in 2005 by Henry C.M. Leung & Francis Y. L. Chin [3]

True set T                                    False set F

d=1       l=3       x=mismatch

1                                                      1
.                                                      .
.           x                      x                   .
.                                                      .
x               x                                    x
T                                                      F

The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
The generalized planted
(l,d)-motif problem

introduced in 2005 by Henry C.M. Leung & Francis Y. L. Chin [3]

True set T                                        False set T

d=1       l=3       x=mismatch

1                                                      1
.                                                      .
.           x                      x                   .
.                                                      .
x               x                                        x
T                                                      F

no d-variant of this string is the motif
The generalized (l,d)-motif problem                                             Journal Club 15.11.2005
The generalized planted
(l,d)-motif problem

Expected number of length-9 strings           1
that don‘t have any d-variant of M in F

l=9

in T
not in F

The generalized (l,d)-motif problem                        Journal Club 15.11.2005
The generalized planted
(l,d)-motif problem

Expected number of length-l strings that
don‘t have a d-variant of M in F
1
l=30
low d      more
information in T                       l=9

high d    more
information in F

in T
not in F
The generalized (l,d)-motif problem                     Journal Club 15.11.2005
The generalized planted
(l,d)-motif problem

Expected number of length-l strings that have at least on d-variant of M
in T but no d-variant of M in F
1
l=30
in T
l=9
not in F

in T but not in F

The generalized (l,d)-motif problem                              Journal Club 15.11.2005
The generalized planted
(l,d)-motif problem

all generalized problems for
l <= 20 are solvable
we have new challenging
generalized (30,13) and (30,14)-
problems

The generalized (l,d)-motif problem                             Journal Club 15.11.2005
The generalized planted
(l,d)-motif problem

all generalized problems for
l <= 20 are solvable
we have new challenging
generalized (30,13) and (30,14)-
problems

The generalized (l,d)-motif problem                             Journal Club 15.11.2005
Solving Both Problems

Depending on d we use a different strategy

small d   large d

search in the      search in the
True set           False set

filter with the    filter with the
False set          True set

The generalized (l,d)-motif problem                               Journal Club 15.11.2005
Search with Voting
Algorithms

Idea 1: Motif M is a d-variant of all its` d-variants
d=1 l= 3

Motif M = ACG                             ACT               ACG

1-variant         1-variant
of M              of ACT

We know: Motif M gets 1 vote
from every sequence !

The generalized (l,d)-motif problem                                     Journal Club 15.11.2005
Search with Voting from T

C={}                                   # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)          i=   1 j=1       C={}
do if V[s] = T
then insert s into C
s       V[s] R[s]
Example:                                                 AT          0          0
d=0                   T1 = A T A C
l=3                                                                       TA          0          0
T2 = G A T A
M = AT
AC          0          0
The generalized (l,d)-motif problem                                                 Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=1 j=1          C={}
do if V[s] = T
then insert s into C
s      V[s] R[s]
Example:                                            AT       1          0
d=0                   T1 = A T A C
l=3                                                                  TA       0          0
T2 = G A T A
M = AT
AC       0          0
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=1 j=1          C={}
do if V[s] = T
then insert s into C
s      V[s] R[s]
Example:                                            AT       1          1
d=0                   T1 = A T A C
l=3                                                                  TA       0          0
T2 = G A T A
M = AT
AC       0          0
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=1 j=2          C={}
do if V[s] = T
then insert s into C
s      V[s] R[s]
Example:                                            AT       1          1
d=0                   T1 = A T A C
l=3                                                                  TA       1          1
T2 = G A T A
M = AT
AC       0          0
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=1 j=3          C={}
do if V[s] = T
then insert s into C
s      V[s] R[s]
Example:                                            AT       1          1
d=0                   T1 = A T A C
l=3                                                                  TA       1          1
T2 = G A T A
M = AT
AC       1          1
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=2 j=1          C={}
do if V[s] = T
then insert s into C
s      V[s] R[s]
AT       1          1
Example:
d=0                   T1 = A T A C                                   TA       1          1
l=3                   T2 = G A T A                                   AC       1          1
M = AT
GA       1          1
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=2 j=2          C={}
do if V[s] = T
then insert s into C
s      V[s] R[s]
AT       2          2
Example:
d=0                   T1 = A T A C                                   TA       1          1
l=3                   T2 = G A T A                                   AC       1          1
M = AT
GA       1          1
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           i=2 j=3          C={}
do if V[s] = T
then insert s into C
s      V[s] R[s]
AT       2          2
Example:
d=0                   T1 = A T A C                                   TA       2          2
l=3                   T2 = G A T A                                   AC       1          1
M = AT
GA       1          1
The generalized (l,d)-motif problem                                         Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           T = 2, j = 1,     C={}
do if V[s] = T
then insert s into C
s      V[s] R[s]
AT        2          2
Example:
d=0                   T1 = A T A C                                   TA        2          2
l=3                   T2 = G A T A                                   AC        1          1
M = AT
GA        1          1
The generalized (l,d)-motif problem                                          Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           T = 2, j = 2, C = { AT}
do if V[s] = T
then insert s into C
s     V[s] R[s]
AT       2          2
Example:
d=0                   T1 = A T A C                                    TA       2          2
l=3                   T2 = G A T A                                    AC       1          1
M = AT
GA       1          1
The generalized (l,d)-motif problem                                          Journal Club 15.11.2005
Search with Voting from T

C={}                                 # set with candidate motifs
for i = 1 to T
do for j = 1 to n – l + 1
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
then V[s] = V[s] + 1
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)           T = 2, j = 3, C = { AT,TA}
do if V[s] = T
then insert s into C
s     V[s] R[s]
AT       2          2
Example:
d=0                   T1 = A T A C                                     TA       2          2
l=3                   T2 = G A T A                                     AC       1          1
M = AT
GA        1          1
The generalized (l,d)-motif problem                                           Journal Club 15.11.2005
Filter from False set F

C = { AT,TA}, C* = { }                     # set with candidate motifs
for a = 1 to |C|
true = 1
do for i = 1 to F
do for j = 1 to n – l + 1
if Ca is in Neighbourhood of s = Fi [j…j+l-1]
true = 0
if true == 1
then insert Ca into C*
Example:
d=0                   F1 = G G G A           a = 2, i = 1, j = 3
l=3                   F2 = C C C A
M = AT

The generalized (l,d)-motif problem                                  Journal Club 15.11.2005
Filter from False set F

C = { AT,TA}, C* = { }                    # set with candidate motifs
for a = 1 to |C|
true = 1
do for i = 1 to F
do for j = 1 to n – l + 1
if Ca is in Neighbourhood of s = Fi [j…j+l-1]
true = 0
if true == 1
then insert Ca into C*
Example:
d=0                   F1 = G G T A                       C* = { AT }
l=3                   F2 = C C T A
M = AT

The generalized (l,d)-motif problem                              Journal Club 15.11.2005
Search and Filtering
# voting from T
C={}
for i = 1 to T
do for j = 1 to n – l + 1                                             d                 i
do for each length-l string s in N(s=Ti [j…j+l-1],d)
do if R[s] <> i
nT Σ ( ) 3     l
i
i=0
then V[s] = V[s] + 1
Neighbourhood(s,d)
R[s] = i
for j = 1 to n – l + 1
do for each length-l string s in N(s=Tt [j…j+l-1],d)              d
i
do if V[s] = T
then insert s into C                        nΣ( )3     l
i
i=0
for a = 1 to |C|
do for i = 1 to F
do for j = 1 to n – l + 1
if Ca is in Neighbourhood of s = Fi [j…j+l-1]         |C| n F l
We can solve the (9,<=2),(15,<=5), challenging(30,<=13)-problems
The generalized (l,d)-motif problem                                           Journal Club 15.11.2005
Solving Both Problems

Depending on d we use a different strategy

small d   large d

vote from the      vote from the
True set          False set

filter with the    filter with the
False set          True set

The generalized (l,d)-motif problem                               Journal Club 15.11.2005
Search with Voting from F

find length-l string that has no d-variant in F
C={}
for i = 1 to F
do for j = 1 to n – l + 1                                         d
i
do for each length-l string s in N(s=Fi [j…j+l-1],d)   nF Σ ( ) 3  l
i
do if R[s] <> i                                i=0
then V[s] = V[s] + 1
R[s] = i

not suitable for large d !

reduce d and l to values which
have acceptable running time
The generalized (l,d)-motif problem                                    Journal Club 15.11.2005
Search with Voting from F

Example: consider a generalized (4,3)-problem
vote from F with a (3,2)-problem
recombine candidate motifs and filter with T

Motif M = ATCG
vote from F with (3,2)-problem

find:        prefix ATC      suffix TCG
recombine to Motif M

ATCG
filter out false candidate motifs with T

The generalized (l,d)-motif problem                                        Journal Club 15.11.2005
Search with Voting from F

using reduced generalized problems we can solve:

l     9    15     30

d     >=3   >=6   >=20

d‘     1     4     6

by first voting from F
recombine overlapping candidate motifs
filtering with T
The generalized (l,d)-motif problem                      Journal Club 15.11.2005
Experimental results

T yeast promoter sequences each containing d-variant of the motif
F randomly picked yeast promoter sequences
d=1

found the binding sites for all sets within one second

The generalized (l,d)-motif problem                           Journal Club 15.11.2005
[1] medline trend Dan corlan
[2] Pavel A. Pevzner, Sing-Hoi Sze, Combinatorial Approaches to Finding Subtle
Signals in DNA Sequences, International Conference on Intelligent Systems for
Molecular Biology 8 (200) 269-278
[3]Generalized planted (l,d)-motif problem with negative set, Henry C.M. Leung ,
Francis Y. L. Chin, WABI 2005

The generalized (l,d)-motif problem                                 Journal Club 15.11.2005

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 2 posted: 5/5/2013 language: English pages: 38