Embed
Email

ps3

Document Sample

Categories
Tags
Stats
views:
5
posted:
11/27/2011
language:
English
pages:
4
CS262: Computational genomics





CS 262 – Problem Set 3

(due at the beginning of class on Thursday, Feb 26)



Collaboration is allowed, but you must submit separate writeups. Please also

write the names of all your collaborators on your submissions.



1. Sequence assembly

(a) Let F be a collection of fragments. The overlap multigraph of F, denoted

as OM(F) is a directed, weighted multigraph. The set V of nodes of this

structure is just F itself. A directed edge from a ε F to a different

fragment b ε F with weight t >= 0 exists if the suffix of a with t characters

is a prefix of b.

i. Explain how directed paths in this graph give rise to a multiple alignment of

sequences belonging to this path. Also explain how a consensus sequence can

be derived, providing a common superstring of the involved sequences.

ii. Let P be a path in OM(F) that goes through every vertex (P is any

Hamiltonian path). Let S(P) be the common superstring derived from P. Let

w(P) be the weight of P. Prove that minimizing |S(P)| is equivalent to

maximizing w(P). (Note: S(P) is the sequence of the target DNA molecule to

be assembled)

iii. A collection of fragments F is said to be substring-free if there are no

two distinct strings a and b in F such that a is a substring of b. Prove that if

S is a shortest common superstring of F, there is a Hamiltonian path P such

that S = S(P).

iv. If F is a collection of strings, prove that there is a unique substring-free

collection G equivalent to F (ie, having the same superstring). How does this

result help you?

v. Prof. Kotovsky suggests a greedy approach to solve the sequence assembly

problem formulated as a shortest common superstring problem. He says,

“We know that looking for shortest common superstrings is the same as

looking for Hamiltonian paths of maximum weight in a directed multigraph.

To maximize the weight, we can simplify the multigraph and consider only

the heaviest edge between every pair of nodes, discarding other parallel

edges of smaller weight. To compute the heaviest path, continuously add the

heaviest available edge, which is one that does not upset the construction of

a Hamiltonian path given the previously chosen edges. Because the graph is

complete (zero-weight edges are also assumed to be present), this process









1

CS262: Computational genomics





stops only when a path containing all vertices is formed.” Prove that this

greedy strategy does not always produce the best result.



2. Chaining Local Alignments

(a) Consider the following problem:

Let S be a sequence of numbers, n1, . . . , nk. Let each number have an

associated weight, w1, . . . ,wk. Find the heaviest increasing subsequence of S,

that is, a subsequence ni1 , . . . , nim such that i1 = 1.

i. What string will maximize the number of nodes (edges) in our suffix tree?

ii. What string will minimize the number of nodes (edges) in our suffix tree

if m >= n? Extra Credit: What string will minimize the number of nodes

(edges) in our suffix tree if m < n?

(c) A maximal pair in a string S is a pair of identical substrings α and β in S

such that the character to the immediate left (right) of α is different from

the character to the immediate left (right) of β.

Give a linear-time algorithm that takes in a string S and finds the longest

maximal pair in which the two copies do not overlap. That is, if the two

copies begin at positions p1 < p2 and are of length n’, then p1+n’ < p2. This is

exercise #40 of Chapter 7 of the Gusfield book (page 173).









4



Related docs
Other docs by Stariya Js @ B...
How we become literate
Views: 0  |  Downloads: 0
15189
Views: 0  |  Downloads: 0
Enrollment Agreement
Views: 0  |  Downloads: 0
seddc 061009 pm
Views: 0  |  Downloads: 0
Juvanec-KamenNaKamen-eng
Views: 0  |  Downloads: 0
Syllabus Macro Fall 10
Views: 0  |  Downloads: 0
23401
Views: 0  |  Downloads: 0
9-11-RPH-stonefabrication-ord-memo-agss
Views: 0  |  Downloads: 0
Junior_Pre_season_Soccer_League_application
Views: 0  |  Downloads: 0
guide_to_moodle_quizzes
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!