Embed
Email

Using OWEN in a batch mode

Document Sample

Shared by: gegeshandong
Categories
Tags
Stats
views:
1
posted:
12/10/2011
language:
pages:
3
Using OWEN in a batch mode



1) Technically, this is simple. From UNIX command line, you need to type



$ ./OWEN.sh file1 from1 to1 invcomp1 file2 from2 to2 invcomp2 output CMD_FILE



where:



file1 is the name of a file with the 1st sequence



from1 and to1 and the first and the last nucleotides in this sequence to be used (if 0 and 0

are specified, the whole sequence will be used)



incomp1 (0/1) determines if the sequence will be invertcomplemented



file2, from2, to2, invcomp2 - are analogous



output is the name of the file in which the constructed global alignment will be written



CMD_FILE is the name of the instruction file which contains a succession of commands

for OWEN. Batch mode uses the same commands as interactive mode, but these

commands are issued automatically, and (currently) regardless of the results of previous

commands. The format of this file is self-explanatory.



2) Two OWEN instruction files OWEN.good and OWEN.fast are stored on our ftp site.

These files were developed on the basis of our experience with human-mouse alignments.

OWEN.good provides more precise alignments but will run longer. OWEN.fast is

supposed to find all strong local similarities, and will run faster. Perhaps, for a large-scale

alignment project one may want to develop another succession of OWEN instructions,

based on experience with interactive alignment of a fraction of sequences.



3) Some considerations for the succession of commands.



Obviously, there is a trade-off between speed and accuracy. Speed primarily depends on

the length of seed (number of successive matches) required for finding a hit. If the

sequences to be aligned are over 10M, the maximal seed length (32) must be used

initially. In contrast, if the sequences are shorter than 100K, initial seed length 16 or even

12 is enough.



OWEN should be used in many passes, and seed length can go down from pass to pass. If

the resulting alignment is dense, at some point seeds can be abolished altogether,

resulting in the most accurate alignment. However, this can only be done after hits

already become dense enough along the sequences (perhaps, at least 1 hit per 10K).



It makes sense to create filter initially, and to use it for each pass after the 1st one, and

update it after each pass. Filter can be ignored during the final pass.

4) We plan a substantial upgrade of pairwise OWEN, in particular, low-complexity

seqeunces will be masked before creation of any alignments. This will improve

performance substantially. However, even the current version can align 10M sequences

in a few minutes.



Succession of instruction within OWEN.good (OWEN.fast lacks passes 7, 8, 9, 12 and

13):



PASS - 1

1) Align w=32 4/6=8 p=0.1e-8 Nomask

2) Create filter

3) Select overlapped

4) Delete ! selected

5) Reconcile

6) Greedy resolve

7) Expand 4/4=8

8) Merge



PASS - 2

1) Align w=24 4/6=8 p=.1e-6 MaskKnown MaskInternal

2) Update filter

3) Select overlapped

4) Delete ! selected

5) Expand 4/4=8

6) Merge

7) Reconcile

8) Greedy resolve



PASS - 3

1) Align w=16 4/6=8 p=.1e-5 MaskKnown MaskInternal

2) Update filter

3) Select overlapped

4) Delete ! selected

5) Expand 4/4=8

6) Merge

7) Reconcile

8) Greedy resolve



PASS - 4

The same as 3, without 3) and 4)



PASS - 5

1) Align w=12 4/6=8 p=.1e-4 MaskKnown MaskInternal

2) Update filter

3) Select overlapped

4) Delete ! selected

5) Expand 4/4=8

6) Merge

7) Reconcile

8) Greedy resolve



PASS - 6

The same as 5, without 3) and 4)



PASS - 7 (may be slow)

1) Align w=8 4/6=8 p=.001 MaskKnown MaskInternal

2) Update filter

3) Select overlapped

4) Delete ! selected

5) Expand 4/4=8

6) Merge

7) Reconcile

8) Greedy resolve



PASS - 8

The same as 7, without 3) and 4)



PASS - 9

The same as 8



PASS - 10

1) Align w=10 4/6=8 p=.001 NoMask

2) Update filter

5) Expand 4/4=8

6) Merge

7) Reconcile

8) Greedy resolve



PASS - 11

The same as 10



PASS - 12 (may be very slow)

1) Align NoHash 4/6=8 p=0.001 NoMask

5) Expand 4/4=8

6) Merge

7) Reconcile

8) Greedy resolve



PASS 13

The same as 12



Related docs
Other docs by gegeshandong
A_E_KY-4PSE30WUSeries-Rev1012A
Views: 0  |  Downloads: 0
688_xls
Views: 0  |  Downloads: 0
2-1 辫措康
Views: 0  |  Downloads: 0
VINPR Lit Order Form New Jan 09
Views: 0  |  Downloads: 0
WRECKED - Torino Film Festival
Views: 0  |  Downloads: 0
project2btestcases
Views: 0  |  Downloads: 0
Fund Account transfer form9.2011
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!