Using OWEN in a batch mode
1) Technically, this is simple. From UNIX command line, you need to type
$ ./OWEN.sh file1 from1 to1 invcomp1 file2 from2 to2 invcomp2 output CMD_FILE
where:
file1 is the name of a file with the 1st sequence
from1 and to1 and the first and the last nucleotides in this sequence to be used (if 0 and 0
are specified, the whole sequence will be used)
incomp1 (0/1) determines if the sequence will be invertcomplemented
file2, from2, to2, invcomp2 - are analogous
output is the name of the file in which the constructed global alignment will be written
CMD_FILE is the name of the instruction file which contains a succession of commands
for OWEN. Batch mode uses the same commands as interactive mode, but these
commands are issued automatically, and (currently) regardless of the results of previous
commands. The format of this file is self-explanatory.
2) Two OWEN instruction files OWEN.good and OWEN.fast are stored on our ftp site.
These files were developed on the basis of our experience with human-mouse alignments.
OWEN.good provides more precise alignments but will run longer. OWEN.fast is
supposed to find all strong local similarities, and will run faster. Perhaps, for a large-scale
alignment project one may want to develop another succession of OWEN instructions,
based on experience with interactive alignment of a fraction of sequences.
3) Some considerations for the succession of commands.
Obviously, there is a trade-off between speed and accuracy. Speed primarily depends on
the length of seed (number of successive matches) required for finding a hit. If the
sequences to be aligned are over 10M, the maximal seed length (32) must be used
initially. In contrast, if the sequences are shorter than 100K, initial seed length 16 or even
12 is enough.
OWEN should be used in many passes, and seed length can go down from pass to pass. If
the resulting alignment is dense, at some point seeds can be abolished altogether,
resulting in the most accurate alignment. However, this can only be done after hits
already become dense enough along the sequences (perhaps, at least 1 hit per 10K).
It makes sense to create filter initially, and to use it for each pass after the 1st one, and
update it after each pass. Filter can be ignored during the final pass.
4) We plan a substantial upgrade of pairwise OWEN, in particular, low-complexity
seqeunces will be masked before creation of any alignments. This will improve
performance substantially. However, even the current version can align 10M sequences
in a few minutes.
Succession of instruction within OWEN.good (OWEN.fast lacks passes 7, 8, 9, 12 and
13):
PASS - 1
1) Align w=32 4/6=8 p=0.1e-8 Nomask
2) Create filter
3) Select overlapped
4) Delete ! selected
5) Reconcile
6) Greedy resolve
7) Expand 4/4=8
8) Merge
PASS - 2
1) Align w=24 4/6=8 p=.1e-6 MaskKnown MaskInternal
2) Update filter
3) Select overlapped
4) Delete ! selected
5) Expand 4/4=8
6) Merge
7) Reconcile
8) Greedy resolve
PASS - 3
1) Align w=16 4/6=8 p=.1e-5 MaskKnown MaskInternal
2) Update filter
3) Select overlapped
4) Delete ! selected
5) Expand 4/4=8
6) Merge
7) Reconcile
8) Greedy resolve
PASS - 4
The same as 3, without 3) and 4)
PASS - 5
1) Align w=12 4/6=8 p=.1e-4 MaskKnown MaskInternal
2) Update filter
3) Select overlapped
4) Delete ! selected
5) Expand 4/4=8
6) Merge
7) Reconcile
8) Greedy resolve
PASS - 6
The same as 5, without 3) and 4)
PASS - 7 (may be slow)
1) Align w=8 4/6=8 p=.001 MaskKnown MaskInternal
2) Update filter
3) Select overlapped
4) Delete ! selected
5) Expand 4/4=8
6) Merge
7) Reconcile
8) Greedy resolve
PASS - 8
The same as 7, without 3) and 4)
PASS - 9
The same as 8
PASS - 10
1) Align w=10 4/6=8 p=.001 NoMask
2) Update filter
5) Expand 4/4=8
6) Merge
7) Reconcile
8) Greedy resolve
PASS - 11
The same as 10
PASS - 12 (may be very slow)
1) Align NoHash 4/6=8 p=0.001 NoMask
5) Expand 4/4=8
6) Merge
7) Reconcile
8) Greedy resolve
PASS 13
The same as 12