# Two-Phase Multi-way Merge Sort Examples

Document Sample

```					                                      Two-Phase Multi-way Merge Sort Examples

a.   How many sublists will be required for sorting the relation if 50 MB (1 MB = 220 bytes) is available for
buffering?

1 block = 4096 bytes = 212 bytes, and each tuple uses 100 bytes , so we can store

4096 / 100  = 40.96 = 40 tuples in one block

The buffer space is 50 MB = 50 × 220 bytes, which is enough to store

50 × 220 ÷ 212  = 50 × 28 = 12,800 blocks

The file requires 250,000 blocks and each time we load a buffer we create a sublist; this means we
must create 250,000 ÷ 12800 = 19.53 = 20 sublists

(Note that the last sublist contains only 250,000 - 19 × 12,800 = 6800 blocks)

b.   How long will it take to sort the relation if all blocks are stored randomly?

We shall break this problem down into the following parts
• The time to load the buffers prior to sorting them, and to write the sorted buffer back to disk

This is the same as the time to read and then write 250,000 blocks into memory. In an earlier
example we noted that the average access time for a block to be 10.9 msecs, so to access 250,000
blocks requires

250,000 × 10.9 msecs = 2725000 msecs = 2725 secs = 45.41 mins

Thus to read and then write 250,000 blocks will take 2 × 45.41 = 90.82 or 91 mins

•    The total time to sort the loaded buffers

Let us assume we use an n log n algorithm (e.g. quicksort). Since the buffer can store 12,800 (=
50 × 28 ) blocks and since each block stores 40 tuples, then sorting each buffer requires sorting

50 × 28 × 40 = 2000 × 28 tuples ≈ 211 × 28 = 219 tuples

With an n log n algorithm we then estimate the number of operations (actually comparisons) at

219 × log2 219 = 19 × 219 ≈ 25 × 219 = 224 operations

Sorting 20 ( ≈ 25 ) sublists then requires approximately 25 × 224 = 229 operations

If we assume 60 nsec memory and a 1 GHz processor, then we estimate 61 nsecs per operation., so
sorting all of the sublists requires on the order of

229 × 61 × 2-30 ≈ 229 × 26 × 2-30 = 25 secs

so that sorting the sublists effectively only adds seconds to the overall sorting time, and hence can
be ignored.

•    The time for Phase 2.

The time for phase 2 is the same as that for reading 250,000 random blocks and then writing them
(again randomly), since comparisons among the sublist buffer values will be negligible. In the
first part of this problem we saw that this took 91 mins.

Total: We estimate the total sorting time as approximately 91 mins + 91 mins = 182 mins
c.   How long will it take to sort the relation if all blocks are stored in consecutive cylinders (in the same region)?

Let us assume the blocks will be stored in consecutive cylinders in region 2.

Since there are 256 sectors per track and each block requires 8 sectors we can store 32 blocks per sector. Since
the Megatron 747 has 8 disk surfaces, this means each cylinder can hold 32 × 8 = 256 blocks.

Thus, in order to store the file we shall need 250,000 ÷ 256 = 976.5625 = 977 cylinders

Phase 1: We shall once again ignore the time to sort the sublists, so phase 1 effectively consists of loading the
50MB memory buffer to create 20 sorted sublists and then to store each of these sublists.

We note that the 50MB buffer can store 12,800 ÷ 256 = 50 cylinders. We also note that we can ignore
rotational delay when we initially load the buffer to create the sublists since the order in which the tuples are
read into the buffer does not matter

We see the events of the first part of phase 1 as

load 19 full buffers with 50 cylinders each time + load 1 partial buffer with 26 full and one partial cylinder

Here:
loading a full buffer requires: 1 random seek, 49 adjacent seeks, 12800 block transfers
loading a partial buffer requires: 1 random seek, 26 adjacent seeks, 1 rotational delay, 6800 block transfers

Thus, the first part of phase 1 will require

20 random seeks, 957 seeks to an adjacent cylinder, 1 rotational delay, and 250000 block transfers

The time for this is

20(6.46 msecs) + 957 (1.002 msecs) + 4.17 msecs + 250000(.26 msecs)
=    129.2 + 958.914 + 65000 + 4.17 msecs
=    66092.284 msecs = 66.092 secs

Storing the sublists will require essentially the same amount of time, except each time we write to a cylinder we must
allow for a rotational delay since in this case the order in which we write the sublists back to disk will matter. Thus
means we will have 976 additional rotational delays, which will add 976(4.17 msecs) = 4040.64 msecs = 4.040 secs
to the above time.

Thus, the total time for phase 1 is 66.092 secs + 66.092 secs + 4.040 secs = 136.224 secs = 2.27 mins.

Phase 2: The time for phase 2 will be the same as that for the case with randomly stored blocks since we cannot
predict when we will write the output buffer or fill a sublist buffer. We saw that this took 91 mins.

Total: We estimate the total sorting time to be approximately 91 mins + 2.27 mins = 93 mins

d.   What is the maximum number of these tuples that can be sorted using the two-phase multi-way sort with the
amount of buffer space available and the given block size? How much storage would they require?

We begin by noting that the maximum number of sublists is directly dependent on the number of sublists we
have available for merging. Since each sublist must have a buffer of at least one block we will maximize the
number of sublists if each buffer is the size of a block, in this case 4K = 212 bytes. The number of 1 block
buffers that can be carved out of 50MB of total buffer space is

50 × 220 ÷ 212  = 50 × 28 = 12,800 buffers

CSCI 430 -- Spring, 2001                                Assignment 2                                              Page - 2
Since 1 of these must serve as the output buffer for merging the sublists, however, the total number of buffers
available for sublists is 12,800 – 1 = 12,799, which is also the maximal number of sublists we can have.

In creating each these sublists, however, we use all of available buffer memory to hold tuples for sorting. Since
this memory can hold 12,800 blocks and since each block hold 40 tuples, this means that each of the sublists can
have a maximum of 12,800 = 512,000 tuples.

Thus, the maximum number of tuples that we can sort is 12,799 × 512,000 = 6,553,088,000 tuples.

At 100 bytes per tuple, this means we must have 655,308,800,000 bytes of disk space available just to hold the
tuples.

CSCI 430 -- Spring, 2001                           Assignment 2                                             Page - 3

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 7 posted: 9/4/2010 language: English pages: 3