Document Sample

Two-Phase Multi-way Merge Sort Examples a. How many sublists will be required for sorting the relation if 50 MB (1 MB = 220 bytes) is available for buffering? 1 block = 4096 bytes = 212 bytes, and each tuple uses 100 bytes , so we can store 4096 / 100 = 40.96 = 40 tuples in one block The buffer space is 50 MB = 50 × 220 bytes, which is enough to store 50 × 220 ÷ 212 = 50 × 28 = 12,800 blocks The file requires 250,000 blocks and each time we load a buffer we create a sublist; this means we must create 250,000 ÷ 12800 = 19.53 = 20 sublists (Note that the last sublist contains only 250,000 - 19 × 12,800 = 6800 blocks) b. How long will it take to sort the relation if all blocks are stored randomly? We shall break this problem down into the following parts • The time to load the buffers prior to sorting them, and to write the sorted buffer back to disk This is the same as the time to read and then write 250,000 blocks into memory. In an earlier example we noted that the average access time for a block to be 10.9 msecs, so to access 250,000 blocks requires 250,000 × 10.9 msecs = 2725000 msecs = 2725 secs = 45.41 mins Thus to read and then write 250,000 blocks will take 2 × 45.41 = 90.82 or 91 mins • The total time to sort the loaded buffers Let us assume we use an n log n algorithm (e.g. quicksort). Since the buffer can store 12,800 (= 50 × 28 ) blocks and since each block stores 40 tuples, then sorting each buffer requires sorting 50 × 28 × 40 = 2000 × 28 tuples ≈ 211 × 28 = 219 tuples With an n log n algorithm we then estimate the number of operations (actually comparisons) at 219 × log2 219 = 19 × 219 ≈ 25 × 219 = 224 operations Sorting 20 ( ≈ 25 ) sublists then requires approximately 25 × 224 = 229 operations If we assume 60 nsec memory and a 1 GHz processor, then we estimate 61 nsecs per operation., so sorting all of the sublists requires on the order of 229 × 61 × 2-30 ≈ 229 × 26 × 2-30 = 25 secs so that sorting the sublists effectively only adds seconds to the overall sorting time, and hence can be ignored. • The time for Phase 2. The time for phase 2 is the same as that for reading 250,000 random blocks and then writing them (again randomly), since comparisons among the sublist buffer values will be negligible. In the first part of this problem we saw that this took 91 mins. Total: We estimate the total sorting time as approximately 91 mins + 91 mins = 182 mins c. How long will it take to sort the relation if all blocks are stored in consecutive cylinders (in the same region)? Let us assume the blocks will be stored in consecutive cylinders in region 2. Since there are 256 sectors per track and each block requires 8 sectors we can store 32 blocks per sector. Since the Megatron 747 has 8 disk surfaces, this means each cylinder can hold 32 × 8 = 256 blocks. Thus, in order to store the file we shall need 250,000 ÷ 256 = 976.5625 = 977 cylinders Phase 1: We shall once again ignore the time to sort the sublists, so phase 1 effectively consists of loading the 50MB memory buffer to create 20 sorted sublists and then to store each of these sublists. We note that the 50MB buffer can store 12,800 ÷ 256 = 50 cylinders. We also note that we can ignore rotational delay when we initially load the buffer to create the sublists since the order in which the tuples are read into the buffer does not matter We see the events of the first part of phase 1 as load 19 full buffers with 50 cylinders each time + load 1 partial buffer with 26 full and one partial cylinder Here: loading a full buffer requires: 1 random seek, 49 adjacent seeks, 12800 block transfers loading a partial buffer requires: 1 random seek, 26 adjacent seeks, 1 rotational delay, 6800 block transfers Thus, the first part of phase 1 will require 20 random seeks, 957 seeks to an adjacent cylinder, 1 rotational delay, and 250000 block transfers The time for this is 20(6.46 msecs) + 957 (1.002 msecs) + 4.17 msecs + 250000(.26 msecs) = 129.2 + 958.914 + 65000 + 4.17 msecs = 66092.284 msecs = 66.092 secs Storing the sublists will require essentially the same amount of time, except each time we write to a cylinder we must allow for a rotational delay since in this case the order in which we write the sublists back to disk will matter. Thus means we will have 976 additional rotational delays, which will add 976(4.17 msecs) = 4040.64 msecs = 4.040 secs to the above time. Thus, the total time for phase 1 is 66.092 secs + 66.092 secs + 4.040 secs = 136.224 secs = 2.27 mins. Phase 2: The time for phase 2 will be the same as that for the case with randomly stored blocks since we cannot predict when we will write the output buffer or fill a sublist buffer. We saw that this took 91 mins. Total: We estimate the total sorting time to be approximately 91 mins + 2.27 mins = 93 mins d. What is the maximum number of these tuples that can be sorted using the two-phase multi-way sort with the amount of buffer space available and the given block size? How much storage would they require? We begin by noting that the maximum number of sublists is directly dependent on the number of sublists we have available for merging. Since each sublist must have a buffer of at least one block we will maximize the number of sublists if each buffer is the size of a block, in this case 4K = 212 bytes. The number of 1 block buffers that can be carved out of 50MB of total buffer space is 50 × 220 ÷ 212 = 50 × 28 = 12,800 buffers CSCI 430 -- Spring, 2001 Assignment 2 Page - 2 Since 1 of these must serve as the output buffer for merging the sublists, however, the total number of buffers available for sublists is 12,800 – 1 = 12,799, which is also the maximal number of sublists we can have. In creating each these sublists, however, we use all of available buffer memory to hold tuples for sorting. Since this memory can hold 12,800 blocks and since each block hold 40 tuples, this means that each of the sublists can have a maximum of 12,800 = 512,000 tuples. Thus, the maximum number of tuples that we can sort is 12,799 × 512,000 = 6,553,088,000 tuples. At 100 bytes per tuple, this means we must have 655,308,800,000 bytes of disk space available just to hold the tuples. CSCI 430 -- Spring, 2001 Assignment 2 Page - 3

DOCUMENT INFO

Shared By:

Categories:

Tags:
reference books, data structures, five questions, text books, class work, list of experiments, main memory, b. tech, basic theory, text book, query optimization, software engineering, pervasive computing, pearson education, hash tables

Stats:

views: | 7 |

posted: | 9/4/2010 |

language: | English |

pages: | 3 |

OTHER DOCS BY rqy18723

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.