a5
Document Sample


CMPT 354E1
Assignment 5
Total marks: 50
Due date: 1st of August (10 a.m. in the assignment box)
1) [15 marks] Describe what would happen to the B+ tree shown below given the operations described in (a) to (d).
Note the following:
The right hand portion of the tree is not shown.
The parts (a) to (d) are independent of each other, i.e. the tree is in the state shown above before the operations
in each of the parts.
The insertion algorithm does not redistribute values between nodes.
The deletion algorithm redistributes values between nodes where possible.
63
I1 I2 78 95 121
45
I3 10 I4 49 55 I5 69 72 76 I6 82 91 I7 … I8 …
L31 4 8 L41 45 47 L51 63 65 68 L61 78 79
L32 11 17 L42 49 53 L52 69 70 71 L62 82 83 85
L43 55 57 L53 72 73 75 L63 91 93 94
L54 76 77
a) [1 mark] Insert 10
b) [2 marks] Insert 95.
c) [4 marks] Insert 64
d) [2 marks] Delete 91
e) [2 marks] Delete 79
f) [4 marks] Delete 17 and then delete 45 (for full marks you must describe what the tree looks like after deleting
17 as well as what it looks like after both deletions)
2) [35 marks] Consider the following information about a database to record retail sales to customer from a number
of stores:
Name Schema Records
Customer {sin, fName, lName, income, assets, age, city, street} 1,000,000
Sales {sNumber, sin, stName, sDate, amount} 10,000,000
Store {stName, stCity, stStreet, sqFeet} 1,200
General Information
Block size is 2,048 bytes.
No record spans two blocks.
There are 110 pages available in main memory
Customer Information
The file is stored as a clustered B+ tree file (where the leaves of the tree contain data records) on age
concatenated with record ID (to produce a unique search key). Each leaf node is kept at 2/3 occupancy (round
to the nearest whole number of records). There are three non-leaf levels of the tree. Database statistics indicate
that customers’ ages range from 15 to 95. No detailed statistics are maintained about the distribution of these
values.
There is a hash index (of “height” 1) on sin. The index has 1,000,000 distinct values.
There is a B+ tree index on income concatenated with record ID of height 4 (including leaf pages). Database
statistics indicate that there are 250,000 different values of income, ranging between 5,000 and 500,000.
There is a B+ tree index on {city, street} concatenated with record ID, of height 4 (including leaf pages).
Database statistics indicate that there are 50,000 different {city, street} values. However, more detailed
statistics indicate that there are only 20 distinct values for city.
Attribute sizes are: fName, lName, city and street are each 20 bytes, sin is 12 bytes, and income, assets and age
are 4 bytes.
Each Customer record is 104 bytes.
Sales Information
The file is sorted by sNumber, numbered from 1 to 10,000,000.
There is a hash index (of “height” 1) on sin. The index has 1,000,000 distinct values.
Database statistics indicate that there are 1,200 distinct values for stName.
Attribute sizes: stName is 20 bytes, sin and sDate are 12 bytes, amount is 8 bytes and sNumber is 4 bytes.
Each Sales record is 56 bytes.
Store Information
The file is sorted by stName.
There is a B+ tree index on {stCity, stStreet} of height 2 (including the leaf pages). Database statistics indicate
that there are 1,200 different values for {stCity, stStreet}. However, more detailed statistics indicate that there
are only 20 distinct values for stCity.
Attribute sizes are: stName, stCity and stStreet are each 20 bytes, and sqFeet is 4 bytes.
Each Store record is 64 bytes.
a) For each of the following selections determine the possible access methods, the cost of each such method (in disk
reads) and the size in both records and blocks of the resulting selection.
i. age = 37(Customer)
ii. sNumber > 1,000,000 sNumber <= 1,100,000(Sales)
iii. city = “Vancouver” street = “29th.” income = 125000(Customer)
iv. income = 67000 sin = 1213 assets = 208000 age = 28(Customer)
v. (income = 67000 sin = 1213) (assets = 208000 age = 28)(Customer)
vi. (city = “Toronto” street = “Queens”) sin = 2158(Customer)
b) What is the size in bytes of the relation shown below? Please note the size (in bytes) of each record and how
many records can fit on one block.
fName, lName, income(age = 37(Customer))
c) For each of the following joins calculate the cost of the join (in disk reads and writes), estimate both the number
of records and number of blocks of the joined relation and note the size (in bytes) of each record and how many
records can fit on one block. Note assume that any selections have already been performed and the results
materialized (stored on disk), do not include the cost of such selections in your evaluation. The symbol nj
represents the natural join operator.
i. sNumber > 9,000,000(Sales) nj Store (using the merge join algorithm).
ii. stName = “Robson”(Sales) nj Customer (using the hash join algorithm with the Sales derived relation being the outer
relation).
iii. age = 45 age = 54(Customer) nj Sales (using the block nested loop join algorithm with the Customer derived
relation being the outer relation).
Get documents about "