CS 222 – Principles of Data Management by locknkey24

VIEWS: 6 PAGES: 4

									        CS 222 – Principles of Data Management
                                  Assignment 2
                               Professor Mehrotra
                              Due Date: Oct 27th 2008

Instructions
Please follow these instructions:

1. Do the assignment in groups of 3. Hand in only solution per group but make sure that
   the name of each member appears on the first page. Later addition of names to
   assignments will NOT be permitted.
2. Write the solution for each question starting on a sheet of paper and all the sheets
   together.
3. Make sure you write your solutions clearly and legibly.
4. There are 3 questions in the assignment each worth 10 points.


Question 1. (10 points)
Banking is an application domain that makes heavy use of database systems to manage
customers' accounts. For this application domain illustrate the importance of modeling
computations as transactions.
      • Construct an example of transaction execution that may produce unacceptable
           results if isolation is not guaranteed; explain your answer.
      • Construct an example of transaction execution that may produce unacceptable
           results if atomicity is not guaranteed; explain your answer.
      • Construct an example of transaction execution that may produce unacceptable
           results if durability is not guaranteed; explain your answer.

Question 2. (10 points)
Consider a database consisting of three files each storing records of the following types:

 •   Employe
            record, with attributes employeeName, dateOfBirth, and ssn.

 •   Project
               record, with attributes projectName and projectNumber.

 •   WorksOn
           record, with attributes ssn and projectNumber.
A record of the form (332625001, 4) in the file containing the WorksOn records indicates
that an employee whose social security number is 332625001 works on a project whose
number is 4.

Consider the following query that we may wish to pose over the above database.

List the names of all employees who were born before Jan 1st, 1920 and work on the
project called ``Mercury''.

Let us assume that

 •   there are 10,000 employees. Hence there are 10,000 employee records in the
     Employee file;
 •   there are 10 different projects. Hence there are 10 different records in the Project
     file;
 •   each employee works on exactly one project.Hence, there are 10,000 entries in the
     WorksOn file;
 •   the number of employees working on project ``Mercury'' is 1,000;
 •   there are only 10 employees who were born before 1920.

There are several ways to execute the query given above.One of them is described by the
following pseudo-code.

1. Create the temporary file EmpWorksOn whose records have the attributes:
  ssn, employeeName, dateOfBirth, projectNumber.

While (not end_of_file(Employee file)){
  read record E from Employee file
  While (not end_of_file(WorksOn file)){
    read record W from WorksOn file
    if (E.ssn = W.ssn) {
      create a record EW with
        EW.ssn           := E.ssn
        EW.employeeName := E.employeeName
        EW.dateOfBirth   := E.dateOfBirth
        EW.projectNumber := W.projectNumber
      append record EW to EmpWorksOn file
    }
  }
}

2. Create the temporary file EmpWorksOnProject by merging each record in the EmpWorksOn
file with the record in the Project file that has the same project number.

While (not end_of_file(EmpWorksOn file)) {
  read record EW from EmpWorksOn file
  While (not end_of_file(Project file)){
    read record P from Project file
        if (EW.projectNumber = P.projectNumber) {
          create a record EWP with
            EWP.ssn           := EW.ssn
            EWP.employeeName := EW.employeeName
            EWP.dateOfBirth   := EW.dateOfBirth
            EWP.projectNumber := EW.projectNumber
            EWP.projectName   := P.projectName
          append record EWP to EmpWorksOnProject file
        }
    }
}

3. Scan the EmpWorksOnProject file and print the names of all the employees whose
record satisfies the query.

While (not end_of_file(EmpWorksOnProject file)){
  read record EWP from EmpWorksOnProject file
  if (EWP.dateOfBirth < 01/01/1920) and (EWP.projectName                                      =
"Mercury")
    print EWP.employeeName
}

    •    Estimate the cost of the above algorithm by computing the total number of read
         record and append record operations it executes on files.
    •    As it turns out, the given algorithm is not particularly efficient. Find an algorithm
         that executes the same query more efficiently and write the pseudo-code for it.
    •    Show that your solution is better by computing the total number of read and
         append operations executed by your algorithm.

Question 3. (10 points)
Consider the unix file system which follows the indexed file allocation strategy.
Associated with each file is an inode that stores 13 pointers. The first 10 pointers point to
the first 10 data pages of the file. The 11th pointer in the inode points to a disk slot
containing pointers to the data pages. The 12th pointer is a pointer to a disk slot that
contains pointers to slots that contain pointers to the data pages. Similarly, the 13th
pointer in the inode is a pointer to a slot that contains pointers to slots that contain pointer
to slots that contain pointers to the data pages of the file.

    • Assume a 50GB file, a slot size of 8KB, and let a pointer be 8 bytes long.
Assume that 20 bytes per slot (including the slots containing pointers to file pages) is
used for other bookkeeping. If this file is accessed randomly how many disk slots must be
read on an average to translate the file page number to the actual disk address. (In your
analysis, assume that the inode itself always resides in buffer and so accessing it does not
result in any I/O. Furthermore, assume that 1GB = $2^{30}$ bytes, and 1KB = $2^{10}$
bytes. Also, assume that each file page is equally likely to be accessed).
    • Based on your analyses, can you make some observations whether the Unix file
        system is suited for really large files? Can you think of strategies which would
have a lower cost for translating file page number to disk address even for very
large files?




                    Figure 1: Unix File Allocation System

								
To top