213 FinalExam CS257 by 0VzlRHK

VIEWS: 8 PAGES: 627

									                              1



SECONDARY STORAGE
   MANAGEMENT
    CHAPTER 13

         SUBMITTED BY
       KHADKE, SUVARNA
            CS 257
       (SECTION II) ID 213
     STUDENT ID :-005226235
          13.1.1 Memory Hierarchy
   Several components for data storage having
    different data capacities available
   Cost per byte to store data also varies
   Device with smallest capacity offer the fastest
    speed with highest cost per bit
      Memory Hierarchy Diagram
 Programs,                              DBMS
Main Memory DBMS’s        Tertiary Storage


  As Visual Memory       Disk         File System


                     Main Memory


                       Cache
               13.1.1. Memory Hierarchy
   Cache
       Lowest level of the hierarchy
       Data items are copies of certain locations of main memory
       Sometimes, values in cache are changed and corresponding changes to main
        memory are delayed
       Machine looks for instructions as well as data for those instructions in the cache
       Holds limited amount of data
   Cache is found on the same chip as the microprocessor
    itself, and additional level-2 cache is found on another chip.
   Data and instructions are moved to cache from main
    memory when they are needed by the processor.
   Cache data can be accessed by the processor in a few
    nanoseconds.
          13.1.1 Memory Hierarchy
   No need to update the data in main memory
    immediately in a single processor computer
   In multiple processors data is updated
    immediately to main memory….called as write
    through
                         Main Memory
   Everything happens in the computer i.e. instruction execution, data
    manipulation, as working on information that is resident in main
    memory
   Main memories are random access….one can obtain any byte in
    the same amount of time
   In the center of the action is the computer's main
    memory. We may think of everything that happens
    in the computer - instruction executions and data
    manipulations - as working on information that is
    resident in main memory
   Typical times to access data from main memory to
    the processor or cache are in the 10-100
    nanosecond range
           Secondary storage
   Used to store data and programs when they
    are not being processed
   More permanent than main memory, as data
    and programs are retained when the power is
    turned off
   E.g. magnetic disks, hard disks
   Database systems always involve secondary
    storage like the disks and other devices that
    store large amount of data that persists over
    time.
                 Tertiary Storage
   Holds data volumes in terabytes
   Used for databases much larger than what can
    be stored on disk
   As capacious as a collection of disk units can
    be, there are databases much larger than what
    can be stored on the disk(s) of a single
    machine, or even of a substantial collection of
    machines.
   Tertiary storage devices have been developed
    to hold data volumes measured in terabytes.
     13.1.2 Transfer of Data Between
                  levels
   Data moves between adjacent levels of the hierarchy
   At the secondary or tertiary levels accessing the desired data or
    finding the desired place to store the data takes a lot of time
   Disk is organized into bocks
   Entire blocks are moved to and from memory called a buffer
   At the secondary and tertiary levels, accessing the
    desired data or finding the desired place to store
    data takes a great deal of time, so each level is
    organized to transfer large amount of data or from
    the level below, whenever any data at all is needed.
     13.1.2 Transfer of Data Between
               level (cont’d)
   A key technique for speeding up database
    operations is to arrange the data so that when
    one piece of data block is needed it is likely
    that other data on the same block will be
    needed at the same time
   Same idea applies to other hierarchy levels
      13.1.3 Volatile and Non Volatile
                Storage
   A volatile device forgets what data is stored on
    it after power off
   Non volatile holds data for longer period even
    when device is turned off
   All the secondary and tertiary devices are non
    volatile and main memory is volatile
                13.1.4 Virtual Memory


   When we write programs the data we use, variables of the
    program, files read and so on occupies a virtual memory
    address space.

   Typical software executes in virtual memory
   Address space is typically 32 bit or 2^32 bytes or 4GB
   Transfer between memory and disk is in terms of blocks
               13.2.1 Mechanism of Disk
   Mechanisms of Disks
       Use of secondary storage is one of the important characteristic of DBMS
       Disk device consists of 2 moving pieces
           1. disk assembly
           2. head assembly
       Disk assembly consists of 1 or more platters
       Platters rotate around a central spindle
       Bits are stored on upper and lower surfaces of platters
   The two principal moving pieces of a disk drive are a disk
    assembly and a head assembly.
   The disk assembly consists of one or more circular platters
    that rotate around a central spindle
   The upper and lower surfaces of the platters are covered
    with a thin layer of magnetic material, on which bits are
    stored.
           13.2.1 Mechanism of Disk
   Disk is organized into tracks
   The track that are at fixed radius from center form one cylinder
   Tracks are organized into sectors
   Sectors are the segments of circle separated by gap
   0’s and 1’s are represented by different patterns in the
    magnetic material.
    A common diameter for the disk platters is 3.5 inches.
    The disk is organized into tracks, which are concentric
    circles on a single platter.
    The tracks that are at a fixed radius from a center,
    among all the surfaces form one cylinder.
             13.2.2 Disk Controller
   One or more disks are controlled by disk
    controllers
   Disks controllers are capable of
     Controlling the mechanical actuator that moves
      the head assembly
     Selecting the sector from among all those in the
      cylinder at which heads are positioned
     Transferring bits between desired sector and
      main memory
     Possible buffering of entire track
                13.2.3 Disk Access
                  Characteristics
   Accessing (reading/writing) a block requires 3
    steps
     Disk  controller positions the head assembly at the
      cylinder containing the track on which the block is
      located. It is a ‘seek time’
     The disk controller waits while the first sector of
      the block moves under the head. This is a
      ‘rotational latency’
     All the sectors and the gaps between them pass
      the head, while disk controller reads or writes
      data in these sectors. This is a ‘transfer time’
               13.3 Accelerating Access to
                  Secondary Storage
18


        Several approaches for more-efficiently accessing
         data in secondary storage:
            Place blocks that are together in the same cylinder.
            Divide the data among multiple disks.
            Mirror disks.
            Use disk-scheduling algorithms.
            Prefetch blocks into main memory.
        Scheduling Latency is defined as – added delay
         in accessing data caused by a disk scheduling
         algorithm.
        Throughput is defined as – the number of disk
         accesses per second that the system can
         accommodate.
             13.3 Accelerating Access to
                Secondary Storage
   Several approaches for more-efficiently accessing
    data in secondary storage:
       Place blocks that are together in the same cylinder.
       Divide the data among multiple disks.
       Mirror disks.
       Use disk-scheduling algorithms.
       Prefetch blocks into main memory.
   Scheduling Latency – added delay in accessing
    data caused by a disk scheduling algorithm.
   Throughput – the number of disk accesses per
    second that the system can accommodate.
    13.3.1 The I/O Model of Computation

   The number of block accesses (Disk I/O’s) is a
    good approximation to the time needed by the
    algorithm.
       This should be minimized.
   Ex 13.3: You want to have an index on R to
    identify the block on which the desired tuple
    appears, but not where on the block it resides.
     For Megatron 747 example, it takes 11ms to read
      a 16k block.
     A standard microprocessor can execute millions
      of instruction in 11ms.
     13.3.2 Organizing Data by Cylinders

   If we read all blocks on a single track or
    cylinder consecutively, then we can neglect all
    but first seek time and first rotational latency.
   Ex 13.4: We request 1024 blocks of M747.
     If data is randomly distributed, average latency is
      10.76ms by Ex 13.2, making total latency 11s.
     If all blocks are consecutively stored on 1
      cylinder:
         6.46ms   + 8.33ms * 16 = 139ms

    (1 average seek) (time per rotation)   (# rotations)
           13.1.3 Using Multiple Disks
   If we have n disks, read/write performance will
    increase by a factor of n.
   Striping – distributing a relation across multiple
    disks following this pattern:
       Data on disk R1: R1, R1+n, R1+2n, …        ADD
       Data on disk R2: R2, R2+n, R2+2n, …         PIC
                     …
       Data on disk Rn: Rn, Rn+n, Rn+2n, …
                                                   HERE
   Ex 13.5: We request 1024 blocks with n = 4.
       6.46ms + (8.33ms * (16/4)) = 39.8ms

    (1 average seek) (time per rotation)      (# rotations)
             13.1.4 Mirroring Disks
   Mirroring Disks – having 2 or more disks hold
    identical copied of data.
   Benefit 1: If n disks are mirrors of each other,
    the system can survive a crash by n-1 disks.
   Benefit 2: If we have n disks, read
    performance increases by a factor of n.
   Performance increases further by having the
    controller select the disk which has its head
    closest to desired data block for each read.
        13.1.5 Disk Scheduling and the
                    Elevator Problem
   Disk controller will run this algorithm to select
    which of several requests to process first.
   Pseudo code:
     requests[] // array of all non-processed data
      requests
     Upon receiving new data request:
         requests[].add(new  request)
         while(requests[] is not empty)
              Go until data is requested
              If at end, reverse direction
              requests[] = all current requests
           13.1.5 Disk Scheduling and the
                   Elevator Problem (con’t)
     Events:
                         64000
Head starting point      56000
 Request data at         48000          Current
      8000               40000
                         32000           time
 Request data at
      24000              24000              45.5
                                            56.8
                                            26.9
                                            13.6
                                            34.2
                                             20
                                             10
                                             30
                                              0
                                            4.3
                         16000
 Request data at         8000
      56000
 Get data at 8000
 Request data at
                                     data          time
      16000
Get data at 24000                     8000..       4.3
 Request data at                     24000..       13.6
      64000
Get data at 56000                    56000..       26.9
 Request Data at                     64000..       34.2
      40000
Get data at 64000                    40000..       45.5
Get data at 40000                    16000..       56.8
13.1.5 Disk Scheduling and the
        Elevator Problem (con’t)


   data      time   data      time
    8000..   4.3     8000..   4.3
   24000..   13.6   24000..   13.6
   56000..   26.9   56000..   26.9
   64000..   34.2   16000..   42.2
   40000..   45.5   64000..   59.5
   16000..   56.8   40000..   70.8
     13.1.6 Prefetching and Large-Scale
                     Buffering
   If at the application level, we can predict the
    order blocks will be requested, we can load
    them into main memory before they are
    needed.
                13.3.1 The I/O Model of
28
                     Computation
        The number of block accesses (Disk I/O’s) is a
         good time approximation for the algorithm.
            This should be minimized.
        Ex 13.3: To have an index on R to identify the
         block on which the desired tuple appears, but
         not where on the block it resides.
          For Megatron 747 (M747) example, it takes 11ms
           to read a 16k block.
          A standard microprocessor can execute millions
           of instruction in 11ms, making any delay in
           searching for the desired tuple negligible.
              13.3.2 Organizing Data by
29
                      Cylinders
        Read all blocks on a single track or cylinder
         consecutively, then we can neglect all but first
         seek time and first rotational latency.
        Ex 13.4: We request 1024 blocks of M747.
          If data is randomly distributed, average latency is
           10.76ms by Ex 13.2, making total latency 11s.
          If all blocks are consecutively stored on 1
           cylinder:
              6.46ms   + 8.33ms * 16 = 139ms

         (1 average seek) (time per rotation)   (# rotations)
             13.3.3 Using Multiple Disks
30


        If we have n disks, read/write performance will
         increase by a factor of n.
        Striping – distributing a relation across multiple
         disks following this pattern:
            Data on disk R1: R1, R1+n, R1+2n,…
            Data on disk R2: R2, R2+n, R2+2n,…
                          …
            Data on disk Rn: Rn, Rn+n, Rn+2n, …
        Ex 13.5: We request 1024 blocks with n = 4.
            6.46ms + (8.33ms * (16/4)) = 39.8ms

         (1 average seek) (time per rotation)      (# rotations)
               13.3.4 Mirroring Disks
31


        Mirroring Disks – having 2 or more disks hold
         identical copied of data.
        Benefit 1: If n disks are mirrors of each other,
         the system can survive a crash by n-1 disks.
        Benefit 2: If we have n disks, read
         performance increases by a factor of n.
        Performance can be increases further by
         having the controller select the disk which has
         its head closest to desired data block for each
         read.
         13.3.5 Disk Scheduling and the
                     Elevator Problem
32


        Disk controller will run this algorithm to select
         which of several requests to process first.
        Pseudo code:
            requests[] // array of all non-processed data requests
            upon receiving new data request:
                requests[].add(new request)
            while(requests[] is not empty)
              move head to next location
              if(head location is at data in requests[])
                     retrieve data
                     remove data from requests[]
                if(head reaches end)
                     reverse head direction
         13.3.5 Disk Scheduling and the
                 Elevator Problem (con’t)
33

        Events:
                         64000
  Head starting point    56000
 Request data at 8000    48000          Current
Request data at 24000    40000
                         32000           time
Request data at 56000
   Get data at 8000      24000              26.9
                                            13.6
                                            56.8
                                            34.2
                                            45.5
                                             10
                                             20
                                             30
                                              0
                                            4.3
                         16000
Request data at 16000    8000
  Get data at 24000
Request data at 64000
  Get data at 56000
                                     data          time
Request Data at 40000
  Get data at 64000                   8000..       4.3
  Get data at 40000                  24000..       13.6
  Get data at 16000
                                     56000..       26.9
                                     64000..       34.2
                                     40000..       45.5
                                     16000..       56.8
     13.3.5 Disk Scheduling and the
             Elevator Problem (con’t)
34




            Elevator          FIFO
           Algorithm        Algorithm
          data      time   data      time
           8000..   4.3     8000..   4.3
          24000..   13.6   24000..   13.6
          56000..   26.9   56000..   26.9
          64000..   34.2   16000..   42.2
          40000..   45.5   64000..   59.5
          16000..   56.8   40000..   70.8
         13.3.6 Prefetching and Large-Scale
                         Buffering
35


        If at the application level, we can predict the
         order blocks will be requested, we can load
         them into main memory before they are
         needed.
              13.3 Accelerating Access to
                  Secondary Storage
36


        Several approaches for more-efficiently accessing
         data in secondary storage:
            Place blocks that are together in the same cylinder.
            Divide the data among multiple disks.
            Mirror disks.
            Use disk-scheduling algorithms.
            Prefetch blocks into main memory.
        Scheduling Latency is defined as – added delay
         in accessing data caused by a disk scheduling
         algorithm.
        Throughput is defined as – the number of disk
         accesses per second that the system can
         accommodate.
                13.3.1 The I/O Model of
37
                     Computation
        The number of block accesses (Disk I/O’s) is a
         good time approximation for the algorithm.
            This should be minimized.
        Ex 13.3: To have an index on R to identify the
         block on which the desired tuple appears, but
         not where on the block it resides.
          For Megatron 747 (M747) example, it takes 11ms
           to read a 16k block.
          A standard microprocessor can execute millions
           of instruction in 11ms, making any delay in
           searching for the desired tuple negligible.
              13.3.2 Organizing Data by
38
                      Cylinders
        Read all blocks on a single track or cylinder
         consecutively, then we can neglect all but first
         seek time and first rotational latency.
        Ex 13.4: We request 1024 blocks of M747.
          If data is randomly distributed, average latency is
           10.76ms by Ex 13.2, making total latency 11s.
          If all blocks are consecutively stored on 1
           cylinder:
              6.46ms   + 8.33ms * 16 = 139ms

         (1 average seek) (time per rotation)   (# rotations)
             13.3.3 Using Multiple Disks
39


        If we have n disks, read/write performance will
         increase by a factor of n.
        Striping – distributing a relation across multiple
         disks following this pattern:
            Data on disk R1: R1, R1+n, R1+2n,…
            Data on disk R2: R2, R2+n, R2+2n,…
                          …
            Data on disk Rn: Rn, Rn+n, Rn+2n, …
        Ex 13.5: We request 1024 blocks with n = 4.
            6.46ms + (8.33ms * (16/4)) = 39.8ms

         (1 average seek) (time per rotation)      (# rotations)
               13.3.4 Mirroring Disks
40


        Mirroring Disks – having 2 or more disks hold
         identical copied of data.
        Benefit 1: If n disks are mirrors of each other,
         the system can survive a crash by n-1 disks.
        Benefit 2: If we have n disks, read
         performance increases by a factor of n.
        Performance can be increases further by
         having the controller select the disk which has
         its head closest to desired data block for each
         read.
         13.3.5 Disk Scheduling and the
                     Elevator Problem
41


        Disk controller will run this algorithm to select
         which of several requests to process first.
        Pseudo code:
            requests[] // array of all non-processed data requests
            upon receiving new data request:
                requests[].add(new request)
            while(requests[] is not empty)
              move head to next location
              if(head location is at data in requests[])
                     retrieve data
                     remove data from requests[]
                if(head reaches end)
                     reverse head direction
       13.3.5 Disk Scheduling and the
               Elevator Problem (con’t)
42

        Events:
                        64000
  Head starting point   56000
 Request data at 8000   48000         Current
Request data at 24000   40000
                        32000          time
Request data at 56000
   Get data at 8000     24000              26.9
                                           13.6
                                           56.8
                                           34.2
                                           45.5
                                            10
                                            20
                                            30
                                             0
                                           4.3
                        16000
Request data at 16000   8000
  Get data at 24000
Request data at 64000
  Get data at 56000
                                    data          time
Request Data at 40000
  Get data at 64000                 8000..        4.3
  Get data at 40000                24000..        13.6
  Get data at 16000
                                   56000..        26.9
                                   64000..        34.2
                                   40000..        45.5
                                   16000..        56.8
     13.3.5 Disk Scheduling and the
             Elevator Problem (con’t)
43




            Elevator          FIFO
           Algorithm        Algorithm
          data      time   data      time
           8000..   4.3     8000..   4.3
          24000..   13.6   24000..   13.6
          56000..   26.9   56000..   26.9
          64000..   34.2   16000..   42.2
          40000..   45.5   64000..   59.5
          16000..   56.8   40000..   70.8
         13.3.6 Prefetching and Large-Scale
                         Buffering
44


        If at the application level, can predict the order
         blocks will be requested, can load them into
         main memory before they are needed.
          Ways in which disks can fail-
45


        Intermittent failure.

        Media Decay.

        Write failure.

        Disk Crash.
46




     13.4
                Intermittent Failures.
47



        Read or write operation on a sector
         unsuccessful not on first try, but after repeated
         tries.
        The most common form of failure.
        Parity checks can be used to detect this kind of
         failure.
        A straightforward way to perform the check
         is to read the sector and compare it with the
         sector we intended to write.
                      Media Decay.
48


        Serious form of failure.

        Bit/Bits are permanently corrupted.

        Impossible to read a sector correctly even after
         many trials.

        Stable storage technique for organizing a disk
         is used to avoid this failure or failed to writes
         do not result in permanent loss.
                       Write failure
49


        Attempt to write a sector is not possible.

        Attempt to retrieve previously written sector is
         unsuccessful.

        Possible reason – power outage while writing
         of the sector.

        Stable Storage Technique can be used to
         avoid this.
                        Disk Crash
50




     •   Most serious form of disk failure.

     •   Entire disk becomes unreadable, suddenly
         and permanently.

     •   RAID techniques can be used for coping with
         disk crashes.
         More on Intermittent failures…
51




        When we try to read a sector, but the correct
         content of that sector is not delivered to the
         disk controller.

        If the controller has a way to tell that the sector
         is good or bad (checksums), it can then
         reissue the read request when bad data is
         read, until the sector is returned correctl, or
         some preset limit like 100 tries, is reached.
         More on Intermittent Failures..
52


         The controller can attempt to write a sector,
         but the contents of the sector are not what was
         intended.

        The only way to check this is to let the disk go
         around again read the sector.

        One way to perform the check is to read the
         sector and compare it with the sector we
         intend to write.
                           Contd..
53


         Instead of performing the complete
         comparison at the disk controller, simpler way
         is to read the sector and see if a good sector
         was read.

        If it is good sector, then the write was correct
         otherwise the write was unsuccessful and
         must be repeated.
                      Checksums.
54


        Technique used to determine the good/bad
         status of a sector.

        Each sector has some additional bits called
         the checksum that are set depending on the
         values of the data bits in that sector.

        If checksum is not proper on reading, then
         there is an error in reading.
                Checksums(contd..)
55




        There is a small chance that the block was not
         read correctly even if the checksum is proper.

        The probability of correctness can be
         increased by using many checksum bits.
        If on reading we find that the checksum is
         not proper for the data bits, then we know
         there is an error in reading.
               Checksum calculation.
56


        Checksum is based on the parity of all bits in
         the sector.

        If there are odd number of 1’s among a
         collection of bits, the bits are said to have odd
         parity. A parity bit ‘1’ is added.

        If there are even number of 1’s then the
         collection of bits is said to have even parity. A
         parity bit ‘0’ is added.
         Checksum calculation(contd..)
57


        The number of 1’s among a collection of bits
         and their parity bit is always even.

        During a write operation, the disk controller
         calculates the parity bit and append it to the
         sequence of bits written in the sector.

        Every sector will have a even parity.
                       Examples…
58


        A sequence of bits 01101000 has odd number
         of 1’s. The parity bit will be 1. So the
         sequence with the parity bit will now be
         011010001.

        A sequence of bits 11101110 will have an even
         parity as it has even number of 1’s. So with the
         parity bit 0, the sequence will be 111011100.
         Checksum calculation(contd..)
59




         Any one-bit error in reading or writing the bits
         results in a sequence of bits that has odd-
         parity.

        The disk controller can count the number of 1’s
         and can determine if the sector has odd parity
         in the presence of an error.
                           Odds.
60


        There are chances that more than one bit can
         be corrupted and the error can be unnoticed.

        Increasing the number of parity bits can
         increase the chances of detecting errors.

        In general, if there are n independent bits as
         checksum, the chances of error will be one in
         2n.
                    Stable Storage.
61


        While checksum will almost certainly
         detect the existence of a media failure or a
         failure to read or write correctly, it does not
         help us correct the error.
        i.e. Checksums can detect the error but cannot
         correct it.
        Sometimes we overwrite the previous contents
         of a sector and yet cannot read the new
         contents correctly.
         To deal with these problems, Stable Storage
         policy can be implemented on the disks.
              Stable-Storage(contd..)
62


        To deal with the problems above, we
         implement the Stable-storage
        Sectors are paired and each pair represents
         one sector-contents X.
        The left copy of the sector may be represented
         as XL and XR as the right copy.
                     Assumptions.
63




        We assume that copies are written with
         sufficient number of parity bits to decrease the
         chance of bad sector looks good when the
         parity checks are considered.

        Also, If the read function returns a good value
         w for either XL or XR then it is assumed that w
         is the true value of X.
          Stable -Storage Writing Policy:
64


     1.   Write the value of X into XL. Check the value
          has status “good”; i.e., the parity-check bits
          are correct in the written copy. If not repeat
          write. If after a set number of write attempts,
          we have not successfully written X in XL,
          assume that there is a media failure in this
          sector. A fix-up such as substituting a spare
          sector for XL must be adopted.

     2.   Repeat (1) for XR.
         Stable-Storage Reading Policy:
65




        The policy is to alternate trying to read XL and
         XR until a good value is returned.

        If a good value is not returned after pre chosen
         number of tries, then it is assumed that X is
         truly unreadable.
           Error-Handling capabilities:
66


     Media failures:
     •  If after storing X in sectors XL and XR, one of
       them undergoes media failure and becomes
       permanently unreadable, we can read from the
       second one.

     •   If both the sectors have failed to read, then
         sector X cannot be read.

     •   The probability of both failing is extremely
                  Error-Handling
67
                Capabilities(contd..)

     Write Failure:
     • When writing X, if there is a system failure(like

       power shortage), the X in the main memory is
       lost and the copy of X being written will be
       erroneous.

     •   Half of the sector may be written with part of
         new value of X, while the other half remains as
         it was.
                Error-Handling
68
              Capabilities(contd..)
      The possible cases when the system becomes
       available:
     1. The failure occurred when writing to XL. Then
        XL is considered bad. Since XR was never
        changed, its status is good. We can make a
        copy of XR into XL, which is the old value of X.
     2. The failure occurred after XL is written. Then
        XL will have the good status and XR which
        has the old value of XR has bad status. We
        can copy the new value of X to XR from XL.
         Recovery from Disk Crashes.
69


        To reduce the data loss by Dish crashes,
         schemes which involve redundancy, extending
         the idea of parity checks or duplicate sectors
         can be applied.

        The term used for these strategies is RAID or
         Redundant Arrays of Independent Disks.

        In general, if the mean time to failure of disks
         is n years, then in any given year, 1/nth of the
         surviving disks fail.
                  Recovery from Disk
70
                   Crashes(contd..)
        Each of the RAID schemes has data disks and
         redundant disks.

        Data disks are one or more disks that hold the data.

        Redundant disks are one or more disks that hold
         information that is completely determined by the
         contents of the data disks.

        When there is a disk crash of either of the disks, then
         the other disks can be used to restore the failed disk
         to avoid a permanent information loss.
                           71




    13.4.6 DISK FAILURES

Xiaqing He
ID: 204
Dr. Lin
                       1) Mirroring
72
        The simplest scheme to recovery from Disk
         Crashes
        How does Mirror work?
           -- making two or more copied of the data
         on different disks
        Benefit:
          -- save data in case of one disk will fail;
          -- divide data on several disks and let access
         to several blocks at once
                    1) Mirroring (con’t)
73

        For mirroring, when the data can be lost?
          -- the only way data can be lost if there is a second
         (mirror/redundant) disk crash while the first (data) disk crash
         is being repaired.

      Possibility:
     Suppose:
      One disk: mean time to failure = 10 years;

      One of the two disk: average of mean time to failure = 5
       years;
      The process of replacing the failed disk= 3 hours=1/2920
       year;
     So:
      the possibility of the mirror disk will fail=1/10 * 1/2,920
       =1/29,200;
      The possibility of data loss by mirroring: 1/5 * 1/29,200 =
                      2)Parity Blocks
74
          why changes?
         -- disadvantages of Mirroring:      uses so many
         redundant disks
      What’s new?
     -- RAID level 4: uses only one redundant disk

        How this one redundant disk works?
         -- modulo-2 sum;
         -- the jth bit of the redundant disk is the modulo-2
          sum of the jth bits of all the data disks.
        Example
                  2)Parity
75
          Blocks(con’t)___Example
     Data disks:
      Disk1: 11110000

      Disk2: 10101010

      Disk3: 00111000



     Redundant disk:
      Disk4: 01100010
                     2)RAID 4 (con’t)
76
      Reading
     -- Similar with reading blocks from any disk;

     Writing
     1)change the data disk;
     2)change the corresponding block of the redundant
      disk;

        Why?
         -- hold the parity checks for the corresponding
         blocks of all the data disks
           2)RAID 4 (con’t) _ writing
77

     For a total N data disks:
     1) naïve way:
        read N data disks and compute the modulo-2 sum of
       the corresponding blocks;
        rewrite the redundant disk according to modulo-2
       sum of the data disks;

     2) better way:
      Take modulo-2 sum of the old and new version of the
       data block which was rewritten;
      Change the position of the redundant disk which was
       1’s in the modulo-2 sum;
                     2)RAID 4 (con’t) _
78
                     writing_Example
        Data disks:
        Disk1: 11110000
        Disk2: 10101010  01100110
        Disk3: 00111000

         to do:
        Modulo-2 sum of the old and new version of disk 2:
         11001100
        So, we need to change the positions 1,2,5,6 of the redundant
         disk.


        Redundant disk:
        Disk4: 01100010  10101110
     2)RAID 4 (con’t) _failure recovery
79

        Redundant disk crash:
     -- swap a new one and recomputed data from all the data disks;

        One of Data disks crash:
      -- swap a new one;
     -- recomputed data from the other disks including data disks and
         redundant disk;

        How to recomputed? (same rule, that’s why there will be some
         improvement)
     -- take modulo-2 sum of all the corresponding bits of all the
        other disks
              3) An Improvement: RAID 5
80
          Why need a improvement?
         -- Shortcoming of RAID level 4: suffers from a bottleneck defect (when
           updating data disk need to read and write the redundant disk);

          Principle of RAID level 5 (RAID 5):
         -- treat each disk as the redundant disk for some of the blocks;

      Why it is feasible?
     The rule of failure recovery for redundant disk and data disk is the
       same:

     “take modulo-2 sum of all the corresponding bits of all the other
        disks”

     So, there is no need to retreat one as redundant disk and others as
       data disks
                    3) RAID 5 (con’t)
81
        How to recognize which blocks of each disk treat
         this disk as redundant disk?


     -- if there are n+1 disks which were labeled from
        0 to N, then we can treat the i cylinder of disk
                                        th


        J as redundant if J is the remainder when I is
        divided by n+1;

        Example;
            3) RAID 5 (con’t)_example
82
       N=3;
      The first disk, labeled as 0 : 4,8,12…;

      The second disk, labeled as 1 : 1,5,9…;

      The third disk, labeled as 2 : 2,6,10…;

      ……….



         Suppose all the 4 disks are equally likely to be
         written, for one of the 4 disks, the possibility of
         being written:
        1/4 + 3 /4 * 1/3 =1/2
        If N=m => 1/m +(m-1)/m * 1/(m-1) = 2/m
          4) Coping with multiple disk
83
                    crashes
      RAID 6
        – deal with any number of disk crashes if using
       enough redundant disks
      Example

           a system of seven disks ( four data disks_numer 1-
       4 and 3 redundant disks_ number 5-7);
     • How to set up this 3*7 matrix ?

            (why is 3? – there are 3 redundant disks)
     1)every column values three 1’s and 0’s except for all
       three 0’s;
     2) column of the redundant disk has single 1’s;
     3) column of the data disk has at least two 1’s;
         4) Coping with multiple disk crashes
                        (con’t)
84


        Reading:
         read form the data disks and ignore the
         redundant disk

        Writing:
        Change the data disk
         change the corresponding bits of all the
         redundant disks
           4) Coping with multiple disk crashes
                          (con’t)
85
        In those system which has 4 data disks and 3
         redundant disk, how they can correct up to 2 disk
         crashes?
        Suppose disk a and b failed:
         find some row r (in 3*7 matrix)in which the column
         for a and b are different (suppose a is 0’s and b is
         1’s);
        Compute the correct b by taking modulo-2 sum of
         the corresponding bits from all the other disks
         other than b which have 1’s in row r;
        After getting the correct b, Compute the correct a
         with all other disks available;

        Example
     4) Coping with multiple disk crashes
     (con’t)_example
86


     3*7 matrix
                                           data disk
      redundant disk
     disk number           1       2         3     4   5   6   7
                       1       1       1      0    1   0   0

                       1       1       0      1    0   1   0

                       1       0       1      1    0   0   1
     4) Coping with multiple disk crashes
               (con’t)_example
87


     First block of all the disks
               disk     contents
               1)     11110000
               2)     10101010
               3)     00111000
               4)     01000001
               5)     01100010
               6)     00011011
               7)     10001001
     4) Coping with multiple disk crashes
               (con’t)_example
88


     Two disks crashes;
             disk    contents
             1)     11110000
             2)     ?????????
             3)     00111000
             4)     01000001
             5)     ?????????
             6)     00011011
             7)     10001001
     4) Coping with multiple disk crashes
               (con’t)_example
89


        In that 3*7 matrix, find in row 2, disk 2 and 5 have
      different value and disk 2’s value is 1 and 5’s value is 0.
         so: compute the first block of disk 2 by modulo-2 sum
      of all the corresponding bits of disk 1,4,6;
         then compute the first block of disk 2 by modulo-2
      sum of all the corresponding bits of disk 1,2,3;
             1)     11110000
             2)     ????????? => 00001111
             3)     00111000
             4)     01000001
             5)     ????????? => 01100010
             6)     00011011
             7)     10001001
     13.5 Arranging data on disk




90
                               Records
     Data elements are represented as records, which stores in consecutive bytes
     in same disk block.

     Basic layout techniques of storing data :

     Fixed-Length Records

     Allocation criteria - data should start at word boundary.

     Fixed Length record header

     1. A pointer to record schema.
     2. The length of the record.
                    3. Timestamps to indicate last modified or last read.




91
     Example
  CREATE TABLE employee(
  name CHAR(30) PRIMARY KEY,
92 address   VARCHAR(255),
  gender CHAR(1),
  birthdate DATE
  );   Following information should be there in the record.
       1. The record schema
       2. The length of the record
       3. Timestamps
      many record layouts include a header of some small number of bytes to provide this
       additional information.
  Data should start at word boundary and contain header and four fields name, address, gender and
  birthdate.
     Packing Fixed-Length Records into
                  Blocks :
93




     Records are stored in the form of blocks on the disk and
       they move into main memory when we need to update
       or access them.
     A block header is written first, and it is followed by series
       of blocks.
     Block header contains the following
                information :
94
        Links to one or more blocks that are part of a network
         of blocks.
        Information about the role played by this block in such
         a network.
        Information about the relation, the tuples in this block
         belong to.
        A "directory" giving the offset of each record in the
         block.
        Time stamp(s) to indicate time of the block's last
         modification and/or access.
95
                        Example




     Along with the header we can pack as many record as
       we can
     in one block as shown in the figure and remaining
       space will
     be unused.
                    96




13.6 REPRESENTING
BLOCK AND RECORD
ADDRESSES
97


        Address of a block and Record
          In   Main Memory
             Address of the block is the virtual memory address of
             the first byte
            Address of the record within the block is the virtual
             memory address of the first byte of the record
          InSecondary Memory: sequence of bytes
           describe the location of the block in the
           overall system
        Sequence of Bytes describe the location
         of the block : the device Id for the disk,
         Cylinder number, etc.
                Addresses in Client-Server
98
                        Systems
        Database consists of a server process that provides data from secondary storage
         to one or more client processes that are applications using the data.
        The server and client processes may be on one machine, or the server and the
         various clients can be distributed over many machines.
        The client application uses a "virtual" address space.
        The operating system or DBMS decides which parts of the address space are
         currently located in main memory, and hardware maps the virtual address space
         to physical locations in main memory.
        The addresses in address space are represented in two ways
            Physical Addresses: byte strings that determine the place within the secondary
             storage system where the record can be found.
            Logical Addresses: arbitrary string of bytes of some fixed length
        Physical Address bits are used to indicate:
            Host to which the storage is attached
            Identifier for the disk
            Number of the cylinder
            Number of the track
            Offset of the beginning of the record
          ADDRESSES IN CLIENT-SERVER
99
              SYSTEMS (CONTD..)
         Map Table relates logical addresses to
          physical addresses.

                       Logical   Physical

     Logical Address




                                              Physical Address
                       Logical and Structured
100
                            Addresses
         Purpose of logical address?
         All the information needed for a physical address is found in the map table.
         Many combinations of logical and physical addresses yield structured address
          schemes.
         A very useful, combination of physical and logical addresses is to keep in each
          block an offset table that holds the offsets of the records within the block, as
          suggested in Fig .


         Gives more flexibility, when we
             Move the record around within the block
             Move the record to another block
         Gives us an option of deciding what to do when a record is deleted?

                                                                       Rec      Rec      Rec   Rec
                                                                       ord      ord      ord   ord
                        Offset table                                   4        3        2     1
                   Header
                                                   Unused
                      Pointer Swizzling
101

         Relational systems need the ability to represent
          pointers in tuples
         index structures are composed of blocks that
          usually have pointers within them
         Having pointers is common in an object-relational
          database systems
         Important to learn about the management of
          pointers
         Every data item (block, record, etc.) has two
          addresses:
            database address: address on the disk
            memory address, if the item is in virtual memory
             Pointer Swizzling (Contd…)
102


         Translation Table: Maps database address to
          memory address

                      Dbaddr          Mem-addr
  Database address




                                                         Memory Address

         All addressable items in the database have entries
          in the map table, while only those items currently in
          memory are mentioned in the translation table
                 Pointer Swizzling (Contd…)
103


         Pointer consists of the following two fields
             Bit indicating the type of address
             Database or memory address
             Example 13.17
                       Disk                          Memory


                                                   Swizzled



                     Block 1                         Block 1



                                                               Unswizzled


                      Block 2
                         Example 13.7
104


         Block 1 has a record with pointers to a
          second record on the same block and to a
          record on another block
         If Block 1 is copied to the memory
           The first pointer which points within Block 1
            can be swizzled so it points directly to the
            memory address of the target record
           Since Block 2 is not in memory, we cannot
            swizzle the second pointer
               Pointer Swizzling (Contd…)
105


         Three types of swizzling
           Automatic      Swizzling
               As  soon as block is brought into memory, swizzle
               all relevant pointers.
           Swizzling      on Demand
               Only swizzle a pointer if and when it is actually
               followed.
             No Swizzling
               Pointersare not swizzled they are accesses
               using the database address.
                  Programmer Control of
106
                        Swizzling

         Unswizzling
           When a block is moved from memory back to disk, all
            pointers must go back to database (disk) addresses
           Use translation table again

           Important to have an efficient data structure for the
            translation table
           Drawbacks

         The possible time saved by swizzling all of a block‘s
          pointers at one time must be weighed against the
          possibility that some swizzled pointers will never be
          followed.
         In that case, any time spent swizzling and unswizzling the
          pointer will be wasted.
              Pinned records and Blocks
107


         A block in memory is said to be pinned if it cannot
          be written back to disk safely.
         A bit telling whether or not a block is pinned
          can be located in the header of the block.
         If block B1 has swizzled pointer to an item in block
          B2, then B2 is pinned
            Unpin a block, we must unswizzle any pointers
             to it
            Keep in the translation table the places in
             memory holding swizzled pointers to that item
           Unswizzle   those pointers (use translation
            table to replace the memory addresses with
            database (disk) addresses
                       108




             13.7

          CHAPTER 13
ID: 221
      Records With Variable-Length
109
                 Fields
 A simple but effective scheme is to put all fixed length
 fields ahead of the variable-length fields. We then place
 in the record header:
 1. The length of the record.
 2. Pointers to (i.e., offsets of) the beginnings of all the
    variable-length fields. However, if the variable-length
    fields always appear in the same order then the first of
    them needs no pointer; we know it immediately follows
    the fixed-length fields.
110
                        Example

        name           addres       gender    birth date
                       s
      0         30               286              287
      297

                Fig 1 : Movie star record with four
       fields
      Records with Variable Fields
111



      An effective way to represent variable
      length records is as follows

       Fixed length fields are Kept ahead of the
      variable length fields

      Record header contains
        • Length of the record
        • Pointers to the beginning of all variable
          length fields except the first one.
          Records with Variable Length
112
                     Fields
      header
      information length
            record
                  to address



                      gender birth date   name   address




      Figure 2 : A Movie Star record with name and address
      implemented as variable length character strings
       Records with Repeating Fields
113
      Records contains variable number of occurrences of a
      field F

      All occurrences of field F are grouped together and the
      record
        header contains a pointer to the first occurrence of field
      F

       L bytes are devoted to one instance of field F

       Locating an occurrence of field F within the record
         • Add to the offset for the field F which are the integer
         multiples of L starting with 0 , L ,2L,3L and so on to
         locate
         •We stop upon reaching the offset of the field F.
         Records with Repeating Fields
114

      other header
      information length
             record
                  to address
                          to movie
                          pointers

                              name    address




                                                 pointers to movies
      Figure 3 : A record with a repeating group of references to
      movies
        Records with Repeating Fields
115


          record header to name length of name
          information             to address
                                      length of
                                          to movie
                                      address
                                          references number of
                                                     references




             addres            name
             s



      Figure 4 : Storing variable-length fields separately from the
      record
                                                      116


  Records with Repeating Fields

Advantage

 Records with repeating fields are used to keep the
record itself fixed length allows record to be searched
more efficiently, minimizes the overhead in the block
headers, and allows records to be moved within or among
the blocks with minimum effort.

Disadvantage

 Storing variable length components on another block
increases the number of disk I/O’s needed to examine all
components of a record.
                                                      117


 Records with Repeating Fields

A compromise strategy is to allocate a fixed
portion of the record for the repeating fields

 If the number of repeating fields is lesser than
  allocated space, then there will be some
unused space

 If the number of repeating fields is greater than
  allocated space, then extra fields are stored in
a
  different location and

Pointer to that location and count of additional
 occurrences is stored in the record
             Variable Format Records
118


       Records that do not have fixed schema

       Variable format records are represented by sequence of
         tagged fields

       Each of the tagged fields consist of information
         • Attribute or field name
         • Type of the field
         • Length of the field
         • Value of the field

       Why use tagged fields
        • Information – Integration applications
        • Records with a very flexible schema
              Variable Format Records
119



 code for name                  code for restaurant
           code for string      owned
                                    code for string type
           type length
                                            length


       N   S 1       Clint       R    S   1      Hog’s Breath Inn
             4       Eastwood             6




           Fig 5 : A record with tagged fields
        Records that do not fit in a block
120



       When the length of a record is greater than block size
      ,then
         then record is divided and placed into two or more
      blocks

       Portion of the record in each block is referred to as a
        RECORD FRAGMENT

       Record with two or more fragments is called
        SPANNED RECORD

       Record that do not cross a block boundary is called
        UNSPANNED RECORD
                  Spanned Records
121


      Spanned records require the following extra
      header information

        • A bit indicates whether it is fragment or not

        • A bit indicates whether it is first or last
        fragment of
          a record

        • Pointers to the next or previous fragment for
        the
          same record
        Records that do not fit in a block
122




      block header

           record
           header

                              record          record
              record 1                                    record 3
                                2-a             2-b


         block 1                                block 2


                   Figure 6 : Storing spanned records across
        blocks
                                 BLOBS
123




      •    Binary, Large OBjectS = BLOBS
      •   BLOBS can be images, movies, audio files and other very
          large values that can be stored in files.
      •   Storing BLOBS
           – Stored in several blocks.
           – Preferable to store them consecutively on a cylinder or
              multiple disks for efficient retrieval.
      •   Retrieving BLOBS
           – A client retrieving a 2 hour movie may not want it all at
              the same time.
           – Retrieving a specific part of the large data requires an
              index structure to make it efficient. (Example: An index
              by seconds on a movie BLOB.)
      Record Modifications

               Section 13.8




124
125


         Record is a single, implicitly structured
          data item in the database table. Record is
          also called as Tuple.
      •   Record Modification : Records Modified
          when a data manipulation operation is
          performed.
             STRUCTURE OF A RECORD
126


      •    RECORD STRUCTURE FOR A PERSON
          TABLE
      •   CREATE TABLE PERSON ( NAME CHAR(30), ADDRESS CHAR(256) ,
          GENDER CHAR(1), BIRTHDATE CHAR(10));
                   TYPES OF RECORDS
127


      •   FIXED LENGTH RECORDS
              CREATE TABLE SJSUSTUDENT(STUDENT_ID
              INT(9) NOT NULL , PHONE_NO INT(10) NOT NULL);




      •   VARIABLE LENGTH RECORDS
              CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL,
              NAME CHAR(100) ,ADDRESS CHAR(100) ,PHONE_NO INT(10) NOT
          NULL);
                   Modification types

         Insertion

         Deletion

         Update




128
                                  Insertion

         Insertion of records without order
            Records can be placed in a block with empty space or in a new
          block.


              Insertion of records in fixed order
             Space available in the block
             No space available in the block (outside the block)


              Structured address
          Is a Pointer to a record from outside the block.

129                                                                     129
                Inserting New Records
130




      •   If Records are not required to be a particular
          order, just find an empty block and place the
          record in the block. eg: Heap Files
      •    if the Records are to be Kept in a particular
          Order(eg: sorted by primary key) ?
      •   Then Locate appropriate block, check if space
          is available in the block if yes place the record
          in the block.
                    Insertion in fixed order

      Space available within the block
         Use of an offset table in the header of each block with pointers to the
          location of each record in the block.
         The records are slid within the block and the pointers in the offset table are
          adjusted.

                 Offse
                 t
                 table
              header        unuse
                            d

                                      Record 4   Record 3   Record 2   Record 1




131                                                                                 131
                Insertion in fixed order

      No space available within the block (outside the block)
         Find space on a “nearby” block.
          •   In case of no space available on a block, look at the following block in sorted
              order of blocks.
          •   If space is available in that block ,move the highest records of first block 1 to
              block 2 and slide the records around on both blocks.
         Create an overflow block
          •   Records can be stored in overflow block.
          •   Each block has place for a pointer to an overflow block in its header.
          •   The overflow block can point to a second overflow block as shown below.




                            Block              Overflow
                            B                  block for B
132                                                                                          132
                                     Deletion

         Recover space after deletion
             When using an offset table, the records can be slid around the block so
              there will be an unused region in the center that can be recovered.


             In case we cannot slide records, an available space list can be
              maintained in the block header.


             The list head goes in the block header and available regions hold the
              links in the list.




133                                                                               133
134
                                   Deletion
         Use of tombstone
             The tombstone is placed in a record in order to avoid pointers to the
              deleted record to point to new records.

             The tombstone is permanent until the entire database is
              reconstructed.

             If pointers go to fixed locations from which the location of the record is
              found then we put the tombstone in that fixed location. (See examples)
             A tombstone is a bit placed at first byte of deleted record to
              indicate the record was deleted ( 0 – Not Deleted 1 – Deleted)
             Where a tombstone is placed depends on the nature of the record
              pointers.

             Map table is used to translate logical record address to physical
              address.
                                                                                   134
135
                                   Deletion
         Use of tombstone
             If we need to replace records by tombstones, place the bit that serves
              as the tombstone at the beginning of the record.

             This bit remains the record location and subsequent bytes can be
              reused for another record




                               Record 1                      Record 2


              Record 1 can be replaced, but the tombstone remains, record 2 has
              no tombstone and can be seen when we follow a pointer to it.
                                                                                  135
                                 Update
         Fixed Length update
           No effect on storage system as it occupies same space as before
          update.
         Variable length update
          • If length increases, like insertion “slide the records”

          • If length decreases, like deletion we update the space-
            available list, recover the space/eliminate the overflow
            blocks.
           Longer length

           Short length




136                                                                    136
                                     Update

      Variable length update (longer length)
         Stored on the same block:
             Sliding records
             Creation of overflow block.


         Stored on another block
             Move records around that block
             Create a new block for storing variable length fields.




137                                                                    137
                                   Update

      Variable length update (Shorter length)
         Same as deletion
             Recover space
             Consolidate space.




138                                             138
                           Topics
139


         Records with Variable Length Fields
         Records with Repeating Fields
         Variable Format Records
         Records that do not fit in a block
         BLOBS
B TREE

CHAPTER 14.2
                                  141




14.7
DATABASE SYSTEMS – The Complete
Book   BTREES & BITMAP
        INDEXES
                      Definition
142


      A bitmap index for a field F is a collection of
      bit-vectors of length n, one for each possible
      value that may appear in that field F.[1]
               What does that mean?
143




         Assume relation R
          with
          2  attributes A and B.
           Attribute A is of type       A    B
            Integer and B is of      1   30   foo
            type String.             2   30   bar
           6 records, numbered      3   40   baz
            1 through 6 as shown.    4   50   foo
                                     5   40   bar
                                     6   30   baz
               Example Continued…
144




         A bitmap for attribute B is:
                                             A    B
                                         1   30   foo
              Value          Vector
                                         2   30   bar
               foo           100100      3   40   baz
               bar           010010      4   50   foo
                                         5   40   bar
               baz           001001
                                         6   30   baz
                   Where do we reach?
145


         A bitmap index is a special kind of database index
          that uses bitmaps.[2]
         Bitmap indexes is considered to work well for
          data such as gender, which has a small
          number of distinct values, e.g., male and
          female, but many occurrences of those
          values.[2]
                           A little more…
146

         A bitmap index for attribute A of relation R is:
          A  collection of bit-vectors
           The number of bit-vectors = the number of distinct
            values of A in R.
           The length of each bit-vector = the cardinality of R.
           The bit-vector for value v has 1 in position i, if the ith
            record has v in attribute A, and it has 0 there if not.[3]
         Records are allocated permanent numbers.[3]
         There is a mapping between record numbers and record
          addresses.[3]
            Motivation for Bitmap Indexes
147


         Very efficient when used for partial match queries.[3]

         They offer the advantage of buckets [2]
           Where  we find tuples with several specified attributes
            without first retrieving all the record that matched in
            each of the attributes.

         They can also help answer range queries [3]
                         Another Example
148

      Multidimensional Array of multiple types
      {(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}

                                     5 = 100010
                                     79 = 010100
                                     4 = 001000
                                     6 = 000001
                                     d = 101100
                                     t = 010010
                                     a = 000001
         Example Continued…
149


           {(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}
      Searching for items is easy, just AND together.
                    To search for (5,d)

                      5 = 100010
                      d = 101100
              100010 AND 101100 = 100000



        The location of the
      record has been traced!
                     Compressed Bitmaps
150


         Assume:
           The  number of records in R are n
           Attribute A has m distinct values in R
         The size of a bitmap index on attribute A is m*n.
         If m is large, then the number of 1’s will be around 1/m.
           Opportunity    to encode
         A common encoding approach is called run-length
          encoding.[1]
                         Run-length encoding
151

         Represents runs
           A run is a sequence of i 0’s followed by a 1, by some suitable binary encoding
             of the integer i.

         A run of i 0’s followed by a 1 is encoded by:
           First computing how many bits are needed to represent i, Say k
           Then represent the run by k-1 1’s and a single 0 followed by k bits which
             represent i in binary.
           The encoding for i = 1 is 01. k = 1
           The encoding for i = 0 is 00. k = 1


         We concatenate the codes for each run together, and the sequence of bits is the
          encoding of the entire bit-vector
           Understanding with an Example
152

         Let us decode the sequence 11101101001011
         Staring at the beginning (left most bit):
           First run: The first 0 is at position 4, so k = 4. The next 4 bits are 1101,
             so we know that the first integer is i = 13
           Second run: 001011
              k=1
              i=0
             Last run: 1011
              k=1
              i=3
         Our entire run length is thus 13,0,3, hence our bit-vector is:
                        0000000000000110001
             Managing Bitmap Indexes
153


      1) How to find a specific bit-vector for a value
        efficiently?
      2) After selecting results that match, how to
        retrieve the results efficiently?
      3) When data is changed, do you alter bitmap
        index?
                1) Finding bit vectors
154


       Think  of each bit-vector as a key to a value.[1]
       Any secondary storage technique will be efficient
        in retrieving the values.[1]
       Create secondary key with the attribute value as a
        search key [3]
         Btree
         Hash
                      2) Finding Records
155


         Create secondary key with the record number as a search key
          [3]
         Or in other words,
           For  finding record k, create a secondary index using
            the kth position as a search key.[1]
       3) Handling Modifications
156


               Two things to remember:



      Record numbers can not changed once assigned


      Changes to data file require changes to bitmap index
157


      Deletion
         Tombstone replaces deleted record
         Corresponding bit is set to 0
158


      Insertion
          Record is assigned next record number.
          A bit of value 0 or 1 is appended to each bit
           vector
          If new record contains a new value of the
           attribute, add one bit-vector.
159


      Modification
         Change the bit corresponding to the old value
          of the modified record to 0
         Change the bit corresponding to the new
          value of the modified record to 1
         If the new value is a new value of A, then
          insert a new bit-vector.
160



      Section 14.7
      Chapter 14
                                      Structure
161

         A balanced tree, meaning that all paths from the
          leaf node have the same length.
         There is a parameter n associated with each Btree
          block. Each block will have space for n searchkeys
          and n+1 pointers.
         The root may have only 1 parameter, but all other
          blocks most be at lea
      B-tree organizes its blocks into a tree. The tree is balanced, meaning that all paths from the
          root to a leaf have the same length. Typically, there are three layers in a B-tree: the root,
          an intermediate layer, and leaves, but any number of layers is possible.
      st half full.
                                         Structure
162

      •   There are three layers in binary trees- the root, an intermediate layer and leaves
      •   In a B-Tree each block have space for n search-key values and n+1 pointers
      ●
      A typical node >
      ● a typical interior
      node would have
      pointers pointing to
      leaves with out
      values
      ● a typical leaf would
      have pointers point
      to records
      N search keys
      N+1 pointers
                           Application
163


         The search key of the Btree is the primary key for
          the data file. That is, there is one key-pointer
          pair in a leaf for every record of the data
          file.
         Data file is sorted by its primary key.
         Data file is sorted by an attribute that is not a
          key,and this attribute is the search key for the Btree.
                              Lookup
164


      If at an interior node, choose the correct pointer to use. This is
                  done by comparing keys to search value.
                         Lookup
165




      At a leaf node, choose the key that matches what
      you are looking for and the pointer for that leads
                         to the data.
                            Insertion
166


          For inserting, choose the correct leaf node to put
          pointer to data.
         If the node is full, create a new node and split keys
          between the two.
         Then recursively move up, if cannot create new
          pointer to new node because full, create new node.
         This would end with creating a new root node, if
          the current root was full.
                         Deletion
167


      Perform lookup to find node to delete and delete it.

         If node is no longer half full, perform join on
        adjacent node and recursively delete up, or key
        move if that node is full and recursively change
                           pointer up.
                       Efficiency
168


        Btrees allow lookup, insertion, and deletion of
              records using very few disk I/Os.

      Each level of a Btree would require one read. Then
      you would follow the pointer of that to the next or
                           final read.
                          Efficiency
169


      Three levels are sufficient for Btrees. Having each block have
               255 pointers, 255^3 is about 16.6 million.

      You can even reduce disk I/Os by keeping a level of a Btree in
        main memory. Keeping the first block with 255 pointers
       would reduce the reads to 2, and even possible to keep the
            next 255 pointers in memory to reduce reads to 1.
      Query Execution
170




          Chapter 15
          Section 15.1
                       Agenda
171


         Query Processor and major parts of Query
          processor
         Physical-Query-Plan Operators
         Scanning Tables
         Basic approaches to locate the tuples of a
          relation R
         Sorting While Scanning Tables
         Computation Model for Physical Operator
         I/O Cost for Scan Operators
         Iterators
           What is a Query Processor
172



         Group of components of a DBMS that
          converts a user queries and data-
          modification commands into a sequence of
          database operations
         It also executes those operations
         Must supply detail regarding how the query
          is to be executed
      Major parts of Query processor
173


                       Query Execution:
                         The algorithms
                         that manipulate
                         the data of the
                         database.

                         Focus on the
                         operations of
                         extended
                         relational
                         algebra.
      Outline of Query Compilation
174

                 Query compilation
                  Parsing : A parse tree for the

                   query is constructed
                  Query Rewrite : The parse
                   tree is converted to an initial
                   query plan and transformed
                   into logical query plan (less
                   time)
                  Physical Plan Generation :

                   Logical Q Plan is converted
                   into physical query plan by
                   selecting algorithms and order
                   of execution of these operator.
          Physical-Query-Plan Operators
175


         Physical operators are implementations of
          the operator of relational algebra.
         They can also be use in non relational
          algebra operators like “scan” which scans
          tables, that is, bring each tuple of some
          relation into main memory
                    Scanning Tables
176


         One of the basic thing we can do in a
          Physical query plan is to read the entire
          contents of a relation R.
         Variation of this operator involves simple
          predicate, read only those tuples of the
          relation R that satisfy the predicate.
                 Scanning Tables
177

      Basic approaches to locate the tuples of a
       relation R
       Table   Scan
         Relation R is stored in secondary memory
          with its tuples arranged in blocks
         It is possible to get the blocks one by one
       Index-Scan
         If there is an index on any attribute of
          Relation R, we can use this index to get all
          the tuples of Relation R
          Sorting While Scanning Tables
178


         Number of reasons to sort a relation
           Query could include an ORDER BY clause,
            requiring that a relation be sorted.
           Algorithms to implement relational algebra
            operations requires one or both arguments
            to be sorted relations.
           Physical-query-plan operator sort-scan
            takes a relation R, attributes on which the
            sort is to be made, and produces R in that
            sorted order
      Computation Model for Physical
179
                Operator
         Physical-Plan Operator should be selected
          wisely which is essential for good Query
          Processor .
         For “cost” of each operator is estimated by
          number of disk I/O’s for an operation.
         The total cost of operation depends on the
          size of the answer, and includes the final
          write back cost to the total cost of the query.
             Parameters for Measuring
180
                      Costs
         Parameters that affect the performance of a
          query
           Buffer space availability in the main
            memory at the time of execution of the
            query
           Size of input and the size of the output
            generated
           The size of memory block on the disk and
            the size in the main memory also affects
            the performance
      Parameters for Measuring Costs
181


         B: The number of blocks are needed to hold
          all tuples of relation R.
         Also denoted as B(R)
         T:The number of tuples in relationR.
         Also denoted as T(R)
          V: The number of distinct values that appear in
          a column of a relation R
         V(R, a)- is the number of distinct values of
          column for a in relation R
          I/O Cost for Scan Operators
182


         If relation R is clustered, then the number
          of disk I/O for the table-scan operator is =
          ~B disk I/O’s
         If relation R is not clustered, then the
          number of required disk I/O generally is
          much higher
         A index on a relation R occupies many
          fewer than B(R) blocks
               That means a scan of the entire
          relation R which takes at least B disk I/O’s
          will require more I/O’s than the entire index
          Iterators for Implementation of
183
                Physical Operators
         Many physical operators can be
          implemented as an Iterator.
         Three methods forming the iterator for an
          operation are:
         1. Open( ) :
            This method starts the process of getting
             tuples
            It initializes any data structures needed to
             perform the operation
      Iterators for Implementation of
184
            Physical Operators
         2. GetNext( ):
           Returns the next tuple in the result
           If there are no more tuples to return,
            GetNext returns a special value
            NotFound
         3. Close( ) :
           Ends the iteration after all tuples

           It calls Close on any arguments of the
            operator
             Query Execution
185




      One-Pass Algorithms for
       Database Operations (15.2)
          One-Pass Algorithm Methods
186




         Tuple-at-a-time, unary operations: (selection &
          projection)
          You should explain in detail
         Full-relation, unary operations
         You should explain in detail
         Full-relation, binary operations (set & bag versions of
          union)
          You should explain in detail
          One-Pass Algorithms for Tuple-
187
              at-a-Time Operations
         Tuple-at-a-time operations are selection and projection
             read the blocks of R one at a time into an input buffer
             perform the operation on each tuple
             move the selected tuples or the projected tuples to the
              output buffer


         The disk I/O requirement for this process depends only
          on how the argument relation R is provided.
             If R is initially on disk, then the cost is whatever it takes to
              perform a table-scan or index-scan of R.
      A selection or projection being
188
        performed on a relation R
      One-Pass Algorithms for Unary,
189
         full Relation Operations
         Duplicate Elimination
             To eliminate duplicates, we can read each block of R
              one at a time, but for each tuple we need to make a
              decision as to whether:
               1.   It is the first time we have seen this tuple, in which case
                    we copy it to the output, or
               2.   We have seen the tuple before, in which case we must
                    not output this tuple.
             One memory buffer holds one block of R's tuples, and
              the remaining M - 1 buffers can be used to hold a single
              copy of every tuple.
      Managing memory for a one-
190
       pass duplicate-elimination
                    Duplicate Elimination
191


         When a new tuple from R is considered, we compare it
          with all tuples seen so far
             if it is not equal: we copy both to the output and add it to
              the in-memory list of tuples we have seen.
             if there are n tuples in main memory: each new tuple takes
              processor time proportional to n, so the complete operation
              takes processor time proportional to n2.
         We need a main-memory structure that allows each of
          the operations:
             Add a new tuple, and
             Tell whether a given tuple is already there
      Duplicate Elimination (…contd.)
192


         The different structures that can be used for such main
          memory structures are:
             Hash table:-
             Balanced binary search tree
           :Each     of these structure has some space
              overhead in addition to the space needed to store
              of the tuples. For example a main memory hash
              table needs a bucket array and space for pointers
              to link the tuples in a bucket
      One-Pass Algorithms for Unary,
193
         full Relation Operations
         Grouping
             The grouping operation gives us zero or more grouping
              attributes and presumably one or more aggregated
              attributes

             If we create in main memory one entry for each group then
              we can scan the tuples of R, one block at a time.

             The entry for a group consists of values for the grouping
              attributes and an accumulated value or values for each
              aggregation.
                             Grouping
194


         The accumulated value is:
             For MIN(a) or MAX(a) aggregate, record minimum
              /maximum value, respectively.
             For any COUNT aggregation, add 1 for each tuple of
              group.
             For SUM(a), add value of attribute a to the accumulated
              sum for its group, provided a is not NULL
             AVG(a) is a hard case. We must maintain 2
              accumulations: count of no. of tuples in the group &
              sum of a-values of these tuples, Which is also
              computed for a COUNT and Sum. After checking all
              tuples of R, take quotient of sum & count to obtain
              average.
      One-Pass Algorithms for Binary
195
               Operations
          Binary operations include:
          What is Binary operations ?
          ->which uses two operands for evaluations
              Union
              Intersection
              Difference
              Product
              Join
          Tell the difference between Set and Bag?
          Set does not allow duplicate elements
          Bag allows duplicate element
                           Set Union
196


         We read S into M - 1 buffers of main memory and
          build a search structure where the search key is the
          entire tuple.

         All these tuples are also copied to the output.

         Read each block of R into the Mth buffer, one at a
          time.

         For each tuple t of R, see if t is in S, and if not, we
          copy t to the output. If t is also in S, we skip t.
                     Set Intersection
197


         Read S into M - 1 buffers and build a search structure
          with full tuples as the search key.

         Read each block of R, and for each tuple t of R, see if
          t is also in S. If so, copy t to the output, and if not,
          ignore t.
                        Set Difference
198

         Read S into M - 1 buffers and build a search structure
          with full tuples as the search key.
         To compute R -s S, read each block of R and examine
          each tuple t on that block. If t is in S, then ignore t; if it
          is not in S then copy t to the output.
         To compute S -s R, read the blocks of R and examine
          each tuple t in turn. If t is in S, then delete t from the
          copy of S in main memory, while if t is not in S do
          nothing.
         After considering each tuple of R, copy to the output
          those tuples of S that remain.
                    Bag Intersection
199

         Read S into M - 1 buffers.
         Multiple copies of a tuple t are not stored individually.
          Rather store 1 copy of t & associate with it a count
          equal to no. of times t occurs.
         Next, read each block of R, & for each tuple t of R
          see whether t occurs in S. If not ignore t; it cannot
          appear in the intersection. If t appears in S, & count
          associated with t is (+)ve, then output t & decrement
          count by 1. If t appears in S, but count has reached 0,
          then do not output t; we have already produced as
          many copies of t in output as there were copies in S.
                      Bag Difference
200


         To compute S -B R, read tuples of S into main
          memory & count no. of occurrences of each distinct
          tuple.

         Then read R; check each tuple t to see whether t
          occurs in S, and if so, decrement its associated
          count. At the end, copy to output each tuple in main
          memory whose count is positive, & no. of times we
          copy it equals that count.

         To compute R -B S, read tuples of S into main
          memory & count no. of occurrences of distinct tuples.
              Bag Difference (…contd.)
201


         Think of a tuple t with a count of c as c reasons not to
          copy t to the output as we read tuples of R.

         Read a tuple t of R; check if t occurs in S. If not, then
          copy t to the output. If t does occur in S, then we look at
          current count c associated with t. If c = 0, then copy t to
          output. If c > 0, do not copy t to output, but decrement c
          by 1.
                            Product
202


         Read S into M - 1 buffers of main memory

         Then read each block of R, and for each tuple t of R
          concatenate t with each tuple of S in main memory.

         Output each concatenated tuple as it is formed.

         This algorithm may take a considerable amount of
          processor time per tuple of R, because each such
          tuple must be matched with M - 1 blocks full of tuples.
          However, output size is also large, & time/output tuple
          is small.
                           Natural Join
203


         Convention: R(X, Y) is being joined with S(Y, Z), where
          Y represents all the attributes that R and S have in
          common, X is all attributes of R that are not in the
          schema of S, & Z is all attributes of S that are not in the
          schema of R. Assume that S is the smaller relation.

         To compute the natural join, do the following:
           1. Read all tuples of S & form them into a main-
               memory search structure.
               Hash table or balanced tree are good e.g. of such
               structures. Use M - 1 blocks of memory for this
               purpose.
             Natural Join (…contd.)
204


      2.   Read each block of R into 1 remaining main-
           memory buffer.
           For each tuple t of R, find tuples of S that agree
           with t on all attributes of Y, using the search
           structure.
           For each matching tuple of S, form a tuple by
           joining it with t, & move resulting tuple to output.
                                  205




                     15.3
              Nested-Loop Joins
By:
Saloni Tamotia (215)
            Introduction to Nested-Loop
206
                       Joins
          Used for relations of any side.
         Not necessary that relation fits in main memory
          Uses “One-and-a-half” pass method in which
          for each variation:
            One argument read just once.
            Other argument read repeatedly.
             Can be used for relations of any size.
         Two kinds:
            Tuple-Based Nested Loop Join
            Block-Based Nested Loop Join
      ADVANTAGES OF NESTED-LOOP
207
                JOIN


       Fits in the iterator framework.
       Allows us to avoid storing

        intermediate relation on disk.
       Tuple-Based Nested-Loop Join
208




       Simplest variation of the
      nested-loop join

       Loop ranges over individual
      tuples
          Tuple-Based Nested-Loop Join
209



          Algorithm to compute the Join R(X,Y) | |
          S(Y,Z)
          FOR each tuple s in S DO
              FOR each tuple r in R DO
                    IF r and s join to make tuple t THEN
                    output t
         R and S are two Relations with r and s as
          tuples.
          carelessness in buffering of blocks causes
          the use of T(R)T(S) disk I/O’s
                     IMPROVEMENT &
210
                      MODIFICATION

                    To decrease the cost
         Method 1: Use algorithm for Index-Based
          joins
           We find tuple of R that matches given tuple of
            S
           We need not to read entire relation R


         Method 2: Use algorithm for Block-Based
          joins
          Block-Based Nested-Loop Join
                   Algorithm
211




         Access to arguments is organized by
          block.
            While reading tuples of inner relation we
             use less number of I/O’s disk.

         Using enough space in main memory to
          store tuples of relation of the outer loop.
            Allows to join each tuple of the inner
             relation with as many tuples as possible.
      Block-Based Nested-Loop Join
212
               Algorithm
      ALGORITHM:
      FOR each chunk of M-1 blocks of S DO
       Read this blocks into main-memory buffers;
        FOR each block b of R DO
       Read b into main memory
          FOR each tuple t of b DO
       Find the tuples of S in main memory that join
        with t;
           find the tuples of S in memory that join with t
              output the join of t with each of these tuples
      Block-Based Nested-Loop Join
213
               Algorithm

         Assumptions:
           B(S) ≤ B(R)
           B(S) > M


          This means that the neither relation fits
           in the entire main memory.
           Analysis of Nested-Loop Join
214



          Number of disk I/O’s:
      [B(S)/(M-1)]*(M-1 +B(R))


      or


      B(S) + [B(S)B(R)/(M-1)]
      or approximately B(S)*B(R)/M

          Assuming all of M.B(S),B(R) are large, but
           M is the smallest of these,
          ~B(S)*B(R)/M.
          i.e. the cost is proportional to the product
           of the sizes of the two relations, divided by
           the amount of available memory.
                  215




TWO-PASS ALGORITHMS
BASED ON SORTING
 SECTION 15.4
          Two-Pass Algorithms Based on
216
                     Sorting
         Two-pass Algorithms: where data from the
          operand relations is read into main memory,
          processed in some way, written out to disk again,
          and then reread from disk to complete the
          operation
                         Basic idea
217


         Step 1: Read M blocks of R into main memory.
         Step 2:Sort these M blocks in main memory, using
          an efficient, main-memory sorting algorithm. so
          we expect that the time to sort will not exceed the
          disk 1/0 time for step (1).
         Step 3: Write the sorted list into M blocks of disk.
            Duplicate Elimination Using
218
                    Sorting δ(R)
         First sort the tuples of R in sublists
         Then use the available main memory to hold one
          block from each sorted sublist
         Then repeatedly copy one to the output and ignore
          all tuples identical to it.
         The total cost of this algorithm is 3B(R)
         This algorithm requires only √B(R)blocks of main
          memory, rather than B(R) blocks(one-pass
          algorithm).
                          Example
219


         Suppose that tuples are integers, and only two
          tuples fit on a block. Also, M = 3 and the relation R
          consists of 17 tuples:
           2,5,2,1,2,2,4,5,4,3,4,2,1,5,2,1,3
         After first-pass

                          Sublists            Elements
                            R1                1,2,2,2,2,5
                            R2                2,3,4,4,4,5
                            R3                 1,1,2,3,5
                                  Example
220

         Second pass
                        Sublist     In memory   Waiting on disk
                          R1           1,2         2,2, 2,5
                          R2           2,3         4,4, 4,5
                          R3           1,1           2,3,5

      After processing tuple 1
                        Sublist     In memory   Waiting on disk
                          R1           2           2,2, 2,5
                          R2           2,3         4,4, 4,5
         Output: 1 R3               2,3               5

      Continue the same process with next tuple.
      Grouping and Aggregation Using
221
                Sorting γ(R)
         Two-pass algorithm for grouping and aggregation is quite
          similar to the previous algorithm.
         Step 1:Read the tuples of R into memory, M blocks at a
          time. Sort each M blocks, using the grouping attributes of L
          as the sort key. Write each sorted sublist to disk.
         Step 2:Use one main-memory buffer for each sublist, and
          initially load the first block of each sublist into its buffer.
         Step 3:Repeatedly find the least value of the sort key
          (grouping attributes) present among the first available
          tuples in the buffers.
         This algorithm takes 3B(R) disk 1/0's, and will work as long
          as B(R) < M².
          A Sort-Based Union Algorithm
222

         For bag-union one-pass algorithm is used.
         For set-union
          ◦   Step 1:Repeatedly bring M blocks of R into main memory, sort their
              tuples, and write the resulting sorted sublist back to disk.
          ◦   Step 2:Do the same for S, to create sorted sublists for relation S.
          ◦   Step 3:Use one main-memory buffer for each sublist of R and S.
              Initialize each with the first block from the corresponding sublist.
          ◦   Step 4:Repeatedly find the first remaining tuple t among all the
              buffers. Copy t to the output. and remove from the buffers all copies
              of t (if R and S are sets there should be at most two copies)
         This algorithm takes 3(B(R)+B(S)) disk 1/0's, and will work as long
          as B(R)+B(S) < M².
          Sort-Based Intersection and
223
                  Difference
         For both set and bag version, the algorithm is same as
          that of set-union except that the way we handle the
          copies of a tuple t at the fronts of the sorted sublists.
         For set intersection, output t if it appears in both R
          and S.
         For bag intersection, output t the minimum of the
          number of times it appears in R and in S.
         For set difference, R-S, output t if and only if it
          appears in R but not in S.
         For bag difference, R-S, output t the number of times it
          appears in R minus the number of times it appears in
          S.
      A Simple Sort-Based Join Algorithm
224


         When taking a join, the number of tuples from the
          two relations that share a common value of the
          join attribute(s), need to take in main memory
          simultaneously, can exceed what fits in memory
         To avoid this situation, try to reduce main-memory
          use for other aspects of the algorithm, make
          available a large number of buffers to hold the
          tuples with a given join-attribute value
      A Simple Sort-Based Join Algorithm
225

         Given relations R(X, Y) and S(Y, Z) to join, and given M blocks of main
          memory for buffers.
         Step 1:Sort R and S, using a two-phase, multiway merge sort, with Y as
          the sort key.
         Step 2:Merge the sorted R and S. The following steps are done
          repeatedly:
          ◦   Find the least value y of the join attributes Y that is currently at the front of the
              blocks for R and S.
          ◦   If y does not appear at the front of the other relation, then remove the
              tuple(s) with sort key y.
          ◦   Otherwise, identify all the tuples from both relations having sort key y.
          ◦   Output all the tuples that can be formed by joining tuples from R and S with a
              common Y-value y.
          ◦   If either relation has no more unconsidered tuples in main memory.,reload the
              buffer for that relation.
      A Simple Sort-Based Join Algorithm
226


         The simple sort-join uses 5(B(R) + B(S)) disk I/0's.
         It requires B(R) ≤ M² and B(S) ≤ M² to work.
      A More Efficient Sort-Based Join
227

         If we do not have to worry about very large numbers of
          tuples with a common value for the join attribute(s), then
          we can save two disk 1/0's per block by combining the
          second phase of the sorts with the join itself
         To compute R(X, Y) ►◄ S(Y, Z) using M main-memory
          buffers
             Create sorted sublists of size M, using Y as the sort key, for both
              R and S.
             Bring the first block of each sublist into a buffer
             Repeatedly find the least Y-value y among the first available
              tuples of all the sublists. Identify all the tuples of both relations
              that have Y-value y. Output the join of all tuples from R with all
              tuples from S that share this common Y-value
      A More Efficient Sort-Based Join
228


         The number of disk I/O’s is 3(B(R) + B(S))
         It requires B(R) + B(S) ≤ M² to work
      Summary of Sort-Based Algorithms
229




             Operators          Approximate         Disk I/O
                                 M required
                γ,δ                  √B                3B
               U,∩,−           √(B(R) + B(S))     3(B(R) + B(S))
                ►◄            √(max(B(R),B(S)))   5(B(R) + B(S))
         ►◄(more efficient)    √(B(R) + B(S))     3(B(R) + B(S))
                                230




          QUERY
        EXECUTION
     15.5 TWO-PASS ALGORITHMS
By        BASED ON HASHING
Swathi Vegesna
231


      If the data is too big to store in main memory
        buffers, hashing is done
        Hash  all the tuples of the argument(s) using an
         appropriate hash key.
        For all the common operations, there is a way to
         select the hash key so all the tuples that need to
         be considered together when we perform the
         operation have the same hash value.
        This reduces the size of the operand(s) by a
         factor equal to the number of buckets.
                   Partitioning Relations by
232                         Hashing
      Algorithm:

      initialize M-1 buckets using M-1 empty buffers;
      FOR each block b of relation R DO BEGIN
           read block b into the Mth buffer;
           FOR each tuple t in b DO BEGIN
                   IF the buffer for bucket h(t) has no room for t THEN
                                BEGIN
                                             copy the buffer t o disk;
                                             initialize a new empty block in that buffer;
                                END;
                   copy t to the buffer for bucket h(t);
           END ;
      END ;
           FOR each bucket DO
                   IF the buffer for this bucket is not empty THEN
                                write the buffer to disk;
                Duplicate Elimination
233


       For the operation δ(R) hash R to M-1 Buckets.
      (Note that two copies of the same tuple t will hash to the
        same bucket)
       duplicate elimination can be done on each bucket Ri

        independently, using one-pass algorithm
       The result is the union of δ(Ri), where Ri is the portion of

        R that hashes to the ith bucket
                         Requirements
234


         Number of disk I/O's: 3*B(R)
      B(R) < M(M-1), only then the two-pass, hash-based
          algorithm will work
         In order for this to work, we need:
           hash   function h evenly distributes the tuples
            among the buckets
           each bucket Ri fits in main memory (to allow the
            one-pass algorithm)
           i.e., B(R) ≤ M2
             Grouping and Aggregation
235


         Hash all the tuples of relation R to M-1 buckets, using a
          hash function that depends only on the grouping
          attributes
          (Note: all tuples in the same group end up in the same
          bucket)
         Use the one-pass algorithm to process each bucket
          independently
         Uses 3*B(R) disk I/O's, requires B(R) ≤ M2
              Union, Intersection, and
236
                     Difference
         For binary operation we use the same hash
          function to hash tuples of both arguments.
         R U S we hash both R and S to M-1
         R ∩ S we hash both R and S to 2(M-1)
         R-S we hash both R and S to 2(M-1)
         Requires 3(B(R)+B(S)) disk I/O’s.
         Two pass hash based algorithm requires
          min(B(R)+B(S))≤ M2
                  Hash-Join Algorithm
237


         Use same hash function for both relations; hash function
          should depend only on the join attributes

         Hash R to M-1 buckets R1, R2, …, RM-1
         Hash S to M-1 buckets S1, S2, …, SM-1
         Do one-pass join of Ri and Si, for all i
         3*(B(R) + B(S)) disk I/O's; min(B(R),B(S)) ≤ M2
            Sort based Vs Hash based
238


         For binary operations, hash-based only limits
          size to min of arguments, not sum

         Sort-based can produce output in sorted order,
          which can be helpful

         Hash-based depends on buckets being of
          equal size

         Sort-based algorithms can experience reduced
          rotational latency or seek time
      Index-Based Algorithms
239




              Chapter 15
              Section 15.6
              Clustering and Nonclustering
240
                         Indexes
         Clustered Relation: Tuples are packed into few
          blocks so that it can possibly hold those
          tuples

         Clustering indexes: Indexes on attributes that
          all the tuples with a fixed value for the search
          key of this index appear on roughly, so few
          blocks can hold them
          Clustering and Nonclustering
241
                     Indexes
         A relation that isn’t clustered cannot have a
          clustering index



         A clustered relation can have nonclustering
          indexes
                Index-Based Selection
242


         For a selection σC(R), by reading all the tuples
          of R, suppose C is conditon of the form a=v,
          where a is an attribute and v =is the value.
         Then one can search the index with the value
          v and get pointers to exactly those tuples of R
          that have a- value v.
         These tuples consitutes, the result of
          σa=v(R),we will retrive them.
         For clustering index R.a:
           the number of disk I/O’s will be B(R)/V(R,a)
                Index-Based Selection
243


         The actual number may be higher for several
          reasons:
               1. index is not kept entirely in main
          memory,some disk I/O’s are needed to support
          the index lookup.
               2. Even though all the tuples with a=v might
          fit in b blocks, they spread over more blocks b+1
          blocks because the don’t start at the beginning
          of a block.
               3. may not be packed as tightly as possible into
          blocks
                          Example
244


         B(R)=1000, T(R)=20,000 number of I/O’s
          required:
         1. clustered, not index                    1000
         2. not clustered, not index         20,000
         3. If V(R,a)=100, index is clustering      10
         4. If V(R,a)=10, index is nonclustering 2,000
             Joining by Using an Index
245


         Natural join R(X, Y) S S(Y, Z)
         Recall that x,y,z are the sets of attributes,
             Number of I/O’s to get R
                    Clustered: B(R)
                    Not clustered: T(R)
             Number of I/O’s to get tuple t of S
                    Clustered: T(R)B(S)/V(S,Y)
                    Not clustered: T(R)T(S)/V(S,Y)
                           Example
246


         R(X,Y): 1000 blocks S(Y,Z)=500 blocks
          Assume 10 tuples in each block,
          so T(R)=10,000 and T(S)=5000
          V(S,Y)=100
          If R is clustered, and there is a clustering index
          on Y for S
          the number of I/O’s for R is:      1000
          the number of I/O’s for S
          is10,000*500/100=50,000
            Joins Using a Sorted Index
247


         If the index is B-tree, or any other structure from
          which we easily can extract the tuples of a
          relation in sorted order, we will get number of
          opportunites to use the index.
         Natural join R(X, Y) S (Y, Z) with index on Y for
          either R or S
         Extreme case: Zig-zag join
         Example: relation R(X,Y) and R(Y,Z) with index on
          Y for both relations
          search keys (Y-value) for R: 1,3,4,4,5,6
          search keys (Y-value) for S: 2,2,4,6,7,8
                    248




  CHAPTER 15.7
BUFFER MANAGEMENT
        What does a buffer manager do?
249



      Assume there are M of main-memory buffers needed for
         the operators on relations to store needed data.
      In practice:
      1) rarely allocated in advance to the operator
      2) the value of M may vary depending on system
         conditions.
         The central task of making main-memory buffers
         available to processes, such as queries, that act
         on the database is given to the buffer manager.
      Therefore, buffer manager is used to allow processes
         to get the memory they need, while minimizing the
         delay and unclassifiable requests.
250
       The role of the buffer manager


                                        Requests
                          Read/Writes



                Buffers




                            Buffer
                           manager




      Figure 1: The role of the buffer manager : responds to requests for
      main-memory access to disk blocks
          15.7.1 Buffer Management Architecture

251




      Two broad architectures for a buffer manager:
      1) The buffer manager controls main memory directly.
          • Relational DBMS

      2) The buffer manager allocates buffers in virtual memory,
         allowing the OS to decide how to use buffers are
         actually in main memory at any time and which are
         in the “swap space” on disk that the oerating system
         manages.
          • “main-memory” DBMS
          • “object-oriented” DBMS
                           Buffer Pool
252




      Key setting for the Buffer manager to be efficient:

      The buffer manager should limit the number of buffers in
        use so that they fit in the available main memory, i.e.
        Don’t exceed available space, it has to select a buffer
        to empty, by returning its contents to disk.

      The number of buffers is a parameter set when the DBMS is
        initialized.

      No matter which architecture of buffering is used, we simply
        assume that there is a fixed-size buffer pool, a set of
        buffers available to queries and other database actions.
253
                 Buffer Pool

                               Page Requests from Higher Levels

                            BUFFER POOL


               disk page

               free frame

              MAIN MEMORY

               DISK                                        choice of frame dictated
                                              DB           by replacement policy

         Data must be in RAM for DBMS to operate on it!
         Buffer Manager hides the fact that not all data is in RAM.
          15.7.2 Buffer Management
                  Strategies


      Buffer-replacement strategies:

      When a buffer is needed for a newly requested
       block and the buffer pool is full, what block to
       throw out the buffer pool?




254
            Buffer-replacement strategy
255
                       -- LRU
      Least-Recently Used (LRU):

      To throw out the block that has not been read or written for
         the longest time. This method requires that the buffer
         manager maintain a table indicating the last time the
         block in each buffer was accessed.
         It also requires that each database access make an
         entry in this table, so there is significant effort in
         maintating this information
      • Requires more maintenance but it is effective.
           • Update the time table for every access.
           • Least-Recently Used blocks are usually less likely to
              be accessed sooner than other blocks.
       Buffer-replacement strategy
256               -- FIFO
      First-In-First-Out (FIFO):

        The buffer that has been occupied the longest by the same
        block is emptied and used for the new block.

      • Requires less maintenance but it can make more mistakes.
         • In this approach, the buffer manager needs to know only
           the time at which the block is read from disk, and there
           is no need to modify the table when the block is
           accessed. Keep only the loading time
         • The oldest block doesn’t mean it is less likely to be
           accessed. Example: the root block of a B-tree index
      Buffer-replacement strategy
257
               – “Clock”
      The “Clock” Algorithm (“Second Chance”)
      Efficient approximation to LRU
      Think of the 8 buffers as arranged in a circle, shown as
         Figure 3

       Flag 0 and 1:
       buffers with a 0 flag are ok to sent their contents back
       to disk, i.e. ok to be replaced
       buffers with a 1 flag are not ok to be replaced
      Buffer-replacement strategy
258            – “Clock”
                                      0

                              0              1


                          0                        0

        the buffer with           0          1         Start point to
        a 0 flag will                                  search a 0 flag
        be replaced
                                      1     The flag will
                                            be set to 0

                                          By next time the hand
                                          reaches it, if the content of
                                          this buffer is not accessed,
                                          i.e. flag=0, this buffer will
                                          be replaced.
                                          That’s “Second Chance”.



          Figure 3: the clock algorithm
       Buffer-replacement strategy --
                   Clock
259




      a buffer’s flag set to 1 when:
      a block is read into a buffer
      the contents of the buffer is accessed


      a buffer’s flag set to 0 when:
      the buffer manager needs a buffer for a new block, it
      looks for the first 0 it can find, rotating clockwise. If it
      passes 1’s, it sets them to 0.
      System Control helps Buffer-
         replacement strategy
260




      System Control
      The query processor or other components of a DBMS can
      give advice to the buffer manager in order to avoid some
      of the mistakes that would occur with a strict policy such
      as LRU,FIFO or Clock.
      For example:
      A “pinned” block means it can’t be moved to disk without
      first modifying certain other blocks that point to it.
      In FIFO, use “pinned” to force root of a B-tree to remain in
      memory at all times.
      15.7.3 The Relationship Between
261
      Physical Operator Selection and
             Buffer Management
         Problem:
         Physical Operator expected certain number of
         buffers M for execution.
         However, the buffer manager may not be able
         to guarantee these M buffers are available.
      15.7.3 The Relationship Between
      Physical Operator Selection and
262

             Buffer Management
        Questions:

        Can the algorithm adapt to changes of M, the
        number of main-memory buffers available?

        When available buffers are less than M, and some
        blocks have to be put in disk instead of in memory.

        How the buffer-replacement strategy impact the
        performance (i.e. the number of additional I/O’s)?
                                   Example
263



      FOR each chunk of M-1 blocks of S DO BEGIN
      read these blocks into main-memory buffers;
      organize their tuples into a search structure whose
      search key is the common attributes of R and S;
      FOR each block b of R DO BEGIN
                  read b into main memory;
                              FOR each tuple t of b DO BEGIN
                                         find the tuples of S in main memory that
                                         join with t ;
                                         output the join of t with each of these tuples;
                              END ;
      END ;
      END ;


      Figure 15.8: The nested-loop join algorithm
      15.7.3 The Relationship Between Physical
      Operator Selection and Buffer Management
264




        The outer loop number (M-1) depends on the average
        number of buffers are available at each iteration.

        The outer loop use M-1 buffers and 1 is reserved for a block
        of R, the relation of the inner loop.

        If we pin the M-1 blocks we use for S on one iteration of the
        outer loop, we shall not lose their buffers during the round.

        Also, more buffers may become available and then we could
        keep more than one block of R in memory.

        Will these extra buffers improve the running time?
265
                                Example

      CASE1: NO

      Buffer-replacement strategy: LRU
      Buffers for R: k
      We read each block of R in order into buffers.
      By end of the iteration of the outer loop, the last k blocks of R
      are in buffers.
      However, next iteration will start from the beginning of R
      again.
      Therefore, the k buffers for R will need to be replaced.
266
                               Example

      CASE 2: YES

      Buffer-replacement strategy: LRU
      Buffers for R: k
      We read the blocks of R in an order that alternates:
      firstlast and then lastfirst.
      In this way, we save k disk I/Os on each iteration of the outer
      loop except the first iteration.
           Other Algorithms and M buffers
267




      Other Algorithms also are impact by M and the
      buffer-replacement strategy.

      Sort-based algorithm
              If M shrinks, we can change the size of a
      sublist.
              Unexpected result: too many sublists to
      allocate each sublist a buffer.

      Hash-based algorithm
               If M shrinks, we can reduce the number of
      buckets, as long as the buckets still can fit in M
      buffers.
                                268




15.8 ALGORITHMS USING
MORE THAN TWO PASSES


              Presented by: Kai Zhu
              Professor: Dr. T.Y. Lin
                      Class ID: 220
269


      Reason that we use more than two passes:

      Two passes are usually enough, however, for
      the largest relation, we use as many passes as
      necessary.
270



      MULTI-PASS SORT-BASED
      ALGORITHMS

      Suppose we have M main-memory buffers
      available to sort a relation R, which we
      assume is stored clustered.

      Then we do the following:
271




      BASIS:
      If R fits in M blocks (i.e., B(R)<=M)
      1. Read R into main memory M, which
      we can call R1,R2, … RM
      2. Sort it using any main-memory
          sorting algorithm.
      3. Write the sorted relation to disk.
272
      INDUCTION:
      If R does not fit into main memory.
      1. Partition the blocks holding R into  M
      groups, which we shall call R1, R2, R3…
      2. Recursively sort Ri for each
          i=1,2,3…M.
      3. Merge the M sorted sublists.
273


      If we are not merely sorting R, but performing
      a unary operation such as δ or γ on R.
      We can modify the above so that at the final
      merge we perform the operation on the tuples
      at the front of the sorted sublists.
      That is:
274


         For a δ, output one copy of each distinct tuple,
          and skip over copies of the tuple.
         For a γ, sort on the grouping attributes only,
          and combine the tuples with a given value of
          these grouping attributes.
275


      Conclusion
      The two pass algorithms based on sorting or
      hashing have natural recursive analogs that
      take three or more passes and will work for
      larger amounts of data.
          Perfomance of Multipass, Sort-
276
                Based Algorithms
         BASIS: If k=1,one pass is allowed, then we must
          have B(R) <= M put another way, s(M,1) = M
         Induction:Suppose k>1. the partition R into M
          pieces, each of which is sortable in k-1 passes.
         If B(R) = s(M,k) then s(M,k) /M which is the size
          of each of the M pieces of R cannot exceed s(M,
          k-1).
277




             The Query Compiler

      16.1 Parsing and Preprocessing
          Query compilation is divided
                into three steps
      Query Compiler perform the following operations :
      1. Parsing: Parse SQL query into parser tree.
      2. Logical query plan: Transforms parse tree into
        expression tree of relational algebra.
      3.Physical query plan:            Transforms logical query plan
        into physical query plan.
          . Operation performed
         . Order of operation
          . Algorithm used
          . The way in which stored data is obtained and passed from
        one
278
            operation to another.
                     Query

                     Parser


                 Preprocessor


               Logical Query plan
                   generator



                 Query rewrite

                Preferred logical
                   query plan


279
      Form a query to a logical query
                  plan
      Syntax Analysis and Parse Tree
       Parser takes the sql query and convert it to
      parse
      tree. Nodes of parse tree are :

        1. Atoms: known as Lexical elements such as
      key words, constants, parentheses, operators,
      and other schema elements.

        2. Syntactic categories: Subparts that plays a
       similarrole in a query as <Query> , <Condition>

280
      Grammar for Simple Subset of SQL
  <Query> ::= <SFW>
  <Query> ::= (<Query>)
281

  <SFW> ::= SELECT <SelList> FROM <FromList> WHERE <Condition>

  <SelList> ::= <Attribute>,<SelList>
  <SelList> ::= <Attribute>

  <FromList> ::= <Relation>, <FromList>
  <FromList> ::= <Relation>

  <Condition> ::= <Condition> AND <Condition>
  <Condition> ::= <Tuple> IN <Query>
  <Condition> ::= <Attribute> = <Attribute>
  <Condition> ::= <Attribute> LIKE <Pattern>

  <Tuple> ::= <Attribute>


            Atoms(constants), <syntactic categories>(variable),
            ::= (can be expressed/defined as)
               Query and Parse Tree
282

      StarsIn(title,year,starName)

      MovieStar(name,address,gender,birthdate)

      Query:
      Give titles of movies that have at least one star born in
        1960

      SELECT title FROM StarsIn WHERE starName IN
      (
         SELECT name FROM MovieStar WHERE
         birthdate LIKE '%1960%'
      );
283
           Another query equivalent
284




      SELECT title
      FROM StarsIn, MovieStar
      WHERE starName = name AND
             birthdate LIKE '%1960%' ;
                                Parse Tree
                                   <Query>

                                     <SFW>

 SELECT <SelList> FROM         <FromList>    WHERE       <Condition>

       <Attribute>      <RelName> , <FromList>                 AND

            title          StarsIn     <RelName>
                                         MovieStar     <Query>

                                     <Condition>                 <Condition>

           <Attribute>         =   <Attribute>       <Attribute> LIKE <Pattern>

                    starName              name          birthdate       ‘%1960’

285
                       The Preprocessor
286   Functions of Preprocessor
      . If a relation used in the query is virtual view then each use of
        this relation in the form-list must replace by parser tree that
        describe the view.
      . It is also responsible for semantic checking
         1. Checks relation uses : Every relation mentioned in FROM-
          clause must be a relation or a view in current schema.
         2. Check and resolve attribute uses: Every attribute mentioned
            in SELECT or WHERE clause must be an attribute of same
        relation in the current scope.
         3. Check types: All attributes must be of a type appropriate to
            their uses.
287

      StarsIn(title,year,starName)

      MovieStar(name,address,gender,birthdate)

      Query:
      Give titles of movies that have at least one star born in
        1960

      SELECT title FROM StarsIn WHERE starName IN
      (
         SELECT name FROM MovieStar WHERE
         birthdate LIKE '%1960%'
      );
  Preprocessing Queries Involving Views
288


       When an operand in a query is a virtual view, the
       preprocessor needs to replace the operand by a piece of
       parse tree that represents how the view is constructed from
       base table.
       Base Table: Movies( title, year, length, genre, studioname,
                                producerC#)
       View definition : CREATE VIEW ParamountMovies AS
                         SELECT title, year FROM movies
                         WHERE studioName = 'Paramount';
      Example based on view:
      SELECT title FROM ParamountMovies WHERE year = 1979;
                       289




 16.2 ALGEBRAIC LAWS
FOR IMPROVING QUERY
         PLANS
           Optimizing the Logical Query
290
                       Plan
         The translation rules converting a parse tree to a logical
          query tree not always produce the best logical query
          tree.

         The best logical query tree is often possible
          optimized by applying relational algebra laws to convert
          the original tree into a more efficient logical query tree.

      Logical query tree optimized using relational algebra
        laws is called heuristic optimization
                 Relational Algebra Laws
291



      These laws often involve the properties of:

           commutativity - operator can be applied to operands
            independent of order.

             E.g. A +   B = B + A - The “+” operator is commutative.

           associativity - operator is independent of operand
            grouping.

             E.g. A +(B + C) = (A + B) + C - The “+” operator is
              associative.
          Associative and Commutative
292
                    Operators

         The relational algebra operators of cross-product
          (×), join (⋈), union, and intersection are all
          associative and commutative.

               Commutative         Associative

               R X S=S X R         (R X S) X T = S X (R X T)

               R⋈S=S⋈R             (R ⋈ S) ⋈ T= S ⋈ (R ⋈ T)

               RS=SR             (R  S)  T = S  (R  T)

               R ∩S =S∩ R          (R ∩ S) ∩ T = S ∩ (R ∩ T)
              Laws Involving Selection
293



         Complex selections involving AND or OR can be broken into
          two or more selections: (splitting laws)

           σC1 AND C2 (R) = σC1( σC2 (R))
           σC1 OR C2 (R) = ( σC1 (R) ) S ( σC2 (R) )
         Example
             R={a,a,b,b,b,c}
             p1 satisfied by a,b, p2 satisfied by b,c
             σp1vp2 (R) = {a,a,b,b,b,c}
             σp1(R) = {a,a,b,b,b}
             σp2(R) = {b,b,b,c}
             σp1 (R) U σp2 (R) = {a,a,b,b,b,c}
               Laws Involving Selection
294
                      (Contd..)
         Selection is pushed through both arguments
          for union:
            σC(R  S) = σC(R)  σC(S)

         Selection is pushed to the first argument and
          optionally the second for difference:
            σC(R - S) = σC(R) - S
            σC(R - S) = σC(R) - σC(S)
             Laws Involving Selection
295
                    (Contd..)
         All other operators require selection to be pushed
          to only one of the arguments.
         For joins, not be able to push selection to both if
          argument does not have attributes selection
          requires.
            σC(R × S) = σC(R) × S
            σC(R ∩ S) = σC(R) ∩ S
            σC(R ⋈ S) = σC(R) ⋈ S
            σC(R ⋈D S) = σC(R) ⋈D S
           Laws Involving Selection
296
                  (Contd..)
       Example
       Consider relations R(a,b) and S(b,c) and the
        expression
       σ (a=1 OR a=3) AND b<c (R ⋈S)

       σ a=1 OR a=3(σ b<c (R ⋈S))
       σ a=1 OR a=3(R ⋈ σ b<c (S))

       σ a=1 OR a=3(R) ⋈ σ b<c (S)
            Laws Involving Projection
297




         Like selections, it is also possible to push
          projections down the logical query tree.
          However,
         projections just reduce the number of
          attributes instead of reducing the number of
          tuples , hence the performance gained is
          less than selection.
            Laws Involving Projection
298




         Laws for pushing projections with joins:

            πL(R × S) = πL(πM(R) × πN(S))
            πL(R ⋈ S) = πL((πM(R) ⋈ πN(S))
            πL(R ⋈D S) = πL((πM(R) ⋈D πN(S))
              Laws Involving Projection
299
         Laws for pushing projections with set operations.

         Projection can be performed entirely before union.

            πL(R UB S) = πL(R) UB πL(S)

         Projection can be pushed below selection as long
          as, keep all attributes needed for the selection (M
          = L  attr(C)).

            πL ( σC (R)) = πL( σC (πM(R)))
                  Laws Involving Join
300


         We already learned about

          1. Joins are commutative and associative.


          2. Selection can be distributed into joins.


          3. Projection can be distributed into joins.
            Laws Involving Duplicate
301
                  Elimination
         The duplicate elimination operator (δ) can be
          pushed through many operators.
         R has two copies of tuples t, S has one copy
          of t,
         δ (RUS)=one copy of t
         δ (R) U δ (S)=two copies of t
             Laws Involving Duplicate
302
                   Elimination
         Laws for pushing duplicate elimination
          operator (δ):

            δ(R × S) = δ(R) × δ(S)
            δ(R S) = δ(R)       δ(S)
            δ(R D S) = δ(R)      D δ(S)

            δ( σC(R) = σC(δ(R))
             Laws Involving Duplicate
303
                   Elimination
         The duplicate elimination operator (δ) can also
          be pushed through bag intersection, but not
          across union, difference, or projection in
          general.

            δ(R ∩ S) = δ(R) ∩ δ(S)
              Laws Involving Grouping
304




         The grouping operator (γ) laws depend on the aggregate
          operators used.


         There is one general rule, however, that grouping subsumes
          duplicate elimination:


             δ(γL(R)) = γL(R)

         The reason is that some aggregate functions are unaffected by
          duplicates (MIN and MAX) while other functions are (SUM,
          COUNT, and AVG).
                                                       305




                Section 16.3

Presented By:                Under the supervisiundu
                   Dr. T.Y
                      Topics to be covered
306



         From Parse to Logical Query
          Plans
           Conversion  to Relational Algebra
           Removing Subqueries From Conditions

           Improving the Logical Query Plan

           Grouping Associative/ Commutative Operators
                           Parsing
307

      • Preferred L Q ‘s Goal is to convert a text string
        containing a query into a parse tree data
        structure:
        – leaves form the text string (broken into lexical
          elements)
        – internal nodes are syntactic categories
      • Uses standard algorithmic techniques from
        compilers
        – given a grammar for the language (e.g., SQL),
          process the string and build the tree
              Review
308



                 Query
                   Parser

                                   Section 16.1
                 Preprocessor


              Logical query plan
                  generator
                                   Section 16.3

               Query Rewriter

      Preferred logical query plan
           Two steps to turn Parse tree into
309
            Preferred Logical Query Plan
         The nodes and structures of the parse tree is replaced,
          in appropriate groups, by an operator or operators of
          relational algebra.

         The relational algebra expression is considered and
          turn it into an expression that can be converted to the
          most efficient physical query plan.
                       Reference Relations
310


          StarsIn (movieTitle, movieYear, starName)
          MovieStar (name, address, gender, birthdate)
      SELECT title
      FROM StarsIn
      WHERE starName IN (
                SELECT name
                FROM MovieStar
                WHERE birthdate LIKE ‘%1960’
      );
      (Find the movies with stars born in 1960)


      Assume we have a simplified grammar for SQL.
          Conversion to Relational Algebra
311


         If we have a <Query> with a <Condition> that has
          no subqueries, then we may replace the entire
          construct – the select-list, from-list, and condition –
          by a relational-algebra expression.
      •   The complete algorithm depends on specific
          grammar, which determines forms of the parse
          trees
312



       The relational-algebra expression
         consists of the following from
                 bottom to top:
       The products of all the relations mentioned in the
        <FromList>, which Is the argument of:
       A selection σC, where C is the <Condition> expression in
        the construct being replaced, which in turn is the
        argument of:
       A projection πL , where L is the list of attributes in the
        <SelList>
                      A query : Example
313


         SELECT movieTitle
          FROM Starsin, MovieStar
          WHERE starName = name AND
                 birthdate LIKE ‘%1960’;
       SELECT movieTitle
         FROM Starsin, MovieStar
      WHERE starName = name AND
            birthdate LIKE ‘%1960’;
314
      Translation to an algebraic expression tree
315
                 The Preprocessor

      • It replaces each reference to a view with a
        parse (sub)-tree that describes the view (i.e., a
        query)
      • It does semantic checking:
        – are relations and views mentioned in the schema?
        – are attributes mentioned in the current scope?
        – are attribute types correct?



316
           Removing Subqueries From Conditions
317


         For parse trees with a <Condition> that has a
          subquery
         Intermediate operator – two argument
          selection
         It is intermediate in between the syntactic
          categories of the parse tree and the relational-
          algebra operators that apply to relations.
                Using a two-argument σ
318

                      πmovieTitle


                              σ

      StarsIn                       <Condition>


                    <Tuple>               IN          πname


                                               σ birthdate LIKE ‘%1960'
                  <Attribute>


                  starName                         MovieStar
        Converting Two-Argument
                Selection
      • To continue the conversion, we need rules for
        replacing two-argument selection with a
        relational algebra expression
      • Different rules depending on the nature of the
        sub query
      • Here is shown an example for IN operator and
        uncorrelated query (sub query computes a
        relation independent of the tuple being tested)


319
               Two argument selection with condition
                         involving IN
320


              We have, two arguments – some relation and the
               second argument is a <Condition> of the form t IN S.
                    ‘t’ – tuple composed of some attributes of R
                    ‘S’ – uncorrelated subquery


              Steps to be followed:
          1.       Replace the <Condition> by the tree that is the expression for
                   S ( δ is used to remove duplicates)
          2.       Replace the two-argument selection by a one-argument
                   selection σC.
          3.       Give σC an argument that is the product of R and S.
          Two argument selection with condition
                    involving IN
321




                σ                      σC


      R             <Condition>        X


            t          IN     S    R         δ

                                             S
      The effect
322
              What if Subquery is Correlated?


         Example is when subquery refers to the
          current tuple of the outer scope that is being
          tested
         More complicated to deal with, since
          subquery cannot be translated in isolation
         Need to incorporate external attributes in the
          translation
         Some details are in textbook

323
              Improving the Logical Query Plan
324


         Algebraic laws to improve logical query plans:
           Selections   can be pushed down the expression
            tree as far as they can go.
           Similarly, projections can be pushed down the
            tree, or new projections can be added.
           Duplicate eliminations can sometimes be
            removed, or moved to a more convenient position
            in the tree.
           Certain selections can be combined with a
            product below to turn the pair of operations into
            an equijoin.
             Grouping Associative/ Commutative
                        Operators
325


         An operator that is associative and commutative
          operators may be though of as having any number
          of operands.
         We need to reorder these operands so that the
          multiway join is executed as sequence of binary
          joins.
         Its more time consuming to execute them in the
          order suggested by parse tree.
         For each portion of subtree that consists of nodes
          with the same associative and commutative
          operator (natural join, union, and intersection), we
          group the nodes with these operators into a single
          node with many children.
      The effect of query rewriting
326




                  Π movieTitle


                Starname = name



      StarsIn                     σbirthdate LIKE ‘%1960’


                                       MovieStar
          Final step in producing logical query plan
327




                                          U            W
                                 =>           U   V


                                      R       T
                                          S
          U        U   V     W


      R        U
           S       T
                    An Example to summarize
328


         “find movies where the average age of the stars was at
          most 40 when the movie was made”
                SELECT distinct m1.movieTitle,
                       m1,movieYear
                     FROM StarsIn m1
                 WHERE m1.movieYear – 40 <= (
               SELECT AVG (birthdate)
               FROM StartsIn m2, MovieStar s
               WHERE m2.starName = s.name AND
                    m1.movieTitle = m2.movieTitle AND
                    m1.movieYear = m2.movieyear
          );
      SELECT distinct m1.movieTitle, m1,movieYear
      FROM StarsIn m1
      WHERE m1.movieYear – 40 <= (
         SELECT AVG (birthdate)
         FROM StartsIn m2, MovieStar s
         WHERE m2.starName = s.name AND
329          m1.movieTitle = m2.movieTitle AND
             m1.movieYear = m2.movieyear );
      Selections combined with a product to turn
       the pair of operations into an equijoin…
330
      Condition pushed up the expression tree…
331
      `
332
      Selections combined…
333
334
335




       16.4 FROM ESTIMATING THE COST
      OF OPERATION ►
          Estimating the Cost of Operations
336


         After getting to the logical query plan, we turn it into
          physical plan.
         Consider all the possible physical plan and estimate
          their costs – this evaluation is known as cost-based
          enumeration.
         The one with least estimated cost is the one
          selected to be passed to the query-execution
          engine.
              Selection for each physical plan
337


         We select for each physical plan:
           An order and grouping for associative-and-
            commutative operations like joins, unions, and
            intersections.
           An algorithm for each operator in the logical plan, for
            instance, deciding whether a nested-loop join or hash-
            join should be used.
           Additional operators – scanning, sorting etc. – that are
            needed for the physical plan but that were not present
            explicitly in the logical plan.
           The way in which the arguments are passed from on
            operator to the next.
           Estimating Sizes of Intermediate
338
                      Relations
           Rules for estimating the number of tuples
           in an intermediate relation:
      1.   Give accurate estimates.
      2.   Are easy to compute.
      3.   Are logically consistent; that is, the size
           estimate for an intermediate relation should
           not depend on how that relation is computed.
           Objective of estimation is to select best
           physical plan with least cost.
          Estimating the Size of a Projection
339


         The projection is different from the other
          operators,
         We should treat a classical, duplicate-
          eliminating projection as a bag-projection.
         The size of the result can be computed exactly.
         There may be reduction in size (due to
          eliminated components) or increase in size (due
          to new components created as combination of
          attributes).
           Estimating the Size of a Selection
340


         While performing selection, we may reduce the
          number of tuples but the sizes of tuple remain
          same.
         Size can be computed as:
                            S = σ A=c (R)

                            Where A is an attribute of R and c is a constant
                  The recommended estimate is
                            T(S) = T(R)/ V(R,A)




               The rule above surely holds if all values of
                attribute A occur equally often in the
                database
               Estimating Sizes of Other
341
                     Operations
         Union
         Intersection
         Difference
         Duplicate Elimination
         Grouping and Aggregation
               Estimating the Size of a Join
342


         two simplifying assumptions:
          1. Containment of Value Sets
          If R and S are two relations with attribute Y and V(R,Y)<=V(S,Y)
          then every Y-value of R will be a Y-value of S.
          2. Preservation of Value Sets
          Join a relation R with another relation S with attribute A in R and
          not in S then all distinct values of A are preserved and not lost.V(S
                                                                 
          R,A) = V(R,A)
          Under these assumptions, we estimate
            
          T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y))
          Natural Joins With Multiple Join
                     Attributes
343


                Of the T(R),T(S) pairs of tuples from R and S, the
          expected number of pairs that match in both y1 and y2 is:

          T(R)T(S)/max(V(R,y1), V(S,y1)) max(V(R, y2), V(S, y2))
      

            In general, the following rule can be used to estimate the
          size of a natural join when there are any number of attributes
          shared between the two relations.
                                         
            The estimate of the size of R       S is computed by
          multiplying T(R) by T(S) and dividing by the largest of V(R,y)
          and V(S,y) for each attribute y that is common to R and S.
              Joins of Many Relations(1)
344


         rule for estimating the size of any join
           Start with the product of the number of
          tuples in each relation.
             Then, for each attribute A appearing at
          least twice, divide by all but the least of
          V(R,A)’s.

           Can estimate the number of values that
          will remain for attribute A after the join.
              By the preservation-of-value-sets
          assumption, it is the least of these V(R,A)’s.
              Joins of Many Relations(2)
345


            Based on the two assumptions-
          containment and preservation of value sets:
         In a natural join of n relations, the estimation
          of rules, applied to each join individually,
          yield the same estimate for the size of the
          result. Moreover, this estimate is the same
          that we get if we apply the rule for the join of
          all n relations as a whole.
                Estimating Sizes for Other
346
                       Operations
         Union: the average of the sum and the larger.

      Intersection:
       approach1: take the average of the extremes,
        which is the half the smaller.

               
         approach2: intersection is an extreme case of
          the natural join, use the formula

         T(R    S) = T(R)T(S)/max(V(R,Y), V(S, Y))
               Estimating Sizes for Other
347
                      Operations
         Difference: T(R)-(1/2)*T(S)

         Duplicate Elimination: take the smaller of
          (1/2)*T(R) and the product of all the V(R, a i )’s.

         Grouping and Aggregation: upper-bound the
          number of groups by a product of V(R,A)’s,
          here attribute A ranges over only the grouping
          attributes of L.
                    348




CHOOSING AN ORDER
    FOR JOINS
      INTRODUCTION




349
          Significance of Left and Right Join
                     Arguments
         The argument relations in joins determine the
          cost of the join
         The left argument of the join is
           Called the build relation
           Assumed to be smaller

           Stored in main-memory




350
          Significance of Left and Right Join
                     Arguments
         The right argument of the join is
           Called  the probe relation
           Read a block at a time

           Its tuples are matched with those of build relation

         The join algorithms which distinguish between
          the arguments are:
           One-pass   join
           Nested-loop join

           Index join

351
                          Join Trees

         Order of arguments is important for joining two
          relations
         Left argument stored in main-memory, should
          be smaller
         With two relations only two choices of join tree
         With more than two relations, there are n!
          ways to order the arguments and therefore n!
          join trees, where n is the no. of relations


352
                          Join Trees

         Order of arguments is important for joining two
          relations
         Left argument, since stored in main-memory,
          should be smaller
         With two relations only two choices of join tree
         With more than two relations, there are n!
          ways to order the arguments and therefore n!
          join trees, where n is the no. of relations


353
                           Join Trees

         Total # of tree shapes T(n) for n relations given
          by recurrence:




         T(1) = 1
         T(2) = 1
         T(3) = 2
         T(4) = 5 … etc
354
                   Left-Deep Join Trees

         Consider 4 relations. Different ways to join
          them are as follows




355
         In fig (a) all the right children are leaves. This
          is a left-deep tree
         In fig (c) all the left children are leaves. This is
          a right-deep tree
         Fig (b) is a bushy tree
         Considering left-deep trees is advantageous
          for deciding join orders



356
                                   Join order
          Join order selection
357
      

           A1    A2      A3     ..              An
           Left deep join trees
                                                An
                                     Ai

           Dynamic       programming
             Best    plan computed for each subset of relations
                    Best plan (A1, .., An) = min cost plan of(
                                  Best plan(A2, .., An)        A1
                                  Best plan(A1, A3, .., An)      A2
                                    ….
                                  Best plan(A1, .., An-1))      An
          Dynamic Programming to Select a
              Join Order and Grouping
         Three choices to pick an order for the join of many
          relations are:
           Consider all of the relations

           Consider a subset

           Use a heuristic o pick one

         Dynamic programming is used either to consider all or a
          subset
           Construct a table of costs based on relation size

           Remember only the minimum entry which will required
             to proceed

358
      Dynamic Programming to Select a
          Join Order and Grouping




359
      Dynamic Programming to Select a
          Join Order and Grouping




360
      Dynamic Programming to Select a
          Join Order and Grouping




361
      Dynamic Programming to Select a
          Join Order and Grouping




362
          A Greedy Algorithm for Selecting a
                     Join Order
         It is expensive to use an exhaustive method
          like dynamic programming
         Better approach is to use a join-order heuristic
          for the query optimization
         Greedy algorithm is an example of that in
          which, Make one decision at a time about
          order of join and never backtrack on the
          decisions once made


363
                                   364




COMPLETING THE PHYSICAL-QUERY-
 PLAN AND CHAPTER 16 SUMMARY
           (16.7-16.8)




       CS257 Spring 2009

       Professor Tsau Lin

       Student: Suntorn Sae-Eung

       Donavon Norwood
      VI. Notation for Physical Query Plans
365    •   A logical operator σc(R) is often combined, or partially
           combined, with the access method for relation R, when R is a
           stored relation
       •   Other selections, where the argument is not a stored relation
           or an appropriate index is not available, will be replaced by
           the corresponding physical operator we have called Filter.


              Several types of operators:
            1.    Operators for leaves
            2.    (Physical) operators for Selection
            3.    (Physical) Sorts Operators
            4.    Other Relational-Algebra Operations
            In practice, each DBMS uses its own internal notation for
             physical query plan.
           Notation for Physical Query Plans
                         (cont.)
366


      1.       Operator for leaves
                A leaf operand is replaced in LQP tree
                   TableScan(R) : read all blocks
                   SortScan(R, L) : read in order according to L
                   IndexScan(R, C): scan index attribute A by
                    condition C of form Aθc.
                   IndexScan(R, A) : scan index attribute R.A.
                    This behaves like TableScan but more efficient if
                    R is not clustered.
           Notation for Physical Query Plans
                         (cont.)
367


      2.       (Physical) operators for Selection
                Logical operator σC(R) is often combined with
                 access methods.
                   If σC(R) is replaced by Filter(C), and there is no
                    index on R or an attribute on condition C
                       Use TableScan or SortScan(R, L) to access R
                   If condition C  Aθc AND D for condition D, and
                    there is an index on R.A, then we may
                       Use operator IndexScan(R, Aθc) to access R
                        and
                       Use Filter(D) in place of the selection σC(R)
           Notation for Physical Query Plans
                         (cont.)
368


      3.       (Physical) Sort Operators
                Sorting can occur any point in physical plan,
                 which use a notation SortScan(R, L).
                It is common to use an explicit operator Sort(L)
                 to sort relation that is not stored.
                Can apply at the top of physical-query-plan tree
                 if the result needs to be sorted with ORDER BY
                 clause (г).
           Notation for Physical Query Plans
                         (cont.)
369


      4.       Other Relational-Algebra Operations
                Descriptive text definitions and signs to elaborate
                  Operations performed e.g. Join or grouping.
                  Necessary parameters e.g. theta-join or list of
                   elements in a grouping.
                  A general strategy for the algorithm e.g. sort-
                   based, hashed based, or index-based.
                  A decision about number of passed to be used e.g.
                   one-pass, two-pass or multipass.
                  An anticipated number of buffers the operations
                   will required.
          Notation for Physical Query Plans
                        (cont.)
370


         Example of a physical-query-plan
              A physical-query-plan in example 16.36 for the case
               k > 5000
                  TableScan
                  Two-pass hash join
                  Materialize (double line)
                  Store operator
          Notation for Physical Query Plans
371
                        (cont.)
         Another example
             A physical-query-plan in example 16.36 for the case
              k < 49
                 TableScan
                 (2) Two-pass hash join
                 Pipelining
                 Different buffers needs
                 Store operator
          Notation for Physical Query Plans
372
                        (cont.)

         A physical-query-plan in example 16.35
              Use Index on condition y = 2 first
              Filter with the rest condition later on.
      VII. Ordering of Physical Operations
373




             The PQP is represented as a tree structure
              implied order of operations.
             Still, the order of evaluation of interior nodes
              may not always be clear.
               Iterators are used in pipeline manner
               Overlapped time of various nodes will make
                “ordering” no sense.
          Ordering of Physical Operations
                       (cont.)
374
      •   The following rules summarize the ordering of events implicit
          in a physical-query-plan tree:
      •   Following this strategy, the query optimizer can now generate
          executable code, perhaps a sequence of function calls, for the
          query.
             3 rules summarize the ordering of events in a PQP tree:
           1.     Break the tree into sub-trees at each edge that represent
                  materialization.
                     Execute one subtree at a time.
           2.     Order the execution of the subtree
                     Bottom-top
                     Left-to-right
           3.     All nodes of each sub-tree are executed simultaneously.
375



         Summary of Chapter 16
                 COMPILATION OF QUERIES

         Compilation means turning a query into a
          physical query plan, which can be
          implemented by query engine.
         Steps of query compilation :
             Parsing
             Semantic checking
             Selection of the preferred logical query plan
             Generating the best physical plan


376
                         THE PARSER

         The first step of SQL query processing.
         Generates a parse tree
         Nodes in the parse tree corresponds to the
          SQL constructs
         Similar to the compiler of a programming
          language




377
                      VIEW EXPANSION

         A very critical part of query compilation.
         Expands the view references in the query tree
          to the actual view.
         Provides opportunities for the query
          optimization.




378
                     SEMANTIC CHECKING

         Checks the semantics of a SQL query.
         Examines a parse tree.
         Checks :
             Attributes
             Relation names
             Types
         Resolves attribute references.



379
              CONVERSION TO A LOGICAL
                    QUERY PLAN

         Converts a semantically parsed tree to a
          algebraic expression.
         Conversion is straightforward but sub queries
          need to be optimized.
         Two argument selection approach can be
          used.




380
             ALGEBRAIC TRANSFORMATION
         To transform a logical query plan to an actual plan using
          algebraic transformations.
         The different laws used for this transformation :
           Commutative and associative laws

           Laws involving selection

           Pushing selection

           Laws involving projection

           Laws about joins and products

           Laws involving duplicate eliminations

           Laws involving grouping and aggregation

381
                     ESTIMATING SIZES OF
                          RELATIONS

         When selecting the best logical plan, true
          running time is taken into consideration.
         Two factors the affects the most in
          estimating the sizes of relation :
             Size of relations ( No. of tuples )
             No. of distinct values for each attribute of each
              relation
         Histograms are used by some systems.


382
                  COST BASED OPTIMIZING

         Best physical query plan represents the least
          costly plan.
         Factors that decide the cost of a query plan :
             Order and grouping operations like joins, unions
              and intersections.
             Nested loop and the hash loop joins used.
             Scanning and sorting operations.
             Storing intermediate results.


383
                      PLAN ENUMERATION
                          STRATEGIES

         Common approaches for searching the space
          for best physical plan .
             Dynamic programming : Tabularizing the best
              plan for each sub expression
             Selinger style programming : sort-order the
              results as a part of table
             Greedy approaches : Making a series of locally
              optimal decisions
             Branch-and-bound : Starts with enumerating the
              worst plans and reach the best plan
384
                 LEFT-DEEP JOIN TREES
         Left – Deep Join Trees are the binary trees
          with a single spine down the left edge and with
          leaves as right children.
         This strategy reduces the number of plans to
          be considered for the best physical plan.
         Restrict the search to Left – Deep Join Trees
          when picking a grouping and order for the join
          of several relations.


385
      PHYSICAL PLANS FOR SELECTION

         Breaking a selection into an index-scan of
          relation, followed by a filter operation.
         The filter then examines the tuples retrieved by
          the index-scan.
         Allows only those to pass which meet the
          portions of selection condition.




386
                   PIPELINING VERSUS
                      MATERIALIZING
         This flow of data between the operators can be
          controlled to implement “ Pipelining “ .
         The intermediate results should be removed from main
          memory to save space for other operators.
         This techniques can implemented using “
          materialization “ .
         Both the pipelining and the materialization should be
          considered by the physical query plan generator.
         An operator always consumes the result of other
          operator and is passed through the main memory.

387
                               388




 QUERY COMPILER: 16.7
COMPLETING THE PHYSICAL
     QUERY-PLAN




   CS257 Spring 2009

   Professor Tsau Lin

   Student: Suntorn Sae-Eung

   ID: 212
      Before complete Physical-Query-
                    Plan
389


         A query previously has been
           Parsed  and Preprocessed (16.1)
           Converted to Logical Query Plans (16.3)

           Estimated the Costs of Operations (16.4)

           Determined costs by Cost-Based Plan Selection
            (16.5)
           Weighed costs of join operations by choosing an
            Order for Joins
  16.7 Completing the Physical-Query-
                  Plan
390


          3 topics related to turning LP into a
           complete physical plan
          1.   Choosing of physical implementations such as
               Selection and Join methods
          2.   Decisions regarding to intermediate results
               (Materialized or Pipelined)
          3.   Notation for physical-query-plan operators
          I. Choosing a Selection Method (A)
391


         Algorithms for each selection operators
          1. Can we use an created index on an attribute?
           If yes, index-scan. Otherwise table-scan)

          2. After retrieve all condition-satisfied tuples in (1),
            then filter them with the rest selection conditions
              Choosing a Selection Method(A)
                          (cont.)
392

      •   To pick algorithms for each selection operator.
      •   Assuming there are no multidimensional indexes on several of the attributes, then each
          physical plan uses some number of attributes that each:


             Recall  Cost of query = # disk I/O’s
             How costs for various plans are estimated from σC(R) operation
      1. Cost of table-scan algorithm
                  a)    B(R)              if R is clustered
                  b)    T(R)              if R is not clustered
      2. Cost of a plan picking an equality term (e.g. a = 10) w/ index-scan
                  a)    B(R) / V(R, a)    clustering index
                  b)    T(R) / V(R, a)    nonclustering index
      3. Cost of a plan picking an inequality term (e.g. b < 20) w/ index-scan
                  a)    B(R) / 3                      clustering index
                  b)    T(R) / 3                      nonclustering index
                           Example
393



      Selection: σx=1 AND y=2 AND z<5 (R)
      - Where paremeters of R(x, y, z) are :
        T(R)=5000,          B(R)=200,
        V(R,x)=100, and     V(R, y)=500

      -   Relation R is clustered
      -   x, y have nonclustering indexes, only index on z
          is clustering.
                      Example (cont.)
394


      Selection options:
      1.   Table-scan  filter x, y, z. Cost is B(R) = 200 since
           R is clustered.
      2.   Use index on x =1  filter on y, z. Cost is 50 since
           T(R) / V(R, x) is (5000/100) = 50 tuples, index is
           not clustering.
      3.   Use index on y =2  filter on x, z. Cost is 10
           since T(R) / V(R, y) is (5000/500) = 10 tuples
           using nonclustering index.
      4.   Index-scan on clustering index w/ z < 5  filter x ,y.
           Cost is about B(R)/3 = 67
                           Example (cont.)
395

           Costs
            option 1 = 200
            option 2 = 50
            option 3 = 10         
            option 4 = 67
      The lowest Cost is option 3.
           Therefore, the preferred physical plan
         1.    retrieves all tuples with y = 2
         2.    then filters for the rest two conditions (x, z).
      •   We decide among the physical plans with which to implement a
          given election by estimating the cost of reading data for each
          possible option.
      •   We shall count only the cost of accessing the data blocks, not
          the index blocks.
            II. Choosing a Join Method
396


         Determine costs associated with each join
          algorithms:
          1. One approach is to call One-pass join, and nested-
            loop join devotes enough buffer to joining
          2. An alternative is to choose Sort-join is preferred
            when attributes are pre-sorted or two or more join on
            the same attribute such as
                       (R(a, b)   S(a, c)) T(a, d)
           - where sorting R and S on a will produce result of R
           S to be sorted on a and used directly in next join
      Choosing a Join Method (cont.)
397


      3. Index-join for a join with high chance of using
        index created on the join attribute such as R(a,
       b)     S(b, c)
      4. Hashing join is the best choice for unsorted or
        non-indexing relations which needs multipass
        join.
          III. Pipelining Versus Materialization
398


          Materialization (naïve way) to execute a query plan is to order
           the operations appropriately and store the results of each
           operation on disk until it is needed by another operation.

          Pipelining (more efficient way)
      •    More subtle way to execute a query plan is to interleave the
           execution of several operations. The tuples produced by one
           operation are passed directly to the operation that uses it,
           without ever storing the intermediate tuples on disk. This
           approach in called pipelining.
      •    Since pipelining saves disk I/O’s, where is an obvious
           advantage to pipelining, but there is a corresponding
           disadvantage.
           IV. Pipelining Unary Operations
399


         Unary = a-tuple-at-a-time or full relation
         selection and projection are the best
          candidates for pipelining.

                      In buf    Unary      Out buf
                               operation

          R
                   In buf       Unary      Out buf
                               operation


               M-1 buffers
      Pipelining Unary Operations (cont.)
400


         Pipelining Unary Operations are implemented
          by iterators
            V. Pipelining Binary Operations
401


         We use one buffer to pass the results to its consumer, one block at
          a time.
         The number of other buffers need to compute the results and to
          consume the results varies, depending on the size of the result and
          the sizes of other relations involved in the query.


         Binary operations : ,  , - ,   ,x
         The results of binary operations can also be pipelined.
         Use one buffer to pass result to its consumer, one block at a time.
         The extended example shows tradeoffs and opportunities
                           Example
402


         Consider physical query plan for the
          expression
                  (R(w, x) S(x, y)) U(y, z)
         Assumption
           R occupies 5,000 blocks, S and U each 10,000
            blocks.
           The intermediate result R S occupies k blocks for
            some k.
           Both joins will be implemented as hash-joins, either
            one-pass or two-pass depending on k
           There are 101 buffers available.
                     Example (cont.)
403


        First consider join
      R S, neither relations
      fits in buffers
       Needs two-pass

      hash-join to partition
      R into 100 buckets
      (maximum possible) each bucket has 50 blocks
       The 2nd pass hash-join uses 51 buffers,
         leaving the rest 50 buffers for joining result of
         R S with U.
                       Example (cont.)
404


          Case 1: suppose k  49, the result of          R
           S occupies at most 49 blocks.
          Steps
          1.   Pipeline in R S into 49 buffers
          2.   Organize them for lookup as a hash table
          3.   Use one buffer left to read each block of U in
               turn
          4.   Execute the second join as one-pass join.
                     Example (cont.)
405


         The total number of
          I/O’s is 55,000
           45,000  for two-pass
            hash join of R and S
           10,000 to read U for
            one-pass hash join of
            (R S) U.
                       Example (cont.)
406


          Case 2: suppose k > 49 but < 5,000, we can
           still pipeline, but need another strategy which
           intermediate results join with U in a 50-bucket,
           two-pass hash-join. Steps are:
      1.   Before start on R S, we hash U into 50 buckets of
           200 blocks each.
      2.   Perform two-pass hash join of R and U using 51
           buffers as case 1, and placing results in 50 remaining
           buffers to form 50 buckets for the join of R S with U.
      3.   Finally, join R S with U bucket by bucket.
                        Example (cont.)
407


         The number of disk I/O’s is:
           20,000   to read U and write its tuples into buckets
           45,000 for two-pass hash-join R      S
           k to write out the buckets of R    S
           k+10,000 to read the buckets of R       S and U in
            the final join
         The total cost is 75,000+2k.
                             Example (cont.)
408


         Compare Increasing I/O’s between case 1 and
          case 2
          k    49 (case 1)
             Disk   I/O’s is 55,000
          k   > 50  5000 (case 2)
             k=50 , I/O’s is 75,000+(2*50) = 75,100
             k=51 , I/O’s is 75,000+(2*51) = 75,102
             k=52 , I/O’s is 75,000+(2*52) = 75,104

      Notice: I/O’s discretely grows as k increases from 49 50.
                      Example (cont.)
409


          Case 3: k > 5,000, we cannot perform two-
           pass join in 50 buffers available if result of
           R S is pipelined. Steps are
      1.   Compute R S using two-pass join and store
           the result on disk.
      2.   Join result on (1) with U, using two-pass join.
                        Example (cont.)
410


         The number of disk I/O’s is:
           45,000   for two-pass hash-join R and S
          k to store R S on disk
           30,000 + k for two-pass join of U in R    S
         The total cost is 75,000+4k.
411
                      Example (cont.)
         In summary, costs of physical plan as function
          of R S size.
                                    412




          CHAPTER 18
      CONCURRENCY CONTROL
            18.1 – 18.2
Chiu Luk
CS257 Database Systems Principles
Spring 2009
               CONCURRENCY CONTROL
413

         Concurrency control in database management systems
          (DBMS) ensures that database transactions are
          performed concurrently without the concurrency violating
          the data integrity of a database.
         Executed transactions should follow the ACID rules. The
          DBMS must guarantee that only serializable (unless
          Serializability is intentionally relaxed), recoverable
          schedules are generated.
         It also guarantees that no effect of committed transactions
          is lost, and no effect of aborted (rolled back) transactions
          remains in the related database.
             TRANSACTION ACID RULES
414



      Atomicity - The outside world the transaction appears to be
      indivisible, atomic.

      Consistency - Every transaction must leave the database in a
      consistent state.

      Isolation - Transactions cannot interfere with each other. Providing
      isolation is the main goal of concurrency control.

      Durability - Successful transactions must persist through crashes.
           SERIAL AND SERIALIZABLE
415               SCHEDULES
       In the field of databases, a schedule is a list of actions, (i.e. reading, writing,
      aborting, committing), from a set of transactions.
       In this example, Schedule D is the set of 3 transactions T1, T2, T3.
      The schedule describes the actions of the transactions as seen by the DBMS.
       T1 Reads and writes to object X,
      and then T2 Reads and writes to object Y, and finally T3 Reads and writes to object
      Z.
      The actions of the 3 transactions are not interleaved, it is said to be serial
      schedules
          Serial and Serializable Schedules
416


         A schedule that is equivalent to a serial schedule has the serializability property.
         In schedule E, the order in which the actions of the transactions are executed is not the
          same as in D, but in the end, E gives the same result as D.
        Serial Schedule TI precedes T2
417                                            A     B
      T1                    T2                 25    25
      Read(A); A  A+100
      Write(A);                                125
      Read(B); B  B+100;
      Write(B);                                      125
                            Read(A);A  A2;
                            Write(A);          250
                            Read(B);B  B2;
                                                     250
                            Write(B);
                                               250   250
           Serial Schedule T2 precedes Tl
418                                            A     B
      T1                    T2                 25    25
                            Read(A);A  A2;
                            Write(A);          50
                            Read(B);B  B2;
                            Write(B);                50
      Read(A); A  A+100
                                               150
      Write(A);
      Read(B); B  B+100;
                                                     150
      Write(B);
                                               150   150
                 serializable, but not serial,
                          schedule
419                                                                A     B
      T1                                   T2                      25    25
      Read(A); A  A+100
      Write(A);                                                    125
                                           Read(A);A  A2;
                                           Write(A);               250
      Read(B); B  B+100;
      Write(B);                                                          125
                                           Read(B);B  B2;
                                                                         250
                                           Write(B);
                                                                   250   250
      r1(A); w1 (A): r2(A); w2(A); r1 (B); w1 (B); r2(B); w2(B);
             nonserializable schedule
420                                            A     B
      T1                    T2                 25    25
      Read(A); A  A+100
      Write(A);                                125
                            Read(A);A  A2;
                            Write(A);          250
                            Read(B);B  B2;
                                                     50
                            Write(B);
      Read(B); B  B+100;
                                                     150
      Write(B);
                                               250   150
          schedule that is serializable only
          because of the detailed behavior
421
                of the transactions     A    B
      T1                                                        T2’                         25    25
      Read(A); A  A+100
      Write(A);                                                                             125
                                                                Read(A);A  A1;
                                                                Write(A);                   125
                                                                Read(B);B  B1;
                                                                                                  25
                                                                Write(B);
      Read(B); B  B+100;
                                                                                                  125
      Write(B);
                                                                                            125   125

         regardless of the consistent initial state: the final state will be consistent.
               Non-Conflicting Actions
422




      Two actions are non-conflicting, whenever they
      occur consecutively in a schedule, swapping
      does not affect the final state produced by the
      schedule. Otherwise, they are conflicting.
      Conflicting Actions: General Rules
423


         Two actions of the same transaction conflict: if
           r1(A)w1(B), is that the order of actions of a
            single transaction are fixed and may not be
            reordered
         Two actions over the same database element
          conflict, if one of them is a write
           r1(A)w2(A)
           w1(A) w2(A)
               Conflict actions
424


             Two or more actions are said to be in conflict if:
                 The actions belong to different transactions.
                 At least one of the actions is a write operation.
                 The actions access the same object (read or write).
             The following set of actions is conflicting:
                 T1:R(X), T2:W(X), T3:W(X)
             While the following sets of actions are not:
                 T1:R(X), T2:R(X), T3:R(X)
                 T1:R(X), T2:W(Y), T3:R(X)
             Conflict Serializable
425


       We may take any schedule and make as many
      nonconflicting swaps as we wish.

       With the goal of turning the schedule into a serial
      schedule.

       If we can do so, then the original schedule is
      serializable, because its effect on the database
      state remains the same as we perform each of the
      nonconflicting swaps.
                            Conflict Serializable
426
         A schedule is said to be conflict-serializable when the schedule is conflict-equivalent to
          one or more serial schedules.
         Another definition for conflict-serializability is that a schedule is conflict-serializable if
          and only if there exists an acyclic precedence graph/serializability graph for the
          schedule.
         Which is conflict-equivalent to the serial schedule <T1,T2>, but not <T2,T1>.
             Conflict equivalent / conflict-
                      serializable
427


         Let Ai and Aj are consecutive non-conflicting actions
          that belongs to different transactions. We can swap Ai
          and Aj without changing the result.

         Two schedules are conflict equivalent if they can be
          turned one into the other by a sequence of non-
          conflicting swaps of adjacent actions.

         We shall call a schedule conflict-serializable if it is
          conflict-equivalent to a serial schedule.
            Conflict equivalent / conflict-
                     serializable
428


         Note that conflict-serializable is a sufficient
          condition for serilizability. i. e. a conflict-
          serilizable schedule is a serializable
          schedule.
         conflict-serializability is not required for a
          schedule to be serializable schdule.
       conflict-serializable
429




      T1          T2
      R(A)
      W(A)
                  R(A)
      R(B)
                  W(A)
      W(B)
                  R(B)
                  W(B)
430
       conflict-serializable

      T1          T2
      R(A)
      W(A)
      R(B)
                 R(A)
                 W(A)
      W(B)
                 R(B)
                 W(B)
431
       conflict-serializable

      T1          T2
      R(A)
      W(A)
      R(A)
                 R(B)
      W(B)
                 W(A)
                 R(B)
                 W(B)
 conflict-serializable

T1          T2
R(A)
W(A)                      Serial
                         Schedule
R(A)
W(B)
           R(B)
           W(A)
           R(B)
           W(B)
                         Locks
       Scheduler that uses a lock table to help perform
        its job.
       It works like as follows :
        It is the responsibility of the scheduler is to take a
         request from transaction .or allow them to operate on the
         database or block them.
        Scheduler checks in the lock table
        Generates a serializable schedule of actions.




                                          433
        Consistency of transactions

     Actions and locks must relate each other
       Transactions can only read & write only if has a
        lock and has not released the lock on that
        element and hasn’t released the lock.
       Unlocking an element is compulsory.



     Legality of schedules
       No two transactions may have locked the
       same element without one having the first
       released the lock.
434
                Locking scheduler
   Grants lock requests only if it is in a legal
    schedule.
   If a request is not granted m the requesting
    transaction is delayed; it waits until the
    scheduler grants its request at a later time.
   Lock table stores the information about current
    locks on the elements.




                                    435
    The locking scheduler (contd.)

   A legal schedule of consistent transactions but
    unfortunately it is not a serializable.




                                     436
        Locking schedule (contd.)

     The locking scheduler delays requests that
      would result in an illegal schedule.




437
                  Two-phase locking

     Guarantees a legal schedule of consistent
      transactions is conflict-serializable.
     All lock requests proceed all unlock requests.
     The growing phase:
       Obtain   all the locks and no unlocks allowed.
     The shrinking phase:
       Release   all the locks and no locks allowed.



438
        Working of Two-Phase locking

     Can guarantee that a legal schedule of
      consistent tran serializability.
     Two protocols for 2PL:
       Strict  two phase locking : Transaction holds all
        its exclusive locks till commit / abort.
       Rigorous two phase locking : Transaction holds
        all locks till commit / abort.
     Possible to find a transaction Tj that has a 2PL
      and a schedule S for Ti ( non 2PL ) and Tj that
      is not conflict serializable.
439
                        Failure of 2PL.

     2PL fails to provide security against deadlocks.




440
  Locking Systems with Several Lock
               Modes
     Locking Scheme
        Shared/Read Lock ( For Reading)

        Exclusive/Write Lock( For Writing)

     Compatibility Matrices
     Upgrading Locks: where transactions are allowed to
      take a shared lock and “upgrade” it to an exclusive
      lock” or “write lock”
     Update Locks
     Increment Locks: which treat specially write actions
      that increments a database element

441
            Shared & Exclusive Locks
     Need stronger lock for writing then reading.
     Consistency of Transactions
       Cannot write without Exclusive Lock
       Cannot read without holding some lock

     This basically works on 2 principles
       A read action can only proceed a shared or an
        exclusive lock
       A write lock can only proceed a exclusice lock

     All locks need to be unlocked before commit

442
      Shared and exclusive locks (cont.)
     Two-phase locking of transactions
         Must precede unlocking
     Legality of Schedules
         An element may be locked exclusively by one
          transaction or by several in shared mode, but not both.




443
               Compatibility Matrices

     C. M. is a convenient way to describe lock-
      management policies.
     Has a row and column for each lock mode.
       Rows correspond to a lock held on an element by
        another transaction
       Columns correspond to mode of lock requested.

       Example :

                                   LOCK REQUESTED
                                   S        X
               LOCK       S       YES       NO
               HOLD       X       NO        NO
444
                     Upgrading Locks

     Suppose a transaction wants to read as well
      as write :
       Itaquires a shared lock on the element
       Performs the calculations on the element

       And when its ready to write, It is granted a
        exclusive lock.
     Transactions with unpredicted read write locks
      can use UPGRADING LOCKS.


445
            Upgrading locks (cont.)

     Indiscriminating use of upgrading produces a
      deadlock.
     Example : Both the transactions want to
      upgrade on the same element




446
                     Update locks

     Solves the deadlock occurring in upgrade lock
      method.
     An update lock gives transaction only the
      privilege to read X, not to write X
     A transaction in an update lock can read but cant
      write.
     Update lock can later be converted to exclusive
      lock.
     An update lock can only be given if the element has
      shared locks.
447
                   Update locks (cont.)

     An update lock is like a shared lock when you are requesting
      it and is like a exclusive lock when you have it.
     But once there is an update lock on X we prevent
      additional locks of any kind shared, update, or exclusive
      from being taken on x..
     Notation: uli (X)


     Compatibility matrix :
                               S          X           U
                  S            YES       NO          YES
                  X            NO        NO          NO
                  U            NO        NO          NO
448
                 Increment Locks

     Used for incrementing & decrementing stored
      values.
     E.g. - Transfer money from one bank to
      another, Ticket selling transactions in which
      number seats are decremented after each
      transaction.




449
               Increment lock (cont.)

     A increment lock does not enable read or write locks on
      element.
     Any number of transaction can hold increment lock on
      element
     Shared and exclusive locks can not be granted if an
      increment lock is granted on element

                           S          X           I
                S         YES         NO         NO
                X         NO          NO         NO
                I         NO          NO        YES

450
                                          451




18.4 LOCKING SYSTEMS WITH
   SEVERAL LOCK MODES


                CS257 Spring/2009
                Professor: Tsau Lin
              Student: Suntorn Sae-Eung
                       ID: 212
           18.4 LOCKING SYSTEMS WITH
              SEVERAL LOCK MODES

   In 18.3, if a transaction must lock a database element (X)
    either reads or writes,
      No reason why several transactions could not read X at the
       same time, as long as none write X
   Introduce locking schemes
      Shared/Read Lock ( For Reading)

      Exclusive/Write Lock( For Writing)

   Upgrading Locks: where transactions are allowed to take a
    shared lock and “upgrade” it to an exclusive lock” or “write
    lock”
   Update Locks
   Increment Locks: which treat specially write actions that
452 increments a database element
       18.4.1 SHARED & EXCLUSIVE LOCKS


     Transaction Consistency
       Cannot write without Exclusive Lock
       Cannot read without holding some lock

         Consider lock for writing is “stronger” than for reading
     This basically works on 2 principles
      1. A read action can only proceed a shared or an
        exclusive lock
      2. A write lock can only proceed a exclusive lock
     All locks need to be unlocked before commit
453
       18.4.1 SHARED & EXCLUSIVE LOCKS
                    (CONT.)

     Two-phase locking (2PL) of transactions

      Ti      Lock  R/W  Unlock


     Notation:
      sli (X)– Ti requests shared lock on DB element X
      xli (X)– Ti requests exclusive lock on DB element X
      ui (X)– Ti relinquishes whatever lock on X


454
          18.4.1 SHARED & EXCLUSIVE LOCKS
                       (CONT.)
455



         Legality of Schedules
             An element may be locked by: one write transaction or
              by several read transactions shared mode, but not both
        18.4.2 COMPATIBILITY MATRICES

     A convenient way to describe lock-
      management policies
       Rows correspond to a lock held on an element by
        another transaction
       Columns correspond to mode of lock requested.

       Example :
                                Lock requested
                               S             X
           Lock in    S       YES           NO
            hold      X       NO            NO



456
               18.4.3 UPGRADING LOCKS

   A transaction (T) taking a shared lock is friendly toward other
    transaction.
   When T wants to read and write a new value X,
    1. T takes a shared lock on X.
    2. performs operations on X (may spend long time)
    3. When T is ready to write a new value, “Upgrade” shared lock to
       exclusive lock on X.
   Suppose a transaction wants to read as well as write :
     It aquires a shared lock on the element

     Performs the calculations on the element

     And when its ready to write, It is granted a exclusive lock.

   Transactions with unpredicted read write locks can use
457 UPGRADING LOCKS.
              18.4.3 UPGRADING LOCKS (CONT.)
458


         Observe the example




                                                             ‘B’ is released
          T1 retry and
           succeed


         T1 cannot take an exclusive lock on B until all locks on B are
          released.
      18.4.3 Upgrading Locks (cont.)

     Upgrading can simply cause a “Deadlock”.
           the transactions want to upgrade on the
       Both
       same element




               Both transactions will wait forever !!

459
                   18.4.4 Update locks
     The third lock mode resolving the deadlock problem, which
      rules are
        Only “Update lock” can be upgraded to a write (exclusive)
         lock later.
        An “Update lock” is allowed to grant on X when there are
         already shared locks on X.
        Once there is an “Update lock,” it prevents additional any
         kinds of lock, and later changes to a write (exclusive) lock.
        An update lock gives transaction only the privilege to
         read X, not to write X

     Notation: uli (X)

460
             18.4.4 Update locks (cont.)
461



         Example
        18.4.4 Update locks (cont.)

• Compatibility matrix (asymmetric)


                           Lock requested
                      S          X          U
                S    YES        NO          YES
      Lock in   X     NO        NO          NO
       hold
                U     NO        NO          NO




462
                  18.4.5 Increment Locks

     A useful lock for transactions which
      increase/decrease value.
      e.g. money transfer between two bank accounts.
     If 2 transactions (T1, T2) add constants to the
      same database element (X),
         It doesn’t matter which goes first, but no reads are
          allowed in between transaction processing
     Let see on following exhibits
     Shared and exclusive locks can not be
      granted if an increment lock is granted on
463   element
           18.4.5 Increment Locks (cont.)
464




      CASE 1


               T1: INC (A,2)   A=7    T2: INC (A,10)



         A=5                                     A=17


           T2: INC (A,10)             T1: INC (A,2)
                               A=15

      CASE 2
                 18.4.5 Increment Locks (cont.)
465


         What if

                 T1: INC (A,2)    A=7      T2: INC (A,10)

           A=5          A=5                                 A=15




                 T2: INC (A,10)   A=15
                                         T1: INC (A,2)             A != 17
                        A=5
           A=5                                              A=7
                18.4.5 Increment Locks (cont.)
466


         INC (A, c) –
           Increment  action of writing on database element
            A, which is an atomic execution consisting of
            1. READ(A,t);
            2. t = t+c;
            3. WRITE(A,t);
         Notation:
           ili (X)–action of Ti requesting an increment lock on X
           inci (X)– action of Ti increments X by some constant;
            don’t care about the value of the constant.
             18.4.5 Increment Locks (cont.)
467


         Example
          18.4.5 Increment Locks (cont.)

 • Compatibility matrix

                           Lock requested
                      S          X           I
                S    YES        NO          NO
      Lock in   X     NO        NO          NO
       hold
                I     NO        NO          YES




468
      Concurrency Control
            Section 18.5




469
                      Overview

     Assume knowledge of:
       Lock
       Two phase lock

       Lock modes: shared, exclusive, update

     A simple scheduler architecture based on
      following principle :
       Insert lock actions into the stream of reads,
        writes, and other actions
       Release locks when the transaction manager tells
        it that the transaction will commit or abort
470
      Scheduler That Inserts Lock
      Actions into the transactions
471         request stream
     Scheduler That Inserts Lock
              Actions
    If transaction is delayed, waiting for a lock,
       Scheduler performs following actions
     Part I: Takes the stream of requests generated

       by the transaction & insert appropriate lock
       modes to db operations (read, write, or update)
     Part II: Take actions (a lock or db operation) from

       Part I and executes it.
     Determine the transaction (T) that action belongs
       and status of T (delayed or not). If T is not
       delayed then
    1. Database access action is transmitted to the

472    database and executed
        Scheduler That Inserts Lock
                 Actions
2.      If lock action is received by PartII, it checks the L
        Table whether lock can be granted or not
      i> Granted, the L Table is modified to include granted
        lock
      ii>Not G. then update L Table about requested lock
        then PartII delays transaction T
3.      When a T = commits or aborts, PartI is notified by the
        transaction manager and releases all locks.
             If any transactions are waiting for locks PartI
        notifies PartII.
3.      Part II when notified about the lock on some DB
473
        element, determines next transaction T’ to get lock to
              The Lock Table

  A relation that associates database elements
   with locking information about that element
  Implemented with a hash table using

   database elements as the hash key
  Size is proportional to the number of lock
   elements only, not to the size of the entire
 DBdatabase
 element A

                     Lock
                     informatio
                     n for A
474
      Lock Table Entries Structure
                       Some Sort of
                       information found in
                       Lock Table entry
                       1>Group modes
                       -S: only shared locks are
                       held
                       -X: one exclusive lock and
                       no other locks
                       - U: one update lock and
                       one or more shared locks
                       2>wait : one transaction
                       waiting for a lock on A
                       3>A list : T currently
475                    hold locks on A or
                       Waiting for lock on A
          Handling Lock Requests

     Suppose transaction T requests a lock on A
     If there is no lock table entry for A, then there
      are no locks on A, so create the entry and
      grant the lock request
     If the lock table entry for A exists, use the
      group mode to guide the decision about the
      lock request



476
        Handling Lock Requests

  If group mode is U (update) or X (exclusive)
 No other lock can be granted
     Deny the lock request by T

     Place an entry on the list saying T requests a lock

     And Wait? = ‘yes’

  If group mode is S (shared)

 Another shared or update lock can be granted
     Grant request for an S or U lock

     Create entry for T on the list with Wait? = ‘no’

     Change group mode to U if the new lock is an update
      lock
477
        Handling Unlock Requests

     Now suppose transaction T unlocks A
      Delete T’s entry on the list for A
      If T’s lock is not the same as the group mode,
       no need to change group mode
      Otherwise check entire list for new group
       mode
          S: GM(S) or nothing
          U: GM(S) or nothing
          X: nothing



478
            Handling Unlock Requests

 If the value of waiting is “yes" need to grant one or more locks using
      following approaches
       First-Come-First-Served:
        Grant the lock to the longest waiting request.
        No starvation (waiting forever for lock)
       Priority to Shared Locks:
        Grant all S locks waiting, then one U lock.
        Grant X lock if no others waiting
       Priority to Upgrading:
        If there is a U lock waiting to upgrade to an X lock, grant that first.




479
 Managing Hierarchies of Database Elements (18.6)

         Concurrency Control




480
              Managing Hierarchies of
481
                Database Elements

         We shall focus on two problems that
          arise with locks when there is a tree
          structure to the data are:
         When the tree structure is a hierarchy of
          lockable elements
             Determine how locks are granted for both large
              elements (relations) and smaller elements (blocks
              containing tuples or individual tuples)
         When the data itself is organized as a tree
          (B-tree indexes)
             This will be discussed in the next section
          Locks with Multiple Granularity
482


         A database element can be a relation, block or
          a tuple
         Different systems use different database
          elements to determine the size of the lock
         Thus some may require small database
          elements such as tuples or blocks and
          relations
      Example of Multiple Granularity
483
                  Locks
         Consider a database for a bank
           Choosing relations as database elements means
            we would have one lock for an entire relation
           If we were dealing with a relation having account
            balances, this kind of lock would be very inflexible
            and thus provide very little concurrency
           Why? because balance transactions require
            exclusive locks and this would mean only one
            transaction occurs for one account at any time
           But as each account is independent of others we
            could perform transactions on different accounts
            simultaneously
                          …(contd.)
484



             Thus it makes sense to have block element for the
              lock so that two accounts on different blocks can
              be updated simultaneously
         Another example is that of a document
           With similar arguments as above, we see that it is
            better to have large element (a complete
            document) as the lock in this case
           These documents may be edited from time to
            time, but most transactions will retrieve whole
            documents .
           The sensible choice of database element is a
            complete document.
               Warning (Intention) Locks
485


         The solution to the problem of managing
          locks at different granularities involves a
          new kind of lick called a “warning”.
         The rules for managing locks on hierarchy
          of database elements constitute the
          warning protocol
         These are required to manage locks at
          different granularities
             In the bank example, if the a shared lock is obtained
              for the relation while there are exclusive locks on
              individual tuples, unserializable behavior occurs
      Database Elements Organized in
486
                Hierarchy
              Rules of Warning Protocol
487


         The warning locks will be denoted as I.
         These involve both ordinary (S and X) and
          warning (IS and IX) locks
         The rules are:
           Begin at the root of hierarchy
           Request the S/X lock if we are at the desired element
           If the desired element id further down the hierarchy,
            place a warning lock (IS if S and IX if X)
           When the warning lock is granted, we proceed to the
            child node and repeat the above steps until desired
            node is reached
       Compatibility Matrix for Shared,
488
        Exclusive and Intention Locks

                    IS       IX      S       X
            IS      Yes      Yes     Yes     No
            IX      Yes      Yes     No      No
            S       Yes      No      Yes     No
            X       No       No      No      No

      • The above matrix applies only to locks held by
      other transactions
      Group Modes of Intention Locks
489


         An element can request S and IX locks at the
          same time if they are in the same transaction
          (to read entire element and then modify sub
          elements)
         This can be considered as another lock mode,
          SIX, having restrictions of both the locks i.e.
          No for all except IS
         SIX serves as the group mode
                              Example
490


         Consider a transaction T1 as follows
           Select * from table where attribute1 = ‘abc’
           Here, IS lock is first acquired on the entire relation;
            then moving to individual tuples (attribute = ‘abc’), S
            lock in acquired on each of them
         Consider another transaction T2
           Update table set attribute2 = ‘def’ where attribute1 =
            ‘ghi’
           Here, it requires an IX lock on relation and since T1’s
            IS lock is compatible, IX is granted
491


       On  reaching the desired tuple (ghi), as there is no
        lock, it gets X too
       If T2 was updating the same tuple as T1, it would
        have to wait until T1 released its S lock
              Phantoms and Handling
492
                Insertions Correctly
         This arises when transactions create new sub
          elements of lockable elements
         Since we can lock only existing elements the
          new elements fail to be locked
         The problem created in this situation is
          explained in the following example
                             Example
493


         Consider a transaction T3
           Select   sum(length) from table where attribute1 =
          ‘abc’
         This calculates the total length of all tuples having
          attribute1
         Thus, T3 acquires IS for relation and S for
          targeted tuples
       Now, if another transaction T4 inserts a new tuple
        having attribute1 = ‘abc’, the result of T3 becomes
        incorrect
                    Example (…contd.)
494


         This is not a concurrency problem , since the
          serial order (T3, T4) is maintained
         But if both T3 and T4 were to write an element
          X, it could lead to unserializable behavior
           r3(t1);r3(t2);w4(t3);w4(X);w3(L);w3(X)
           r3 and w3 are read and write operations by T3 and w4
            are the write operations by T4 and L is the total length
            calculated by T3 (t1 + t2)
           At the end, we have result of T3 as sum of lengths of
            t1 and t2 and X has value written by T3
           This is not right; if value of X is considered to be that
            written by T3 then for the schedule to be serializable,
            the sum of lengths of t1, t2 and t3 should be
            considered
                      Example (…contd.)
495


             Else if the sum is total length of t1 and t2 then for the
              schedule to be serializable, X should have value
              written by T4
         This problem arises since the relation has a phantom tuple
          (the new inserted tuple), which should have been locked but
          wasn’t since it didn’t exist at the time locks were taken
         We must regard the insertion or deletion of a tuple as a
          write operation on the relation as a whole.
         Thus transaction must obtain an X lock on the relation
          Movie.
         The occurrence of phantoms can be avoided if all insertion
          and deletion transactions are treated as write operations on
          the whole relation
                496




  CONCURRENCY
    CONTROL


   SECTION 18.7
THE TREE PROTOCOL
                       BASICS
497
       B-Trees
         - Tree data structure that keeps data sorted
         - allow searches, insertion, and deletion
         - commonly used in database and file systems
       Lock

         - Enforce limits on access to resources
         - way of enforcing concurrency control
       Lock Granularity

         - Level and type of information that lock
           protects.
                       TREE PROTOCOL
498
         Tree structures that are formed
          by the link pattern of the
          elements themselves. Database
          are the disjoint pieces of data,
          but the only way to get to Node is
          through its parent.
         Kind of graph-based protocol
         Alternate to Two-Phased Locking (2PL)
         database elements are disjoint pieces of
          data
         Nodes of the tree DO NOT form a
          hierarchy based on containment
           Way to get to the node is    through its
            parent
499
            ADVANTAGES OF TREE
                PROTOCOL


         Unlocking takes less time as
          compared to 2PL

         Freedom from deadlocks
500
               18.7.1 MOTIVATION FOR
               TREE-BASED LOCKING
          Consider B-Tree Index, treating individual nodes
          as lockable database elements.
          Concurrent use of B-Tree is not possible with
          standard set of locks and 2PL.
          Therefore, a protocol is needed which can assure
          serializability by allowing access to the elements
          all the way at the bottom of the tree even if the
          2PL is violated.
501
         18.7.1 MOTIVATION FOR
      TREE-BASED LOCKING (CONT.)
      Reason for : “Concurrent use of B-Tree is not
       possible with standard set of locks and 2PL.”

       every transaction must begin with locking
        the root node
       2PL transactions can not unlock the root

        until all the required locks are acquired.
         18.7.2 ACCESSING TREE
502        STRUCTURED DATA


      Assumptions:

        Only one kind of lock
        Consistent transactions
        Legal schedules
        No 2PL requirement on transaction
      18.7.2 RULES FOR ACCESSING
503
        TREE STRUCTURED DATA

  RULES:
     First lock can be at any node.
     Subsequent locks may be acquired only after parent node has a lock.
     Nodes may be unlocked any time.
     No relocking of the nodes even if the node’s parent is still locked
     There are few restrictions in locks from the tree protocol.
     We assume that that there are only one kind of lock.
     Transaction is consider a legal and schedules as simple.
     Expected restrictions by granting locks only when they do not
      conflict with locks already at a node, but there is no two phase
      locking requirement on transactions.
            18.7.3 WHY TREE PROTOCOL
504
                                       WORKS?
       A transaction's first lock may be at any node of the tree.
         Subsequent locks may only be acquired if the transaction currently has a
          lock on the parent node.
         Nodes may be unlocked at any time
         A transaction may not relock a node on which it has released a lock, even
          if it still holds a lock on the node’s parent


         Tree protocol implies a serial order on transactions in the schedule.
         Order of precedence:
                                                Ti < s Tj

         If Ti locks the root before Tj, then Ti locks every node in common      with Tj
          before Tj.
      ORDER OF PRECEDENCE
505
               506




 CONCURRENCY
   CONTROL


SECTION 18.8
TIMESTAMPS
              What is Timestamping?
507




         Scheduler assign each transaction T a unique
          number, it’s timestamp TS(T).
         Timestamps must be issued in ascending
          order, at the time when a transaction first
          notifies the scheduler that it is beginning.
                    Timestamp TS(T)
508


         Two methods of generating Timestamps.
           Use the value of system, clock as the timestamp.
           Use a logical counter that is incremented after a
            new timestamp has been assigned.
         Scheduler maintains a table of currently active
          transactions and their timestamps irrespective
          of the method used
              Timestamps for database
509
              element X and commit bit


         RT(X):- The read time of X, which is the highest
          timestamp of transaction that has read X.
         WT(X):- The write time of X, which is the highest
          timestamp of transaction that has write X.
         C(X):- The commit bit for X, which is true if and
          only if the most recent transaction to write X has
          already committed.
      Physically Unrealizable Behavior
510


                          Read too late:
         A transaction U that started after transaction T,
          but wrote a value for X before T reads X.

                               U writes X
                                            T reads X




                 T start   U start
      Physically Unrealizable Behavior
511


                           Write too late
         A transaction U that started after T, but read X
          before T got a chance to write X.

                                             U reads X

                                                         T writes X




                       T start     U start


                    Figure: Transaction T tries to write too late
                               Dirty Read
512


         It is possible that after T reads the value of X
          written by U, transaction U will abort.

                                 U writes X

                                               T reads X




                             U start     T start       U aborts



                    T could perform a dirty read if it reads X when shown
        Rules for Timestamps-Based
                 scheduling
513




      1. Scheduler receives a request rT(X)
       a) If TS(T) ≥ WT(X), the read is physically
          realizable.
          1. If C(X) is true, grant the request, if TS(T) > RT(X),
             set RT(X) := TS(T); otherwise do not change RT(X).
          2. If C(X) is false, delay T until C(X) becomes true or
             transaction that wrote X aborts.
       b) If TS(T) < WT(X), the read is physically
          unrealizable. Rollback T.
          Rules for Timestamps-Based
514
               scheduling (Cont.)
      2. Scheduler receives a request WT(X).
          a) if TS(T) ≥ RT(X) and TS(T) ≥ WT(X), write is physically
          realizable and must be performed.
             1. Write the new value for X,
             2. Set WT(X) := TS(T), and
             3. Set C(X) := false.
          b) if TS(T) ≥ RT(X) but TS(T) < WT(X), then the write is
          physically realizable, but there is already a later values in X.
             a. If C(X) is true, then the previous writers of X is
       committed, and ignore the write by T.
             b. If C(X) is false, we must delay T.
          c) if TS(T) < RT(X), then the write is physically unrealizable,
          and T must be rolled back.
        Rules for Timestamps-Based
             scheduling (Cont.)
515
           1.   Granting Request
           2.   Aborting T (if T would violate physical reality) and restarting T
                with a new timestamp (Rollback)
           3.   Delaying T and later deciding whether to abort T or to grant
                the request
      3. Scheduler receives a request to commit T. It must find
        all the database elements X written by T and set C(X) :=
        true. If any transactions are waiting for X to be
        committed, these transactions are allowed to proceed.
      4. Scheduler receives a request to abort T or decides to
        rollback T, then any transaction that was waiting on an
        element X that T wrote must repeat its attempt to read or
        write.
              Multiversion Timestamps
516


         keeps old versions of data item to increase
          concurrency.
         Each successful write results in the creation of a
          new version of the data item written.
         successful write results in the creation of a
          new version of the data item written.
         Use timestamps to label versions.
         When a read(X) operation is issued, select
          an appropriate version of X based on the
          timestamp of the transaction, and return the
          value of the selected version.
              Timestamps and Locking
517


         Generally, timestamping performs better than
          locking in situations where:
           Most   transactions are read-only.
           It is rare that concurrent transaction will try to
            read and write the same element.
         In high-conflict situation, locking performs better
          than timestamps
         In high-conflict situations, rollback will be
          frequent, introducing more delays than a
          locking system
               Validation based scheduling
518




         Scheduler keeps a record of what the active
          transactions are doing.
         Executes in 3 phases
          1.   Read- reads from RS( ), computes local address
          2.   Validate- compares read and write sets
          3.   Write- writes from WS( )
              Validation based Scheduler
519


             Checks are made to ensure serializability is not violated
             Scheduling of transactions is done by assigning transaction
              numbers to each transactions
             There must exist a serial schedule in which transaction Ti
              comes before transaction Tj whenever t(i) < t(j)
             If validation fails then the transaction is rolled back
              otherwise it proceeds to the third phase
         Contains an assumed serial order of transactions.
         Maintains three sets:
           START( ): set of T’s started but not completed validation.

           VAL( ): set of T’s validated but not finished the writing phase.

           FIN( ): set of T’s that have finished.
                      Expected exceptions
520



      1. Suppose there is a transaction U, such that:
       U is in VAL or FIN; that is, U has validated,

       FIN(U)>START(T); that is, U did not finish before T started

       RS(T) ∩WS(T) ≠φ; let it contain database element X.




      2. Suppose there is transaction U, such that:
      • U is in VAL; U has successfully validated.
      •FIN(U)>VAL(T); U did not finish before T entered its validation phase.
      •WS(T) ∩ WS(U) ≠φ; let x be in both write sets.
                     Validation rules
521


         T4 Starts before T1 and T3 finishes. So T4
          has to be checked against the sets of T1 and
          T3
         Check that RS(T) ∩ WS(U)= φ for any
          previously validated U that did not finish before
          T has started i.e. FIN(U)>START(T).
         Check that WS(T) ∩ WS(U)= φ for any
          previously validated U that did not finish before
          T is validated i.e. FIN(U)>VAL(T)
      Example
522
                               Solution
523

         Validation of U:
          Nothing to check
         Validation of T:
           WS(U) ∩ RS(T)= {D} ∩{A,B}=φ
           WS(U) ∩ WS(T)= {D}∩ {A,C}=φ
         Validation of V:
           RS(V) ∩ WS(T)= {B}∩{A,C}=φ
           WS(V) ∩ WS(T)={D,E}∩ {A,C}=φ
           RS(V) ∩ WS(U)={B} ∩{D}=φ
         Validation of W:
           RS(W) ∩ WS(T)= {A,D}∩{A,C}={A}
           WS(W) ∩ WS(V)= {A,D}∩{D,E}={D}
           WS(W) ∩ WS(V)= {A,C}∩{D,E}=φ   (W is not validated)
                      Comparison
524   Concurrency   Storage Utilization             Delays
      control
      Mechanisms
      Locks         Space in the lock table is      Delays transactions but
                    proportional to the number of   avoids rollbacks
                    database elements locked.
      Timestamps    Space is needed for read and    Do not delay the
                    write times with every          transactions but cause
                    database element, neither or    them to rollback unless
                    not it is currently accessed.   Interface is low
      Validation    Space is used for timestamps    Do not delay the
                    and read or write sets for      transactions but cause
                    each currently active           them to rollback unless
                    transaction, plus a few more    interface is low
                    transactions that finished
                    after some currently active
                    transaction began.
                          525




CHAPTER 21
21.1 INTRODUCTION TO
INFORMATION INTEGRATION




     CS257 Fan Yang
      Need for Information Integration
526


         Databases are created independently, even
          if they later need to work together.
         All the data in the world could put in a single
          database (ideal database system)
         In the real world (impossible for a single
          database):
          databases are created independently
          hard to design a database to support future
          use
                    University Database
527


             Registrar Database for keeping data about
              courses and student grades for generating
              transcripts.
             Bursar Database for keeping data about the
              tuition payments by students.
             Human Resource Department Database for
              recording employees including those students
              with teaching assistantship jobs.
      •   Other department….
                                   Inconvenient
528


         But these databases independently were of no use as a change in 1 database would
          not reflect in the other database which had to be performed manually. For e.g. we
          want to make sure that Registrar does not record grades of the student who did not
          pay the fees at Bursars office.
         Building a whole new database for the system again is a very expensive and time
          consuming process.
         In addition to paying for a very expensive software the University will have to run both
          the old and the new databases together for a long time to see that the new system
          works properly or not
         Record grades for students who pay tuition
         Want to swim in SJSU aquatic center for free in summer vacation?
          (all the cases above cannot achieve the function by a single database)


         Solution: one database
                    How to integrate
529


         Start over
          build one database: contains all the legacy
          databases; rewrite all the applications on top
          of all legacy databases, without disturbing the
          original databases. result: painful
         Build a layer of abstraction (middleware)
          on top of all the legacy databases
          this layer is often defined by a collection of
          classes BUT…
            Heterogeneity Problem
530


      What is Heterogeneity Problem
       Aardvark Automobile Co.
       1000 dealers has 1000 databases
       to find a model at another dealer
       can we use this command:
      SELECT * FROM CARS
                 WHERE MODEL=“A6”;
               Type of Heterogeneity
531


         Communication Heterogeneity
         Query-Language Heterogeneity
         Schema Heterogeneity
         Data type difference
         Value Heterogeneity
         Semantic Heterogeneity
                                     532




CHAPTER 21.2
MODES OF INFORMATION
INTEGRATION

      ID: 219
      Name: Qun Yu
      Class: CS257 219 Spring 2009
      Instructor: Dr. T.Y.Lin
                    Content Index
533


      21.2 Modes of Information Integration
      Information Integration allows database or other
        distributed information to work together.
        21.2.1 Federated Database Systems
        21.2.2 Data Warehouses
        21.2.3 Mediators
                         Federations
534




       The simplest architecture for integrating several
      DBs
       One to one connections between all pairs of
      DBs
       n DBs talk to each other, n(n-1) wrappers are
      needed
       Good when communications between DBs are
      limited
                         Wrapper
535


         Wrapper : a software translates incoming queries
          and outgoing answers. In a result, it allows
          information sources to conform to some shared
          schema.
                           Federations Diagram
536




                DB1                                  DB2

                                2 Wrappers

                                2 Wrappers

              2 Wrappers                           2 Wrappers
                                 2 Wrappers


                                 2 Wrappers
                DB3                                  DB4

      A federated collection of 4 DBs needs 12 components to translate queries
      from one to another.
                             Example
537




      Car dealers want to share their inventory. Each dealer queries the
      other’s DB to find the needed car.
      Dealer-1’s DB relation: NeededCars(model,color,autoTrans)
      Dealer-2’s DB relation: Auto(Serial, model, color)
                           Options(serial,option)



                                 wrapper
          Dealer-1’s DB          wrapper               Dealer-2’s DB
                                   Example…
538



      For(each tuple(:m,:c,:a) in NeededCars){
           if(:a=TRUE){/* automatic transmission wanted */
           SELECT serial
           FROM Autos, Options
           WHERE Autos.serial = Options.serial AND Options.option = ‘autoTrans’
                  AND Autos.model = :m AND Autos.color =:c;
      }
      Else{/* automatic transmission not wanted */
           SELECT serial
           FROM Auto
           WHERE Autos.model = :m AND
                  Autos.color = :c AND
                   NOT EXISTS( SELECT * FROM Options WHERE serial = Autos.serial
                                                       AND option=‘autoTrans’);
           }
      }

          Dealer 1 queries Dealer 2 for needed cars
                        Data Warehouse
539


         Sources are translated from their local schema to a global schema
          and copied to a central DB.
         User transparent: user uses Data Warehouse just like an ordinary
          DB
         User is not allowed to update Data Warehouse
         Copies sources of data from several sources are
          stored in a single database.
       Warehouse Diagram
540



                  User    result
                  query




                  Warehouse


                     Combiner


      Extractor                    Extractor


      Source 1                     Source 2
                              Example
541




      Construct a data warehouse from sources DB of 2 car dealers:

      Dealer-1’s schema: Cars(serialNo, model,color,autoTrans,cdPlayer,…)
      Dealer-2’s schema: Auto(serial,model,color)
                        Options(serial,option)

      Warehouse’s schema:
          AutoWhse(serialNo,model,color,autoTrans,dealer)

      Extractor --- Query to extract data from Dealer-1’s data:

      INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
      SELECT serialNo,model,color,autoTrans,’dealer1’
      From Cars;
                              Example
542




      Extractor --- Query to extract data from Dealer-2’s data:

      INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
      SELECT serialNo,model,color,’yes’,’dealer2’
      FROM Autos,Options
      WHERE Autos.serial=Options.serial AND
            option=‘autoTrans’;

      INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
      SELECT serialNo,model,color,’no’,’dealer2’
      FROM Autos
      WHERE NOT EXISTS ( SELECT * FROM serial =Autos.serial
                       AND option = ‘autoTrans’);
             Construct Data Warehouse
543




           There are mainly 3 ways to constructing
           the data in the warehouse:
      1)   Periodically reconstructed from the current data in the
           sources, once a night or at even longer intervals.
           Advantages:
           simple algorithms.
           Disadvantages:
           1) need to shut down the warehouse;
           2) data can become out of date.
              Construct Data Warehouse
544




      2)   Updated periodically based on the changes(i.e. each
           night) of the sources.
           Advantages:
           involve smaller amounts of data. (important when warehouse is
           large and needs to be modified in a short period)
           Disadvantages:
           1) the process to calculate changes to the warehouse is complex.
           2) data can become out of date.
            Construct Data Warehouse
545




      3) Changed immediately, in response to each change or a
         small set of changes at one or more of the sources.
         Advantages:
         data won’t become out of date.
         Disadvantages:
         requires too much communication, therefore, it is
         generally too expensive.
          (practical for warehouses whose underlying sources changes
          slowly.)
                          Mediators
546


         Virtual warehouse, which supports a virtual view
          or a collection of views, that integrates several
          sources.
         Mediator doesn’t store any data.
         Mediators’ tasks:
                          1)receive user’s query,
                       2)send queries to wrappers,
                    3)combine results from wrappers,
                      4)send the final result to user.
547
              A Mediator diagram

              User query              Result


                           Mediator
          Query                            Result
                      Result   Query
         Wrapper                        Wrapper
      Query       Result          Query        Result

         Source 1                       Source 2
                                Example
548




      Same data sources as the example of data warehouse, the mediator
      Integrates the same two dealers’ source into a view with schema:

      AutoMed(serialNo,model,color,autoTrans,dealer)

      When the user have a query:

      SELECT sericalNo, model
      FROM AkutoMed
      Where color=‘red’
                                 Example
549




      In this simple case, the mediator forwards the same query to each
      Of the two wrappers.

      Wrapper1: Cars(serialNo, model, color, autoTrans, cdPlayer, …)
      SELECT serialNo,model
      FROM cars
      WHERE color = ‘red’;

      Wrapper2: Autos(serial,model,color); Options(serial,option)
      SELECT serial, model
      FROM Autos
      WHERE color=‘red’;
      The mediator needs to interprets serial into serialNo, and then
      returns the union of these sets of data to user.
                                  Example
550




  There may be different options for the mediator to forward user query,
  for example, the user queries if there are a specific model&color car
  (i.e. “Gobi”, “blue”).

  The mediator decides 2nd query is needed or not based on the result of
  1st query. That is, If dealer-1 has the specific car, the mediator doesn’t
  have to query dealer-2.
                                      551




21.3 WRAPPERS IN MEDIATOR-
BASED SYSTEMS


                      Presentby: Kai Zhu
                    Professor: Dr. T.Y. Lin
552


      Wrappers in Mediator-based Systems
         More complicated than that in most data
          warehouse system.
         Able to accept a variety of queries from the
          mediator and translate them to the terms of the
          source.
         Communicate the result to the mediator.
553


          How to design a wrapper?
          Classify the possible queries that the mediator can ask into
          templates, which are queries with parameters that represent
          constants.
         The wrapper(extractor) consists of:
          One or more predefined queries (based on source)
             SQL
             Web page



          Suitable communication mechanism for sending and
          receiving information to/from
            source/mediator.
554



          Templates for Query Patterns:

         Use notation T=>S to express the idea
          that the template T is turned by the
          wrapper into the source query S.
555
         Example 1
          Dealer 1
             Cars (serialNo, model, color, autoTrans,
             navi,…)

          For use by a mediator with schema
              AutoMed (serialNo, model, color, autoTrans,
          dealer)
         We denote the code representing that color by
          the parameter $c, then the template will be:
556

      SELECT *
      FROM AutosMed
      WHERE color = ’$c’;
                 =>
      SELECT serialNo, model, color, autoTrans, ’dealer1’
      FROM Cars
      WHERE color=’$c’;

      (Template T => Source query S)
557


         There will be total 2n templates if we have the
          option of specifying n attributes.
558


          Wrapper Generators
         The wrapper generator creates a table holds the various query
          patterns contained in the templates.
         The source queries that are associated with each.


         Creates a table that holds the various query patterns
          contained in templates.
          Source queries associated with each of them.
         The Driver:
          Accept a query from the mediator.
          Search the table for a template that matches the query.
          Send the query to the source.
          Return the response to the Mediator.
          A driver is used in each wrapper, the task of
          the driver is to:
559




         Accept a query from the mediator.
         Search the table for a template that matches the
          query.
         The source query is sent to the source, again
          using a “plug-in” communication mechanism.
         The response is processed by the wrapper.
560


          Filter
         Have a wrapper filter to supporting more
          queries.
         Example 2
         If wrapper is designed with more complicated
561
          template with queries specify both model and
          color.

      SELECT *
      FROM AutosMed
      WHERE model = ’$m’ AND color = ’$c’;
                 =>
      SELECT serialNo, model, color, autoTrans, ’dealer1’
      FROM Cars
      WHERE model = ’$m’ AND color=’$c’;
562

         Now we suppose the only template we have is
          color. However the wrapper is asked by the
          Mediator to find “blue Gobi model car.”
      Solution:
            1. Use template with $c=‘blue’ find all blue
563
       cars and store them in a temporary relation:
            TemAutos (serialNo, model, color, autoTrans,
       dealer)
            2.The wrapper then return to the mediator
       the desired set of automobiles by excuting the
       local query:
       SELECT*
       FROM TemAutos
       WHERE model= ’Gobi’;
                             564




Information integration


      SECTIONS 21.4 – 21.5
      21.4 Capability Based Optimization
565


         Introduction
           Typical  DBMS estimates the cost of each query
            plan and picks what it believes to be the best
           Mediator – has knowledge of how long its sources
            will take to answer
           Optimization of mediator queries cannot rely on
            cost measure alone to select a query plan
           Optimization by mediator follows capability based
            optimization
            21.4.1 The Problem of Limited
                 Source Capabilities
566


         Many sources have only Web Based interfaces
         Web sources usually allow querying through a query
          form, which does not accept arbitrary SQL queries.
         Rather, we are invited to enter values for certain
          attributes and can receive a response that gives
          values for other attributes
         E.g. Amazon.com interface allows us to query about
          books in many different ways.We can specify an
          author and get all their books, or we can specify a
          book title and get book information
         But we cannot ask questions that are too general
            E.g. Select * from books;
            21.4.1 The Problem of Limited
             Source Capabilities (con’t)
567


         Reasons why a source may limit the ways in
          which queries can be asked
           Earliestdatabase did not use relational DBMS
            that supports SQL queries, surely not a
            relational DBMS that supports SQL queries
           Indexes on large database may make certain
            queries feasible, while others are too expensive
            to execute
           Security reasons
             E.g.Medical database may answer queries about
             averages, but won’t disclose details of a particular
             patient's information
          21.4.2 A Notation for Describing
                    Source Capabilities
568


         For relational data, the legal forms of queries are
          described by adornments
         Adornments – Sequences of codes that represent
          the requirements for the attributes of the relation,
          in their standard order. The code we hall use for
          adornments reflect the most common
          capabilities of sources, They are
           f(free) – attribute can be specified or not, as we
            choose
           b(bound) – must specify a value for an attribute but
            any value is allowed
           u(unspecified) – not permitted to specify a value for a
            attribute
          21.4.2 A notation for Describing
               Source Capabilities….(cont’d)
569


           c[S](choice  from set S) means that a value must
            be specified and value must be one of the
            values from finite set S.
           o[S](optional from set S) means either do not
            specify a value or we specify a value from finite
            set S
           A prime (f’) specifies that an attribute is not a part
            of the output of the query
         A capabilities specification is a set of adornments
         A query must match one of the adornments in its
          capabilities specification
          21.4.2 A notation for Describing
               Source Capabilities….(cont’d)
570




         E.g. Dealer 1 is a source of data in the form:
            Cars (serialNo, model, color, autoTrans, navi)
            The adornment for this query form is b’uuuu
          Ie. The first attribute, serialNo,must be specified
            and is not part of the output. The other
            attributes must not be specified and are part
            of the output.
      21.4.3 Capability-Based Query-Plan
                   Selection
571


         Given a query at the mediator, a capability
          based query optimizer first considers what
          queries it can ask at the sources to help
          answer the query
         The process is repeated until:
           Enough    queries are asked at the sources to
            resolve all the conditions of the mediator query
            and therefore query is answered. Such a plan is
            called feasible.
           We can construct no more valid forms of source
            queries, yet still cannot answer the mediator
            query. It has been an impossible query.
          21.4.3 Capability-Based Query-Plan
                     Selection (cont’d)
572

         The simplest form of mediator query where we need to apply the above strategy is join relations
         E.g we have sources for dealer 2
             Autos(serial, model, color)
             Options(serial, option)
                  Suppose that ubf is the sole adornment for Auto and Options have two adornments, bu and
                   uc[autoTrans, navi]
                  Query is – find the serial numbers and colors of Gobi models with a navigation system
                  Here are the three plans
                  1. Specify that the model is Gobi, query Autos and get the serial
                   numbers and colors of all Gobies Then using the bu adornment for
                   Optionsm for each such serial number, find the options for that acar and
                   filter to make sure it has a navigation system
                  2. specififying the navigation-system option, query options using the
                   adornment and get all ther serial numbers for cars with a navigation
                   system.
                  3.Query option 2 to get the serial numbers for cars with a navigation
                   system. Then use these serial numbers to query Autos and see which of
                   these cars are Gobis.
          21.4.4 Adding Cost-Based
                       Optimization
573


         Mediator’s Query optimizer is not done when
          the capabilities of the sources are examined
         Having found feasible plans, it must choose
          among them
         Making an intelligent, cost based query
          optimization requires that the mediator knows
          a great deal about the costs of queries
          involved
         Sources are independent of the mediator, so it
          is difficult to estimate the cost
              21.5 Optimizing Mediator
574
                      Queries
         Chain algorithm – a greed algorithm that finds
          a way to answer the query by sending a
          sequence of requests to its sources.
           Willalways find a solution assuming at least one
            solution exists.
           The solution may not be optimal.

           This class of queries , can be handled that
            involve joins of relations that come from the
            sources, followed by an optional selection
            and optional projection onto output attributes
           Can be expressed as datalog rules
          21.5.1 Simplified Adornment
575
                    Notation
         The chain algorithem itself with datalog
          rules and with whether prior source
          requests have provided bindings for any of
          the variables in the body of the rule
         A query at the mediator is limited to b (bound) and f (free)
          adornments.
         We use the following convention for describing adornments:
           nameadornments(attributes)

           where:

             name is the name of the relation
             the number of adornments = the number of attributes
          21.5.2 Obtaining Answers for
                                        Subgoals
576


         Rules for subgoals and sources:
           Suppose     we have the following subgoal:
              Rx1x2…xn(a1, a2, …, an), where is xi is eiher b or f
              and source adornments for R are: y1y2…yn. Each
              yi can be any of b, f, u, c[S] or o[S] for any set
              of S. Then it is possible to obtain a relation for
              the subgoal provided, for each i= 1,2 … n.
                If yi is b or c[S], then xi = b.
                If xi = f, then yi is not output restricted.
             The adornment on the subgoal matches the adornment at
              the source:
                If yi is f, u, or o[S] and xi is either b or f.
             21.5.3 The Chain Algorithm
577


         Chain algorithm is greedy approach to
          selecting an order in which we obtain
          relations for each of the subgoals of a
          Datalog rule. It is not always provide
          efficient solution, but always provide
          solution if it is exist
         Maintains 2 types of information:
           An adornment for each subgoal.

           A relation X that is the join of the relations for all the subgoals
            that have been resolved.
         Initially, the adornment for a subgoal is b iff the mediator query
          provides a constant binding for the corresponding argument of that
          subgoal.
               21.5.3 The Chain Algorithm
578
                         (con’t)
          First, initialize adornments of subgoals and X.
          Then, repeatedly select a subgoal that can be
           resolved. Let Rα(a1, a2, …, an) be the subgoal:
      1.    Wherever α has a b, we shall find the
            argument in R is a constant, or a variable in
            the schema of R.
              Project X onto its variables that appear in R.
              21.5.3 The Chain Algorithm
579
                        (con’t)
      2.     For each tuple t in the project of X, issue a query to the
             source as follows (β is a source adornment).
              If a component of β is b, then the corresponding
               component of α is b, and we can use the
               corresponding component of t for source query.
              If a component of β is c[S], and the corresponding
               component of t is in S, then the corresponding
               component of α is b, and we can use the
               corresponding component of t for the source query.
              If a component of β is f, and the corresponding
               component of α is b, provide a constant value for
               source query.
               21.5.3 The Chain Algorithm
580
                         (con’t)
              If a component of β is u, then provide no binding
               for this component in the source query.
              If a component of β is o[S], and the
               corresponding component of α is f, then treat it
               as if it was a f.
              If a component of β is o[S], and the
               corresponding component of α is b, then treat it
               as if it was c[S].
      3.   Every variable among a1, a2, …, an is now
           bound. For each remaining unresolved
           subgoal, change its adornment so any
           position holding one of these variables is b.
            21.5.3 The Chain Algorithm
581
                      (con’t)
      4.   Replace X with X πs(R), where S is all of the
           variables among: a1, a2, …, an.
      5.   Project out of X all components that
           correspond to variables that do not appear in
                                 α
           the head or in any unresolved subgoal.
          If every subgoal is resolved, then X is the
           answer.
          If every subgoal is not resolved, then the
           algorithm fails.
      21.5.3 The Chain Algorithm Example
582


         Mediator query:
           Q:   Answer(c) ← Rbf(1,a) AND Sff(a,b) AND Tff(b,c)
      Example:
      Relation             R             S           T
                       w        x    x       y   y        z
      Data
                       1        2    2       4   4        6
                       1        3    3       5   5        7
                       1        4                5        8

      Adornment            bf       c’[2,3,5]f       bu
      21.5.3 The Chain Algorithm Example
                        (con’t)
583


         Initially, the adornments on the subgoals are
          the same as Q, and X contains an empty tuple.
          S and T cannot be resolved because they each
           have ff adornments, but the sources have either a
           b or c.
         R(1,a) can be resolved because its
          adornments are matched by the source’s
          adornments.
         Send R(w,x) with w=1 to get the tables on the
          previous page.
      21.5.3 The Chain Algorithm Example
                        (con’t)
584


         Project the subgoal’s relation onto its second
          component, since only the second component
          of R(1,a) is a variable.
                               a
                               2
                               3
                               4



         This is joined with X, resulting in X equaling
          this relation.
         Change adornment on S from ff to bf.
      21.5.3 The Chain Algorithm Example
                        (con’t)
585


         Now we resolve Sbf(a,b):
           ProjectX onto a, resulting in X.
           Now, search S for tuples with attribute a
            equivalent to attribute a in X.
                                a       b
                                2       4
                                3       5

         Join this relation with X, and remove a
          because it doesn’t appear in the head nor any
          unresolved subgoal: b
                                    4
                                    5
      21.5.3 The Chain Algorithm Example
                        (con’t)
586


         Now we resolve Tbf(b,c):
                              b    c
                              4    6
                              5    7
                              5    8


         Join this relation with X and project onto the c
          attribute to get the relation for the head.
         Solution is {(6), (7), (8)}.
      21.5.4 Incorporating Union Views at
                     the Mediator
587


         This implementation of the Chain Algorithm
          does not consider that several sources can
          contribute tuples to a relation.
         If specific sources have tuples to contribute
          that other sources may not have, it adds
          complexity.
         To resolve this, we can consult all sources, or
          make best efforts to return all the answers.
      21.5.4 Incorporating Union Views at
                  the Mediator (con’t)
588


         Consulting All Sources
           We   can only resolve a subgoal when each source
            for its relation has an adornment matched by the
            current adornment of the subgoal.
           Less practical because it makes queries harder to
            answer and impossible if any source is down.
         Best Efforts
           We  need only 1 source with a matching
            adornment to resolve a subgoal.
           Need to modify chain algorithm to revisit each
            subgoal when that subgoal has new bound
            requirements.
      21.5.4 Incorporating Union Views at
                  the Mediator (con’t)
589


         Consulting All Sources
           We   can only resolve a subgoal when each source
            for its relation has an adornment matched by the
            current adornment of the subgoal.
           Less practical because it makes queries harder to
            answer and impossible if any source is down.
         Best Efforts
           We  need only 1 source with a matching
            adornment to resolve a subgoal.
           Need to modify chain algorithm to revisit each
            subgoal when that subgoal has new bound
            requirements.
                 590




21.6 LOCAL-AS-VIEW
     MEDIATORS
             Local-as-View Mediators.
591


         In a LAV mediator, global predicates defined
          are not views of the source data.
         Expressions are defined, involving the
          global predicates that describe the tuples
          that the source is able to produce.
         Queries are answered at the mediator by
          discovering all possible ways to construct the
          query using the views provided by the source.
           Motivation for LAV Mediators
592


         Relationship between what the mediator
          should provide and what the sources
          provide is more subtle.
         Example, consider the predicate Par(c, p)
          meaning that p is a parent of c which
          represents the set of all child parent facts that
          could ever exist.
         The sources will provide information about
          whatever child-parent facts they know.
                  Motivation(contd..)
593


         There can be sources which may provide
          child-grandparent facts but not child- parent
          facts at all.
         This source can never be used to answer the
          child-parent query under GAV mediators.
         LAV mediators allow to say that a certain
          source provides grand parent facts.
         They help discover how and when to use that
          source in a given query.
      Terminology for LAV Mediation.
594


         The queries at the mediator and that
          describe the source will be single Datalog
          rules.
         Single Datalog rule is often called a
          conjunctive query.
         The global predicates of the LAV mediator are
          used as the sub-goals of mediator queries.
         Conjunctive queries define views.
                           Contd..
595


         Their heads each have a unique view
          predicate that is the name of a view.
         Each view definition has a body consisting of
          global predicates and is associated with a
          particular source.
         Assumption - each view can be
          constructed with an all-free adornment.
                        Example..
596


         Consider global predicate Par(c, p) meaning
          that p is a parent of c.
         One source produces parent facts. Its view is
          defined by the conjunctive query-
                        V1(c, p)  Par(c, p)
         Another source produces some grand parents
          facts. Then its conjunctive query will be –
                V2(c, g)  Par(c, p) AND Par(p, g)
                  Example contd..
597


      The query at the mediator will ask for great-
       grand parent facts that can be obtained from
       the sources. The mediator query is –
      Q(w, z)  Par(w, x) AND Par(x, y) AND Par(y, z)

      One solution can be using the parent
       predicate(V1) directly three times.
      Q(w, z)  V1(w, x) AND V1 (x, y) AND V1(y, z)
                    Example contd..
598


         Another solution can be to use V1(parent facts)
          and V2(grandparent facts).

               Q(w, z)  V1(w, x) AND V2(x, z)
                            Or
               Q(w, z)  V2(w, y) AND V1(y, z)
                Expanding Solutions.
599


         Consider a query Q, a solution S that has a
          body whose subgoals are views and each
          view V is defined by a conjunctive query with
          that view as the head.
         The body of V’s conjunctive query can be
          substituted for a subgoal in S that uses the
          predicate V to have a body consisting of only
          global predicates.
               Expansion Algorithm
600


       A solution S has a subgoal V(a1, a2,…an)
        where ai’s can be any variables or constants.
       The view V can be of the form

                  V(b1, b2,….bn)  B
      Where B represents the entire body.
       V(a1, a2, … an) can be replaced in solution S
        by a version of body B that has all the
        subgoals of B with variables possibly altered.
           Expansion Algorithm contd..
601


          The rules for altering the variables of B are:
      1.   First identify the local variables B, variables
           that appear in the body but not in the head.
      2.   If there are any local variables of B that
           appear in B or in S, replace each one by a
           distinct new variable that appears nowhere
           in the rule for V or in S.
      3.   In the body B, replace each bi by ai for
           i = 1,2…n.
                         Example.
602


         Consider the view definitions,
                       V1(c, p)  Par(c, p)
                V2(c, g)  Par(c, p) AND Par(p, g)
         One of the proposed solutions S is
                 Q(w, z)  V1(w, x) AND V2(x, z)
         The first subgoal with predicate V1 in the
          solution can be expanded as Par(w, x) as
          there are no local variables.
                   Example Contd.
603


       The V2 subgoal has a local variable p which
        doesn’t appear in S nor it has been used as a
        local variable in another substitution. So p can
        be left as it is.
       Only x and z are to be substituted for variables

        c and g.
       The Solution S now will be

       Q(w, z)  Par(w, x) AND Par(x, p) AND Par(p,z)
           Containment of Conjunctive
604
                   Queries
          A containment mapping from Q to E is a
           function т from the variables of Q to the
           variables and constants of E, such that:
      1.    If x is the ith argument of the head of Q,
            then т(x) is the ith argument of the head of
            E.
      2.    Add to т the rule that т(c)=c for any constant
            c. If P(x1,x2,… xn) is a subgoal of Q, then
            P(т(x1), т(x2),… т(xn)) is a subgoal of E.
                       Example.
605


       Consider two Conjunctive queries:
      Q1: H(x, y)  A(x, z) and B(z, y)
      Q2: H(a, b)  A(a, c) AND B(d, b) AND A(a, d)
       When we apply the substitution,

      Т(x) = a, Т(y) = b, Т(z) = d, the head of Q1
        becomes H(a, b) which is the head of Q2.
      So,there is a containment mapping from Q1 to
        Q2.
                    Example contd..
606


         The first subgoal of Q1 becomes A(a, d) which
          is the third subgoal of Q2.
         The second subgoal of Q1 becomes the
          second subgoal of Q2.
         There is also a containment mapping from Q2
          to Q1 so the two conjunctive queries are
          equivalent.
          Why the Containment-Mapping
607
                   Test Works
         Suppose there is a containment mapping т
          from Q1 to Q2.
         When Q2 is applied to the database, we look
          for substitutions σ for all the variables of Q2.
         The substitution for the head becomes a tuple
          t that is returned by Q2.
         If we compose т and then σ, we have a
          mapping from the variables of Q1 to tuples of
          the database that produces the same tuple t
          for the head of Q1.
      Finding Solutions to a Mediator
608
                  Query
         There can be infinite number of solutions built
          from the views using any number of subgoals and
          variables.
         LMSS Theorem can limit the search which states
          that
          •    If a query Q has n subgoals, then any answer
              produced by any solution is also produced by a
              solution that has at most n subgoals.
         If the conjunctive query that defines a view V has
          in its body a predicate P that doesn’t appear in the
          body of the mediator query, then we need not
          consider any solution that uses V.
                       Example.
609


       Recall the query
      Q1: Q(w, z) Par(w, x) AND Par(x, y) AND
                                        Par(y, z)
       This query has three subgoals, so we don’t

        have to look at solutions with more than three
        subgoals.
          Why the LMSS Theorem Holds
610


         Suppose we have a query Q with n subgoals
          and there is a solution S with more than n
          subgoals.
         The expansion E of S must be contained in
          Query Q, which means that there is a
          containment mapping from Q to E.
         We remove from S all subgoals whose
          expansion was not the target of one of Q’s
          subgoals under the containment mapping.
                            Contd..
611


         We would have a new conjunctive query S’
          with at most n subgoals.
         If E’ is the expansion of S’ then, E’ is a subset
          of Q.
         S is a subset of S’ as there is an identity
          mapping.
         Thus S need not be among the solutions to
          query Q.
INFORMATION INTEGRATION
 ENTITY RESOLUTION – 21.7
                     Contents
   21.7 Entity Resolution
       21.7.1 Deciding Whether Records
        Represent a Common Entity
       21.7.2 Merging Similar Records
       21.7.3 Useful Properties of Similarity and
        Merge Functions
       21.7.4 The R-Swoosh Algorithm for ICAR
        Records
       21.7.5 Other Approaches to Entity
        Resolution
                    Introduction

   Determining whether two records or tuples do
    or do not represent the same person,
    organization, place or other entity is called
    ENTITY RESOLUTION.
       Deciding whether Records
      represent a Common Entity
   Two records represent the same individual if
    the two records have similar values for each
    of the fields associated with those records.
   It may be possible that two records are
    same but corresponding fields are not
    identical because of following reasons:
      1. Misspellings
      2. Variant Names
      3. Misunderstanding of Names
   Continue: Deciding whether
Records represent a Common Entity

 4. Evolution of Values
 5. Abbreviations

Thus when deciding whether two records represent
 the same entity, we need to look carefully at the
 kinds of discrepancies and use the test that
 measures the similarity of records.
      Deciding Whether Records
    Represents a Common Entity -
            Edit Distance

   First approach to measure the similarity of
    records is Edit Distance.

   Values that are strings can be compared by
    counting the number of insertions and
    deletions of characters it takes to turn one
    string into another.

   So the records represent the same entity if
    their similarity measure is below a given
     Deciding Whether Records Represents a
         Common Entity - Normalization

   To normalize records by replacing certain
    substrings by others. For instance: we can use
    the table of abbreviations and replace
    abbreviations by what they normally stand for.

   Once normalize we can use the edit distance to
    measure the difference between normalized
    values in the fields.
     Merging Similar Records

   Merging means replacing two similar
    records with single record which contain
    information of both.
   There are many merge rules:
    1. Set the field in which the records disagree to
      the empty string.
    2. (i) Merge by taking the union of the values in
      each field
      (ii) Declare two records similar if at least two of
      the three fields have a nonempty intersection.
 Continue: Merging Similar
         Records

   Name       Address         Phone
1. Susan     123 Oak St.     818-555-1234
2. Susan     456 Maple St.    818-555-1234
3. Susan     456 Maple St.    213-555-5678
After Merging
Name      Address               Phone
(1-2-3) Susan {123 Oak St.,456 Maple St}
                {818-555-1234, 213-555-5678}
    Properties of Similarity and
        Merge Functions
The following properties say that the merge
    operation is a semi lattice :
1.  Idempotence : That is, the merge of a
    record with itself should surely be that
    record.
 Commutativity : Order of merging should

   not matter and end result should be
   same.
1.  Associativity : The order in which we
    group records for a merger should not
    matter.
Continue: Useful Properties of
Similarity and Merge Functions
There are some other properties that we expect
   similarity relationship to have:
  Idempotence for similarity : A record is
   always similar to itself
  Commutativity of similarity : In deciding
   whether two records are similar it does not
   matter in which order we list them
   Representability : If ‘r’ is similar to record ‘s’,
    and ‘s’ is merged with some other record t’’,
    then ‘r’ remains similar to the merger of ‘s’ and
    ‘t’ and can be merged with that record.
R-swoosh Algorithm for ICAR
         Records
   Input: A set of records I, similarity function and a merge
    function.
   Output: A set of merged records O.
   Method:
             O:= emptyset;
             WHILE I is not empty DO BEGIN
                  Let r be any record in I;
                  Find, if possible, some record s in O that is similar to
                   r;
                  IF no record s exists THEN
                                      move r from I to O
                  ELSE BEGIN
                             delete r from I;
                                        delete s from O;
                                               add the merger of r and s to
              I;
                  END;
 Other Approaches to Entity
        Resolution


The other approaches to entity resolution are
 :

  Non-  ICAR Datasets
  Clustering

  Partitioning
   Other Approaches to Entity
 Resolution - Non ICAR Datasets


Non ICAR Datasets : We can define a
 dominance relation r<=s that means record s
 contains all the information contained in
 record r.

 If so, then we can eliminate record r from
 further consideration.
       Other Approaches to Entity
         Resolution - Clustering

Clustering: Some time we group the records into
  clusters such that members of a cluster are in
  some sense similar to each other and members
  of different clusters are not similar.
      Other Approaches to Entity
       Resolution - Partitioning

Partitioning: We can group the records, perhaps
 several times, into groups that are likely to
 contain similar records and look only within each
 group for pairs of similar records.


								
To top