Docstoc

13.4.Disk Failures

Document Sample
13.4.Disk Failures Powered By Docstoc
					                    REPORT
                     CS257
                             NAMRATA K BUDDHADEV
                       CLASS ID: 104 CHANGED TO 224
                                   SJSU ID:006502210

PROF: DR.T.Y.LIN

* Modification are done in gray and italics



                                                       1
          13.1.1 Memory Hierarchy
• Data storage capacities varies for different data

• Cost per byte to store data also varies
• Device with smallest capacity offer the fastest
  speed with highest cost per bit




                                                      2
     Memory Hierarchy Diagram
 Programs,                         DBMS
Main Memory DBMS’s           Tertiary Storage


  As Visual Memory    Disk         File System


                     Main Memory


                        Cache

                                                 3
       13.1.1 Memory Hierarchy
• Cache
  – Lowest level of the hierarchy
  – Data items are copies of certain locations of main memory
  – Sometimes, values in cache are changed and
    corresponding changes to main memory are delayed
  – Machine looks for instructions as well as data for those
    instructions in the cache
  – Holds limited amount of data
  – No need to update the data in main memory immediately in a single
    processor computer
  – In multiple processors data is updated immediately to main
    memory….called as write through
                                                                        4
                      Main Memory
• Refers to physical memory that is internal to the computer. The word main
  is used to distinguish it from external mass storage devices such as disk
  drives.

• Everything happens in the computer i.e.
  instruction execution, data manipulation, as
  working on information that is resident in
  main memory
• Main memories are random access….one can
  obtain any byte in the same amount of time

                                                                         5
               Secondary storage
• Used to store data and programs when they
  are not being processed
• More permanent than main memory, as data
  and programs are retained when the power is
  turned off
• A personal computer might only require 20,000 bytes of secondary
  storage

• E.g. magnetic disks, hard disks


                                                                     6
                    Tertiary Storage
• consists of anywhere from one to several storage drives.
• It is a comprehensive computer storage system that is usually very slow,
  so it is usually used to archive data that is not accessed frequently.

• Holds data volumes in terabytes
• Used for databases much larger than what
  can be stored on disk




                                                                             7
13.1.2 Transfer of Data Between levels

• Data moves between adjacent levels of the hierarchy
•   At the secondary or tertiary levels accessing the desired data or finding the desired
    place to store the data takes a lot of time
• Disk is organized into bocks
• Entire blocks are moved to and from memory called a
  buffer
•   A key technique for speeding up database operations is to arrange the data so that
    when one piece of data block is needed it is likely that other data on the same block
    will be needed at the same time
• Same idea applies to other hierarchy levels



                                                                                            8
   13.1.3 Volatile and Non Volatile
                    Storage
• A volatile device forgets what data is stored
  on it after power off
• Non volatile holds data for longer period even
  when device is turned off
• Secondary and tertiary devices are non volatile
• Main memory is volatile




                                                    9
                13.1.4 Virtual Memory
•   computer system technique which gives an application program the impression that it
    has contiguous working memory (an address space), while in fact it may be physically
    fragmented and may even overflow on to disk storage
•   technique make programming of large applications easier and use real physical
    memory (e.g. RAM) more efficiently
• Typical software executes in virtual memory
• Address space is typically 32 bit or 232 bytes or 4GB
• Transfer between memory and disk is in terms of
  blocks




                                                                                      10
     13.2.1 Mechanism of Disk
• Mechanisms of Disks
  – Use of secondary storage is one of the important
    characteristic of DBMS
  – Consists of 2 moving pieces of a disk
      • 1. disk assembly
      • 2. head assembly
  – Disk assembly consists of 1 or more platters
  – Platters rotate around a central spindle
  – Bits are stored on upper and lower surfaces of platters


                                                              11
    13.2.1 Mechanism of Disk
• Disk is organized into tracks
• The track that are at fixed radius from center
  form one cylinder
• Tracks are organized into sectors
• Tracks are the segments of circle separated by
  gap



                                              12
13
           13.2.2 Disk Controller
• One or more disks are controlled by disk
  controllers
• Disks controllers are capable of
  – Controlling the mechanical actuator that moves the head assembly
  – Selecting the sector from among all those in the
    cylinder at which heads are positioned
  – Transferring bits between desired sector and
    main memory
  – Possible buffering an entire track

                                                                       14
 13.2.3 Disk Access Characteristics
• Accessing (reading/writing) a block requires 3
  steps
  – Disk controller positions the head assembly at the
    cylinder containing the track on which the block is
    located. It is a ‘seek time’
  – The disk controller waits while the first sector of
    the block moves under the head. This is a
    ‘rotational latency’
  – All the sectors and the gaps between them pass the head, while disk
    controller reads or writes data in these sectors. This is a ‘transfer time’

                                                                            15
13.3 Accelerating Access to Secondary
                  Storage
 Secondary storage definition
Several approaches for more-efficiently accessing
 data in secondary storage:
   Place blocks that are together in the same cylinder.
   Divide the data among multiple disks.
   Mirror disks.
   Use disk-scheduling algorithms.
   Prefetch blocks into main memory.
Scheduling Latency – added delay in accessing
 data caused by a disk scheduling algorithm.
Throughput – the number of disk accesses per
 second that the system can accommodate.
                                                           16
13.3.1 The I/O Model of Computation

The number of block accesses (Disk I/O’s) is a
 good time approximation for the algorithm.
   Disk I/o’s proportional to time taken, so should be minimized.

Ex 13.3: You want to have an index on R to
 identify the block on which the desired tuple
 appears, but not where on the block it
 resides.
  For Megatron 747 (M747) example, it takes 11ms
   to read a 16k block.
   delay in searching for the desired tuple is negligible.
                                                                     17
   13.3.2 Organizing Data by Cylinders

 first seek time and first rotational latency can never be neglected.

Ex 13.4: We request 1024 blocks of M747.
    If data is randomly distributed, average latency is
     10.76ms by Ex 13.2, making total latency 11s.
    If all blocks are consecutively stored on 1 cylinder:
         6.46ms + 8.33ms * 16 = 139ms

    (1 average seek)       (time per rotation)        (# rotations)



                                                                         18
        13.3.3 Using Multiple Disks
 Number of disks is proportional to the factor by which performance is
  performance will increase by improved
Striping – distributing a relation across multiple
 disks following this pattern:
   Data on disk R1: R1, R1+n, R1+2n,…
   Data on disk R2: R2, R2+n, R2+2n,…
                 …
   Data on disk Rn: Rn, Rn+n, Rn+2n, …
 Ex 13.5: We request 1024 blocks with n = 4.
   6.46ms + (8.33ms * (16/4)) = 39.8ms

   (1 average seek) (time per rotation) (# rotations)                     19
              13.3.4 Mirroring Disks
Mirroring Disks – having 2 or more disks hold
   identical copy of data.

Benefit 1: If n disks are mirrors of each other,
 the system can survive a crash by n-1 disks.
Benefit 2: If we have n disks, read
 performance increases by a factor of n.
 Performance increases =>increasing efficiency




                                                  20
  13.3.5 Disk Scheduling and the
               Elevator Problem
Disk controller will run this algorithm to select
 which of several requests to process first.
Pseudo code:
  requests[] // array of all non-processed data requests
  upon receiving new data request:
      requests[].add(new request)
  while(requests[] is not empty)
      move head to next location
      if(head is at data in requests[])
           retrieves data
           removes data from requests[]
      if(head reaches end)
           reverses head direction                     21
         13.3.5 Disk Scheduling and the
                  Elevator Problem (con’t)
        Events:
                          64000
  Head starting point     56000
 Request data at 8000     48000         Current time
Request data at 24000     40000
Request data at 56000     32000               45.5
                                              26.9
                                              13.6
                                              34.2
                                              56.8
                                              4.3
                                               20
                                               10
                                               30
                                                0
   Get data at 8000       24000
                          16000
Request data at 16000     8000
  Get data at 24000
Request data at 64000
  Get data at 56000
                                       data          time
Request Data at 40000
  Get data at 64000                    8000..        4.3
  Get data at 40000                   24000..        13.6
  Get data at 16000
                                      56000..        26.9
                                      64000..        34.2
                                      40000..        45.5
                                                     22
                                      16000..        56.8
13.3.5 Disk Scheduling and the
         Elevator Problem (con’t)

       Elevator          FIFO
      Algorithm        Algorithm
     data      time   data      time
      8000..   4.3     8000..   4.3
     24000..   13.6   24000..   13.6
     56000..   26.9   56000..   26.9
     64000..   34.2   16000..   42.2
     40000..   45.5   64000..   59.5
     16000..   56.8   40000..   70.8



                                       23
   13.3.6 Prefetching and Large-Scale
                Buffering
If at the application level, we can predict the
 order blocks will be requested, we can load
 them into main memory before they are
 needed.
 This even reduces the cost and even save the time




                                                      24
                  13.4.Disk Failures
• Intermittent Error: Read or write is unsuccessful.
  If we try to read the sector but the correct content of that
  sector is not delivered to the disk controller. Check for the
  good or bad sector. To check write is correct: Read is
  performed. Good sector and bad sector is known by the read
  operation.

• Checksums: Each sector has some additional bits, called the
  checksums. They are set on the depending on the values of
  the data bits stored in that sector. Probability of reading bad
  sector is less if we use checksums. For Odd parity: Odd
  number of 1’s, add a parity bit 1. For Even parity: Even
  number of 1’s, add a parity bit 0. So, number of 1’s becomes
  always even.                                                      25
• Example:
  1. Sequence : 01101000-> odd no of 1’s
       parity bit: 1 -> 011010001
  2. Sequence : 111011100->even no of 1’s
       parity bit: 0 -> 111011100

• Stable -Storage Writing Policy:
  To recover the disk failure known as Media Decay, in which if
  we overwrite a file, the new data is not read correctly. Sectors
  are paired and each pair is said to be X, having left and right
  copies as Xl and Xr respectively and check the parity bit of left
  and right by substituting spare sector of Xl and Xr until the
  good value is returned.
                                                                  26
• The term used for these strategies is RAID or Redundant
  Arrays of Independent Disks.
• Mirroring:
  Mirroring Scheme is referred as RAID level 1 protection
  against data loss scheme. In this scheme we mirror each disk.
  One of the disk is called as data disk and other redundant
  disk. In this case the only way data can be lost is if there is a
  second disk crash while the first crash is being repaired.
• Parity Blocks:
  RAID level 4 scheme uses only one redundant disk no matter
  how many data disks there are. In the redundant disk, the ith
  block consists of the parity checks for the ith blocks of all the
  data disks. It means, the jth bits of all the ith blocks of both
  data disks and redundant disks, must have an even number of
  1’s and redundant disk bit is used to make this condition true.

                                                                  27
• Failures: If out of Xl and Xr, one fails, it can be read form other, but
  in case both fails X is not readable, and its probability is very small
• Write Failure: During power outage,
•      1. While writing Xl, the Xr, will remain good and X can be read
  from Xr
•      2. After writing Xl, we can read X from Xl, as Xr may or may not
  have the correct copy of X.

  Recovery from Disk Crashes:
• To reduce the data loss by Dish crashes, schemes which involve
  redundancy, extending the idea of parity checks or duplicate sectors
  can be applied.




                                                                        28
• Parity Block – Writing
  When we write a new block of a data disk, we need to change that
  block of the redundant disk as well.

• One approach to do this is to read all the disks and compute the
  module-2 sum and write to the redundant disk.
  But this approach requires n-1 reads of data, write a data block and
  write of redundant disk block.
               Total = n+1 disk I/Os
• RAID 5

   RAID 4 is effective in preserving data unless there are two
   simultaneous disk crashes.




                                                                     29
Whatever scheme we use for updating the disks, we need to
 read and write the redundant disk's block. If there are n data
 disks, then the number of disk writes to the redundant disk
 will be n times the average number of writes to any one data
 disk.

  However we do not have to treat one disk as the redundant
  disk and the others as data disks. Rather, we could treat each
  disk as the redundant disk for some of the blocks. This
  improvement is often called RAID level 5.




                                                                   30
                  13.5 Arranging data on disk

• Data elements are represented as records, which stores in
  consecutive bytes in same same disk block.
  Basic layout techniques of storing data :

  Fixed-Length Records
  Allocation criteria - data should start at word boundary.
  Fixed Length record header
  1. A pointer to record schema.
  2. The length of the record.
  3. Timestamps to indicate last modified or last read.




                                                              31
Example

CREATE TABLE employee(
name CHAR(30) PRIMARY KEY,
address VARCHAR(255),
gender CHAR(1),
birthdate DATE
);
Data should start at word boundary and contain header and four
fields name, address, gender and birthdate.




                                                             32
• Packing Fixed-Length Records into Blocks
 Records are stored in the form of blocks on the disk and they
   move into main memory when we need to update or access
   them.
 A block header is written first, and it is followed by series of
   blocks.
 Block header contains the following information:
 Links to one or more blocks that are part of a network of
   blocks.
 Information about the role played by this block in such a
   network.
 Information about the relation, the tuples in this block belong
   to.


                                                                    33
A "directory" giving the offset of each record in the block.
Time stamp(s) to indicate time of the block's last modification
  and/or access



Along with the header we can pack as many record as we can
Along with the header we can pack as many record as we can
in one block as shown in the figure and remaining space will
be unused.




                                                                  34
     13.6 Representing Block and Record Addresses
• Address of a block and Record
   – In Main Memory
       • Address of the block is the virtual memory address of
         the first byte
       • Address of the record within the block is the virtual
         memory address of the first byte of the record
   – In Secondary Memory: sequence of bytes describe the
     location of the block in the overall system
• Sequence of Bytes describe the location of the block : the
  device Id for the disk, Cylinder number, etc.




                                                                 35
• Addresses in Client-Server Systems
• The addresses in address space are represented in two ways
   – Physical Addresses: byte strings that determine the place
     within the secondary storage system where the record can
     be found.
   – Logical Addresses: arbitrary string of bytes of some fixed
     length
• Physical Address bits are used to indicate:
   – Host to which the storage is attached
   – Identifier for the disk
   – Number of the cylinder
   – Number of the track
   – Offset of the beginning of the record

                                                              36
  • Map Table relates logical addresses to physical addresses.


                  Logical        Physical
Logical Address




                                               Physical Address




                                                                  37
• Logical and Structured Addresses
  Purpose of logical address?
  Gives more flexibility, when we
   – Move the record around within the block
   – Move the record to another block
  Gives us an option of deciding what to do when a record
  is deleted?
• Pointer Swizzling
  Having pointers is common in an object-relational
  database systems
  Important to learn about the management of pointers
  Every data item (block, record, etc.) has two addresses:
   – database address: address on the disk
   – memory address, if the item is in virtual memory


                                                             38
  • Translation Table: Maps database address to
    memory address

                   Dbaddr     Mem-addr
Database address




                                                  Memory Address

  • All addressable items in the database have entries
    in the map table, while only those items currently
    in memory are mentioned in the translation table


                                                              39
• Pointer consists of the following two fields
  – Bit indicating the type of address
  – Database or memory address
  – Example 13.17




               Disk                        Memory


                                         Swizzled



            Block 1                        Block 1



                                                     Unswizzled


             Block 2                                              40
• Example 13.7
  Block 1 has a record with pointers to a second record on the
  same block and to a record on another block
  If Block 1 is copied to the memory
   – The first pointer which points within Block 1 can be
      swizzled so it points directly to the memory address of the
      target record
   – Since Block 2 is not in memory, we cannot swizzle the
      second pointer
• Three types of swizzling
   – Automatic Swizzling
        • As soon as block is brought into memory, swizzle all
          relevant pointers.

                                                                41
   – Swizzling on Demand
       • Only swizzle a pointer if and when it is actually
         followed.
   – No Swizzling
       • Pointers are not swizzled they are accesses using the
         database address.
• Unswizzling
   – When a block is moved from memory back to disk, all
     pointers must go back to database (disk) addresses
   – Use translation table again
   – Important to have an efficient data structure for the
     translation table


                                                                 42
• Pinned records and Blocks
• A block in memory is said to be pinned if it cannot be written
  back to disk safely.
• If block B1 has swizzled pointer to an item in block B2, then B2
  is pinned
   – Unpin a block, we must unswizzle any pointers to it
   – Keep in the translation table the places in memory holding
      swizzled pointers to that item
   – Unswizzle those pointers (use translation table to replace
      the memory addresses with database (disk) addresses




                                                                43
            13.7 Records With Variable-Length Fields



A simple but effective scheme is to put all fixed length
fields ahead of the variable-length fields. We then place
in the record header:
1. The length of the record.
2. Pointers to (i.e., offsets of) the beginnings of all the variable-
   length fields. However, if the variable-length fields always
   appear in the same order then the first of them needs no
   pointer; we know it immediately follows the fixed-length
   fields.



                                                                   44
• Records With Repeating Fields
• A similar situation occurs if a record contains a variable
number of Occurrences of a field F, but the field itself is of
fixed length. It is sufficient to group all occurrences of field F
together and put in the record header a pointer to the first.
• We can locate all the occurrences of the field F as follows.
Let the number of bytes devoted to one instance of field F be
L. We then add to the offset for the field F all integer
multiples of L, starting at 0, then L, 2L, 3L, and so on.
• Eventually, we reach the offset of the field following F.
Where upon we stop.



                                                                     45
An alternative representation is to keep the record of fixed
  length, and put the variable length portion - be it fields of
  variable length or fields that repeat an indefinite number of
  times - on a separate block. In the record itself we keep:
• 1. Pointers to the place where each repeating field begins,
  and
• 2. Either how many repetitions there are, or where the
  repetitions end.




                                                             46
• Variable-Format Records
• The simplest representation of variable-format records is a
    sequence of tagged fields, each of which consists of:
1. Information about the role of this field, such as:
    (a) The attribute or field name,
    (b) The type of the field, if it is not apparent from the field
    name and some readily available schema information, and
    (c) The length of the field, if it is not apparent from the type.
2. The value of the field.
There are at least two reasons why tagged fields would make
    sense.



                                                                   47
1. Information integration applications - Sometimes, a relation
   has been constructed from several earlier sources, and these
   sources have different kinds of information For instance, our
   movie star information may have come from several sources,
   one of which records birthdates, some give addresses, others
   not, and so on. If there are not too many fields, we are
   probably best off leaving NULL those values we do not know.

2. Records with a very flexible schema - If many fields of a
  record can repeat and/or not appear at all, then even if we
  know the schema, tagged fields may be useful. For instance,
  medical records may contain information about many tests,
  but there are thousands of possible tests, and each patient
  has results for relatively few of them

                                                              48
• These large values have a variable length, but even if the length is
    fixed for all values of the type, we need to use some special
    techniques to represent these values. In this section we shall
    consider a technique called “spanned records" that can be used to
    manage records that are larger than blocks.
• Spanned records also are useful in situations where records are
    smaller than blocks, but packing whole records into blocks wastes
    significant amounts of space.
    For both these reasons, it is sometimes desirable to allow records
    to be split across two or more blocks. The portion of a record that
    appears in one block is called a record fragment.
If records can be spanned, then every record and record fragment
    requires some extra header information:



                                                                     49
1. Each record or fragment header must contain a bit telling
   whether or not it is a fragment.
2. If it is a fragment, then it needs bits telling whether it is the
   first or last fragment for its record.
3. If there is a next and/or previous fragment for the same
   record, then the fragment needs pointers to these other
   fragments.
Storing spanned records across blocks:




                                                                  50
• BLOBS
• Binary, Large OBjectS = BLOBS
• BLOBS can be images, movies, audio files and other very large
  values that can be stored in files.
• Storing BLOBS
   – Stored in several blocks.
   – Preferable to store them consecutively on a cylinder or
     multiple disks for efficient retrieval.
• Retrieving BLOBS
   – A client retrieving a 2 hour movie may not want it all at the
     same time.
   – Retrieving a specific part of the large data requires an
     index structure to make it efficient. (Example: An index by
     seconds on a movie BLOB.)
                                                               51
• Column Stores:
• An alternative to storing tuples as records is to store each
  column as a record. Since an entire column of a relation may
  occupy far more than a single block, these records may span
  many block, much as long as files do. If we keep the values in
  each column in the same order then we can reconstruct the
  relation from column records




                                                                   52
                             13.8
• Insertion:
 Insertion of records without order
   Records can be placed in a block with empty space or in a
  new block.
  Insertion of records in fixed order
   – Space available in the block
   – No space available in the block (outside the block)
   Structured address
   Pointer to a record from outside the block.
• Insertion in fixed order
Space available within the block
Use of an offset table in the header of each block with pointers to
  the location of each record in the block.
                                                                 53
• The records are slid within the block and the pointers in the
  offset table are adjusted.
No space available within the block (outside the block)
 Find space on a “nearby” block.

   • In case of no space available on a block, look at the
       following block in sorted order of blocks.
   • If space is available in that block ,move the highest
       records of first block 1 to block 2 and slide the records
       around on both blocks.
 Create an overflow block

   • Records can be stored in overflow block.
   • Each block has place for a pointer to an overflow block in
       its header. The overflow block can point to a second
       overflow block as shown below.

                                                                   54
• Deletion:
 Recover space after deletion
   When using an offset table, the records can be slid around
      the block so there will be an unused region in the center
      that can be recovered.
   In case we cannot slide records, an available space list can be
      maintained in the block header.
   The list head goes in the block header and available regions
      hold the links in the list.
  Use of tombstone
   The tombstone is placed in a record in order to avoid
   pointers to the deleted record to point to new records.
       The tombstone is permanent until the entire database is
       reconstructed.
                                                                 55
   If pointers go to fixed locations from which the location of the
       record is found then we put the tombstone in that fixed
       location. (See examples)
   Where a tombstone is placed depends on the nature of the record
       pointers.
   Map table is used to translate logical record address to physical
       address.
• UPDATING RECORDS
• For Fixed-Length Records, there is no effect on
  the storage system
• For variable length records :
  • If length increases, like insertion “slide the records”
  • If length decreases, like deletion we update the space-
    available list, recover the space/eliminate the
    overflow blocks.

                                                                   56
                      Chapter:18
• 18.1 Serial and Serializable Schedule
• A process of assuming that the transactions preserve the
  consistency when executing simultaneously is called
  Concurrency Control.
• This consistency is taken care by Scheduler.
• Concurrency control in database management systems
  (DBMS) ensures that database transactions are performed
  concurrently without the concurrency violating the data
  integrity of a database.
• Executed transactions should follow the ACID rules. The
  DBMS must guarantee that only serializable (unless
  Serializability is intentionally relaxed), recoverable schedules
  are generated.
                                                                57
• It also guarantees that no effect of committed transactions is lost,
  and no effect of aborted (rolled back) transactions remains in the
  related database.
• ACID rules
• Atomicity - Either the effects of all or none of its operations remain
  when a transaction is completed - in other words, to the outside
  world the transaction appears to be indivisible, atomic.

• Consistency - Every transaction must leave the database in a
  consistent state.
•
• Isolation - Transactions cannot interfere with each other. Providing
  isolation is the main goal of concurrency control.

• Durability - Successful transactions must persist through crashes.


                                                                         58
   In the field of databases, a schedule is a list of actions, (i.e.
    reading, writing, aborting, committing), from a set of
    transactions.
    In this example, Schedule D is the set of 3 transactions T1, T2,
    T3. The schedule describes the actions of the transactions as
    seen by the DBMS. T1 Reads and writes to object X, and then
    T2 Reads and writes to object Y, and finally T3 Reads and
    writes to object Z. This is an example of a serial schedule,
    because the actions of the 3 transactions are not interleaved.




                                                                    59
• Serial and Serializable Schedules:
• A schedule that is equivalent to a serial schedule has the
  serializability property.
• In schedule E, the order in which the actions of the
  transactions are executed is not the same as in D, but in the
  end, E gives the same result as D.




                                                                  60
• Serial Schedule TI precedes T2
            T1       T2                 A     B

                                        50    50
           READ (A,t)
           t := t+100
           WRITE (A,t)                  150
                          READ (A,s)
                          s := s*2
                          WRITE (A,s)   300
           READ (B,t)
            t := t+100
           WRITE (B,t)                        150
                          READ (B,s)
                          s := s*2
                          WRITE (B,s)         300




                                                    61
• Non-Serializable Schedule
           T1       T2                 A     B

                                       50    50
          READ (A,t)
          t := t+100
          WRITE (A,t)                  150
                         READ (A,s)
                         s := s*2
                         WRITE (A,s)   300
                         READ (B,s)
                         s := s*2
                         WRITE (B,s)         100
          READ (B,t)
           t := t+100
          WRITE (B,t)                        200




                                                   62
• A Serializable Schedule with details
            T1       T2                 A     B

                                        50    50
           READ (A,t)
           t := t+100
           WRITE (A,t)                  150
                          READ (A,s)
                          s := s*1
                          WRITE (A,s)   150
                          READ (B,s)
                          s := s*1
                          WRITE (B,s)         50
           READ (B,t)
            t := t+100
           WRITE (B,t)                        150




                                                    63
          18.2 Conflict Serializability
• Non-Conflicting Actions
• Two actions are non-conflicting if whenever they
  occur consecutively in a schedule, swapping them
  does not affect the final state produced by the
  schedule. Otherwise, they are conflicting.
• Conflicting Actions: General Rules
• Two actions of the same transaction conflict:
   – r1(A) w1(B)
• Two actions over the same database element conflict, if one
  of them is a write
   – r1(A) w2(A)
   – w1(A) w2(A)

                                                                64
• Conflict Serializable:
We may take any schedule and make as many nonconflicting
    swaps as we wish.
With the goal of turning the schedule into a serial schedule.
 If we can do so, then the original schedule is serializable,
    because its effect on the database state remains the same as
    we perform each of the nonconflicting swaps.
• A schedule is said to be conflict-serializable when the
    schedule is conflict-equivalent to one or more serial
    schedules.
• Another definition for conflict-serializability is that a schedule
    is conflict-serializable if and only if there exists an acyclic
    precedence graph/serializability graph for the schedule.
• Which is conflict-equivalent to the serial schedule <T1,T2>,
    but not <T2,T1>.
                                                                   65
• Conflict equivalent / conflict-serializable
• Let Ai and Aj are consecutive non-conflicting actions that
   belongs to different transactions. We can swap Ai and Aj
   without changing the result.
   Two schedules are conflict equivalent if they can be turned
   one into the other by a sequence of non-conflicting swaps of
   adjacent actions.
   We shall call a schedule conflict-serializable if it is conflict-
   equivalent to a serial schedule
Test for conflict-serializability
• Construct the precedence graph for S and observe if there are any
   cycles.
    – If yes, then S is not conflict-serializable
    – Else, it is a conflict-serializable schedule.

                                                                       66
• Example of a cyclic precedence graph:
    – Consider the below schedule
      S1: r2(A); r1(B); w2(A); r2(B); r3(A); w1(B); w3(A); w2(B);
• Observing the actions of A in the previous example (figure 2), we can
  find that T2 <s1 T3.

• But when we observe B, we get both T1 <s1 T2 and T2 <s1 T1. Thus
  the graph has a cycle between 1 and 2. So, based on this fact we
  can conclude that S1 is not conflict-serializable
• Why the Precedence-Graph test works
• A cycle in the graph puts too many constraints on the order of
  transactions in a hypothetical conflict-equivalent serial schedule.



                                                                        67
• If there is a cycle involving n transactions T1 T2 ..Tn T1
    – Then in the hypothetical serial order, the actions of T1 must
       precede those of T2 which would precede those of T3... up to n.
    – But actions of Tn are also required to precede those of T1.
    – So, if there is a cycle in the graph, then we can conclude that the
       schedule is not conflict-serializable.




                                                                        68
 18.3 Enforcing Serializability by Locks:
• Locks
•   It works as follows :
  – A request from transaction
  – Scheduler checks in the lock table
  – Generates a serializable schedule of actions.




                                                    69
• Consistency of transactions
• Actions and locks must relate each other
   – Transactions can only read & write only if has a lock and
     has not released the lock.
   – Unlocking an element is compulsory.

• Legality of schedules
   – No two transactions can aquire the lock on same element
     without the prior one releasing it.
   • Locking scheduler-
       Grants lock requests only if it is in a legal schedule.
       Lock table stores the information about current locks
       on the elements.
                                                                 70
• A legal schedule of consistent transactions but
  unfortunately it is not a serializable.




                                                71
• The locking scheduler delays requests that
  would result in an illegal schedule.




                                               72
• Two-phase locking
• Guarantees a legal schedule of consistent transactions is
  conflict-serializable.
• All lock requests proceed all unlock requests.
• The growing phase:
   – Obtain all the locks and no unlocks allowed.
• The shrinking phase:
   – Release all the locks and no locks allowed.
   • Failure of 2PL
   2PL fails to provide security against deadlocks.




                                                              73
     18.3 Locking Systems with Several Lock Modes
• Previous locking schemes were too simple to
  be practical.
• Locking Scheme
    – Shared/Read Lock ( For Reading)
    – Exclusive/Write Lock( For Writing)
•   Compatibility Matrices
•   Upgrading Locks
•   Update Locks
•   Increment Locks
                                                    74
• Shared & Exclusive Locks:
• Consistency of Transactions
    – Cannot write without Exclusive Lock
    – Cannot read without holding some lock
• This basically works on these principles –
1. Consistency of Transactions
    – A read action can only proceed a shared or an exclusive
      lock
    – A write lock can only proceed a exclusive lock
    – All locks need to be unlocked before commit
2. Two-phase locking of transactions
    – Locking Must precede unlocking


                                                                75
3. Legality of Schedules
    – An element may be locked exclusively by one transaction
      or by several in shared mode, but not both




                                                                76
• Compatibility Matrices:
• Has a row and column for each lock mode.
   – Rows correspond to a lock held on an element by another
     transaction
   – Columns correspond to mode of lock requested.
   – Example :The column for S says that we can grant a shared
     lock on an element if the only locks held on that element
     currently are shared locks.
• Upgrading Locks
• Suppose a transaction wants to read as well as write :
   – It acquires a shared lock on the element
   – Performs the calculations on the element
   – And when its ready to write, It is granted a exclusive lock.
• Transactions with unpredicted read write locks can use
                                                                  77
  upgrading locks.
• Indiscriminating use of upgrading produces a deadlock.
  (Limitation)
• Example : Both the transactions want to upgrade on the same
  element




                                                           78
18.4 Locking Systems with Several Lock
                Modes
• Locking Scheme
    – Shared/Read Lock ( For Reading)
    – Exclusive/Write Lock( For Writing)


•   Compatibility Matrices
•   Upgrading Locks
•   Update Locks
•   Increment Locks
                                           79
         Shared & Exclusive Locks
• Consistency of Transactions
   – Cannot write without Exclusive Lock
   – Cannot read without holding some lock
• This basically works on 2 principles
   – A read action can only proceed a shared or an exclusive
     lock
   – A write lock can only proceed a exclusice lock
• All locks need to be unlocked before commit


                                                               80
 Shared and exclusive locks (cont.)
• Two-phase locking of transactions
  – Must precede unlocking
• Legality of Schedules
  – An element may be locked exclusively by one transaction or
    by several in shared mode, but not both.




                                                            81
        Compatibility Matrices
• Has a row and column for each lock mode.
  – Rows correspond to a lock held on an element by
    another transaction
  – Columns correspond to mode of lock requested.
  – Example :
                                    LOCK REQUESTED
                                S             X
           LOCK        S       YES            NO
           HOLD        X        NO            NO



                                                      82
               Update locks
• Solves the deadlock occurring in upgrade lock
  method.
• A transaction in an update lock can read but
  cant write.
• Update lock can later be converted to
  exclusive lock.
• An update lock can only be given if the
  element has shared locks.

                                                  83
            Increment Locks
• Used for incrementing & decrementing stored
  values.
• E.g. - Transfer money from one bank to
  another, Ticket selling transactions in which
  number seats are decremented after each
  transaction.



                                              84
• Increment lock
• A increment lock does not enable either read or write locks on
  element.
• Any number of transaction can hold increment lock on an
  element at any time.
• Shared and exclusive locks cannot be granted if an increment
  lock is granted on element




                                                               85
              18.5 Locking Scheduler
  The order in which the individual steps of different
  transactions occur is regulated by the scheduler.
  The general process of assuring that transactions
  preserve consistency when executing simultaneously is
  called concurrency control.
• Architecture of a Locking Scheduler
    The transactions themselves do not request locks, or
   cannot be relied upon to do so. It is the job of the
   scheduler to insert lock actions into the stream of
   reads, writes and other actions that access data.
  Transactions do not locks. Rather the scheduler releases
   the locks when the transaction manager tells it that the
   transaction will commit or abort.


                                                         86
Role of a Scheduler




                      87
Lock Table




             88
• Lock Table
  The lock table is a relation that associates database elements
  with locking information about that element.
  The table is implemented with a hash table using database
  elements as a hash key.
• Size of Lock Table
• The size of the table is proportional to the number of locked
  elements only and not to the entire size of the database since
  any element that is not locked does not appear in the table.
• Group Mode
  The group mode is a summary of the most stringent
  conditions that a transaction requesting a new lock on an
  element faces. Rather than comparing the lock request with
  every lock held by another transaction on the same element,
  we can simplify the grant/deny decision by comparing the
  request with only the group mode.                              89
Structure of Lock Table Entries




                              90
• Handling Lock Requests
• Suppose transaction T requests a lock on A.
  If there is no lock-table entry for A, then surely there are no
  locks on A, so the entry is created and the request is granted.
• If the lock-table entry for A exists then we use it to guide the
  decision about the lock request.
• Handling Unlocks
• If the value of waiting is ‘Yes’ then we need to grant one or
  more locks from the list of requested locks. The different
  approaches for this are:
• First-come-first-served
• Priority to shared locks
• Priority to upgrading


                                                                     91
   18.6 Managing Hierarchies of Database Elements

It Focus on two problems that come up when there id tree
structure to our data.
1. Tree Structure : Hierarchy of lockable elements. And
     How to allow locks on both large elements, like
     Relations and elements in it such as blocks and tuples
     of relation, or individual.
2. Another is data that is itself organized in a tree. A major
     example would be B-tree index.
Locks With Multiple Granularity
“Database Elements” : It is sometime noticeably the
various elements which can be used for locking.
Eg: Tuples, Pages or Blocks, Relations etc.

                                                                 92
                  Example: Bank database




   Small granularity locks: Larger concurrency can achieved.

   Large granularity locks: Some times saves from unserializable
    behavior.

                                                                    93
Warning locks
The solution to the problem of managing locks at different
granularities involves
 a new kind of lock called a “Warning.“
• It is helpful in hierarchical or nested structure .
• It involves both “ordinary” locks and “warning” locks.
•Ordinary locks: Shared(S) and Exclusive(X) locks.
•Warning locks: Intention to shared(IS) and Intention to
  Exclusive(IX) locks.




                                                       94
• Warning Protocols

• 1. To place an ordinary S or X lock on any element. we must
•     begin at the root of the hierarchy.
• 2. If we are at the element that we want to lock, we need look
  no further. We request lock there only
• 3. If the element is down in hierarchy then place warning lock
  on that node respective of shared and exclusive locks and
  then Move on to appropriate child and then try steps 2 or 3
  and until you go to desired node and then request shared or
  exclusive lock.



                                                               95
              Compatibility Matrix
       IS         IX          S           X
IS     YES        YES         YES         NO
IX     YES        YES         NO          NO
S      YES        NO          YES         NO
X      NO         NO          NO          NO


      IS column: Conflicts only on X lock.

     IX column: Conflicts on S and X locks.

     S column: Conflicts on X and IX locks.

       X column: Conflicts every locks.

                                               96
                            Warning Protocols




                                 Consider the relation:
                    M o v i e ( t i t l e , year, length, studioName)
                                    Transaction1 (T1):
                                         SELECT *
                                       FROM Movie
                             WHERE title = 'King Kong';

Transaction2(T2):
                                UPDATE Movie
                                SET year = 1939
                        WHERE title = 'Gone With the Wind';

                                                                        97
           18.7 The Tree Protocol
• ADVANTAGES OF TREE PROTOCOL
• Unlocking takes less time as compared to 2PL
• Freedom from deadlocks
• 18.7.1 MOTIVATION FOR TREE-BASED LOCKING
•  Consider B-Tree Index, treating individual nodes as lockable
  database elements.
• Concurrent use of B-Tree is not possible with standard set of
  locks and 2PL.
• Therefore, a protocol is needed which can assure
  serializability by allowing access to the elements all the way at
  the bottom of the tree even if the 2PL is violated


                                                                 98
18.7.2 ACCESSING TREE STRUCTURED DATA
Assumptions:
   • Only one kind of lock
   • Consistent transactions
   • Legal schedules
   • No 2PL requirement on transaction
Rules:
• First lock can be at any node.
• Subsequent locks may be acquired only after parent node has
  a lock.
• Nodes may be unlocked any time.
• No relocking of the nodes even if the node’s parent is still
  locked
                                                                 99
18.7.3 WHY TREE PROTOCOL WORKS?
• Tree protocol implies a serial order on transactions in the
  schedule.
• Order of precedence:
                               Ti < s Tj
• If Ti locks the root before Tj, then Ti locks every node in
  common with Tj before Tj.




                                                                100
ORDER OF PRECEDENCE




                      101
           18.8 Concurrency control by Timestamps
• What is Timestamping?
• Scheduler assign each transaction T a unique number, it’s
  timestamp TS(T).
• Timestamps must be issued in ascending order, at the time
  when a transaction first notifies the scheduler that it is
  beginning.
• Two methods of generating Timestamps.
   – Use the value of system, clock as the timestamp.
   – Use a logical counter that is incremented after a new
     timestamp has been assigned.
• Scheduler maintains a table of currently active transactions
  and their timestamps irrespective of the method used


                                                                 102
• Timestamps for database element X and commit bit
• RT(X):- The read time of X, which is the highest timestamp of
  transaction that has read X.
• WT(X):- The write time of X, which is the highest timestamp of
  transaction that has write X.
• C(X):- The commit bit for X, which is true if and only if the
  most recent transaction to write X has already committed.
• Physically Unrealizable Behavior
                          Read too late:
• A transaction U that started after transaction T, but wrote a
  value for X before T reads X.

                                         U reads X
                                                     T writes X


                   T start     U start
                Figure: Transaction T tries to write too late     103
Physically Unrealizable Behavior
Write too late
• A transaction U that started after T, but read X before T got a
  chance to write X.
• Dirty Read
• It is possible that after T reads the value of X written by U,
  transaction U will abort.        U reads X
                                                                T writes X


                    T start             U start
                 Figure: Transaction T tries to write too late

                                  U writes X
                                                    T reads X


                              U start         T start       U aborts

                  T could perform a dirty read if it reads X when shown

                                                                             104
Rules for Timestamps-Based scheduling
1. Scheduler receives a request rT(X)
  a) If TS(T) ≥ WT(X), the read is physically realizable.
     1. If C(X) is true, grant the request, if TS(T) > RT(X), set
          RT(X) := TS(T); otherwise do not change RT(X).
     2. If C(X) is false, delay T until C(X) becomes true or
          transaction that wrote X aborts.
  b) If TS(T) < WT(X), the read is physically unrealizable.
       Rollback T.
2. Scheduler receives a request WT(X).
    a) if TS(T) ≥ RT(X) and TS(T) ≥ WT(X), write is physically
    realizable and must be performed.
       1. Write the new value for X,
       2. Set WT(X) := TS(T), and
       3. Set C(X) := false.                                        105
 b) if TS(T) ≥ RT(X) but TS(T) < WT(X), then the write is physically
      realizable, but there is already a later values in X.
         a. If C(X) is true, then the previous writers of X is
  committed, and ignore the write by T.
         b. If C(X) is false, we must delay T.
      c) if TS(T) < RT(X), then the write is physically unrealizable,
      and T must be rolled back.
3. Scheduler receives a request to commit T. It must find all the
   database elements X written by T and set C(X) := true. If any
   transactions are waiting for X to be committed, these
   transactions are allowed to proceed.
4. Scheduler receives a request to abort T or decides to rollback
   T, then any transaction that was waiting on an element X that
   T wrote must repeat its attempt to read or write.
                                                                    106
• Multiversion Timestamps
• Multiversion schemes keep old versions of data item to increase
  concurrency.
• Each successful write results in the creation of a new version of the
  data item written.
• Use timestamps to label versions.
• When a read(X) operation is issued, select an appropriate version of
  X based on the timestamp of the transaction, and return the value
  of the selected version.
• Timestamps and Locking
• Generally, timestamping performs better than locking in situations
  where:
   – Most transactions are read-only.
   – It is rare that concurrent transaction will try to read and write
      the same element.
• In high-conflict situation, locking performs better than timestamps
                                                                    107
Validation based scheduling
• Scheduler keeps a record of what the active transactions are
   doing.
• Executes in 3 phases
    1. Read- reads from RS( ), computes local address
    2. Validate- compares read and write sets
    3. Write- writes from WS( )
• Contains an assumed serial order of transactions.
• Maintains three sets:
    – START( ): set of T’s started but not completed validation.
    – VAL( ): set of T’s validated but not finished the writing phase.
    – FIN( ): set of T’s that have finished.


                                                                  108
                  Expected exceptions

1. Suppose there is a transaction U, such that:
 U is in VAL or FIN; that is, U has validated,
 FIN(U)>START(T); that is, U did not finish before T started
 RS(T) ∩WS(T) ≠φ; let it contain database element X.




2. Suppose there is transaction U, such that:
• U is in VAL; U has successfully validated.
•FIN(U)>VAL(T); U did not finish before T entered its validation phase.
•WS(T) ∩ WS(U) ≠φ; let x be in both write sets.

                                                                          109
Validation rules
• Check that RS(T) ∩ WS(U)= φ for any previously validated U
   that did not finish before T has started i.e. FIN(U)>START(T).
• Check that WS(T) ∩ WS(U)= φ for any previously validated U
   that did not finish before T is validated i.e. FIN(U)>VAL(T)




                                                                    110
                          Solution
 Validation of U:
  Nothing to check
 Validation of T:
   WS(U) ∩ RS(T)= {D} ∩,A,B-=φ
   WS(U) ∩ WS(T)= {D}∩ ,A,C-=φ
 Validation of V:
   RS(V) ∩ WS(T)= {B}∩,A,C-=φ
   WS(V) ∩ WS(T)={D,E}∩ ,A,C-=φ
   RS(V) ∩ WS(U)={B} ∩,D-=φ
 Validation of W:
   RS(W) ∩ WS(T)= {A,D}∩{A,C}={A}
   WS(W) ∩ WS(V)= {A,D}∩{D,E}={D}
   WS(W) ∩ WS(V)= {A,C}∩,D,E-=φ     (W is not validated)


                                                           111
                          Comparison
Concurrency control   Storage Utilization                  Delays
Mechanisms
Locks                 Space in the lock table is           Delays transactions but
                      proportional to the number of        avoids rollbacks
                      database elements locked.
Timestamps            Space is needed for read and         Do not delay the
                      write times with every database      transactions but cause them
                      element, neither or not it is        to rollback unless Interface
                      currently accessed.                  is low
Validation            Space is used for timestamps         Do not delay the
                      and read or write sets for each      transactions but cause them
                      currently active transaction, plus   to rollback unless interface
                      a few more transactions that         is low
                      finished after some currently
                      active transaction began.



                                                                                     112
                  Chapter21
                    21.1
• Need for Information Integration
• All the data in the world could put in a single
  database (ideal database system)
• Databases In are created independently
  hard to design a database to support future
  use
• The use of databases evolves, so we can not
  design a database to support every possible
  future use.
                                                    113
           University Database
• Registrar: to record student and grade
• Bursar: to record tuition payments by students
• Human Resources Department: to record
  employees
• Applications were build using these databases
  like generation of payroll checks, calculation of
  taxes and social security payments to
  government.

                                                 114
               Inconvenient
• change in 1 database would not reflect in the
  other database which had to be performed
  manually.
• Record grades for students who pay tuition
• Want to swim in SJSU aquatic center for free
  in summer vacation?
  (all the cases above cannot achieve the
  function by a single database)
• Solution: one database
                                                  115
              How to integrate
• Start over
  build one database: contains all the legacy
  databases; rewrite all the applications
  result: painful
• Build a layer of abstraction (middleware)
  on top of all the legacy databases
  this layer is often defined by a collection of
  classes
BUT…

                                                   116
• When we try to connect information sources
  that were developed independently, we
  invariably find that sources differ in many
  ways. Such sources are called Heterogeneous,
  and the problem of integrating them is
  referred to as the Heterogeneity Problem.




                                             117
        Heterogeneity Problem
• What is Heterogeneity Problem
  Aardvark Automobile Co.
  1000 dealers has 1000 databases
  to find a model at another dealer
  can we use this command:
SELECT * FROM CARS
            WHERE MODEL=“A6”;

                                      118
          Type of Heterogeneity
•   Communication Heterogeneity
•   Query-Language Heterogeneity
•   Schema Heterogeneity
•   Data type difference
•   Value Heterogeneity
•   Semantic Heterogeneity



                                   119
   Communication Heterogeneity
• Today, it is common to allow access to your
  information using HTTP protocols. However,
  some dealers may not make their databases
  available on net, but instead accept remote
  accesses via anonymous FTP.
• Suppose there are 1000 dealers of Aardvark
  Automobile Co. out of which 900 use HTTP while
  the remaining 100 use FTP, so there might be
  problems of communication between the dealers
  databases.

                                               120
  Query Language Heterogeneity
• The manner in which we query or modify a
  dealer’s database may vary.
• For e.g. Some of the dealers may have
  different versions of database like some might
  use relational database some might not have
  relational database, or some of the dealers
  might be using SQL, some might be using Excel
  spreadsheets or some other database.

                                              121
         Schema Heterogeneity
• Even assuming that the dealers use a
  relational DBMS supporting SQL as the query
  language there might be still some
  heterogeneity at the highest level like
  schemas can differ.
• For e.g. one dealer might store cars in a single
  relation while the other dealer might use a
  schema in which options are separated out
  into a second relation.

                                                 122
          Data type Diffrences
• Serial Numbers might be represented by a
  character strings of varying length at one
  source and fixed length at another. The fixed
  lengths could differ, and some sources might
  use integers rather than character strings.




                                                  123
          Value Heterogeneity
• The same concept might be represented by
  different constants at different sources. The
  color Black might be represented by an integer
  code at one source, the string BLACK at
  another, and the code BL at a third.




                                              124
       Semantic Heterogeneity
• Terms might be given different interpretations
  at different sources. One dealer might include
  trucks in Cars relation, while the another puts
  only automobile data in Cars relation. One
  dealer might distinguish station wagons from
  the minivans, while another doesn’t.




                                                125
                         21.2

 Federations
 The simplest architecture for integrating several
DBs
 One to one connections between all pairs of
DBs
 n DBs talk to each other, n(n-1) wrappers are
needed
 Good when communications between DBs are            126
                   Wrapper
•   Wrapper : a software translates incoming
    queries and outgoing answers. In a result, it
    allows information sources to conform to
    some shared schema.




                                                    127
                      Federations Diagram
           DB1                                     DB2

                             2 Wrappers

                             2 Wrappers

         2 Wrappers                              2 Wrappers
                              2 Wrappers


                              2 Wrappers
           DB3                                     DB4

A federated collection of 4 DBs needs 12 components to translate queries from one
to another.                                                                         128
                         Example

Car dealers want to share their inventory. Each dealer queries the
other’s DB to find the needed car.
Dealer-1’s DB relation: NeededCars(model,color,autoTrans)
Dealer-2’s DB relation: Auto(Serial, model, color)
                    Options(serial,option)



                              wrapper
    Dealer-1’s DB             wrapper                Dealer-2’s DB




                                                                     129
                              Example…
For(each tuple(:m,:c,:a) in NeededCars){
    if(:a=TRUE){/* automatic transmission wanted */
    SELECT serial
    FROM Autos, Options
    WHERE Autos.serial = Options.serial AND Options.option = ‘autoTrans’
           AND Autos.model = :m AND Autos.color =:c;
}
Else{/* automatic transmission not wanted */
     SELECT serial
     FROM Auto
     WHERE Autos.model = :m AND
           Autos.color = :c AND
           NOT EXISTS( SELECT * FROM Options WHERE serial = Autos.serial
                                           AND option=‘autoTrans’);
    }
}

    Dealer 1 queries Dealer 2 for needed cars                              130
             Data Warehouse
   Sources are translated from their local
    schema to a global schema and copied to a
    central DB.
   User transparent: user uses Data Warehouse
    just like an ordinary DB
   User is not allowed to update Data
    Warehouse


                                                 131
Warehouse Diagram
            User    result
            query




             Warehouse


               Combiner


Extractor                    Extractor


Source 1                     Source 2
                                         132
                         Example

Construct a data warehouse from sources DB of 2 car dealers:

Dealer-1’s schema: Cars(serialNo, model,color,autoTrans,cdPlayer,…)
Dealer-2’s schema: Auto(serial,model,color)
                    Options(serial,option)

Warehouse’s schema:
    AutoWhse(serialNo,model,color,autoTrans,dealer)

Extractor --- Query to extract data from Dealer-1’s data:

INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,autoTrans,’dealer1’
From Cars;
                                                                      133
                         Example

Extractor --- Query to extract data from Dealer-2’s data:

INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,’yes’,’dealer2’
FROM Autos,Options
WHERE Autos.serial=Options.serial AND
        option=‘autoTrans’;

INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,’no’,’dealer2’
FROM Autos
WHERE NOT EXISTS ( SELECT * FROM serial =Autos.serial
                 AND option = ‘autoTrans’);
                                                                  134
      Construct Data Warehouse

     There are mainly 3 ways to constructing
     the data in the warehouse:
1)   Periodically reconstructed from the current data in the
     sources, once a night or at even longer intervals.
     Advantages:
     simple algorithms.
     Disadvantages:
     1) need to shut down the warehouse;
     2) data can become out of date.

                                                               135
      Construct Data Warehouse

2)   Updated periodically based on the changes(i.e. each
     night) of the sources.
     Advantages:
     involve smaller amounts of data. (important when warehouse is
     large and needs to be modified in a short period)
     Disadvantages:
     1) the process to calculate changes to the warehouse is complex.
     2) data can become out of date.




                                                                        136
     Construct Data Warehouse

3) Changed immediately, in response to each change or a
   small set of changes at one or more of the sources.
   Advantages:
   data won’t become out of date.
   Disadvantages:
   requires too much communication, therefore, it is
   generally too expensive.
    (practical for warehouses whose underlying sources changes
    slowly.)




                                                                 137
                   Mediators
  Virtual warehouse, which supports a virtual view or
   a collection of views, that integrates several
   sources.
 Mediator doesn’t store any data.
 Mediators’ tasks:
  1)receive user’s query,
  2)send queries to wrappers,
  3)combine results from wrappers,
  4)send the final result to user.

                                                    138
           A Mediator diagram

         User query                 Result


                         Mediator
        Query                                Result
                    Result   Query
    Wrapper                           Wrapper
Query           Result          Query             Result

    Source 1                           Source 2
                                                           139
                          Example

Same data sources as the example of data warehouse, the mediator
Integrates the same two dealers’ source into a view with schema:

AutoMed(serialNo,model,color,autoTrans,dealer)

When the user have a query:

SELECT sericalNo, model
FROM AkutoMed
Where color=‘red’




                                                                   140
                           Example

In this simple case, the mediator forwards the same query to each
Of the two wrappers.

Wrapper1: Cars(serialNo, model, color, autoTrans, cdPlayer, …)
SELECT serialNo,model
FROM cars
WHERE color = ‘red’;

Wrapper2: Autos(serial,model,color); Options(serial,option)
SELECT serial, model
FROM Autos
WHERE color=‘red’;
The mediator needs to interprets serial into serialNo, and then returns the
union of these sets of data to user.                                      141
            21.3

           Mediator




Wrapper               Wrapper




Source 1              Source 2
                                 Query

                                 Result
                                          142
    Wrappers in Mediator-based Systems



 More complicated than that in most data warehouse system.

 Able to accept a variety of queries from the mediator and translate
  them to the terms of the source.
 Communicate the result to the mediator.




                                                                        143
                                      Wrapper
The wrapper(extractor) consists of:

 One or more predefined queries (based on source)
         SQL
         Web page

 Suitable communication mechanism for sending and receiving information to/from
  source/mediator.




                                                                              144
              How to design a wrapper?



Classify the possible queries that the mediator can ask into
templates, which are queries with parameters that represent
constants.




                                                               145
                    Template for Query Patterns
Design a wrapper – Build templates for all possible queries that the mediator can ask.

Mediator schema: AutosMed (serialNo,model,color,autoTrans,dealer)
Source schema: Cars (serialNo,model,color,autoTrans,navi,…)

Mediator -> wrapper for cars of a given color ($c):

                             SELECT *
                             FROM AutoMed
                             WHERE color = ‘$c’;
=>
                             SELECT serialNo,model,color,autoTrans,’dealer1’
                             FROM Cars
                             WHERE color = ‘$c’;

Wrapper Template describing queries for cars of a given color

Templates needed:
 Pow (2,n) for n attributes
 For all possible queries from the mediator
                                                                                         146
                     Wrapper Generators
The software that creates the wrapper is Wrapper Generator.

             Templates




              Wrapper                        Table
             Generator                                        Wrapper

                                               Driver




                                                                   Queries
                                               Source               Results
                                                                              147
                     Wrapper Generators

Wrapper Generator:

 Creates a table that holds the various query patterns contained in templates.
 Source queries associated with each of them.

The Driver:

   Accept a query from the mediator.
   Search the table for a template that matches the query.
   Send the query to the source.
   Return the response to the Mediator.




                                                                                  148
                                     Filters
Consider the Car dealer’s database. The Wrapper template to get the cars of a given
model and color is:

SELECT *
FROM AutoMed
WHERE model = ‘$m’ and color = ‘$c’;
=>
SELECT serialNo,model,color,autoTrans,’dealer1’
FROM Cars
WHERE model = ‘$m’ and color = ‘$c’;

Another approach is to have a Wrapper Filter:

 The Wrapper has a template that returns a superset of what the query wants.
 Filter the returned tuples at the Wrapper and pass only the desired tuples.

Position of the Filter Component:

 At the Wrapper
 At the Mediator
                                                                                      149
                                       Filters
To find the blue cars of model Ford:

 Use the template to extract the blue cars.
 Return the tuples to the Mediator.
 Filter to get the Ford model cars at the Mediator.

Store at the temporary relation:
TempAutos (serialNo,model,color,autoTrans,dealer)

Filter by executing a local query:

SELECT *
FROM TempAutos
WHERE model = ‘FORD’;




                                                       150
       Other Operations at the Wrapper
It is possible to take the joins at the Wrapper and transmit the result to Mediator.

Suppose the Mediator is asked to find dealers and models such that the dealer has two
red cars, of the same model, one with and one without automatic transmission:

SELECT A1.model, A1.dealer
FROM AutosMed A1, AutosMed A2
WHERE A1.model = A2.model AND A1.color = ‘red’ AND A2.color = ‘red’
   AND A1.autoTrans = ‘no’ and A2.autoTrans = ‘yes’;

Wrapper can first obtain all the red cars:

SELECT *
FROM AutosMed
WHERE color = ‘red’;

RedAutos (serialNo,model,color,autoTrans,dealer)



                                                                                       151
      Other Operations at the Wrapper

The Wrapper then performs a join and the necessary selection.

SELECT DISTINCT A1.model, A1.dealer
FROM RedAutos A1, RedAutos A2
WHERE A1.model = A2.model AND
       A1.autoTrans = ‘no’ AND
       A2.autoTrans = ‘yes’;




                                                                152
21.4 Capability Based Optimization
• Introduction
  – Typical DBMS estimates the cost of each query
    plan and picks what it believes to be the best
  – Mediator – has knowledge of how long its sources
    will take to answer
  – Optimization of mediator queries cannot rely on cost measure alone to
    select a query plan
  – Optimization by mediator follows capability based optimization




                                                                       153
 21.4.1 The Problem of Limited Source
                 Capabilities
• Many sources have only Web Based interfaces
• Web sources usually allow querying through a
  query form
• E.g. Amazon.com interface allows us to query about books in many
  different ways.

• But we cannot ask questions that are too
  general
    – E.g. Select * from books;




                                                                     154
21.4.1 The Problem of Limited Source
             Capabilities (con’t)
• Reasons why a source may limit the ways in
  which queries can be asked
  – Earliest database did not use relational DBMS that
    supports SQL queries
  – Indexes on large database may make certain queries
    feasible, while others are too expensive to execute
  – Security reasons
     • E.g. Medical database may answer queries about averages,
       but won’t disclose details of a particular patient's
       information

                                                              155
   21.4.2 A Notation for Describing
              Source Capabilities
 For relational data, the legal forms of queries are described by
  adornments
Adornments – Sequences of codes that
 represent the requirements for the attributes
 of the relation, in their standard order
   f(free) – attribute can be specified or not
   b(bound) – must specify a value for an attribute
    but any value is allowed
   u(unspecified) – not permitted to specify a value
    for a attribute

                                                                     156
    21.4.2 A notation for Describing
           Source Capabilities….(cont’d)
    c[S](choice from set S) means that a value must be
      specified and value must be from finite set S.
    o[S](optional from set S) means either do not
      specify a value or we specify a value from finite set
      S
    A prime (f’) specifies that an attribute is not a part
      of the output of the query
   A capabilities specification is a set of adornments
   A query must match one of the adornments in its capabilities specification


                                                                             157
  21.4.2 A notation for Describing
         Source Capabilities….(cont’d)

E.g. Dealer 1 is a source of data in the form:
   Cars (serialNo, model, color, autoTrans, navi)
   The adornment for this query form is b’uuuu




                                                    158
  21.4.3 Capability-Based Query-Plan
              Selection
• Given a query at the mediator, a capability based
  query optimizer first considers what queries it
  can ask at the sources to help answer the query
• The process is repeated until:
  – Enough queries are asked at the sources to resolve all
    the conditions of the mediator query and therefore
    query is answered. Such a plan is called feasible.
  – We can construct no more valid forms of source queries, yet still cannot
    answer the mediator query. It has been an impossible query.



                                                                         159
  21.4.3 Capability-Based Query-Plan
             Selection (cont’d)
• The simplest form of mediator query where we
  need to apply the above strategy is join relations
• E.g we have sources for dealer 2
   – Autos(serial, model, color)
   – Options(serial, option)
      • Suppose that ubf is the sole adornment for Auto and
        Options have two adornments, bu and uc[autoTrans, navi]
      • Query is – find the serial numbers and colors of Gobi models
        with a navigation system



                                                                   160
   21.4.4 Adding Cost-Based
              Optimization
• Mediator’s Query optimizer is not done when
  the capabilities of the sources are examined
• Having found feasible plans, it must choose
  among them
• Making an intelligent, cost based query
  optimization requires that the mediator knows
  a great deal about the costs of queries
  involved
• Sources are independent of the mediator, so it is difficult to estimate
  the cost

                                                                       161
21.5 Optimizing Mediator Queries
• Chain algorithm – a greedy algorithm
  – answers the query by sending a sequence of
    requests to its sources.
  – Will always find a solution assuming at least one
    solution exists.
  – The solution may not be optimal.




                                                        162
21.5.1 Simplified Adornment Notation
• A query at the mediator is limited to b (bound) and f (free)
  adornments.

• We use the following convention for
  describing adornments:
   – nameadornments(attributes)
   – where:
       • name is the name of the relation
       • the number of adornments = the number of attributes




                                                               163
  21.5.2 Obtaining Answers for
                  Subgoals
• Rules for subgoals and sources:
  – Suppose we have the following subgoal:
    Rx1x2…xn(a1, a2, …, an),
    and source adornments for R are: y1y2…yn.
     • If yi is b or c[S], then xi = b.
     • If xi = f, then yi is not output restricted.
  – The adornment on the subgoal matches the
    adornment at the source:
     • If yi is f, u, or o[S] and xi is either b or f.

                                                         164
        21.5.3 The Chain Algorithm
• Maintains 2 types of information:
    – An adornment for each subgoal.
    – A relation X that is the join of the relations for all
      the subgoals that have been resolved.
• The adornment for a subgoal is b if the mediator query provides a constant
  binding for the corresponding argument of that subgoal.
• X is a relation over no attributes, containing just an empty tuple.




                                                                          165
 21.5.3 The Chain Algorithm (con’t)
First, initialize adornments of subgoals and X.
Then, repeatedly select a subgoal that can be
 resolved. Let Rα(a1, a2, …, an) be the subgoal:
1.    Wherever α has a b, we shall find the argument in R is a constant, or a
      variable in the schema of R.
      Project X onto its variables that appear in R.




                                                                                166
21.5.3 The Chain Algorithm (con’t)
2. For each tuple t in the project of X, issue a
   query to the source as follows (β is a source
   adornment).
 –   If β has b, then the corresponding component of α has b, and we can
     use the corresponding component of t for source query.
 – If β has c[S], and the corresponding component
   of t is in S, then the corresponding component of
   α has b, and we can use the corresponding
   component of t for the source query.
 –   If β has f, and the corresponding component of α is b, provide a
     constant value for source query.
                                                                        167
21.5.3 The Chain Algorithm (con’t)
  – If a component of β is u, then provide no binding
    for this component in the source query.
  – If a component of β is o[S], and the
    corresponding component of α is f, then treat it
    as if it was a f.
  –      If a component of β is o[S], and the corresponding component of α is
         b, then treat it as if it was c[S].

3. Every variable among a1, a2, …, an is now
   bound. For each remaining unresolved subgoal, change its
      adornment so any position holding one of these variables is b.

                                                                           168
21.5.3 The Chain Algorithm (con’t)
4. Replace X with X πs(R), where S is all of the
   variables among: a1, a2, …, an.
5. Project out of X all components that
   correspond to variables that do not appear
                         α

   in the head or in any unresolved subgoal.
•   If every subgoal is resolved, then X is the answer. Else the algorithm fails




                                                                             169
 21.5.3 The Chain Algorithm Example
• Mediator query:
  – Q: Answer(c) ← Rbf(1,a) AND Sff(a,b) AND Tff(b,c)

• Example:
Relation     R                        S                 T
           w            x         x        y        y       z
Data       1            2         2        4        4       6
                 1      3         3        5        5       7
                 1      4                           5       8


Adornment                   bf            c’*2,3,5+f            bu
                                                                     170
  21.5.3 The Chain Algorithm Example
                    (con’t)
• Initially, the adornments on the subgoals are
  the same as Q, and X contains an empty tuple.
   – S and T cannot be resolved as they each have ff adornments, but the
      sources have either a, b or c.
• R(1,a) can be resolved because its adornments are matched by the
  source’s adornments.

• Send R(w,x) with w=1 to get the tables on the
  previous page.



                                                                           171
 21.5.3 The Chain Algorithm Example
                   (con’t)
• Project the subgoal’s relation onto its second component, since only the
  second component of R(1,a) is a variable.



                                   a
                                   2
                                   3
                                   4

• This is joined with X, resulting in X equaling
  this relation.
• Change adornment on S from ff to bf.
                                                                         172
 21.5.3 The Chain Algorithm Example
                   (con’t)
• Now we resolve Sbf(a,b):
    – Project X onto a, resulting in X.
    – Now, search S for tuples with attribute a equivalent to attribute a in X.
                                  a       b
                                  2       4
                                  3       5
• Join this relation with X, and remove a as it doesn’t appear in the head
  nor any unresolved subgoal:


                                      b
                                      4
                                      5                                      173
21.5.3 The Chain Algorithm Example
                  (con’t)
• Now we resolve Tbf(b,c):
                      b   c
                      4   6
                      5   7
                      5   8



• Join this relation with X and project onto the c
  attribute to get the relation for the head.
• Solution is {(6), (7), (8)}.
                                                174
 21.5.4 Incorporating Union Views at
               the Mediator
• This implementation of the Chain Algorithm
  does not consider that several sources can
  contribute tuples to a relation.
• If specific sources have tuples to contribute that other sources may not
  have, it adds complexity.

• To resolve this, we can consult all sources, or
  make best efforts to return all the answers.



                                                                         175
 21.5.4 Incorporating Union Views at
            the Mediator (con’t)
• Consulting All Sources
  – We can only resolve a subgoal when each source for its relation has an
    adornment matched by the current adornment of the subgoal.
  – Less practical because it makes queries harder to
    answer and impossible if any source is down.
• Best Efforts
  – We need only 1 source with a matching
    adornment to resolve a subgoal.
  – Need to modify chain algorithm to revisit each subgoal when that
    subgoal has new bound requirements.

                                                                       176
   21.6 Local-as-View Mediators.
• In a LAV mediator, global predicates defined are
  not views of the source data.

• Expressions are defined for each source with
  global predicates that describe tuples that
  source produces

• Mediator answers the queries by constructing
  the views as provided by the source.

                                                 177
    Motivation for LAV Mediators
• Relationship between the data provided by
  the mediator and the sources is more subtle
• For example, consider the predicate Par(c, p)
  meaning that p is a parent of c which
  represents the set of all child parent facts that
  could ever exist.
• The sources will provide information about
  whatever child-parent facts they know.

                                                 178
          Motivation(contd..)
• There can be sources which may provide
  child-grandparent facts but not child- parent
  facts at all.
• This source can never be used to answer the
  child-parent query under GAV mediators.
• LAV mediators allow to say that a certain
  source provides grand parent facts.
• Used to discover how and when to use the
  source in a given query.
                                              179
  Terminology for LAV Mediation.
• The queries at mediator and those describing the
  source will be single Datalog rules
• A single Datalog rule is called a conjunctive query
• The global predicates of LAV mediator are used as
  subgoals of mediator queries.
• Conjunctive queries define views. Their heads each
  have a unique view predicate that is name of a view.
• Each view definition consists of global predicates and is
  associated with a particular source.
• Each view is constructed with an all-free adornment.

                                                         180
                 Example..
• Consider global predicate Par(c, p) meaning
  that p is a parent of c.
• One source produces parent facts. Its view is
  defined by the conjunctive query-
               V1(c, p)  Par(c, p)
• Another source produces some grand parents
  facts. Then its conjunctive query will be –
       V2(c, g)  Par(c, p) AND Par(p, g)
                                             181
             Example contd..
• The query at mediator will ask for great-grand
  parent facts to be obtained from sources:
Q(w, z)  Par(w, x) AND Par(x, y) AND Par(y, z)
• One solution can be using the parent
  predicate(V1) directly three times.
Q(w, z)  V1(w, x) AND V1 (x, y) AND V1(y, z)
• Another solution can be to use V1(parent facts)
  and V2(grandparent facts).
Q(w, z)  V1(w, x) AND V2(x, z)
  Or Q(w, z)  V2(w, y) AND V1(y, z)

                                                182
         Expanding Solutions.
• Consider a query Q, a solution S that has a
  body whose subgoals are views and each view
  V is defined by a conjunctive query with that
  view as the head.
• The body of V’s conjunctive query can be
  substituted for a subgoal in S that uses the
  predicate V to have a body consisting of only
  global predicates.

                                             183
          Expansion Algorithm
• A solution S has a subgoal V(a1, a2,…an) where
  ai’s can be any variables or constants.
• The view V can be of the form
            V(b1, b2,….bn)  B
Where B represents the entire body.
• V(a1, a2, … an) can be replaced in solution S by
  a version of body B that has all the subgoals
  of B with variables possibly altered.
                                                184
    Expansion Algorithm contd..
 The rules for altering the variables of B are:
1. First identify the local variables B, variables
   that appear in the body but not in the head.
2. If there are any local variables of B that
   appear in B or in S, replace each one by a
   distinct new variable that appears nowhere
   in the rule for V or in S.
3. In the body B, replace each bi by ai for        i
   = 1,2…n.
                                                  185
                 Example.
• Consider the view definitions,
             V1(c, p)  Par(c, p)
       V2(c, g)  Par(c, p) AND Par(p, g)
• One of the proposed solutions S is
       Q(w, z)  V1(w, x) AND V2(x, z)
• The first subgoal with predicate V1 in the
  solution can be expanded as Par(w, x) as there
  are no local variables.
                                              186
             Example Contd.
• The V2 subgoal has a local variable p which
  doesn’t appear in S nor it has been used as a
  local variable in another substitution. So p can
  be left as it is.
• Only x and z are to be substituted for
  variables c and g.
• The Solution S now will be
 Q(w, z)  Par(w, x) AND Par(x, p) AND Par(p,z)

                                                187
Containment of Conjunctive Queries

 A containment mapping from Q to E is a
  function т from the variables of Q to the
  variables and constants of E, such that:
1. If x is the ith argument of the head of Q,
   then т(x) is the ith argument of the head of
   E.
2. Add to т the rule that т(c)=c for any constant
   c. If P(x1,x2,… xn) is a subgoal of Q, then
   P(т(x1), т(x2),… т(xn)) is a subgoal of E.

                                               188
                 Example.
• Consider two Conjunctive queries:
Q1: H(x, y)  A(x, z) and B(z, y)
Q2: H(a, b)  A(a, c) AND B(d, b) AND A(a, d)
• When we apply the substitution,
Т(x) = a, Т(y) = b, Т(z) = d, the head of Q1
  becomes H(a, b) which is the head of Q2.
So,there is a containment mapping from Q1 to
  Q2.
                                                189
             Example contd..
• The first subgoal of Q1 becomes A(a, d) which
  is the third subgoal of Q2.
• The second subgoal of Q1 becomes the
  second subgoal of Q2.
• There is also a containment mapping from Q2
  to Q1 so the two conjunctive queries are
  equivalent.


                                             190
 Why the Containment-Mapping Test
              Works
• Suppose there is a containment mapping т from
  Q1 to Q2.
• When Q2 is applied to the database, we look for
  substitutions σ for all the variables of Q2.
• The substitution for the head becomes a tuple t
  that is returned by Q2.
• If we compose т and then σ, we have a mapping
  from the variables of Q1 to tuples of the
  database that produces the same tuple t for the
  head of Q1.
                                                191
Finding Solutions to a Mediator Query

There can be infinite number of solutions built from
 the views using any number of subgoals and variables.
LMSS Theorem can limit the search which states that
  • If a query Q has n subgoals, then any answer produced by
    any solution is also produced by a solution that has at most
    n subgoals.
If the conjunctive query that defines a view V has in its
 body a predicate P that doesn’t appear in the body of
 the mediator query, then we need not consider any
 solution that uses V.


                                                             192
                 Example.
• Recall the query
Q1: Q(w, z) Par(w, x) AND Par(x, y) AND
                               Par(y, z)
• This query has three subgoals, so we don’t
  have to look at solutions with more than
  three subgoals.



                                               193
   Why the LMSS Theorem Holds
• Suppose we have a query Q with n subgoals
  and there is a solution S with more than n
  subgoals.
• The expansion E of S must be contained in
  Query Q, which means that there is a
  containment mapping from Q to E.
• We remove from S all subgoals whose
  expansion was not the target of one of Q’s
  subgoals under the containment mapping.

                                           194
                     Contd..
• We would have a new conjunctive query S’
  with at most n subgoals.
• If E’ is the expansion of S’ then, E’ is a subset
  of Q.
• S is a subset of S’ as there is an identity
  mapping.
• Thus S need not be among the solutions to
  query Q.

                                                  195
                  21.7


• ENTITY RESOLUTION: Entity resolution is a
  problem that arises in many information
  integration scenarios.
• It refers to determining whether two
  records or tuples do or do not represent
  the same person, organization, place or
  other entity.

                                              196
Deciding whether Records represent a Common Entity


• Two records represent the same individual if the two
  records have similar values for each of the fields
  associated with those records.

• It is not sufficient that the values of corresponding fields
  be identical because of following reasons:
      1. Misspellings
      2. Variant Names
      3. Misunderstanding of Names

                                                                 197
Continue: Deciding whether Records represent a
                Common Entity

    4. Evolution of Values
    5. Abbreviations

  Thus when deciding whether two records represent
  the same entity, we need to look carefully at the
  kinds of discrepancies and use the test that
  measures the similarity of records.



                                                  198
 Deciding Whether Records Represents a
     Common Entity - Edit Distance



• First approach to measure the similarity of records is Edit
  Distance.

• Values that are strings can be compared by counting the
  number of insertions and deletions of characters it takes
  to turn one string into another.

• So the records represent the same entity if their similarity
  measure is below a given threshold.


                                                            199
  Deciding Whether Records Represents a
     Common Entity - Normalization


• To normalize records by replacing certain substrings by
  others. For instance: we can use the table of
  abbreviations and replace abbreviations by what they
  normally stand for.

• Once normalize we can use the edit distance to measure
  the difference between normalized values in the fields.



                                                            200
       Merging Similar Records

• Merging refers to removal of redundant data in two
  records.
• There are many merge rules:
   1. Set the field in which the records disagree to
     the empty string.
   2. (i) Merge by taking the union of the values in
     each field
     (ii) Declare two records similar if at least two of
     the three fields have a nonempty intersection.

                                                       201
Continue: Merging                Similar Records

   Name      Address        Phone
1. Susan    123 Oak St.     818-555-1234
2. Susan    456 Maple St.    818-555-1234
3. Susan    456 Maple St.    213-555-5678

After Merging

  Name                  Address               Phone
(1-2-3) Susan   {123 Oak St.,456 Maple St} {818-555-1234, 213-
                                                 555-5678}


                                                                 202
 Useful Properties of Similarity and Merge
                 Functions


The following properties say that merge operation is a semi
     lattice:
1. Idempotence: Merge of a record with itself yeilds the
     same record.
2. Commutativity: Order of merged records does not
     matter
3. Associativity : The order in which we group records for
     a merger should not matter.


                                                          203
  Continue: Useful Properties of Similarity and
               Merge Functions

There are some other properties that we expect similarity
    relationship to have:
• Idempotence for similarity: A record is always similar to
    itself
• Commutativity of similarity: In deciding whether two
    records are similar it does not matter in which order we list
    them
• Representability: If r is similar to some other record s, but s
    is instead merged with some other record t, then r remains
    similar to the merger of s and t and can be merged with
    that record.
                                                             204
          R-swoosh Algorithm for ICAR Records

• Input: A set of records I, similarity function and a merge function.
• Output: A set of merged records O.
• Method:
            – O:= emptyset;
            – WHILE I is not empty DO BEGIN
                 » Let r be any record in I;
                 » Find, if possible, some record s in O that is similar to r;
                 » IF no record s exists THEN
                                     move r from I to O
                 » ELSE BEGIN
                          delete r from I;
                          delete s from O;
                           add the merger of r and s to I;
                 » END;
                 » END;
                                                                           205
  Other Approaches to Entity Resolution


The other approaches to entity resolution
 are :

  – Non- ICAR Datasets
  – Clustering
  – Partitioning

                                            206
Other Approaches to Entity Resolution - Non
             ICAR Datasets



Non ICAR Datasets : We can define a dominance relation
  r<=s that means record s contains all the information
  contained in record r.

  If so, then we can eliminate record r from further
  consideration.




                                                          207
  Other Approaches to Entity Resolution –
         Clustering & Partitioning


Clustering: Clustering refers to creating clusters for members
   that are similar to each other

Partitioning: We can group the records, perhaps several times,
   into groups that are likely to contain similar records and look
   only within each group for pairs of similar records.




                                                                     208
                 Chapter 16
                    16.1
• Query Compiler
• The query-compiler is a set of tools for the
  inspection of the process of query
  compilation.
• It shows how a SQL query is parsed, translated
  in relational algebra and optimized.



                                              209
             Query Compiler
• Query Compiler perform the following
  operations :
 parse the query which is represented as a
  parse tree.
Represent parse tree as an expression tree of
  relational algebra.
Turn relational algebra into physical query
  plan.

                                                 210
Query Compiler - Parsing
             Query

             Parser



          Preprocessor



          Logical Plan
           Generator


        Query rewriter

       logical query plan

                            211
   Syntax Analysis And Parse Tree
• The job of a parse tree is:
• It takes text written in SQL language and
  convert it into a parse tree whose nodes are
  correspond to either.
• ATOMS-are keywords, constants, operators,
  names and parenthesis.
• Syntax categories : names for families of
  query’s subpart.

                                                 212
               Grammar
• <Query> ::= <SFW>
• <Query> ::= (<Query>)




                          213
                   Rules
• <SFW> ::= SELECT <SelList> FROM <FromList>
  WHERE <Condition>
• Select-List :
     <SelList> ::= <Attribute>,<SelList>
     <SelList>::= <Attribute>
• From-List :
     <FromList>::= <Relation>,<FromList>
     <FromList>::= <Relation>

                                           214
                   Rules
• Conditions:
   <Condition>::= <Condition> AND <Condition>
   <Condition>::= <Tuple> IN <Query>
   <Condition>::= <Attribute> = <Attribute>
   <Condition>::= <Attribute> LIKE
  <Pattern>
• Tuple:
  <Tuple>::= <Attribute>
                                            215
                    Tables
• StarsIn(movieTitle, movieyear, starName)
• MovieStar(name, address, gender, birthdate)

• We want to find titles of movies that have at
  least one star born in 1960.




                                                  216
        Query should be like….
• SELECT movieTitle
  FROM StarsIn
  WHERE starName I N (
  SELECT name
  FROM Moviestar
  WHERE birthdate LIKE '%1960'
  );

                                 217
                                 Parse Tree
   <Query>

    <SFW>

    SELECT <SelList> FROM <FromList> WHERE <Condition>

    <Attribute>          <RelName>      <Tuple> IN <Query>

    movieTitle        StarsIn               <Attribute>     ( <Query> )

                                starName            <SFW>

    SELECT <SelList> FROM <FromList> WHERE <Condition>

    <Attribute>         <RelName>            <Attribute> LIKE <Pattern>

    Name          MovieStar     birthdate                      ‘%1960’


                                                                          218
               Preprocessor
• It does semantic checking.
• Functions of preprocessor:
1.Check relations uses.
2.Check and resolves attribute uses.
3.Check types.




                                       219
                       16.2

• Pushing Selections
• It is, replacing the left side of one of the rules
  by its right side.
• In pushing selections we first a selection as
  far up the tree as it would go, and then push
  the selections down all possible branches.



                                                   220
• Let’s take an example:
• S t a r s I n ( t i t l e , year, starName)
• Movie(title, year, length, incolor, studioName,
  producerC#)

• Define view MoviesOf 1996 by:
  CREATE VIEW MoviesOfl996 AS
  SELECT *
  FROM Movie
 ,WHERE year = 1996;
                                                    221
• "which stars worked for which studios in
  1996?“ can be given by a SQL Query:

  SELECT starName, studioName
  FROM MoviesOfl996 NATURAL JOIN StarsIn;




                                             222
                        ΠstarName,studioName



          O Year=1996                         StarsIn


              Movie

Logical query plan constructed from definition of a query and view




                                                                     223
Improving the query plan by moving selections up and down the tree


                     ΠstarName,studioName



       O Year=1996                      O Year=1996


           Movie                            StarsIn




                                                                     224
       Laws Involving Projection
• "pushing" projections really involves introducing a
   new projection somewhere below an existing
   projection.
• projection keeps the number of tuples the same and
   only reduces the length of tuples.
• To describe the transformations of extended
   projection Consider a term E + x on the list for a
   projection, where E is an attribute or an expression
   involving attributes and constants and x is an output
   attribute.


                                                       225
                       Example

• Let R(a, b, c) and S(c, d, e) be two relations. Consider the
  expression x,+,,,, b+y(R w S). The input attributes of the
  projection are a,b, and e, and c is the only join attribute.
  We may apply the law for pushing projections below
  joins to get the equivalent expression:

          Πa+e->x,b->y(Πa,b,c(R)       Πc,e(S))

• Eliminating this projection and getting a third
  equivalent expression:Πa+e->x, b->y( R       Πc,e(S))
                                                           226
• In addition, we can perform a projection
  entirely before a bag union. That is:

           ΠL(R UB S)= ΠL(R) )UB ΠL(S)




                                             227
      Laws About Joins and Products

• laws that follow directly from the definition of the join:

                   R c S=       Oc(   R * S)

• R     S = ΠL( O c ( R * S) ) , where C is the condition that
  equates each pair of attributes from R and S with the same
  name. and L is a list that includes one attribute from each
  equated pair and all the other attributes of R and S.
• We identify a product followed by a selection as a join of
  some kind.
                                                               228
 Laws Involving Duplicate Elimination

• The operator δ which eliminates duplicates from
  a bag can be pushed through many but not all
  operators.
• In general, moving a δ down the tree reduces the
  size of intermediate relations and may therefore
  beneficial.
• Moreover, sometimes we can move δ to a
  position where it can be eliminated
  altogether,because it is applied to a relation that is
  known not to possess duplicates.
                                                     229
• δ (R)=R if R has no duplicates. Important cases
  of such a relation R include:
  a) A stored relation with a declared primary
  key, and
  b) A relation that is the result of a γ operation,
  since grouping creates a relation with no
  duplicates.


                                                   230
• Several laws that "push" δ through other
  operators are:
• δ (R*S) =δ(R) * δ(S)
• δ (R     S)=δ(R)    δ(S)
• δ (R c S)=δ(R) c δ(S)
• δ (O c (R))=O c (δ(R))

• We can also move the δ to either or both of the
  arguments of an intersection:
• δ (R ∩B S) = δ(R) ∩B S = R ∩B δ (S) = δ(R) ∩B δ (S)
                                                        231
Laws Involving Grouping and Aggregation

• When we consider the operator γ, we find that
  the applicability of many transformations
  depends on the details of the aggregate
  operators used. Thus we cannot state laws in
  the generality that we used for the other
  operators. One exception is that a γ absorbs a δ
  . Precisely:
• δ(γL(R))=γL(R)
                                               232
• let us call an operator γ duplicate-impervious
  if the only aggregations in L are MIN and/or
  MAX then:

• γ L(R) = γ L (δ(R)) provided γL is duplicate-
  impervious.



                                                  233
                   Example

• Suppose we have the relations
  MovieStar(name , addr , gender, birthdate)
  StarsIn(movieTitle, movieyear, starname)
  and we want to know for each year the birthdate
  of the youngest star to appear in a movie that
  year. We can express this query as:
  SELECT movieyear, MAX(birth date)
  FROM MovieStar, StarsIn
  WHERE name = starName
  GROUP BY movieyear;
                                                234
 γ movieYear, MAX ( birthdate )

          name = starName
     O




         MovieStar StarsIn

Initial logical query plan for the query

                                           235
• Some transformations that we can apply to Fig are
  1. Combine the selection and product into an equijoin.
  2.Generate a δ below the γ , since the γ is duplicate-
     impervious.
  3. Generate a Π between the γ and the introduced δ to
      project onto movie-Year and birthdate, the only
      attributes relevant to the γ




                                                      236
γ movieYear, MAX ( birthdate )

   Π movieYear, birthdate

             δ

      name = starName

      MovieStar StarsIn

Another query plan for the query
                                   237
  γ movieYear, MAX ( birthdate )

     Π movieYear, birthdate

          name = starName

      δ                    δ

Π birthdate,name Π movieYear,starname

  MovieStar            StarsIn

    third query plan for Example        238
                        16.3
• Parsing
• Goal is to convert a text string containing a
  query into a parse tree data structure:
  – leaves form the text string (broken into lexical
    elements)
  – internal nodes are syntactic categories
• Uses standard algorithmic techniques from
  compilers
  – given a grammar for the language (e.g., SQL),
    process the string and build the tree              239
Example: SQL query
SELECT title
FROM StarsIn
WHERE starName IN (
         SELECT name
         FROM MovieStar
         WHERE birthdate LIKE ‘%1960’
);

(Find the movies with stars born in 1960)



Assume we have a simplified grammar for SQL.

                                               240
Example: Parse Tree
                               <Query>

                                <SFW>

 SELECT      <SelList> FROM <FromList>        WHERE           <Condition>

    <Attribute>                <RelName>                   <Tuple> IN <Query>

          title                 StarsIn                 <Attribute>   ( <Query> )

                                                         starName        <SFW>

  SELECT           <SelList>     FROM      <FromList>        WHERE    <Condition>

           <Attribute>            <RelName>               <Attribute> LIKE <Pattern>

                  name           MovieStar               birthDate          ‘%1960’

                                                                                 241
             The Preprocessor

• It replaces each reference to a view with a
  parse (sub)-tree that describes the view (i.e., a
  query)
• It does semantic checking:
  – are relations and views mentioned in the schema?
  – are attributes mentioned in the current scope?
  – are attribute types correct?



                                                  242
  Convert Parse Tree to Relational
              Algebra

• The complete algorithm depends on specific
  grammar, which determines forms of the parse
  trees
• Here is a flavor of the approach




                                            243
                       Conversion

• Suppose there are no subqueries.

• SELECT att-list FROM rel-list WHERE cond

        is converted into

 PROJatt-list(SELECTcond(PRODUCT(rel-list))), or


        att-list(cond( X (rel-list)))
                                                   244
SELECT movieTitle
FROM StarsIn, MovieStar
WHERE starName = name AND birthdate LIKE '%1960';

                             <Query>

                             <SFW>

    SELECT <SelList> FROM <FromList>     WHERE    <Condition>

       <Attribute>     <RelName> , <FromList>                  AND <Condition>

         movieTitle    StarsIn    <RelName>               <Attribute> LIKE <Pattern>


                                  MovieStar                   birthdate    '%1960'

                                        <Condition>

                                  <Attribute> = <Attribute>

                                  starName            name
                                                                                 245
Equivalent Algebraic Expression
              Tree
           movieTitle

    starname = name AND birthdate LIKE '%1960'
                        X

         StarsIn             MovieStar

                                                  246
        Handling Subqueries

• Recall the (equivalent) query:
           SELECT title
           FROM StarsIn
           WHERE starName IN (
                    SELECT name
                    FROM MovieStar
                    WHERE birthdate LIKE ‘%1960’
           );
• Use an intermediate format called two-
  argument selection

                                                   247
Example: Two-Argument Selection
                         title
                           
          StarsIn                  <condition>


                        <tuple>         IN   name
                     <attribute>   birthdate LIKE ‘%1960’
                    starName                 MovieStar




                                                             248
     Converting Two-Argument
             Selection
• To continue the conversion, we need rules for
  replacing two-argument selection with a
  relational algebra expression
• Different rules depending on the nature of the
  sub query
• Here is shown an example for IN operator and
  uncorrelated query (sub query computes a
  relation independent of the tuple being tested)


                                                    249
                      Rules for IN
                                               C
R       <Condition>                              X



    t       IN    S                    R          
                                                           S




                            C is the condition that equates
                            attributes in t with corresponding
                            attributes in S


                                                                 250
Example: Logical Query Plan
                     title
            starName=name
                      
           StarsIn        name
                      birthdate LIKE ‘%1960’
                              MovieStar




                                                251
 What if Subquery is Correlated?

• Example is when subquery refers to the current
  tuple of the outer scope that is being tested
• More complicated to deal with, since subquery
  cannot be translated in isolation
• Need to incorporate external attributes in the
  translation
• Some details are in textbook


                                                   252
Improving the Logical Query Plan

• There are numerous algebraic laws
  concerning relational algebra operations
• By applying them to a logical query plan
  judiciously, we can get an equivalent query
  plan that can be executed more efficiently
• Next we'll survey some of these laws


                                                253
Example: Improved Logical Query Plan
                  title

              starName=name



            StarsIn       name
                      birthdate LIKE ‘%1960’
                            MovieStar




                                                254
    Associative and Commutative
             Operations
•   product
•   natural join
•   set and bag union
•   set and bag intersection

associative: (A op B) op C = A op (B op C)
commutative: A op B = B op A


                                              255
      Laws Involving Selection

• Selections usually reduce the size of the
  relation
• Usually good to do selections early, i.e.,
  "push them down the tree"
• Also can be helpful to break up a complex
  selection into parts


                                               256
            Selection Splitting

•  C1 AND C2 (R) =  C1 (  C2 (R))

•  C1 OR C2 (R) = ( C1 (R)) Uset ( C2 (R))
  if R is a set

•  C1 (  C2 (R)) =  C2 (  C1 (R))

                                                257
  Selection and Binary Operators
• Must push selection to both arguments:
   –  C (R U S) =  C (R) U  C (S)
• Must push to first arg, optional for 2nd:
   –  C (R - S) =  C (R) - S
   –  C (R - S) =  C (R) -  C (S)
• Push to at least one arg with all attributes
  mentioned in C:
   – product, natural join, theta join, intersection
   – e.g.,  C (R X S) =  C (R) X S, if R has all the atts in C


                                                                   258
  Pushing Selection Up the Tree
• Suppose we have relations
  – StarsIn(title,year,starName)
  – Movie(title,year,len,inColor,studioName)
• and a view
  – CREATE VIEW MoviesOf1996 AS
            SELECT *
            FROM Movie
            WHERE year = 1996;
• and the query
  – SELECT starName, studioName
    FROM MoviesOf1996 NATURAL JOIN StarsIn;
                                               259
The Straightforward Tree

         starName,studioName




  year=1996       StarsIn

                         Remember the rule
   Movie
                         C(R S) = C(R)   S?


                                                260
        The Improved Logical Query Plan

       starName,studioName           starName,studioName                  starName,studioName



                                        year=1996


                                                                       year=1996 year=1996
year=1996       StarsIn


   Movie                                                                Movie              StarsIn
                                     Movie            StarsIn

                    push selection                             push selection
                    up tree                                    down tree
                                                                                                  261
 Grouping Assoc/Comm Operators
• Groups together adjacent joins, adjacent unions, and
  adjacent intersections as siblings in the tree
• Sets up the logical QP for future optimization when
  physical QP is constructed: determine best order for
  doing a sequence of joins (or unions or intersections)




                                     U       D   E   F
     U       D   E   F


 A       U                      A    B   C


     B       C
                                                           262
                      16.4
• Estimating the Cost of Operations
• After getting to the logical query plan, we turn
  it into physical plan.
• Consider all the possible physical plan and
  estimate their costs – this evaluation is known
  as cost-based enumeration.
• The one with least estimated cost is the one
  selected to be passed to the query-execution
  engine.
                                                 263
              Physical Plan
       For each physical plan select
• An order and grouping for associative-and-
  commutative operations like joins, unions.
• An Algorithm for each operator in the logical plan.
  eg: whether nested loop join or hash join to be
  used
• Additional operators that are needed for the
  physical plan but that were not present explicitly
  in the logical plan. eg: scanning, sorting
• The way in which arguments are passed from one
  operator to the next.

                                                   264
  Estimating Sizes of Intermediate Relations


Rules for estimating the number of tuples in an
   intermediate relation:
1. Give accurate estimates
2. Are easy to compute
3. Are logically consistent
• Objective of estimation is to select best
   physical plan with least cost.
                                                  265
  Estimating the Size of a Projection

We should treat a classical, duplicate-eliminating
projection as a bag-projection.
The projection is different from the other operators,
in that the size of the result is computable. Since a
projection produces a result tuple for every
argument tuple, the only change in the output size is
the change in the lengths of the tuples.



                                                     266
Estimating the Size of a Selection(1)

• While performing selection, we may reduce the
  number of tuples but the sizes of tuple remain
  same.
• Let               , ,where A is an attribute of
  R and C is a constant. Then we recommend as
  an estimate:
            T(S) =T(R)/V(R,A)
• The rule above surely holds if all values of
  attribute A occur equally often in the database.

                                                     267
Estimating the Size of a Selection(2)

• If               ,then our estimate for
 T(s) is: T(S) = T(R)/3
• We may use T(S)=T(R)(V(R,a) -1 )/ V(R,a) as
  an estimate.
• When the selection condition C is the And of
  several equalities and inequalities, we can treat
  the selection        as a cascade of simple
  selections, each of which checks for one of the
  conditions.

                                                      268
Estimating the Size of a Selection(3)

• A less simple, but possibly more accurate estimate of the
  size of                 is to assume that C1 and     of
  which satisfy C2, we would estimate the number of tuples
  in S as



 In explanation,              is the fraction of tuples that do
  not satisfy C1, and             is the fraction that do not
  satisfy C2. The product of these numbers is the fraction of
  R’s tuples that are not in S, and 1 minus this product is the
  fraction that are in S.

                                                                  269
       Estimating the Size of a Join
• two simplifying assumptions:
  1. Containment of Value Sets
  If R and S are two relations with attribute Y and V(R,Y)<=V(S,Y) then every
  Y-value of R will be a Y-value of S.

 2. Preservation of Value Sets
  Join a relation R with another relation S with attribute A in R and not in S
  then all distinct values of A are preserved and not lost.V(S    R,A) = V(R,A)

 Under these assumptions, we estimate
 T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y))

                                                                             270
Natural Joins With Multiple Join Attributes

  Of the T(R),T(S) pairs of tuples from R and S, the expected
  number of pairs that match in both y1 and y2 is:

  T(R)T(S)/max(V(R,y1), V(S,y1)) max(V(R, y2), V(S, y2))
  In general, the following rule can be used to estimate the size of a
   natural join when there are any number of attributes shared between
   the two relations.

   The estimate of the size of R     S is computed by
   multiplying T(R) by T(S) and dividing by the largest of V(R,y)
   and V(S,y) for each attribute y that is common to R and S.




                                                                         271
      Joins of Many Relations(1)
• rule for estimating the size of any join
  Start with the product of the number of tuples in
  each relation. Then, for each attribute A appearing
  at least twice, divide by all but the least of V(R,A)’s.
  We can estimate the number of values that will
  remain for attribute A after the join. By the
  preservation-of-value-sets assumption, it is the least
  of these V(R,A)’s.


                                                         272
          Joins of Many Relations(2)

  Based on the two assumptions-containment and
  preservation of value sets:
• No matter how we group and order the terms in a
  natural join of n relations, the estimation of rules,
  applied to each join individually, yield the same
  estimate for the size of the result. Moreover, this
  estimate is the same that we get if we apply the rule
  for the join of all n relations as a whole.



                                                      273
Estimating Sizes for Other Operations

 • Union: the average of the sum and the
   larger.
 • Intersection:
 • approach1: take the average of the
   extremes, which is the half the smaller.
 • approach2: intersection is an extreme
   case of the natural join, use the formula
 • T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y))
                                               274
Estimating Sizes for Other Operations

• Difference: T(R)-(1/2)*T(S)

• Duplicate Elimination: take the smaller of
  (1/2)*T(R) and the product of all the V(R,   )’s.

• Grouping and Aggregation: upper-bound the
  number of groups by a product of V(R,A)’s, here
  attribute A ranges over only the grouping
  attributes of L. An estimate is the smaller of
  (1/2)*T(R) and this product.

                                                      275
                            16.5
• Whether selecting a logical query plan or constructing a
  physical query plan from a logical plan, the query optimizer
  needs to estimate the cost of evaluating certain expressions.
• We shall assume that the "cost" of evaluating an expression is
  approximated well by the number of disk I/O's performed.




                                                              276
The number of disk I/O’s, in turn, is influenced by:

1. The particular logical operators chosen to implement the
   query, a matter decided when we choose the logical query
   plan.
2. The sizes of intermediate results (whose estimation we
   discussed in Section 16.4)
3. The physical operators used to implement logical operators.
   e.g.. The choice of a one-pass or two-pass join, or the choice
   to sort or not sort a given relation.
4. The ordering of similar operations, especially joins
5. The method of passing arguments from one physical operator
   to the next.


                                                                277
      Obtaining Estimates for Size Parameter
• The formulas of Section 16.4 were predicated on knowing
  certain important parameters, especially T(R), the number of
  tuples in a relation R, and V(R, a), the number of different
  values in the column of relation R for attribute a.
• A modern DBMS generally allows the user or administrator
  explicitly to request the gathering of statistics, such as T(R)
  and V(R, a). These statistics are then used in subsequent
  query optimizations to estimate the cost of operations.
• By scanning an entire relation R, it is straightforward to count
  the number of tuples T(R) and also to discover the number of
  different values V(R, a) for each attribute a.
• The number of blocks in which R can fit, B(R), can be
  estimated either by counting the actual number of blocks
  used (if R is clustered), or by dividing T(R) by the number of
  tuples per block



                                                                 278
                Computation of Statistics

• Periodic re-computation of statistics is the norm in most
  DBMS's, for several reasons.
   – First, statistics tend not to change radically in a short time.
   – Second, even somewhat inaccurate statistics are useful as long as they
     are applied consistently to all the plans.
   – Third, the alternative of keeping statistics up-to-date can make the
     statistics themselves into a "hot-spot" in the database; because
     statistics are read frequently, we prefer not to update them frequently
     too.




                                                                          279
• The recomputation of statistics might be triggered
  automatically after some period of time, or after some
  number of updates.
• However, a database administrator noticing, that poor-
  performing query plans are being selected by the query
  optimizer on a regular basis, might request the recomputation
  of statistics in an attempt to rectify the problem.
• Computing statistics for an entire relation R can be very
  expensive, particularly if we compute V(R, a) for each
  attribute a in the relation.
• One common approach is to compute approximate statistics
  by sampling only a fraction of the data. For example, let us
  suppose we want to sample a small fraction of the tuples to
  obtain an estimate for V(R, a).


                                                             280
   Heuristics for Reducing the Cost of Logical Query
                         Plans

• One important use of cost estimates for queries or sub-
  queries is in the application of heuristic transformations of the
  query.
• We have already observed previously how certain heuristics
  applied independent of cost estimates can be expected
  almost certainly to improve the cost of a logical query plan.
• However, there are other points in the query optimization
  process where estimating the cost both before and after a
  transformation will allow us to apply a transformation where
  it appears to reduce cost and avoid the transformation
  otherwise.
• In particular, when the preferred logical query plan is being
  generated, we may consider a number of optional
  transformations and the costs before and after.

                                                                 281
• Because we are estimating the cost of a logical query plan, so
  we have not yet made decisions about the physical operators
  that will be used to implement the operators of relational
  algebra, our cost estimate cannot be based on disk I/Os.
• Rather, we estimate the sizes of all intermediate results using
  the techniques of Section 16.1, and their sum is our heuristic
  estimate for the cost of the entire logical plan.
• For example,




                                                                282
• Consider the initial logical query plan of as shown below,




                                                               283
• The statistics for the relations R and S be as follows




• To generate a final logical query plan from, we shall insist that the selection be
  pushed down as far as possible. However, we are not sure whether it makes
  sense to push the δ below the join or not. Thus, we generate from the two
  query plans shown in next slide. They differ in whether we have chosen to
  eliminate duplicates before or after the join.


                                                                              284
(a)   (b)




            285
• We know how to estimate the size of the result of the
  selections, we divide T(R) by V(R, a) = 50.
• We also know how to estimate the size of the joins; we
  multiply the sizes of the arguments and divide by max(V(R, b),
  V(S, b)), which is 200.




                                                              286
   Approaches to Enumerating Physical Plans

• Let us consider the use of cost estimates in the conversion of
  a logical query plan to a physical query plan.
• The baseline approach, called exhaustive, is to consider all
  combinations of choices (for each of issues like order of joins,
  physical implementation of operators, and so on).
• Each possible physical plan is assigned an estimated cost, and
  the one with the smallest cost is selected.




                                                                 287
• There are two broad approaches to exploring the space of
  possible physical plans:
   – Top-down: Here, we work down the tree of the logical query plan from
     the root.
   – Bottom-up: For each sub-expression of the logical-query-plan tree, we
     compute the costs of all possible ways to compute that sub-
     expression. The possibilities and costs for a sub-expression E are
     computed by considering the options for the sub-expressions for E,
     and combining them in all possible ways with implementations for the
     root operator of E.




                                                                        288
       Branch-and-Bound Plan Enumeration

• This approach, often used in practice, begins by using
  heuristics to find a good physical plan for the entire logical
  query plan. Let the cost of this plan be C. Then as we consider
  other plans for sub-queries, we can eliminate any plan for a
  sub-query that has a cost greater than C, since that plan for
  the sub-query could not possibly participate in a plan for the
  complete query that is better than what we already know.
• Likewise, if we construct a plan for the complete query that
  has cost less than C, we replace C by the cost of this better
  plan in subsequent exploration of the space of physical query
  plans.



                                                               289
                       Hill Climbing

• This approach, in which we really search for a “valley” in the
  space of physical plans and their costs; starts with a
  heuristically selected physical plan.
• We can then make small changes to the plan, e.g., replacing
  one method for an operator by another, or reordering joins by
  using the associative and/or commutative laws, to find
  "nearby" plans that have lower cost.
• When we find a plan such that no small modification yields a
  plan of lower cost, we make that plan our chosen physical
  query plan.




                                                              290
                 Dynamic Programming

• In this variation of the general bottom-UP strategy, we keep
  for each sub-expression only the plan of least cost.
• As we work UP the tree, we consider possible
  implementations of each node, assuming the best plan for
  each sub-expression is also used.




                                                                 291
                Selinger-Style Optimization

• This approach improves upon the dynamic-programming
  approach by keeping for each sub-expression not only the
  plan of least cost, but certain other plans that have higher
  cost, yet produce a result that is sorted in an order that may
  be useful higher up in the expression tree. Examples of such
  interesting orders are when the result of the sub-expression is
  sorted on one of:
   – The attribute(s) specified in a sort (r) operator at the root
   – The grouping attribute(s) of a later group-by (γ) operator.
   – The join attribute(s) of a later join.




                                                                     292
Significance of Left and Right Join
Arguments
• The argument relations in joins determine the
  cost of the join
• The left argument of the join is
  – Called the build relation
  – Assumed to be smaller
  – Stored in main-memory




                                              293
16.6
• Significance of Left and Right Join Arguments
• The right argument of the join is
  – Called the probe relation
  – Read a block at a time
  – Its tuples are matched with those of build relation
• The join algorithms which distinguish between
  the arguments are:
  – One-pass join
  – Nested-loop join
  – Index join


                                                          294
Join Trees
• Order of arguments is important for joining
  two relations
• Left argument, since stored in main-memory,
  should be smaller
• With two relations only two choices of join
  tree
• With more than two relations, there are n!
  ways to order the arguments and therefore n!
  join trees, where n is the no. of relations
                                             295
Join Trees
• Order of arguments is important for joining
  two relations
• Left argument, since stored in main-memory,
  should be smaller
• With two relations only two choices of join
  tree
• With more than two relations, there are n!
  ways to order the arguments and therefore n!
  join trees, where n is the no. of relations
                                             296
Join Trees
• Total # of tree shapes T(n) for n relations given
  by recurrence:



•   T(1) = 1
•   T(2) = 1
•   T(3) = 2
•   T(4) = 5 … etc

                                                 297
Left-Deep Join Trees
• Consider 4 relations. Different ways to join
  them are as follows




                                                 298
• In fig (a) all the right children are leaves. This
  is a left-deep tree
• In fig (c) all the left children are leaves. This is
  a right-deep tree
• Fig (b) is a bushy tree
• Considering left-deep trees is advantageous
  for deciding join orders


                                                         299
Join order
• Join order selection
  – A1    A2      A3     ..           An
  – Left deep join trees
                                    An
                         Ai
  – Dynamic programming
     • Best plan computed for each subset of relations
        – Best plan (A1, .., An) = min cost plan of(
                       Best plan(A2, .., An)        A1
                       Best plan(A1, A3, .., An)      A2
                         ….
                       Best plan(A1, .., An-1))      An



                                                           300
Dynamic Programming to Select a Join
Order and Grouping
• Three choices to pick an order for the join of many relations
  are:
   – Consider all of the relations
   – Consider a subset
   – Use a heuristic o pick one
• Dynamic programming is used either to consider all or a
  subset
   – Construct a table of costs based on relation size
   – Remember only the minimum entry which will required to
     proceed


                                                              301
Dynamic Programming to Select a Join
Order and Grouping
 •   Dynamic programming is used either to consider all or
     a subset

 •   Construct a table of costs based on relation size

 •   Remember only the minimum entry which will
     required to proceed




                                                             302
Dynamic Programming to Select a Join
Order and Grouping




                                       303
Dynamic Programming to Select a Join
Order and Grouping




                                       304
Dynamic Programming to Select a Join
Order and Grouping




                                       305
Dynamic Programming to Select a Join
Order and Grouping




                                       306
•   Disadvantage of dynamic programming is that it
    does not involve the actual costs of the joins in
    the calculations

•   Can be improved by considering
      • Use disk’s I/O for evaluating cost
      • When computing cost of R1 join R2, since
         we sum cost of R1 and R2, we must also
         compute estimates for there sizes



                                                        307
A Greedy Algorithm for Selecting a Join
Order
• It is expensive to use an exhaustive method
  like dynamic programming
• Better approach is to use a join-order heuristic
  for the query optimization
• Greedy algorithm is an example of that
  – Make one decision at a time about order of join
    and never backtrack on the decisions once made



                                                      308
Chapter 15
15.1
What is query processing

 A given SQL query is translated by the query
  processor into a low level execution plan
• An execution plan is a program in a functional
  language:
   – The physical relational algebra, specific for each
    DBMS.
• The physical relational algebra extends the
  relational algebra with:
   – Primitives to search through the internal data
    structures of the DBMS



                                                          309
     What is a Query Processor

 Group of components of a DBMS that converts
  a user queries and data-modification
  commands into a sequence of database
  operations
 It also executes those operations
 Must supply detail regarding how the query is
  to be executed


                                              310
                                               310
Major parts of Query processor
                  Query Execution:
                    The algorithms
                    that manipulate
                    the data of the
                    database.

                    Focus on the
                    operations of
                    extended
                    relational algebra.
                                     311
                                      311
                Query Processing Steps
                   SQL Query
PARSER (parsing and semantic checking as in any compiler)
Parse tree (~ tree structure representing relational calculus expression)
             OPTIMIZER (very advanced)
Execution plan (annotated relation algebra expression)
         EXECUTOR (execution plan interpreter)
                 DBMS kernel
                 Data structures




                                                                            312
Outline of Query Compilation
           Query compilation
            Parsing : A parse tree for the
             query is constructed
            Query Rewrite : The parse tree is
             converted to an initial query plan
             and transformed into logical
             query plan (less time)‫‏‬
            Physical Plan Generation :
             Logical Q Plan is converted into
             physical query plan by selecting
             algorithms and order of
             execution of these operator. 313
                                           313
Basic Steps in Query Processing




                                  314
  Physical-Query-Plan Operators
 Physical operators are implementations of the
  operator of relational algebra.
 They can also be use in non relational algebra
  operators like “scan” which scans tables, that
  is, bring each tuple of some relation into main
  memory




                                               315
                                                315
     Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation




                                       316
Basic Steps in Query Processing (Cont.)

   – Parsing and translation
       • translate the query into its internal form. This is
         then translated into relational algebra.
       • Parser checks syntax, verifies relations
   – Evaluation
       • The query-execution engine takes a query-
         evaluation plan, executes that plan, and returns the
         answers to the query




                                                                317
            Scanning Tables
 One of the basic thing we can do in a Physical
  query plan is to read the entire contents of a
  relation R.
 Variation of this operator involves simple
  predicate, read only those tuples of the
  relation R that satisfy the predicate.




                                                   318
                                                    318
               Scanning Tables
Basic approaches to locate the tuples of a
 relation R
 Table Scan
    Relation R is stored in secondary memory with
     its tuples arranged in blocks
    It is possible to get the blocks one by one

 Index-Scan
    If there is an index on any attribute of Relation

     R, we can use this index to get all the tuples of
     Relation R                                     319
                                                    319
    Sorting While Scanning Tables
 Number of reasons to sort a relation
   Query  could include an ORDER BY clause,
    requiring that a relation be sorted.
   Algorithms to implement relational algebra
    operations requires one or both arguments to
    be sorted relations.
   Physical-query-plan operator sort-scan takes a
    relation R, attributes on which the sort is to be
    made, and produces R in that sorted order
                                                 320
                                                  320
 Computation Model for Physical
           Operator
 Physical-Plan Operator should be selected
  wisely which is essential for good Query
  Processor .
 For “cost” of each operator is estimated by
  number of disk I/O’s for an operation.
 The total cost of operation depends on the
  size of the answer, and includes the final write
  back cost to the total cost of the query.

                                                321
                                                 321
    Parameters for Measuring Costs

 Parameters that affect the performance of a
  query
    Buffer space availability in the main memory at
     the time of execution of the query
    Size of input and the size of the output
     generated
    The size of memory block on the disk and the
     size in the main memory also affects the
     performance
                                                 322
                                                  322
  Parameters for Measuring Costs
 B: The number of blocks are needed to hold all tuples of relation R.
 Also denoted as B(R)‫‏‬
 T:The number of tuples in relationR.
 Also denoted as T(R)‫‏‬
 V: The number of distinct values that appear in a column of a relation R
 V(R, a)- is the number of distinct values of column for a in relation R




                                                                             323
                                                                              323
    I/O Cost for Scan Operators
 If relation R is clustered, then the number of
  disk I/O for the table-scan operator is = ~B disk
  I/O’s
 If relation R is not clustered, then the number
  of required disk I/O generally is much higher
 A index on a relation R occupies many fewer
  than B(R) blocks
      That means a scan of the entire relation R
  which takes at least B disk I/O’s will require
  more I/O’s than the entire index               324
                                                   324
 Iterators for Implementation of
        Physical Operators
 Many physical operators can be implemented
  as an Iterator.
 Three methods forming the iterator for an
  operation are:
 1. Open( ) :
    This method starts the process of getting
     tuples
    It initializes any data structures needed to
     perform the operation
                                                    325
                                                     325
 Iterators for Implementation of
        Physical Operators
 2. GetNext( ):
   Returns the next    tuple in the result
    If there are no more tuples to return,
     GetNext returns a special value NotFound
 3. Close( ) :
    Ends the iteration after all tuples
    It calls Close on any arguments of the
     operator

                                                326
                                                 326
                 15.2
   By general technique
     • sorting-based
     • hash-based
     • index-based
   By the number of times data is read from
    disk
     • one-pass
     • two-pass
     • multi-pass (more than 2)
   By what the operators work on
     • tuple-at-a-time, unary
     • full-relation, unary                    327
        One-Pass Algorithm Methods

 Tuple-at-a-time, unary operations: (selection & projection)‫‏‬


 Full-relation, unary operations


 Full-relation, binary operations (set & bag versions of
  union)‫‏‬




                                                            328
                                                             328
    One-Pass, Tuple-at-a-Time

 These are for SELECT and PROJECT
 Algorithm:
       read the blocks of R sequentially into an input buffer
       perform the operation
       move the selected/projected tuples to an output buffer
 Requires only M ≥ 1
 I/O cost is that of a scan (either B or T, depending
  on if R is clustered or not)
 Exception! Selecting tuples that satisfy some
  condition on an indexed attribute can be done
  faster!                                            329
      One-Pass, Tuple-at-a-Time

 duplicate elimination (DELTA)
 Algorithm:
    • keep a main memory search data structure D (use search
      tree or hash table) to store one copy of each tuple
    • read in each block of R one at a time (use scan)
    • for each tuple check if it appears in D
    • if not then add it to D and to the output buffer
 Requires 1 buffer to hold current block of R;
  remaining M-1 buffers must be able to hold D
 I/O cost is just that of the scan


                                                        330
         One-Pass, Unary, Full-Relation

 duplicate elimination (DELTA)
 Algorithm:
    • keep a main memory search data structure D (use search
      tree or hash table) to store one copy of each tuple
    • read in each block of R one at a time (use scan)
    • for each tuple check if it appears in D
    • if not then add it to D and to the output buffer
 Requires 1 buffer to hold current block of R;
  remaining M-1 buffers must be able to hold D
 I/O cost is just that of the scan

                                                        331
One-Pass Algorithms for Tuple-at-a-Time
              Operations
 Tuple-at-a-time operations are selection and projection
      read the blocks of R one at a time into an input buffer
      perform the operation on each tuple
      move the selected tuples or the projected tuples to the output
       buffer


 The disk I/O requirement for this process depends only on
  how the argument relation R is provided.
      If R is initially on disk, then the cost is whatever it takes to
       perform a table-scan or index-scan of R.


                                                                      332
                                                                       332
A selection or projection being performed
              on a relation R




                                            333
                                             333
   One-Pass Algorithms for Unary, fill-
         Relation Operations

 Duplicate Elimination
      To eliminate duplicates, we can read each block of R one at a
       time, but for each tuple we need to make a decision as to
       whether:
        1.   It is the first time we have seen this tuple, in which case we
             copy it to the output, or
        2.   We have seen the tuple before, in which case we must not
             output this tuple.
      One memory buffer holds one block of R's tuples, and the
       remaining M - 1 buffers can be used to hold a single copy of
       every tuple.


                                                                          334
                                                                           334
Managing memory for a one-pass
    duplicate-elimination




                                 335
                                  335
               Duplicate Elimination

 When a new tuple from R is considered, we compare it
  with all tuples seen so far
      if it is not equal: we copy both to the output and add it to the
       in-memory list of tuples we have seen.
      if there are n tuples in main memory: each new tuple takes
       processor time proportional to n, so the complete operation
       takes processor time proportional to n2.
 We need a main-memory structure that allows each of the
  operations:
      Add a new tuple, and
      Tell whether a given tuple is already there

                                                                    336
                                                                     336
       Duplicate Elimination (…contd.)‫‏‬

 The different structures that can be used for such main
  memory structures are:
      Hash table
      Balanced binary search tree




                                                            337
                                                             337
  One-Pass Algorithms for Unary, fill-
        Relation Operations
 Grouping
      The grouping operation gives us zero or more grouping
       attributes and presumably one or more aggregated attributes

      If we create in main memory one entry for each group then
       we can scan the tuples of R, one block at a time.

      The entry for a group consists of values for the grouping
       attributes and an accumulated value or values for each
       aggregation.



                                                                   338
                                                                    338
                         Grouping

 The accumulated value is:
      For MIN(a) or MAX(a) aggregate, record minimum
       /maximum value, respectively.
      For any COUNT aggregation, add 1 for each tuple of group.
      For SUM(a), add value of attribute a to the accumulated sum
       for its group.
      AVG(a) is a hard case. We must maintain 2 accumulations:
       count of no. of tuples in the group & sum of a-values of these
       tuples. Each is computed as we would for a COUNT & SUM
       aggregation, respectively. After all tuples of R are seen, take
       quotient of sum & count to obtain average.


                                                                  339
                                                                   339
       One-Pass Algorithms for Binary
                Operations
 Binary operations include:
      Union
      Intersection
      Difference
      Product
      Join




                                        340
                                         340
                       Set Union

 We read S into M - 1 buffers of main memory and build a
  search structure where the search key is the entire tuple.


 All these tuples are also copied to the
  output.
 Read each block of R into the Mth
  buffer, one at a time.

 For each tuple t of R, see if t is in S, and if not, we copy t
  to the output. If t is also in S, we skip t.                 341
                                                                341
                  Set Intersection

 Read S into M - 1 buffers and build a search structure with
  full tuples as the search key.

 Read each block of R, and for each tuple t of R, see if t is
  also in S. If so, copy t to the output, and if not, ignore t.




                                                                  342
                                                                   342
                     Set Difference

 Read S into M - 1 buffers and build a search structure with
  full tuples as the search key.
 To compute R -s S, read each block of R and examine each
  tuple t on that block. If t is in S, then ignore t; if it is not in
  S then copy t to the output.
 To compute S -s R, read the blocks of R and examine
   each tuple t in turn. If t is in S, then delete t from the
   copy of S in main memory, while if t is not in S do
   nothing.
 After considering each tuple of R, copy to the output those
   tuples of S that remain.
                                                                  343
                                                                   343
                  Bag Intersection

 Read S into M - 1 buffers.


 Multiple copies of a tuple t are not stored individually.
  Rather store 1 copy of t & associate with it a count equal to
  no. of times t occurs.

 Next, read each block of R, & for each tuple t of R see
  whether t occurs in S. If not ignore t; it cannot appear in
  the intersection. If t appears in S, & count associated with t
  is (+)ve, then output t & decrement count by 1. If t appears
  in S, but count has reached 0, then do not output t; we have
  already produced as many copies of t in output as there
  were copies in S.                                           344
                                                               344
                   Bag Difference

 To compute S -B R, read tuples of S into main memory &
  count no. of occurrences of each distinct tuple.

 Then read R; check each tuple t to see whether t occurs in
  S, and if so, decrement its associated count. At the end,
  copy to output each tuple in main memory whose count is
  positive, & no. of times we copy it equals that count.

 To compute R -B S, read tuples of S into main memory &
  count no. of occurrences of distinct tuples.


                                                           345
                                                            345
           Bag Difference (…contd.)‫‏‬

 Think of a tuple t with a count of c as c reasons not to copy
  t to the output as we read tuples of R.

 Read a tuple t of R; check if t occurs in S. If not, then copy
  t to the output. If t does occur in S, then we look at current
  count c associated with t. If c = 0, then copy t to output. If
  c > 0, do not copy t to output, but decrement c by 1.




                                                              346
                                                               346
                       Product

 Read S into M - 1 buffers of main memory


 Then read each block of R, and for each tuple t of R
  concatenate t with each tuple of S in main memory.

 Output each concatenated tuple as it is formed.


 This algorithm may take a considerable amount of
  processor time per tuple of R, because each such tuple
  must be matched with M - 1 blocks full of tuples.
  However, output size is also large, & time/output tuple is
  small.                                                    347
                                                             347
                    Natural Join

 Convention: R(X, Y) is being joined with S(Y, Z), where Y
  represents all the attributes that R and S have in common,
  X is all attributes of R that are not in the schema of S, & Z
  is all attributes of S that are not in the schema of R.
  Assume that S is the smaller relation.

 To compute the natural join, do the following:
    1.   Read all tuples of S & form them into a main-memory
         search structure.
         Hash table or balanced tree are good e.g. of such
         structures. Use M - 1 blocks of memory for this
         purpose.
                                                             348
                                                              348
                  Natural Join

1.   Read each block of R into 1 remaining main-memory
     buffer.
     For each tuple t of R, find tuples of S that agree with t
     on all attributes of Y, using the search structure.
     For each matching tuple of S, form a tuple by joining
     it with t, & move resulting tuple to output.




                                                           349
                                                            349
                         15.3

Introduction to Nested-Loop Joins
 Used for relations of any side.
 Not necessary that relation fits in main memory
 Uses “One-and-a-half” pass method in which for each
  variation:
   One argument read just once.
   Other argument read repeatedly.
 Two kinds:
   Tuple-Based Nested Loop Join
   Block-Based Nested Loop Join

                                                        350
ADVANTAGES OF NESTED-LOOP JOIN


Fits in the iterator framework.
Allows us to avoid storing
 intermediate relation on disk.



                                   351
 Tuple-Based Nested-Loop Join

 Simplest variation of the
nested-loop join

 Loop ranges over individual
tuples

                                352
   Tuple-Based Nested-Loop Join
 Algorithm to compute the Join R(X,Y) | | S(Y,Z)
 FOR each tuple s in S DO
     FOR each tuple r in R DO
            IF r and s join to make tuple t THEN
            output t
R and S are two Relations with r and s as tuples.
 carelessness in buffering of blocks causes the use
 of T(R)T(S) disk I/O’s

                                                       353
 IMPROVEMENT & MODIFICATION

                To decrease the cost
Method 1: Use algorithm for Index-Based joins
  – We find tuple of R that matches given tuple of S
  – We need not to read entire relation R

Method 2: Use algorithm for Block-Based joins
  – Tuples of R & S are divided into blocks
  – Uses enough memory to store blocks in order to
    reduce the number of disk I/O’s.
                                                       354
    An Iterator for Tuple-Based Nested-Loop
                                  Join
•   Open0 C
•   R.Open()
•   S . Open ()
•   GetNextO {
•   REPEAT C
•   r := R.GetNext();
•   IF (r = NotFound) C /* R is exhausted for
•   the current s */
•   R.Close();
•   s := S.GetNext();
•   IF (s = NotFound) RETURN NotFound;
•   /* both R and S are exhausted */
•   R.Open0 ;
•   r := R.GetNext();
•   UNTIL(r and s join) ;
•   RETURN the join of r and s;
•   Close0 (
•   R. Close () ; S. Close () ;
                                                355
Block-Based Nested-Loop Join Algorithm
Access to arguments is organized by block.
 While reading tuples of inner relation we
  use less number of I/O’s disk.
Using enough space in main memory to store
 tuples of relation of the outer loop.
  Allows to join each tuple of the inner
   relation with as many tuples as possible.
                                              356
   FOR each chunk of M-1 blocks of S DO BEGIN
   read these blocks into main-memory buffers;
   organize their tuples into a search structure whose
   search key is the common attributes of R and S;
   FOR each block b of R DO BEGIN
   read b into main memory;
   FOR each tuple t of b DO BEGIN
   find the tuples of S in main memory that
   join with t ;
   output the join of t with each of these tuples;
   END ;
   END ;
   END ;

                                                          357
Block-Based Nested-Loop Join Algorithm


   ALGORITHM:
   FOR each chunk of M-1 blocks of S DO
    FOR each block b of R DO
     FOR each tuple t of b DO
      find the tuples of S in memory that join with t
        output the join of t with each of these tuples


                                                     358
Block-Based Nested-Loop Join Algorithm


  • Assumptions:
    – B(S) ≤ B(R)
    – B(S) > M

    This means that the neither relation fits in
     the entire main memory.

                                               359
Analysis of Nested-Loop Join


   Number of disk I/O’s:
  [B(S)/(M-1)]*(M-1 +B(R))

  or

  B(S) + [B(S)B(R)/(M-1)]
  or approximately B(S)*B(R)/M
                                 360
                           15.7

What does a buffer manager do?
Central Task of making memory buffers available to
  processors is done with the help of buffer managers.

In practice:
1) rarely allocated in advance
2) the value of M may vary depending on system
   conditions

Therefore, buffer manager is used to allow processes
  to get the memory they need, while minimizing the
  delay and unclassifiable requests.                 361
 The role of the buffer manager


                                  Requests
                    Read/Writes



          Buffers




                      Buffer
                     manager




Figure 1: The role of the buffer manager : responds to requests for
main-memory access to disk blocks                                     362
    15.7.1 Buffer Management Architecture

Two broad architectures for a buffer manager:
1) The buffer manager which controls main memory directly
   is Relational DBMS

2) The buffer manager allocates buffers in virtual memory,
    allowing the OS to decide how to use buffers.
i.e“main-memory” DBMS
     • “object-oriented” DBMS




                                                             363
    Buffer Pool



Key setting for the Buffer manager to be efficient:
Problem:
The buffer manager should limit the number of buffers in
   use so that they fit in the available main memory, i.e.
   Don’t‫‏‬exceed‫‏‬available‫‏‬space.

The number of buffers is a parameter set when the DBMS is
  initialized.

No matter which architecture of buffering is used, we simply
  assume that there is a fixed-size buffer pool, a set of
  buffers available to queries and other database actions.

                                                               364
           Buffer Pool

                          Page Requests from Higher Levels

                         BUFFER POOL


         disk page

         free frame

        MAIN MEMORY

         DISK                                            choice of frame dictated
                                           DB            by replacement policy

•   Data must be in RAM for DBMS to operate on it!
•   Buffer Manager hides the fact that not all data is in RAM.                      365
   15.7.2 Buffer Management Strategies


Buffer-replacement strategies:

Critical choice the buffer manager has to make is
  when a buffer is needed for a newly requested
  block and the buffer pool is full then which
  block to throw out the buffer pool.




                                                    366
    Buffer-replacement strategy -- LRU


Least-Recently Used (LRU):

To throw out the block that has not been read or written
   for the longest time.

• Requires more maintenance but it is effective.
   • Update the time table for every access.
   • Least-Recently Used blocks are usually less likely to
     be accessed sooner than other blocks.



                                                             367
    Buffer-replacement strategy -- FIFO

First-In-First-Out (FIFO):

   The buffer that has been occupied the longest by the
   same block is emptied and used for the new block.

• Requires less maintenance but it can make more
  mistakes.
   • Keep only the loading time
   • The‫‏‬oldest‫‏‬block‫‏‬doesn’t‫‏‬mean‫‏‬it‫‏‬is‫‏‬less‫‏‬likely‫‏‬to‫‏‬be‫‏‬
      accessed.
       Example: the root block of a B-tree index


                                                              368
    Buffer-replacement strategy – “Clock”

The‫“‏‬Clock”‫‏‬Algorithm‫“(‏‬Second‫‏‬Chance”)‫‏‬

Think of the 8 buffers as arranged in a circle, shown as
   Figure 3

 Flag 0 and 1:
 buffers with a 0 flag are ok to sent their contents back to
 disk, i.e. ok to be replaced
 buffers with a 1 flag are not ok to be replaced




                                                                369
Buffer-replacement strategy – “Clock”

                                0

                        0              1


                    0                        0

  the buffer with           0          1         Start point to
  a 0 flag will                                  search a 0 flag
  be replaced
                                1     The flag will
                                      be set to 0

                                    By next time the hand
                                    reaches it, if the content of
                                    this buffer is not accessed,
                                    i.e. flag=0, this buffer will
                                    be replaced.
                                    That’s‫“‏‬Second‫‏‬Chance”.



    Figure 3: the clock algorithm                                   370
     Buffer-replacement strategy -- Clock

a buffer’s flag set to 1 when:
a block is read into a buffer
the contents of the buffer is accessed


a buffer’s flag set to 0 when:
the buffer manager needs a buffer for a new block, it
looks for the first 0 it can find, rotating clockwise. If it passes
1’s, it sets them to 0.



                                                                      371
  System Control helps Buffer-replacement strategy

System Control
The query processor or other components of a DBMS can give
advice to the buffer manager in order to avoid some of the
mistakes that would occur with a strict policy such as LRU,FIFO
or Clock.
For example:
A “pinned” block means it can’t be moved to disk without first
modifying certain other blocks that point to it.
In FIFO, use “pinned” to force root of a B-tree to remain in
memory at all times.


                                                                  372
 15.7.3 The Relationship Between Physical
 Operator Selection and Buffer Management


Problem:
Physical Operator expected certain number of
buffers M for execution.
However, the buffer manager may not be able to
guarantee these M buffers are available.




                                                  373
  15.7.3 The Relationship Between Physical
  Operator Selection and Buffer Management

Questions:

Can the algorithm adapt to changes of M, the
number of main-memory buffers available?

When available buffers are less than M, and some
blocks have to be put in disk instead of in memory.

How the buffer-replacement strategy impact the
performance‫(‏‬i.e.‫‏‬the‫‏‬number‫‏‬of‫‏‬additional‫‏‬I/O’s)?

                                                      374
  Example
 FOR each chunk of M-1 blocks of S DO BEGIN
 read these blocks into main-memory buffers;
 organize their tuples into a search structure whose
 search key is the common attributes of R and S;
 FOR each block b of R DO BEGIN
             read b into main memory;
                          FOR each tuple t of b DO BEGIN
                                      find the tuples of S in main memory that
                                      join with t ;
                                      output the join of t with each of these tuples;
                          END ;
 END ;
 END ;


Figure 15.8: The nested-loop join algorithm                                             375
Example

The outer loop number (M-1) depends on the average
number of buffers are available at each iteration.

The outer loop use M-1 buffers and 1 is reserved for a block
of R, the relation of the inner loop.

If we pin the M-1 blocks we use for S on one iteration of the
outer loop, we shall not lose their buffers during the round.

Also, more buffers may become available and then we could
keep more than one block of R in memory.

Will these extra buffers improve the running time?
                                                                 376
Example

CASE1: NO

Buffer-replacement strategy: LRU
Buffers for R: k
We read each block of R in order into buffers.
By end of the iteration of the outer loop, the last k blocks of R
are in buffers.
However, next iteration will start from the beginning of R
again.
Therefore, the k buffers for R will need to be replaced.




                                                                     377
Example

CASE 2: YES

Buffer-replacement strategy: LRU
Buffers for R: k
We read the blocks of R in an order that alternates:
firstlast and then lastfirst.
In this way, we save k disk I/Os on each iteration of the outer
loop except the first iteration.




                                                                   378
Other Algorithms and M buffers

Other Algorithms also are impact by M and the
buffer-replacement strategy.

Sort-based algorithm
           If we use a sort-based algorithm for some operator, then it is
possible to adapt to changes in M.
If Af shrinks, we can change the size of a sublist,
since the sort-based algorithms we discussed do not depend on the sublists
being the same size. The major limitation is that as M shrinks,
 we could be forced to create so many sublists that we cannot then
allocate a buffer for each sublist in the merging process..



                                                                             379
• Hash Table


•   If the algorithm is hash-based, ive can reduce the number of buckets if
•   shrinks, as long as the buckets do not then become so large that they do
•   not fit in allotted main memory. However, unlike sort-based algorithms,
•   we cannot respond to changes in A1 while the algorithm runs. Rather,
•   once the number of buckets is chosen, it remains fixed throughout the first
•   pass, and if buffers become unavailable, the blocks belonging to some of
•   the buckets will have to be ST\-appedo ut.




                                                                             380
                    15.8
    Intro
•   Algorithms using more than two passes.
•   Multi-pass Sort-based Algorithms
•   Performance of Multipass, Sort-Based
    Algorithms
•   Multipass Hash-Based Algorithms
•   Conclusion

                                             381
Reason that we use more than two passes:

Two passes are usually enough, however, for
the largest relation, we use as many passes as
necessary.




                                                 382
Multi-pass Sort-based Algorithms

Suppose we have M main-memory buffers
available to sort a relation R, which we
assume is stored clustered.

Then we do the following:

                                           383
BASIS:
If R fits in M blocks (i.e., B(R)<=M)
1. Read R into main memory.
2. Sort it using any main-memory sorting
algorithm.
3. Write the sorted relation to disk.


                                           384
INDUCTION:
If R does not fit into main memory.
1. Partition the blocks holding R into   M
groups, which we shall call R1, R2, R3…
2. Recursively sort Ri for each i=1,2,3…M.
3. Merge the M sorted sublists.




                                             385
If we are not merely sorting R, but performing
a unary operation such as δ or γ on R.
We can modify the above so that at the final
merge we perform the operation on the
tuples at the front of the sorted sublists.
That is:



                                             386
• For a δ, output one copy of each distinct tuple,
  and skip over copies of the tuple.
• For a γ, sort on the grouping attributes only,
  and combine the tuples with a given value of
  these grouping attributes.




                                                387
Conclusion
The two pass algorithms based on sorting or
hashing have natural recursive analogs that
take three or more passes and will work for
larger amounts of data.




                                              388
    Performance of Multipass, Sort-Based
                   Algorithms

• BASIS: If k = 1, i.e., one pass is allowed, then we must have B(R) < M. Put
• another way, s(M, 1) = Af.
• INDUCTION: Suppose k > 1. Then we partition R into 1M pieces, each of
• which must be sortable in k - 1 passes. If B(R) = s(M, k), then s(M, k)/:l17
• which is the size of each of the M pieces of R, cannot exceed s(M, k - 1).
  That
• is: s(M, k) = Ms(M, k - 1)




                                                                                 389
             Multipass Hash-Based Algorithms

• BASIS: For a unary operation, if the relation fits in hl buffers, read it into memory
  and perfor111 the operation.
• For a binary operation, if either relation fits in ,11 - I buffers, perform the operation
  by reading this relation into main
 memory and then read the second relation, one block at a time, into the Mth buffer.

•   INDUCTION: If no relation fits in main memory, then hash each relation into A 1 -1
    buckets, as discussed in Section 15.5.1. Recursively perform the operation on each
    bucket or corresponding pair of buckets, and accumulate the output
•   from each bucket or pair.




                                                                                        390

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:15
posted:8/9/2011
language:English
pages:390