Docstoc

indexing

Document Sample
indexing Powered By Docstoc
					  Advanced Database Technology
 Anna Östlin Pagh and Rasmus Pagh
    IT University of Copenhagen
             Spring 2004


         February 19, 2004


INDEXING I
 Lecture based on [GUW, 13.0-13.2]



        Slides based on
     Notes 04: Indexing
 for Stanford CS 245, fall 2002
    by Hector Garcia-Molina
                                     1
Today
  Why indexing?
  Conventional indexes (dense/sparse)

  Multi-level indexes

  Secondary indexes

  Next time: B-tree and hash indexes




                                         2
               Why indexing?

 Common queries involve conditions on the
  values of attributes, e.g.
  SELECT * FROM R WHERE a=11
  SELECT * FROM R WHERE 0<=b and b<42
 indexing an attribute (or set of attributes)
  speeds up finding tuples with specific values.
 (And gives other speed-ups as well.)
 Conceptually similar to index in a book.

                                             3
           Problem session

 Consider an index data structure "similar
  to" the index in a book.
 How many steps does it take to find the
  occurrences of a specific term?
 What about the number of I/Os?
 Does your encyclopaedia have an index?


                                          4
Sequential File
10
20
30
40
50
60
70
80
90
100



                  5
Dense Index   Sequential File

    10        10
    20        20
    30
    40        30
              40
    50
    60        50
    70        60
    80
              70
     90
              80
    100
              90
    110       100
    120



                                6
Sparse Index   Sequential File

    10         10
    30         20
    50
    70         30
               40
     90
    110        50
    130        60
    150
               70
    170
               80
    190
               90
    210        100
    230



                                 7
Sparse 2nd level         Sequential File

   10              10    10
   90              30    20
   170             50
   250             70    30
                         40
                    90
   330             110   50
   410                   60
                   130
   490             150
   570                   70
                   170
                         80
                   190
                         90
                   210   100
                   230


                                           8
Sparse vs. Dense Trade-off

 Sparse: Less index space per record
                      can keep more of index
  in memory
 Dense: Can tell if any record
  exists                  without accessing
  file

(Later:
    sparse better for insertions
    dense needed for secondary indexes)
                                           9
Summary of terms
 Index on sequential file
 Search key (can be  primary key)
 Primary index (on sequencing field)
   -secondary index works on other fields
 Dense index (all search key values in)
 Sparse index (one search key/ block)
 Multi-level index (index on index)

                                        10
Next:

 Duplicate keys

 Deletion/Insertion

 Secondary indexes



                       11
Duplicate keys

                 10
                 10
                 10
                 20
                 20
                 30
                 30
                 30
                 40
                 45
                      12
Duplicate keys
Dense index, one way to implement?
                        10
        10              10
        10
        10              10
        20              20

        20              20
                        30
        30
        30
                        30
        30              30
                        40
                        45
                                     13
 Duplicate keys
 Dense index, better way?
                            10
         10                 10
         20
         30                 10
         40                 20
                            20
                            30
                            30
                            30
                            40
What assumption is made     45
here?                            14
Duplicate keys
Sparse index, one way?
                           10
 careful if looking


                      10   10
   for 20 or 30!

                      10
                      20   10
                      30   20
                           20
                           30
                           30
                           30
                           40
                           45

                                15
Duplicate keys
Sparse index, another way?
  – place first new key from block
                                     10
  should    10                       10
            20
  this be   30                       10
  40?       30                       20
                                     20
                                     30
                                     30
                                     30
                                     40
                                     45
                                          16
Deletion from sparse index
   delete record 40
                         10
        10               20
        30
        50               30
        70               40

        90               50
       110               60
       130               70
       150               80



                              17
Deletion from sparse index
   delete record 30
                         10
        10               20
     40 30
        50               30 40
        70               40

        90               50
       110               60
       130               70
       150               80



                                 18
Deletion from sparse index
 delete records 30 & 40
                           10
         10                20
      50 30
      70 50                30
         70                40

         90                50
        110                60
        130                70
        150                80



                                19
Deletion from dense index
    delete record 30
                            10
         10                 20
         20
      40 30                 30 40
         40                 40

         50                 50
         60                 60
         70                 70
         80                 80



                                    20
Insertion (sparse index case)
    insert record 34
                               10
        10                     20
        30
        40                     30
        60                     34
                               40
                               50
         our lucky day!       60
          we have free space
          where we need it!

                                    21
Insertion - using overflow blocks
        insert record 25
                 10              25
  10             20
  30
  40             30           overflow blocks
  60                          (reorganize later...)
                 40
                 50
                 60

                Problem:
   Overflow blocks take longer to access
                                               22
Insertion - immediate reorganization
      insert record 15
                                    10
            10                      20 15
         20 30
            40                      30 20
            60                      30
                                    40
                                    50
                                    60
 In general: Use same technique
 as for inserting in linked list.

                                            23
Secondary indexes
                               Sequence
    Sparse index              field

             30
                         30
             20
                         50
             80          20
            100          70
             90          80
             ...         40
                         100
                          10
  does not make sense!   90
                         60

                                          24
Secondary indexes
                          Sequence
     Dense index         field
               10    30
               20    50
               30
   10          40
   50                20
                     70
   90
   ...         50
               60    80
  sparse       70    40
               ...
   high              100
   level              10
                     90
                     60

                                     25
Duplicate values & secondary indexes
 one option...
                 10          20
                 10          10
                 10
                 20          20
                             40
                 20
                 30          10
                 40          40
                 40
                             10
                 40          40
                 40
                 ...         30
                             40
 Problem:
 Uses more space than necessary
                                       26
Duplicate values & secondary indexes

                           20
      10                   10
      20
      30                   20
      40                   40

      50                   10
      60                   40
      ...
                           10
                           40
                           30
                           40

               buckets
                                       27
              Summary
 Indexes allow finding a particular
  attribute value in a few I/Os.
 Unresolved problems regarding
  insertions and deletions.
 Next time: Also obtaining efficient
  updates (using B-trees). Hash indexes -
  sometimes more efficient.

                                        28