Database Systems and Storage Virtualization by kellena88

VIEWS: 7 PAGES: 20

									                                                                                 1




          Database Systems and Storage Virtualization

                                             Ken Salem

                                  School of Computer Science




                      CASCON Workshop on Self-Optimizing Systems

                                        October 19, 2005




CASCON Workshop on Self-Optimizing Systems                         October 19, 2005
Introduction                                                                        2


                               Database Systems In Context




  • DBMS should self-configure and
    self-optimize, but . . .
                                                             Web Server
  • . . . DBMS don’t operate in isola-
    tion, and must also interoperate
                                                             App Server
    with other systems.
  • “end-to-end” self-management
                                                              DBMS
    and performance are the ultimate
    goals
                                                      Resource Manager




CASCON Workshop on Self-Optimizing Systems                            October 19, 2005
Introduction                                                                           3


                         Database Systems and Storage Systems




                                                           DBMS

                 Web Server

                 App Server                                    SAN or LAN

                   DBMS                                    K    K
                                                 storage
               Resource Manager                  server             cache
                                                      K              K




CASCON Workshop on Self-Optimizing Systems                               October 19, 2005
Introduction                                                                                  4


                                     Storage Virtualization

                   DBMS
                                                                DBMS




                   K     K
         storage
         server              cache
               K              K




                   reality                                    virtual reality




  Physical storage infrastructure is complex, dynamic, shared, and hidden from
  the DBMS and other storage clients.

CASCON Workshop on Self-Optimizing Systems                                      October 19, 2005
Introduction                                                                            5


                                         Provisioning

                   DBMS
                                                          DBMS




                   K     K
         storage
         server              cache
               K              K




                   reality                              virtual reality




  Storage is not a physical device that the DBMS controls. It is a service with
  which the DBMS must interact.

CASCON Workshop on Self-Optimizing Systems                                October 19, 2005
Introduction                                                                       6


                                 Some Research Questions




Database Design Time:
    Can we extend the physical design task to include the storage system?
Query Compilation Time:
   How much will it cost the DBMS to perform a storage operation?
Query Execution Time:
   How can we coordinate storage system buffer management and DBMS buffer
   management?




CASCON Workshop on Self-Optimizing Systems                           October 19, 2005
Physical Design                                                                          7


                                      Design Problems




                   DBMS

                                                 database        indexes? views?
                                               physical design   logical placement?


                  K      K
        storage                                storage system    striping?
        server               cache
                                                   design        logical to physical?
              K               K




CASCON Workshop on Self-Optimizing Systems                                 October 19, 2005
Physical Design                                                                          8


                                       Design Advisors



          SQL workload


                                    design
            database
                                                 database        indexes? views?
          design advisor                       physical design   logical placement?

       storage workload

                                    design     storage system    striping?
            storage
          design advisor                           design        logical to physical?




CASCON Workshop on Self-Optimizing Systems                                 October 19, 2005
Physical Design                                                                               9


                             End-to-End Physical Database Design

                   SQL workload



                                       design
                     database
                                                  database        indexes? views?
                   design advisor               physical design   logical placement?
                     ?????
              storage workload

                                       design   storage system    striping?
                     storage
                   design advisor                   design        logical to physical?


                  end−to−end advisor




  Database design and storage system design are not independent.


CASCON Workshop on Self-Optimizing Systems                                      October 19, 2005
Costing Storage Operations                                                              10


                             Costing Database Storage Operations

SELECT * FROM Lineitem L, Orders O
WHERE L.orderkey = O.orderkey AND O.totalprice < :param

Plan 1: index nested loop join, with Order as outer and unclustered index on
    Lineitem.orderkey as inner (best at low and high selectivities)
Plan 2: hash join with Order as build and Lineitem as probe (best at medium
    selectivities)




                        Global relative query cost vs. query selectivity

CASCON Workshop on Self-Optimizing Systems                                 October 19, 2005
Cache Management                                                        11


                               Two-tier Cache Management

                           DBMS
                                             cache




                                1. read(p)

                           storage
                           server            cache


                                  2. read(p)

                                                     p


CASCON Workshop on Self-Optimizing Systems                 October 19, 2005
Cache Management                                                                 12


                         Two-tier Cache Management (Part 2)

                           DBMS
                                             cache




                                1. read(p)

                           storage
                           server            cache
                                                     p

                                  2. read(p)             3. fetch

                                                     p


CASCON Workshop on Self-Optimizing Systems                          October 19, 2005
Cache Management                                                                       13


                         Two-tier Cache Management (Part 3)

                           DBMS
                                             cache
                                                         p



                                1. read(p)                     4. fetch

                           storage
                           server            cache
                                                     p

                                  2. read(p)                 3. fetch

                                                     p


CASCON Workshop on Self-Optimizing Systems                                October 19, 2005
Cache Management                                                                  14


                     Problems with Two-tier Cache Management


                                             DBMS
                                                       cache
                                                                   p


Problems
poor locality at second tier:  reg-             1. read(p)               4. fetch
    ularly used pages are cached in
    the first tier                            storage
                                             server    cache
cache inclusion: same pages appear                             p
    in both caches
                                                 2. read(p)            3. fetch

                                                               p


CASCON Workshop on Self-Optimizing Systems                         October 19, 2005
Cache Management                                                                 15


                                       Some Solutions




  • DBMS manages storage system cache
      – breaks virtualization
      – DBMS is not the only storage client

  • storage system manages DBMS cache
      – storage system unaware of DBMS semantics

  • our approach
      – manage caches independently
      – use write hints to improve management of the storage system cache




CASCON Workshop on Self-Optimizing Systems                          October 19, 2005
Cache Management                                                                     16


                                             Write Hints


Replacement writes
      • Replacement writes are issued to clean a block for eventual eviction.
      • Actual eviction may or may not be imminent.
      • A well-designed first tier will clean blocks in advance of eviction, but not
        too far in advance.

Recovery writes
      • Recovery writes are issued to limit system recovery time or to limit
        potential data loss.




  Write hints indicate why a block is being written by the first tier (DBMS).



CASCON Workshop on Self-Optimizing Systems                              October 19, 2005
Cache Management                                                                     17


                              Experiments with Write Hints




  • instrumented DB2 to tag writes with hints

  • collected DB2 I/O request traces under OLTP (TPC-C) workloads
  • built simulator for storage system cache, and used traces to drive it
      – LRU and MQ algorithms
      – LRU+Hints and MQ+Hints
      – TQ (new policy based on write hints)




CASCON Workshop on Self-Optimizing Systems                              October 19, 2005
Cache Management                                                                                                18


                                   Impact of Write Hints on Performance


                            100%
                            90%
                            80%
                            70%
                                                                          60.45%            LRU
           Read Hit Ratio




                            60%                                                             LRU+Hints
                            50%                                                             MQ
                                                                                            MQ+Hints
                            40%                                   36.59%
                                                                                            TQ
                            30%                         23.00%                              OPT
                                                  18.01%
                            20%
                            10%
                                    1.00% 3.43%
                             0%


From X. Li, A. Aboulnaga, K. Salem, A. Sachedina and S. Gao. Second-Tier Cache Management Using Write Hints.

                    Proceedings of the USENIX Symposium on File and Storage Technologies (FAST). 2005.


CASCON Workshop on Self-Optimizing Systems                                                         October 19, 2005
                                                                                    19


                                             Wrap Up




  • Storage is a service.

  • Storage characteristics affect database configuration and physical design,
    database query optimizer, and database run-time engine.

  • Challenge: get good performance without breaking virtualization.




CASCON Workshop on Self-Optimizing Systems                             October 19, 2005
                                                                   20


                                             People



Waterloo:
      • Xuhui Li (CAS student)
      • Ye (Brian) Qin
      • M. Hossein Sheikh Attar
      • Oguzhan Ozmen
      • Ashraf Aboulnaga
IBM:
      • Aamer Sachedina
      • Matt Huras
      • Calisto Zuzarte (CAS)



CASCON Workshop on Self-Optimizing Systems            October 19, 2005

								
To top