CS 232A Database System Principles Introduction by g4509244

VIEWS: 6 PAGES: 52

									CS 232A: Database System
        Principles

      Introduction




                           1
              Introduction

• Applications‟ View of a Relational Database
  Management System (RDBMS)
• The Big Picture of UCSD‟s DB program
• Relational Model Quick Overview
• SQL Quick Overview
• Transaction Management Quick Overview
• What is Hard about building a RDBMS?


                                                2
Applications‟ View of a Relational Database
      Management (RDBMS) System
• Applications: ……….
• Persistent data structure                     Application
   – Large volume of data
   – “Independent” from processes using
     the data
                                               RDBMS Client
• High-level programming interface
  for access & modification        JDBC/ODBC              Relations,
   – Automatically optimized            SQL commands      cursors,
                                                          other
• Transaction management (ACID)
   – Atomicity: all or none happens,           RDBMS Server
     despite failures & errors
   – Concurrency
   – Isolation: appearance of “one at a
     time”                                     Relational Database
   – Durability: recovery from failures and                      3

     other errors
 CSE232A and the rest of UCSD‟s
    database course program
• CSE132A: Basics of relational database systems
  – Application view orientation
  – Basics on algebra, query processing
• CSE132B: Application-oriented project course
  – How to design and build complex applications that use a
    relational database system.
  – Active database aspects
  – Materialized views, decision support queries
  – Implementation issues


                                                        4
CSE232A and the rest of UCSD‟s
   database course program
• CSE233: Database Theory
  – Theory of query languages
  – Deductive and Object-Oriented databases
• CSE232B: Advanced Database Systems
  o The structure and operation of non-conventional
    database systems, such as
     o   data warehouses & OLAP systems
     o   mediators & distributed query processing
     o   object-oriented and XML databases
     o   Deductive databases and recursive query processing
                                                              5
       Data Structure: Relational
                Model
• Relational databases:             Movie
                                    Movie
  Schema + Data                     Title     Director   Actor
• Schema (also called               Wild      Lynch      Winger
                                    Sky       Berto      Winger
  scheme):                          Reds      Beatty     Beatty
   – collection of tables (also     Tango     Berto      Brando
     called relations)
                                    Tango     Berto      Winger
   – each table has a set of        Tango     Berto      Snyder
     attributes
   – no repeating relation names,
     no repeating attributes in
     one table                      Schedule
                                    Theater   Title
• Data (also called                 Odeon     Wild
  instance):                        Forum     Reds
   – set of tuples                  Forum     Sky
   – tuples have one value for
     each attribute of the table
     they belong
    Programming Interface:
          JDBC/ODBC

• How client opens connection with
  server
• How access & modification commands
  are issued
•…



                                       7
Access (Query) & Modification
       Language: SQL
• SQL
  – used by the database user
  – declarative: we only describe what we want to
    retrieve
  – based on tuple relational calculus
• The result of a query is always a table
  (regardless of the query language used)
• Internal Equivalent of SQL: Relational Algebra
  – used internally by the database system
  – procedural (operational): we describe how we
    retrieve
            Basic Relational Algebra
                   Operators
                                            Find tuples where director=“Berto”
• Selection (s )                                   sDirector=“Berto”Movie
   – sc R selects tuples of the                 Title     Director       Actor
     argument relation R that                   Sky       Berto          Winger
     satisfy the condition c.                   Tango     Berto          Brando
   – The condition c consists of                Tango     Berto          Winger
     atomic predicates of the                   Tango     Berto          Snyder
     form
       • attr = value                       Find tuples where director=actor
                (attr is attribute of R)           s Director=ActorMovie
       • attr1 = attr2                          Title     Director       Actor
       • other operators possible               Reds      Beatty         Beatty
         (e.g., >, <, !=, LIKE)            s Director=“Berto” OR Director=ActorMovie
   – Bigger conditions                        Title     Director       Actor
     constructed by conjunctions              Sky       Berto          Winger
     (AND) and disjunctions (OR)              Reds      Beatty         Beatty
     of atomic predicates                     Tango     Berto          Brando
                                              Tango     Berto          Winger
                                              Tango     Berto          Snyder
             Basic Relational Algebra
                    Operators
• Projection (p)
   – pattr1, …, attrN Rreturns a            pTitle,Director Movie
      table that has only the              Title      Director
      attributes attr1, …, attrN of R      Wild       Lynch
   – no duplicate tuples in the            Sky        Berto
      result (notice the example           Reds       Beatty
      has only one (Tango,Berto)           Tango      Berto
      tuple                           Project the title and director of Movie
• Cartesian Product (x)
   – the schema of the result has
                                          R                    S
                                         A    B                A    C
      all attributes of both R and S     0    1                a    b
   – for every pair of tuples r from     2    4                c    d
      R and s from S there is a
      result tuple that consists of r           RxS
      and s                                     R.A B S.A C
   – if both R and S have an                    0      1 a       b
      attribute A then rename to                0      1 c       d
      R.A and S.A                               2      4 a       b
                                               2     4   c     d
        Basic Relational Algebra
               Operations
                                   Find all people, ie, actors and
• Rename ( r )                     directors of the table Movie
   r        R renames
       A B                          pPeople rActor People Movie
       attribute A of relation R      U p People rDirector People
                                                                Movie
     into B
       S
   – r R renames relation R
                         into S
• Union (U)
   – applies to two tables R       Find all directors who are not actors
     and S with same schema
   – R U S is the set of tuples     pDirectorMovie
     that are in R or S or both     - pDirector rActor   Director Movie
• Difference (-)
   – applies to two tables R
     and S with same schema
   – R - S is the set of tuples
     in R but not in S
      SQL Queries: The Basic From
                                   Find titles of currently playing movies
• Basic form                               SELECT Title
        SELECT a1, …, aN
        FROM R1, …, RM                     FROM Schedule
        WHERE condition            Find the titles of all movies by “Berto”
• Equivalent relational                   SELECT Title
   algebra expression                     FROM Schedule
pa1, …, aN scondition(R1x … xRM)          WHERE Director=“Berto”
• WHERE clause is optional         Find the titles and the directors of all
• When more than one               currently playing movies
   relations of the FROM
   have an attribute named         SELECT Movie.Title, Director
   A we refer to a specific A      FROM Movie, Schedule
   attribute as
  <RelationName>.A                 WHERE Movie.Title=Schedule.Title
      SQL Queries: Aliases

• Use the same relation more than once
  in the FROM clause
• Tuple variables
• Example: find actors who are also
  directors
      SELECT t.Actor
      FROM Movie t s
      WHERE t.Actor=s.Director
             SQL Queries: Nesting
• The WHERE clause can
  contain predicates of      Typical use:“find objects that always
  the form                   satisfy property X”, e.g., find actors
   – attr/value IN <SQL      playing in every movie by “Berto”:
     query>
   – attr/value NOT IN       SELECT Actor FROM Movie
                      <SQL   WHERE Actor NOT IN
     query>
                                  (SELECT t.Actor
• The predicate is                 FROM Movie t s,
  satisfied if the attr or         WHERE s.Director=“Berto”
  value appears in the
  result of the nested              AND t.Actor NOT IN
  <SQL query>                           (SELECT Actor
• Queries involving                       FROM Movie
  nesting but no negation                WHERE Title=s.Title))
  can always be un-          The shaded query finds actors NOT playing in
  nested, unlike queries     some movie by “Berto”
  with nesting and           The top lines complement the shaded part
            Homework Problem
Compare with shaded sub-query
of previous page. The sample     Actor Director Movie
data may help you
                                   a      B      1
 SELECT Actor
 FROM Movie                        a      B      2
 WHERE Actor NOT IN                b      B      1
      (SELECT Actor
       FROM Movie                  c      X      3
       WHERE Director=“Berto”)     d      B      1
                                   d      B      2
                                   d      X      3

                                                15
        Nested Queries: Existential
       and Universal Quantification
                                  Find directors of currently playing movies
                                  SELECT Director
• A op ANY <nested query>         FROM Movie
  is satisfied if there is a      WHERE Title = ANY
  value X in the result of the     SELECT Title
  <nested query> and the           FROM Schedule
  condition A op X is satisfied
   – ANY aka SOME
• A op ALL <nested query>
  is satisfied if for every
  value X in the result of the    Find the employees with the highest salary
  <nested query> the              SELECT Name
                                  FROM Employee
  condition A op X is satisfied   WHERE Salary >= ALL
                                   SELECT Salary
                                   FROM Employee
          Nested Queries: Set
             Comparison
                     Find actors playing in every movie
                     by “Berto”
• <nested query 1>   SELECT s.Actor
     CONTAINS        FROM Movie s
 <nested query 2>    WHERE
                              (SELECT Title
                               FROM Movie t
                               WHERE t.Actor = s.Actor)
                                       CONTAINS
                              (SELECT Title
                               FROM Movie
                               WHERE Director = “Berto”)
          SQL:Union, Intersection,
                Difference
                           Find all actors or directors
                           (SELECT Actor
                            FROM Movie)
• Union                    UNION
   – <SQL query 1> UNION   (SELECT Director
      <SQL query 2>         FROM Movie)
• Intersection
   – <SQL query 1>
     INTERSECT     <SQL
     query 2>
• Difference
                           Find all actors who are not directors
   – <SQL query 1> MINUS   (SELECT Actor
      <SQL query 2>         FROM Movie)
                           MINUS
                           (SELECT Director
                            FROM Movie)
    SQL Queries: Aggregation and
             Grouping
• There is no relational
                            Employee
  algebra equivalent for    Name      Dept      Salary
  aggregation and           Joe       Toys      45
  grouping                  Nick      PCs       50
                            Jim       Toys      35
• Aggregate functions:      Jack      PCs       40
  AVG, COUNT, MIN,
                         Find the average salary of all employees
  MAX, SUM, and recently
                                SELECT AvgSal=Avg(Salary)
  user defined functions
                                FROM Employee
  as well                              AvgSal
• Group-by                             42.5

Find the average salary for each department
SELECT Dept, AvgSal=Avg(Salary)             Dept   AvgSal
FROM Employee                               Toys   40
                                            PCs    45
GROUP-BY Dept
SQL Grouping: Conditions that
      Apply on Groups
                  Find the average salary of for
                  each department that has more
• HAVING clause   than 1 employee
                  SELECT Dept, AvgSal=(Avg(Salary))
                  FROM Employee
                  GROUP-BY Dept
                  HAVING COUNT(Name)>1
        SQL: More Bells and Whistles ...
                             Retrieve all movie attributes of currently
                             playing movies
                             SELECT Movie.*
• Select all attributes      FROM Movie, Schedule
  using *                    WHERE Movie.Title=Schedule.Title
• Pattern matching           Retrieve all movies where the title
  conditions                 starts with “Ta”
   – <attr> LIKE <pattern>   SELECT *
                             FROM Movie
                             WHERE Title LIKE “Ta”
    …and a Few “Dirty” Points
                                                         Title
                                  SELECT Title           Tango
• Duplicate elimination           FROM Movie             Tango
  must be explicitly                                     Tango
  requested
                                 SELECT DISTINCT Title   Title
   – SELECT DISTINCT …                                   Tango
      FROM … WHERE …             FROM Movie
• Null values
   – all comparisons involving    Title    Director      Actor
     NULL are false by            Wild     Lynch         Winger
     definition                   Sky      Berto         Winger
   – all aggregation              Reds     NULL          Beatty
     operations, except count,    Tango    Berto         Brando
     ignore NULL values           Tango    Berto         Winger
                                  Tango    Berto         NULL
    SQL as a Data Manipulation Language:
                  Insertions
• inserting tuples
                               INSERT INTO Movie
   – INSERT INTO R
      VALUES (v1,…,vk);        VALUES (“Brave”, “Gibson”, “Gibson”);

• some values may be left      INSERT INTO Movie(Title,Director)
  NULL                         VALUES (“Brave”, “Gibson”);
• use results of queries for   INSERT INTO EuroMovie
  insertion                           SELECT * FROM Movie
   – INSERT INTO R                    WHERE Director = “Berto”
      SELECT …
      FROM …
      WHERE
SQL as a Data Manipulation Language:
       Updates and Deletions
• Deletion basic form:       Delete the movies that are not currently playing
  delete every tuple that    DELETE FROM Movie
  satisfies <cond>           WHERE Title NOT IN SELECT Title
   – DELETE FROM R WHERE                             FROM Schedule
       <cond>
• Update basic form:         Change all “Berto” entries to “Bertoluci”
  update every tuple that    UPDATE Movie
  satisfies <cond> in the    SET Director=“Bertoluci”
  way specified by the SET   WHERE Director=“Berto”
  clause
   –    UPDATE R              Increase all salaries in the Toys dept by 10%
        SET A1=<exp1>, …,     UPDATE Employee
              Ak=<expk>       SET Salary = 1.1 * Salary
        WHERE <cond>          WHERE Dept = “Toys”
                              The “rich get richer” exercise:
                              Increase by 10% the salary of the employee
                              with the highest salary
    Transaction Management

• Transaction: Collection of actions that
  maintain the consistency of the database if
  ran to completion & isolated
• Goal: Guarantee integrity and consistency of
  data despite
  – Concurrency
  – Failures
• Concurrency Control
• Recovery

                                                 25
  Example Concurrency & Failure
           Problems
• Consider the “John &
  Mary” checking & savings
                                  C2S(X=100)
  account
                                  Read(C);
  – C: checking account balance   C:=C-100
  – S: savings‟ account balance   Write(C)
• Check-to-Savings transfer       Read(S)
                                  S:=S+100
  transaction moves $X from
                                  Write(S)
  C to S
  – If it runs in the system
    alone and to completion the
    total sum of C and S stays
    the same                                   26
Example Failure Problem &
 Recovery Module‟s Goal
              • Database is in inconsistent
 C2S(X=100)     state after machine
 Read(C);       restarts
 C:=C-100     • It is not the developer‟s
 Write(C)       problem to account for
                crashes
 CPU HALTS    • Recovery module
 Read(S)        guarantees that all or
 S:=S+100
                none of transaction
 Write(S)
                happens and its effects
                become “durable”
                                      27
 Example Concurrency Problem &
Concurrency Control Module‟s Goals
   Serial Schedule
   Read(C);               • If multiple
   C:=C+100
   Write(C)
                            transactions run in
   Read(S)                  sequence the
   S:=S-100                 resulting database is
   Write(S)
               Read(C)      consistent
               C:=C+50    • Serial schedules
               Write(C)
               Read(S)      – De facto correct
               S:=S-50
               Write(S)


                                                 28
 Example Concurrency Problem &
Concurrency Control Module‟s Goals
   Good Schedule w/ Concurrency
   Read(C);                       • Databases allow
   C:=C+100
   Write(C)
                                    transactions to run
             Read(C)                in parallel
             C:=C+50
             Write(C)
   Read(S)
   S:=S-100
   Write(S)
             Read(S)
             S:=S-50
             Write(S)


                                                          29
 Example Concurrency Problem &
Concurrency Control Module‟s Goals
   Bad Schedule w/ Concurrency   • “Bad” interleaved
   Read(C);                        schedules may leave
   C:=C+100                        database in inconsistent
   Write(C)
              Read(C)
                                   state
              C:=C+50            • Developer should not
             Write(C)              have to account for
              Read(S)
              S:=S-50
                                   parallelism
             Write(S)            • Concurrency control
   Read(S)                         module guarantees
   S:=S-100
   Write(S)                        serializability
                                   – only schedules equivalent
                                     to serial ones happen30
              Introduction

• Applications‟ View of a Relational Database
  Management System (RDBMS)
• The Big Picture of UCSD‟s DB program
• Relational Model Quick Overview
• SQL Quick Overview
• Transaction Management Quick Overview
• What is Hard about building a RDBMS?


                                                31
Isn‟t Implementing a Database
        System Simple?
 Introducing the




           Database Management System

•   The latest from Megatron Labs
•   Incorporates latest relational technology
•   UNIX compatible
•   Lightweight & cheap!
                                          32
 Megatron 3000 Implementation
            Details

 • Relations stored in files (ASCII)
    – e.g., relation Movie is in /usr/db/Movie
 • Directory file (ASCII) in /usr/db/directory

Movie# Title# STR# Director# STR# Actor# STR# …
Schedule# Theater# STR # Title# STR# …
            .
            .
            .

                                                 33
       Megatron 3000
      Sample Sessions

% MEGATRON3000
    Welcome to MEGATRON 3000!
&
  .
  .
  .
& quit
%




                                34
         Megatron 3000
        Sample Sessions
& select *
  from Movie #

Title    Director   Actor
Wild     Lynch      Winger
Sky      Berto      Winger
Reds     Beatty     Beatty
Tango    Berto      Brando
Tango    Berto      Winger
Tango    Berto      Snyder

&
                             35
        Megatron 3000
       Sample Sessions

& select Theater, Movie.Title
  from Movie, Schedule
  where Movie.Title=Schedule.Title
     AND Actor = “Winger” #

    Theater Title
    Odeon   Wild
    Forum   Sky

&
                                     36
      Megatron 3000
     Sample Sessions

& select *
  from Movie | LPR #
&
      Result sent to LPR (printer).




                                      37
      Megatron 3000
     Sample Sessions

& select *
  from Movie
  where Actor = “Winger” | T #
&

        New relation T created.




                                  38
              Megatron 3000

• To execute “select * from Movie where
  condition”:
     (1) Read dictionary to get Movie attributes
     (2) Read Movie file, for each line:
           (a) Check condition
           (b) If OK, display




                                                   39
              Megatron 3000

• To execute “select * from Movie
                  where condition | T”:
     (1) Process select as before
     (2) Write results to new file T
     (3) Append new line to dictionary




                                          40
                   Megatron 3000

• To execute
       select Theater, Movie.Title
       from Movie, Schedule
       where Movie.Title=Schedule.Title
                   AND Actor = “Winger”
      (1) Read dictionary to get Movie, Schedule attributes
      (2) Read Movie file, for each line:
              (a) Read Schedule file, for each line:
                      (i) Create join tuple
                      (ii) Check condition
                      (iii) Display if OK


                                                              41
What‟s wrong with the Megatron
         3000 DBMS?

• Tuple layout on disk
e.g.,   - Change string from „Cat‟ to „Cats‟ and we
        have to rewrite file
        - ASCII storage is expensive
        - Deletions are expensive




                                                      42
What‟s wrong with the Megatron
         3000 DBMS?

• Search expensive; no indexes
e.g.,   - Cannot find tuple with given key quickly
        - Always have to read full relation




                                                     43
What‟s wrong with the Megatron
         3000 DBMS?

• Brute force query processing
e.g.,
   select Theater, Movie.Title
   from Movie, Schedule
   where Movie.Title=Schedule.Title
       AND Actor = “Winger”
• Much better if
   – Use index to select tuples with “Winger” first
   – Use index to find theaters where qualifies titles play
• Or
   – Sort both relations on title and merge
• Exploit caches and buffers

                                                              44
What‟s wrong with the Megatron
         3000 DBMS?

• Concurrency control & recovery
   – Hey, there is a reason for the lower price
     point 
• No reliability
e.g.,   - Can lose data
        - Can leave operations half done




                                                  45
What‟s wrong with the Megatron
         3000 DBMS?

• Security
• Interoperation with other systems
• Consistency enforcement




                                      46
               Course Topics

• Hardware aspects
• Physical Organization Structure
     Records in blocks, dictionary, buffer management,…
• Indexing
     B-Trees, hashing,…
• Query Processing
     rewriting, physical operators, cost-based optimization,
             semantic optimization…
• Crash Recovery
     Failures, stable storage,…
                                                               47
               Course Topics

• Concurrency Control
     Correctness, locks, deadlocks…

• Security & Integrity
     Authorization, encryption,…

• On-Line Analytical Processing
• Distributed Databases
     Interoperation, distributed recovery,…



                                              48
     Database System Architecture
    Query Processing                   Transaction Management
        SQL query                      Calls from Transactions (read,write)

         Parser                               Transaction
            relational algebra                 Manager
         Query            View
                        definitions           Concurrency             Lock
       Rewriter
         and                                   Controller             Table
                        Statistics &
       Optimizer        Catalogs &
                        System Data
query execution                                 Recovery
plan                                            Manager
       Execution         Buffer
        Engine          Manager

          Data + Indexes                           Log
 The Journey of a Query (Example)
SELECT t.Actor                               pt.Actor
FROM Movie t,s             Parsing
WHERE t.Title=s.Title                        st.Title=s.Title AND s.Actor=“Winger”
 AND s.Actor=“Winger”
                                             x
                                   Movie t    Movie s

                                                   Rewriting
               pt.Actor
                                                         pt.Actor
 JOIN          ss.Actor=“Winger”
                                                         ss.Actor=“Winger”
                 t.Title=s.Title       Rewriting
                                                         st.Title=s.Title
     Movie t    Movie s
                                                         x
                                               Movie t    Movie s
        Next Page
  The Journey of a Query (cont‟d)
          pt.Actor                                          pt.Actor
          ss.Actor=“Winger”                                   t.Title=s.Title
                               Algebraic
            t.Title=s.Title
                              Optimization                    ss.Actor=“Winger”
Movie t    Movie s                               Movie t   Movie s
                                            Cost-Based           index on Actor and
   what are the rules used for              Optimization        Title, unsorted tables,
   the transformation of the query ?                               tables>>memory
   how do we evaluate the cost of              pt.Actor
   a possible execution plan ?                    LEFT INDEX
                                                  t.Title=s.Title
                                                                   Query Execution
                                                                        Plan
                                                   sINDEX
                                                    s.Actor=“Winger”
                                       Movie t    Movie s
   The Journey of a Query (cont‟d)
  ActorIndex                            Wild      Lynch            Winger
                                        Sky       Berto            Winger
                                        Reds      Beatty           Beatty
                                        Tango     Berto            Brando
                Winger
                                        Tango     Berto            Winger
  TitleIndex                            Tango     Berto            Snyder


                                            How is the table arranged on the disk ?
EXECUTION ENGINE
                                            Are tuples with the same Actor value
                                            clustered (consecutive) ?
find “Winger” tuples using Actorindex
                                            What is the exact structure of the
  for each “Winger” tuple
                                            index (tree, hash table,…) ?
    find tuples t with the same title
    using TitleIndex
    project the attribute Actor of t

								
To top