Docstoc

the relational modelppt PowerPoint Presentation.ppt

Document Sample
the relational modelppt  PowerPoint Presentation.ppt Powered By Docstoc
					The Relational Model
                            Relations
                                              S.S.N       street
A relation is a more           name                                city
concrete construction, of
something we have seen                          students
before, the ER diagram.

A relation is (just!) a
table!                        name    S.S.N      street     city

We will use table and         Lisa 1272 Blaine Riverside
relation interchangeably,     Bart 5592 Apple Irvine
except where there is a
                              Lisa 7552 11th   Riverside
possibility of confusion.
                              Sue     5555 Main             Oceanside
                                      The students relation
 A relation consists of a relational schema and a relational instance.


 A relation schema is essentially a list of column names with their
 data types. In this case…

 students(name : string, S.S.N : string, street : string, city : string)

• An relation instance
is made up of zero of            name    S.S.N     street     city
more tuples (rows,               Lisa 1272 Blaine Riverside
records)
                                 Bart 5592 Apple Irvine
                                 Lisa 7552 11th   Riverside
                                 Sue     5555 Main            Oceanside
   A schema specifies a relation’s name.


students(name : string, S.S.N : string, street : string, city : string)




A schema also specifies the name of each field, and its domain.




 Fields are often referred to as columns, attributes, dimensions
A minor, but important point about relations, they are unordered.



name   S.S.N   street   city          name   S.S.N   city        street
Lisa   1272    Blaine   Riverside     Lisa   1272    Riverside   Blaine
Bart   5592    Apple    Irvine        Bart   5592    Irvine      Apple
Lisa   7552    11th     Riverside     Lisa   7552    Riverside   11th
Sue    5555    Main     Oceanside     Sue    5555    Oceanside Main


This is not a problem, since we refer to fields by name.

However sometimes, we refer to the fields by their column number,
in which case the ordering becomes important. I will point this out
when we get there.

Also, the tuples are unordered too!
Note that every tuple in our instance is unique. This is not a
coincidence. The definition of relation demands it.

Later we will see how we can represent weak entities in relations.




                               name    S.S.N    street    city
                               Lisa 1272 Blaine Riverside
                               Bart 5592 Apple Irvine
                               Lisa 7552 11th   Riverside
                               Sue     5592 Main          Oceanside
                                The number of fields is called the
                                degree (or arity, or dimensionality
                                of the relation).

                                Below we have a table of degree 4.


The number of tuples
cardinality of the relation
                                 name   S.S.N    street   city
Of course, we don’t count the
                                 Lisa 1272 Blaine Riverside
row that has the labels!
                                 Bart 5592 Apple Irvine
To the right we have a table     Lisa 7552 11th   Riverside
of cardinality 3.
   students(name : string, S.S.N : string, street : string, city : string)


Note that relations have primary keys, just like ER
diagrams. Remember that the primary key might not be
one field, it may be a combination of two or more fields.


                               name    S.S.N     street    city
                               Lisa 1272 Blaine Riverside
                               Bart 5592 Apple Irvine
                               Lisa 7552 11th   Riverside
                               Sue     5555 Main           Oceanside
Translating ER diagrams into Relations
 We need to figure out how to translate ER diagrams into relations.

 There are only three cases to worry about.
    • Strong entity sets
    • Weak entity sets
    • Relationship sets
                                days
      Number            Name                  PID               Name


               Course          Teaches              Professor
• Strong entity sets
                                 days
  Number            Name                           PID               Name


           Course               Teaches                  Professor




                                    professor(PID : string, name : string)

                                                    PID        name
 This is trivial, the primary key
 of the ER diagram becomes                          1234       Keogh
 the primary key of the                             3421       Lee
 relation. All other fields are
                                                    2342       Smyth
 copied in (in any order)
                                                    4531       Lee
• Weak entity sets
                                 days
  Number            Name                             PID               Name


           Course               Teaches                    Professor




course(PID : string, number : string, name : string)

PID        number     name          The primary key of the relation
                                    consists of the union of the primary
1234       CS12       C++           key of the strong entity set and the
3421       CS11       Java          discriminator of the weak entity set.
                                    The “imported” key from the strong
2342       CS12       C++           entity set is called the foreign key.

4531       CS15       LISP          All other fields are copied in (in any
                                    order)
  • Relationship entity sets
                                            days
     Number               Name                               PID               Name


                Course                     Teaches                 Professor




                                teaches(PID : string, days : string )
 For one-to-one relationship sets,
the relation’s primary key can be
that of either entity set.                PID        days
• For many-to-many relationship           1234       mwf
sets, the union of the primary keys
becomes the relation’s primary key        3421       wed
•For the other cases, the the
relation’s primary key is taken from
                                          2342       tue
the strong entity set.
                                          4531       sat
                  So, this ER Model…

                          days
Number            Name                 PID               Name


         Course          Teaches             Professor




  … maps to this database schema
   professor(PID : string, name : string)
   course(PID : string, number : string, name : string)
   teaches(PID : string, days : string)
We have seen how to create a database
schema, how do we create an actual
database on our computers?


professor(PID : string, name : string)
course(PID : string, number : string, name : string)
teaches(PID : string, days : string)
…how do we create an actual database
on our computers?

We use SQL, a language that allows us
to build, modify and query databases.
 professor(PID : string, name : string)
 SQL (Structured Query Language)
• SQL is a language that allows us to build, modify and
query databases.
• SQL is an ANSI standard language. American National Standards Institute
• SQL is the “engine” behind Oracle, Sybase, Microsoft
SQL Server, Informix,Access, Ingres, etc.
• Most of these systems have build GUIs on top of the
command line interface, so you don’t normally write
statements directly in SQL (although you can).
                 Important Note
• In our textbook, the authors introduce SQL at the same
time as they introduce the relational model (Chapter 3).

• My plan is a little different. I plan to discuss operations
on databases (using relational algebra) in a more abstract
way, and revisit SQL later in the course.

• I encourage you to glance at the SQL material as you
read about the relational model in Chapter 3, but don’t
worry about the details of SQL just yet.
               Relational Algebra
• Procedural language                        SQL is closely based
• Five basic operators                       on relational algebra.
       • selection                  select
       • projection                 project
       • union                      (why no intersection?)
       • set difference             difference
       • Cross product              Cartesian product

• The are some other operators which are composed of the above
operators. These show up so often that we give them special names.
• The operators take one or two relations as inputs and give a new
relation as a result.
            Select Operation – Example
 • Relation r        A   B   C    D   Intuition: The select operation
                           1    7   allows us to retrieve some rows
                           5    7
                                      of a relation (by “some” I mean
                                      anywhere from none of them to
                           12   3
                                      all of them)
                           23 10

                                      Here I have retrieved all the
                                      rows of the relation r where
• A=B ^ D > 5 (r)                    either the value in field A
                     A   B   C    D
                                      equals the value in field B, or
                            1   7   the value in field D is greater
lowercase                  23 10    than 5.
Greek
sigma
                 Select Operation
• Notation:  p(r)                      lowercase Greek sigma 

• p is called the selection predicate
• Defined as:
            p(r) = {t | t  r and p(t)}
  Where p is a formula in propositional calculus consisting
  of terms connected by :  (and),  (or),  (not)
  Each term is one of:
           <attribute> op        <attribute> or <constant>
   where op is one of: =, , >, . <. 
• Example of selection:
    name=“Keogh(professor)
         Project Operation – Example I
                 A       B        C

                        10       7
 • Relation r:          20       1
                                      Intuition: The project operation
                                      allows us to retrieve some
                        30       1
                                      columns of a relation (by
                        40       2   “some” I mean anywhere from
                                      none of them to all of them)
• A,C (r)
                     A        C
                                      Here I have retrieved columns
                             7       A and C.
                             1
Greek capital                1
letter pi                    2
      Project Operation – Example II
                    A   B    C

                       10   1
                                     Intuition: The project
• Relation r:          20   1       operation removes
                       30   1       duplicate rows, since
                       40   2       relations are sets.

• A,C (r)      A   C        A   C
                                     Here there are two rows
                   1           1   with A =  and C = 1. So
                   1   =       1   one was discarded.
                   1           2
                   2
                 Project Operation
• Notation:

             A1, A2, …, Ak (r)     Greek capital letter pi

  where A1, A2 are attribute names and r is a
  relation name.
• The result is defined as the relation of k
  columns obtained by erasing the columns that
  are not listed
• Duplicate rows removed from result, since
  relations are sets.
         Union Operation – Example
                  A       B           A       B

                         1                  2
Relations r, s:
                         2                  3
                         1
                                          s
                      r                           Intuition: The union
                                                  operation concatenates
                                                  two relations, and removes
                              A   B               duplicate rows (since
                                 1
                                                  relations are sets).
 r  s:                          2
                                                  Here there are two rows
                                 1
                                                  with A =  and B = 2. So
                                 3               one was discarded.
                    Union Operation
• Notation: r  s
• Defined as:
            r  s = {t | t  r or t  s}
For r  s to be valid.
  1. r, s must have the same arity (same number of attributes)
  2. The attribute domains must be compatible (e.g., 2nd column
      of r deals with the same type of values as does the 2nd
      column of s).
Although the field types must be the same, the names can be
  different. For example I can union professor and lecturer where:

  professor(PID : string, name : string)
   lecturer(LID : string, first_name : string)
  Set Difference Operation – Example
                  A       B           A       B

                         1                  2
Relations r, s:
                         2                  3
                         1
                                          s
                      r                           Intuition: The set
                                                  difference operation
r – s:                        A   B
                                                  returns all the rows that
                                                  are in r but not in s.
                                 1
                                 1
     Set Difference Operation
• Notation r – s
• Defined as:
      r – s = {t | t  r and t  s}
• Set differences must be taken between
  compatible relations.
  – r and s must have the same arity
  – attribute domains of r and s must be compatible
• Note that in general r – s  s – r
    Cross-Product Operation -Example
Relations r, s:   A       B            C   D    E

                         1               10   a
                                          10   a
                         2               20   b
                                          10   b
                      r
                                           s        Intuition: The cross
 r x s:                                             product operation
                  A       B   C   D    E
                                                    returns all possible
                         1      10   a            combinations of rows in
                         1      19   a
                                                    r with rows in s.
                         1      20   b
                         1      10   b
                         2      10   a            In other words the result
                         2      10   a
                         2      20   b
                                                    is every possible pairing
                         2      10   b            of the rows of r and s.
      Cross-Product Operation-Example
Relations r, s:   A       B            C   D    E

                         1               10   a
                                          10   a
                         2               20   b
                                          10   b
                      r
                                           s        Intuition: The cross
 r x s:                                             product operation
                  A       B   C   D    E
                                                    returns all possible
                         1      10   a            combinations of rows in
                         1      19   a
                                                    r with rows in s.
                         1      20   b
                         1      10   b
                         2      10   a            In other words the result
                         2      10   a
                         2      20   b
                                                    is every possible pairing
                         2      10   b            of the rows of r and s.
      Cross-Product Operation

• Notation r x s
• Defined as:
     r x s = {t q | t  r and q  s}
• Assume that attributes of r(R) and s(S) are
  disjoint. (That is, R  S = ).
• If attributes names of r(R) and s(S) are not
  disjoint, then renaming must be used.
              Composition of Operations
• We can build expressions using                       r x s:
  multiple operations
• Example: A= C(r x s)                            A   B   C   D    E

                                                      1      10   a
                        A        B   C   D    E       1      10   a
                                                      1      20   b
                                1      10   a       1      10   b
                                        10   a       2      10   a
                                2      20   b       2      10   a
                                        10   b       2      20   b
                             r                        2      10   b
                                         s
                                                   A   B   C   D    E

“take the cross product of r                          1    10     a
and s, then return only the          A=C(r x s)      2    20     a
rows where A equals B”                                2    20     b
                Rename Operation
• Allows us to name, and therefore
  to refer to, the results of          A       B
                                                       A       B
  relational-algebra expressions.
                                              1
                                                              2
Example:                                      2
                                                              3
   myRelation (r – s)                        1
                                                           s
                                           r


 Take the set difference of r and s,               A       B
 and call the result myRelation
                                                          1
 Renaming in relational algebra is
                                                          1
 essentiality the same as assignment
 in a programming language                     myRelation
               Rename Operation
If a relational-algebra expression E                  A        B
                                                                       A       B
   has arity n, then                                          1
                                                                              2
     x (A1, A2, …, An) (E)                                   2
                                                                              3
                                                              1
                                                           r               s
returns the result of expression E under
   the name X, and with the attributes
   renamed to A1, A2, …., An.                                      E       K

                                                                          1
                              Take the set difference of
                              r and s, and call the                       1
Example                       result myRelation,
                                                               myRelation
 myRelation(E,K) (r – s)     while renaming the first
                              field E and the second
                              field K.
                 Banking Examples
branch (branch-name, branch-city, assets)
customer (customer-name, customer-street, customer-only)

account (account-number, branch-name, balance)

loan (loan-number, branch-name, amount)

depositor (customer-name, account-number)

borrower (customer-name, loan-number)

Note that I have not indicated primary keys here for simplicity.
             Quick note on notation
      good_customers                        bad_customers
customer-name loan-number                customer-name loan-number
Patty         1234                       Seymour       3432
Apu           3421                       Marge         3467
Selma         2342                       Selma         7625
Ned           4531                       Abraham       3597



 If we have two or more relations which feature
 the same attribute names, we could confuse them.
 To prevent this we can use dot notation.
 For example
                good_customers.loan-number
                     Example Queries
• Find all loans of over $1200            “select from the relation loan,
                                          only the rows which have a
                amount > 1200 (loan)     amount greater than 1200”



                        loan-number     branch-name amount
                        1234            Riverside   1,923.03
              loan      3421            Irvine        123.00
                        2342            Dublin         56.25
                        4531            Prague        120.03


amount > 1200 (loan)   1234          Riverside             1,923.03
                         Example Queries
• Find the loan number for each loan of an amount greater than $1200
                 loan-number (amount > 1200 (loan))

 “select from the relation loan,
 only the rows which have a
 amount greater than 1200, then           loan-number branch-name   amount
 project out just the                     1234        Riverside     1,923.03
 loan_number”
                                          3421        Irvine          123.00
                                   loan
                                          2342        Dublin           56.25
                                          4531        Prague          120.03


 amount > 1200 (loan)                    1234        Riverside     1,923.03



loan-number (amount > 1200 (loan))      1234
                         Example Queries
• Find all loans greater than $1200 or less than $75
                 amount > 1000 or amount < 75(loan)

 “select from the relation loan, only
 the rows which have a amount
 greater than 1000 or an amount less       loan-number branch-name   amount
 than 75                                   1234        Riverside     1,923.03
                                           3421        Irvine          123.00
                                    loan   2342        Dublin           56.25
                                           4531        Prague          120.03

                                           1234        Riverside     1,923.03
 amount > 1000 or amount < 75(loan)
                                           2342        Dublin           56.25
                           Example Queries
  • Find the names of all customers who have a loan, an account, or
    both, from the bank
        customer-name (borrower)  customer-name (depositor)

        borrower                                      depositor
customer-name loan-number                    customer-name account-number
Patty           1234                         Moe             3467
Apu             3421                         Apu             2312
Selma           2342                         Patty           9999
Ned             4531                         Krusty          3423
                                Moe
customer-name (borrower)       Apu            customer-name (depositor)
                   Patty        Patty           Moe
                   Apu          Krusty          Apu
                   Selma        Selma           Patty
                   Ned          Ned             Krusty
                                                                              Note this example is
                              Example Queries                                 split over two slides!

Find the names of all customers who have a loan at the Riverside branch.
customer-name (branch-name=“Riverside ” (borrower.loan-number = loan.loan-number(borrower x loan)))




                                   borrower                                    loan
                        customer-name loan-number            loan-number branch-name amount
 We retrieve            Patty              1234              1234             Riverside        1,923.03
 borrower and
 loan…                  Apu                3421              3421             Irvine             123.00

                       customer-name       borrower.loan   loan.loan-       branch-name       amount
                                           -number         number
                       Patty               1234            1234             Riverside          1,923.03
 …we calculate
                       Patty               1234            3421             Irvine               123.00
 their cross
 product…              Apu                 3421            1234             Riverside          1,923.03
                       Apu                 3421            3421             Irvine               123.00
  customer-name (branch-name=“Riverside ” (borrower.loan-number = loan.loan-number(borrower x loan)))

…we calculate
                       customer-name      borrower.loan    loan.loan-       branch-name       amount
their cross                               -number          number
product…
                       Patty              1234             1234             Riverside          1,923.03
                       Patty              1234             3421             Irvine               123.00
…we select the         Apu                3421             1234             Riverside          1,923.03
rows where
borrower.loan-         Apu                3421             3421             Irvine               123.00
number is equal to
loan.loan-number…      customer-name      borrower.loan    loan.loan-      branch-name        amount
                                          -number          number
                       Patty              1234             1234            Riverside          1,923.03
…we select the
                       Apu                3421             3421            Irvine                123.00
rows where
branch-name is
equal to               customer-name      borrower.loan    loan.loan-      branch-name        amount
                                          -number          number
“Riverside”
                       Patty              1234             1234            Riverside          1,923.03

…we project out
the customer-name.    Patty
                                                                      Note this example is
                         Example Queries                              split over three slides!


Find the largest account balance
    ...we will need to rename account relation as d...
balance(account) - account.balance(account.balance < d.balance (account x d (account)))




                                         account                       d
                                      account- balance        account- balance
We do a rename to                     number                  number
get a “copy” of                       Apu         100.30      Apu           100.30
account which we                      Patty        12.34      Patty          12.34
call d…                               Lenny        45.34      Lenny          45.34



… next we will do
a cross product…
balance(account) - account.balance(account.balance < d.balance (account x d (account)))

                                         account.account-   account.   d.account- d.balance
                                         number             balance    number
                                         Apu                 100.30 Apu              100.30
   … do a cross                          Apu                 100.30 Patty             12.34
   product…                              Apu                 100.30 Lenny             45.34
                                         Patty                 12.34 Apu             100.30
                                         Patty                 12.34 Patty            12.34
                                         Patty                 12.34 Lenny            45.34
                                         Lenny                 45.34 Apu             100.30

   …select out all rows                  Lenny                 45.34 Patty            12.34
                                         Lenny                 45.34 Lenny            45.34
   where account.balance
   is less than
                                         account.account-   account.   d.account- d.balance
   d.balance…                            number             balance    number
                                         Patty                 12.34 Apu             100.30
                                         Patty                 12.34 Lenny            45.34
                                         Lenny                 45.34 Apu             100.30
   .. next we project…
balance(account) - account.balance(account.balance < d.balance (account x d (account)))

                                              account.account-   account.   d.account- d.balance
                                              number             balance    number
                                              Patty                 12.34 Apu             100.30
   .. next we project out                     Patty                 12.34 Lenny            45.34
   account.balance…                           Lenny                 45.34 Apu             100.30

                                                                 account.
   …then we do a set                                             balance
   difference between it                                            12.34
   an the original                       account                    12.34
   account.balance from               account- balance
                                                                    45.34
   the account relation…              number
                                      Apu             100.30
                                      Patty            12.34
   … the set difference               Lenny            45.34
   leaves us with one
   number, the largest
   value!                                     100.30
                  Formal Definition
• A basic expression in the relational algebra consists of
  either one of the following:
   – A relation in the database
   – A constant relation
• Let E1 and E2 be relational-algebra expressions; the
  following are all relational-algebra expressions:
   – E1  E2
   – E1 - E2
   – E1 x E2
   – p (E1), P is a predicate on attributes in E1
   – s(E1), S is a list consisting of some of the attributes in E1
   –  x (E1), x is the new name for the result of E1
            Additional Operations
We define additional operations that do not add
 any power to the relational algebra, but that
 simplify common queries.
  – Natural join
                                  All joins are really
  –   Conditional Join            special cases of
  –   Equi join                   conditional join

  –   Division
  –   Set intersection
Natural-Join Operation: Motivation
Very often we have a query and the                 borrower                                               loan
answer is not contained in a single             cust-name l-number                         l-number              branch
relation. For example, I might wish to          Patty               1234                   1234                  Dublin
know where Apu banks.
                                                Apu                 3421                   3421                  Irvine
The classic relational algebra way to do
such queries is a cross product, followed
                                                      cust-name      borrower.l-number          loan.l-number         branch
by a selection which tests for equality on
some pair of fields.                                  Patty          1234                       1234                  Dublin
                                                      Patty          1234                       3421                  Irvine
borrower.l-number = loan.l-number(borrower x         Apu            3421                       1234                  Dublin
loan)))                                               Apu            3421                       3421                  Irvine
While this works…
    • it is unintuitive                                 cust-name     borrower.l-number           loan.l-number        branch

    • it requires a lot of memory                       Patty         1234                        1234                 Dublin
    • the notation is cumbersome                        Apu           3421                        3421                 Irvine
                                                           Note that is this example the two relations are the same size (2 by
                                                           2), this does not have to be the case.


 So we have a more intuitive way of achieving the same effect,
 the natural join, denoted by the  symbol
 Natural-Join Operation: Intuition
Natural join combines a cross product and a selection into one
operation. It performs a selection forcing equality on those attributes
that appear in both relation schemes. Duplicates are removed as in
all relation operations.
So if the relations have one attribute in common, as in the last slide
(“l-number”), for example, we have…
   borrower   loan   = borrower.l-number = loan.l-number(borrower x loan)))


There are two special cases:
   • If the two relations have no attributes in common, then their
   natural join is simply their cross product.
   • If the two relations have more than one attribute in common,
   then the natural join selects only the rows where all pairs of
   matching attributes match. (lets see an example on the next slide).
                 l-name               f-name            age                         l-name      f-name       ID

       A         Bouvier              Selma             40
                                                                              B     Bouvier     Selma        1232
                 Bouvier              Patty             40                          Smith       Selma        4423
                 Smith                Maggie 2

                                                                 l-name    f-name    age     l-name      f-name         ID
Both the l-name and the
                                                                 Bouvier   Selma     40      Bouvier     Selma          1232
f-name match, so select.
                                                                 Bouvier   Patty     40      Smith       Selma          4423

Only the f-names match,                                          Smith     Maggie 2          Bouvier     Selma          1232
so don’t select.                                                 Bouvier   Selma     40      Smith       Selma          4423
                                                                 Bouvier   Patty     40      Bouvier     Selma          1232
Only the l-names match,                                          Smith     Maggie 2          Smith       Selma          4423
so don’t select.

   We remove duplicate                                           l-name    f-name    age      l-name     f-name         ID
   attributes…                                                   Bouvier   Selma     40       Bouvier    Selma          1232


        The natural join of A and B                                                   l-name         f-name       age   ID
Note that this is just a way to visualize the natural join, we
                                                                   A        B=        Bouvier        Selma        40    1232
don’t really have to do the cross product as in this example
              Natural-Join Operation
• Notation: r s
• Let r and s be relations on schemas R and S respectively.The result is
  a relation on schema R  S which is obtained by considering each
  pair of tuples tr from r and ts from s.
• If tr and ts have the same value on each of the attributes in R  S, a
  tuple t is added to the result, where
   – t has the same value as tr on r
   – t has the same value as ts on s
• Example:
      R = (A, B, C, D)
      S = (E, B, D)
• Result schema = (A, B, C, D, E)
• r s is defined as:

  r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
    Natural Join Operation – Example
• Relations r, s:
         A    B       C   D                                 B    D     E

             1          a                                 1    a    
             2          a                                 3    a    
             4          b                                 1    a    
             1          a                                 2    b    
             2          b                                 3    b    
                  r                                               s

r   s                                             How did we get here?
                              A   B   C   D   E
                                 1      a      Lets do a trace over the
                                 1      a      next few slides…
                                 1      a   
                                 1      a      Warning! Example spread
                                 2      b      over many slides, you may
                                                  wish to edit before printing.
          A   B       C   D                  B   D   E

             1          a                  1   a   
             2          a                  3   a   
             4          b                  1   a   
             1          a                  2   b   
             2          b                  3   b   
                  r                              s




First we note which attributes the two relations have in common…
            A   B       C   D                       B   D   E

               1          a                       1   a   
               2          a                       3   a   
               4          b                       1   a   
               1          a                       2   b   
               2          b                       3   b   
                    r                                   s

                                A   B   C   D   E
                                   1      a   
                                   1      a   




There are two rows in s that match our first row in r, (in the relevant
attributes) so both are joined to our first row…
             A   B       C   D                       B   D   E

                1          a                       1   a   
                2          a                       3   a   
                4          b                       1   a   
                1          a                       2   b   
                2          b                       3   b   
                     r                                   s


                                 A   B   C   D   E
                                    1      a   
                                    1      a   




…there are no rows in s that match our second row in r, so do
nothing…
             A   B       C   D                       B   D   E

                1          a                       1   a   
                2          a                       3   a   
                4          b                       1   a   
                1          a                       2   b   
                2          b                       3   b   
                     r                                   s


                                 A   B   C   D   E
                                    1      a   
                                    1      a   




…there are no rows in s that match our third row in r, so do
nothing…
            A   B       C   D                       B   D   E

               1          a                       1   a   
               2          a                       3   a   
               4          b                       1   a   
               1          a                       2   b   
               2          b                       3   b   
                    r                                   s


                                A   B   C   D   E
                                   1      a   
                                   1      a   
                                   1      a   
                                   1      a   


There are two rows in s that match our fourth row in r, so both are
joined to our fourth row…
             A   B       C   D                       B   D   E

                1          a                       1   a   
                2          a                       3   a   
                4          b                       1   a   
                1          a                       2   b   
                2          b                       3   b   
                     r                                   s


                                 A   B   C   D   E
                                    1      a   
                                    1      a   
                                    1      a   
                                    1      a   
                                    2      b   

There is one row that matches our fifth row in r,.. so it is joined to
our fifth row and we are done!
        Conditional-Join Operation:
The conditional join is actually the most general type of join. I
introduced the natural join first only because it is more intuitive
and.. natural!
Just like natural join, conditional join combines a cross product and
a selection into one operation. However instead of only selecting
rows that have equality on those attributes that appear in both
relation schemes, we allow selection based on any predicate.

   r     c   s = c(r x s)            Where c is any predicate
                                      the attributes of r and/or s


Duplicate rows are removed as always, but duplicate columns are
not removed!
              Conditional-Join Example:
    We want to find all women that are older than their husbands…

     l-name      f-name marr-Lic        age           l-name     f-name      marr-Lic age
     Simpson     Marge     777          35            Simpson Homer          777         36
r    Lovejoy     Helen     234          38       s    Lovejoy    Timothy 234             36
     Flanders    Maude 555              24            Simpson Bart           null        9
     Krabappel Edna        978          40



        r        r.age > s.age AND r.Marr-Lic = r.Marr-Lic                   s
    r.l-name    r.f-name   r.Marr-Lic    r.age   s.l-name   s.f-name   s.marr-Lic   s.age
    Lovejoy     Helen      234           38      Lovejoy    Timothy    234          36


     Note we have removed ambiguity of attribute names by using “dot” notation
     Also note the redundant information in the marr-lic attributes
 Set-Intersection Operation - Example

                 A       B
Relation r, s:                       A       B
                        1                  2
                        2                  3
                        1

                     r                   s
                                                 Intuition: The
                                                 intersection operation
                                                 returns all the rows that
                                                 are in both r and s.
            rs
                         A       B

                                2
       Set-Intersection Operation

•   Notation: r  s
•   Defined as:
•   r  s ={ t | t  r and t  s }
•   Assume:
     – r, s have the same arity
    – attributes of r and s are compatible
• Note: r  s = r - (r - s)
  r /s
               Division Operation
• Suited to queries that include the phrase “for all”.
• Let r and s be relations on schemas R and S
  respectively where
  – R = (A1, …, Am, B1, …, Bn)
  – S = (B1, …, Bn)
  The result of r / s is a relation on schema
  R – S = (A1, …, Am)


     r / s = { t | t   R-S(r)   u  s ( tu  r ) }
          Division Operation – Example
   Relations r, s:      A    B                   B

                            1                   1
                            2                   2
                            3
                            1                   s
                            1
                            1
                            3
                            4                                              A
                            6
                            1                                              
                            2
                                                                   r / s:
                                                                            
                              r

 occurs in the presence of both 1 and 2, so it is returned.
 occurs in the presence of both 1 and 2, so it is returned.
 does not occur in the presence of both 1 and 2, so is ignored.
...
             Another Division Example
   Relations r, s:
                          A   B    C   D    E              D        E

                             a       a    1              a        1
                             a       a    1              b        1
                             a       b    1                   s
                             a       a    1
                             a       b    3
                             a       a    1
                             a       b    1                           A   B   C
                             a       b    1           r /s:              a   
                                  r
                                                                           a   



<, a , > occurs in the presence of both <a,1> and <b,1>, so it is returned.
< , a , > occurs in the presence of both <a,1> and <b,1>, so it is returned.
<, a , > does not occur in the presence of both <a,1> and <b,1>, so it is ignored.
               Assignment Operation
• The assignment operation () provides a convenient way to
  express complex queries, write query as a sequential program
  consisting of a series of assignments followed by an
  expression whose value is displayed as a result of the query.
• Assignment must always be made to a temporary relation
  variable.
• Example: Write r  s as
                 temp1  R-S (r)
                 temp2  R-S ((temp1 x s) – R-S,S (r))
                 result = temp1 – temp2
  – The result to the right of the  is assigned to the relation variable on the
     left of the .
  – May use variable in subsequent expressions.
Extended Relational-Algebra-Operations


       • Generalized Projection
       • Outer Join
       • Aggregate Functions
           Generalized Projection
• Extends the projection operation by allowing arithmetic
  functions to be used in the projection list.

                 F1, F2, …, Fn(E)
• E is any relational-algebra expression
• Each of F1, F2, …, Fn are are arithmetic expressions
  involving constants and attributes in the schema of E.
• Given relation credit-info(customer-name, limit, credit-
  balance), find how much more each person can spend:
  customer-name, limit – credit-balance (credit-info)
           Generalized Projection
Given relation credit-info(customer-name, limit, credit-
  balance), find how much more each person can spend:
  customer-name, limit – credit-balance (credit-info)
                             customer-name     limit       credit-balance
                             Simpson, Marge      500 400
                             Lovejoy, Helen     2000 1500
             credit-info
                             Flanders, Maude           0 0
                             Krabappel, Edna       50 11


                                                  100
                                                  500
                                                       0
                                                   39
Aggregate Functions and Operations
• Aggregation function takes a collection of values and returns
  a single value as a result.
                 avg: average value
                 min: minimum value
                 max: maximum value
                 sum: sum of values
                 count: number of values
• Aggregate operation in relational algebra
                    G1, G2, …, Gn   g F1( A1), F2( A2),…, Fn( An) (E)
   –   E is any relational-algebra expression
   –   G1, G2 …, Gn is a list of attributes on which to group (can be empty)
   –   Each Fi is an aggregate function (i.e avg, min, max etc)
   –   Each Ai is an attribute name
Aggregate Operation – Example
                           A    B   C

• Relation r:                     7
                                  7
                                  3
                                  10




                           sum-C
  g sum(c) (r)
                               27




i.e we want to find the sum of all the numbers in attribute C
        Aggregate Operation – Example
          Relation account grouped by last-name:
                                   last-name     account-number    balance
          account          Simpson                   A-102          400
                           Simpson                   A-201          900
                           Flanders                  A-217          750
i.e calculate the
                           Flanders                  A-215          750
total balances,
                           Nahasapeemapetilon        A-222         11700
grouped by last-
name.

last-name    g      sum(balance)   (account)
                                                Yes yes, I make
         last-name             balance
                                                good money, but
                                                  I was shot 14
  Simpson                           1300
                                                 times last year
  Flanders                          1500
  Nahasapeemapetilon               11700
                          Outer Join
• An extension of the join operation that avoids loss of
  information.
• Computes the join and then adds tuples from one
  relation that does not match tuples in the other
  relation to the result of the join.
• Uses null values:
   – null signifies that the value is unknown or does not exist
   – All comparisons involving null are (roughly speaking) false
     by definition.
      • Will study precise meaning of comparisons with nulls later
          Outer Join – Example

                      loan-number   branch-name   amount
• Relation loan       L-170         Springfield   3000
                      L-230         Shelbyville   4000
                      L-260         Dublin        1700


 Relation borrower      customer-name loan-number
                         Simpson         L-170
                         Wiggum          L-230
                         Flanders        L-155
            Outer Join – Example
                    loan-number    branch-name   amount   customer-name
                   L-170           Springfield    3000    Simpson
• Inner Join       L-230           Shelbyville    4000    Wiggum


 loan   Borrower



• Left Outer Join
 loan   borrower

                   loan-number    branch-name    amount   customer-name
                   L-170          Springfield    3000     Simpson
                   L-230          Shelbyville    4000     Wiggum
                   L-260          Dublin         1700     null
            Outer Join – Example
                     loan-number    branch-name   amount   customer-name
                    L-170          Springfield     3000     Simpson
Right Outer Join    L-230          Shelbyville     4000     Wiggum
                    L-155          null             null    Flanders

loan     borrower




Full Outer Join     loan-number    branch-name    amount   customer-name
                    L-170          Springfield    3000     Simpson
                    L-230          Shelbyville    4000     Wiggum
  loan   borrower
                    L-260          Dublin         1700     null
                    L-155          null            null    Flanders
                           Null Values
• It is possible for tuples to have a null value, denoted by null, for
  some of their attributes
• null signifies an unknown value or that a value does not exist.
• The result of any arithmetic expression involving null is null.
• Aggregate functions simply ignore null values
   – Is an arbitrary decision. Could have returned null as result instead.
   – We follow the semantics of SQL in its handling of null values
• For duplicate elimination and grouping, null is treated like any
  other value, and two nulls are assumed to be the same
   – Alternative: assume each null is different from each other
   – Both are arbitrary decisions, so we simply follow SQL
                            Null Values
• Comparisons with null values return the special truth value unknown
   – If false was used instead of unknown, then not (A < 5)
               would not be equivalent to       A >= 5
• Three-valued logic using the truth value unknown:
   – OR: (unknown or true)     = true,
         (unknown or false)    = unknown
         (unknown or unknown) = unknown
   – AND: (true and unknown)        = unknown,
           (false and unknown)     = false,
           (unknown and unknown) = unknown
   – NOT: (not unknown) = unknown
   – In SQL “P is unknown” evaluates to true if predicate P evaluates to unknown
• Result of select predicate is treated as false if it evaluates to
  unknown

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:26
posted:6/30/2010
language:English
pages:77
lily cole lily cole
About