Docstoc

PPT - Database Systems – Set Theory

Document Sample
PPT - Database Systems – Set Theory Powered By Docstoc
					                        Database Systems – Set Theory
RELATIONS

   A relational database consists of tables, each of which is assigned a unique name.
   A row in a table represents a relationship among a set of values.
   A table is a collection of such relationships.
   Column Headers are commonly referred to as attributes

Websites-Schema=(website, organization, first-year, category)‫‏‬

websites relation:

      website                         organization          first-year category
      www.zojjed.com                  Walking Promotions    2006        Fiction
      www.racewalk.com                Walking Promotions    1995        Health
      www.greattreks.com              Walking Promotions    2006        Travel
      www.twofeetgallery.com          Walking Promotions    2004        Photograph
      www.walkinghealthy.com          Walking Promotions    2002        Health
      www.cs.drexel.edu/~jsalvage     Drexel University     2005        Education
                                Database Systems – Set Theory
DOMAINS

 A Domain is the set of permitted values for a column/attribute.
 The domain can be any positive number as in the case with first year

 The domain can be a series of letters up to a maximum number of letters as in the
case with organization.
 The domain can be valid web addresses, whose rules might be slightly more
complicated.

If
D1   denotes   the   set   of   all   websites
D2   denotes   the   set   of   all   organizations
D3   denotes   the   set   of   all   first years
D4   denotes   the   set   of   all   categories

Any row of websites must contain a 4-tuple(v1,v2,v3, v4) where
v1 is a website in the domain D1
v2 is a organization in the domain D2
v3 is year in the domain D3
V4 is a category in the domain D4

Therefore, account is a subset of D1xD2xD3xD4.
                       Database Systems – Set Theory
DOMAINS

In general a table must be a subset of D1xD2x…xDn-1xDn

Tables vs. Relations

There exists a close relationship between this language and the terminology used in
databases.

Instead of numbers DB’s use names.

Relation -> table
tuple -> row

Websites table has 6 tuples.
                         Database Systems – Set Theory
TUPLE NOTATION

If t is a variable denoting the first tuple relationship, then t[website] denotes the
website attribute of the tuple t.

t[website] = “www.zojjed.com”
t[organization]=”Walking Promotions”
t[first-year] = 2006
t[category] = “Fiction”

Alternatively, using numbered attributes:
t[1] = “www.zojjed.com”
t[2]= ”Walking Promotions”
t[3] = 2006
t[4] = “Fiction”

t  r, indicate the tuple t is in the relation r
                       Database Systems – Set Theory
DOMAINS

It is possible for several attributes to have the same domain, i.e. several attributes can
contain the same domain of possible values.

Later we will introduce a customer relation. It has a customer name; if Ialso had an
employee table with the field employee name, technically they both have the same
domain.

It depends upon how you look at it. If the domain is the set of all possible names, this
is true.

What about the domains website and first-year. They are incompatible.

What about website and category? While they both store character strings, the website
domain has a more limited domain.

In a set, an attribute may contain the value Null.

For now we will assume they do not.
                       Database Systems – Set Theory
DATABASE SCHEMAS
 Logical design of the database

 defines the type definition of a variable



DATABASE INSTANCE
 Snapshot of the database at a given time

 an instance of a variable



A database schema in relations is defined by using a capitalized name for the
relationship-schema and a lowercase name of each attribute. An instance of a relation
is represented by a lowercase name.

Websites-schema(website, organization, first-year, category)‫‏‬

A relation on the Website-schema is as follows:
websites(Website-schema)‫‏‬

Side notes, very important:

A relation has no order
A relation cannot contain duplicate tuples
                     Database Systems – Set Theory
Customers-Schema=(website, first-name, last-name)‫‏‬

customers Relation
 website                       first-name last-name
 www.zojjed.com                Derek       Jeter
 www.zojjed.com                Chase       Utley
 www.drexel.edu/~jsalvage      Jeremy      Johnson
 www.racewalk.com              Ryan        Howard
 www.zojjed.com                Ryan        Howard

 Notice the website attribute appears in both the customers relation and
 Websites relation.

 This is not a coincidence, fields often are repeated.

 This allows distinct relations to be related.

 If we wanted to gather website details for customer websites we need
 information from both relations
                     Database Systems – Set Theory
Combined information from website and customers relations


  website                         category      first-name   last-name
  www.zojjed.com                  Fiction       Derek        Jeter
  www.zojjed.com                  Fiction       Chase        Utley
  www.cs.drexel.edu/~jsalvage     Education     Jeremy       Johnson
  www.racewalk.com                Health        Ryan         Howard
  www.zojjed.com                  Fiction       Ryan         Howard



  In real databases, unique id fields would be used to identify the
  customer and the website so the website name would not be repeated
  in both tables.
                      Database Systems – Set Theory
Instead of having two schemas, it’s possible to have one schema as follows:

WebsiteCustomers(website, organization, first-year, category, first-name, last-name)‫‏‬

What is wrong with this?
                          Database Systems – Set Theory
Redundant data as well as null fields.


website                       organization         first-year   category      first-name   last-name
www.zojjed.com                Walking Promotions   2006         Fiction       Derek        Jeter
www.racewalk.com              Walking Promotions   1995         Health        Ryan         Howard
www.greattreks.com            Walking Promotions   2006         Travel        Null         Null

www.twofeetgallery.com        Walking Promotions   2004         Photographs   Null         Null
www.walkinghealthy.com        Walking Promotions   2002         Health        Null         Null
www.zojjed.com                Walking Promotions   2006         Fiction       Ryan         Howard
www.zojjed.com                Walking Promotions   2006         Fiction       Chase        Utley
www.cs.drexel.edu/~jsalvage   Drexel University    2005         Education     Jeremy       Johnson
                      Database Systems – Set Theory
hit-counts-Schema= (website, date, hit-count)‫‏‬

hit-counts relation

  website                        date            hit-count
  www.zojjed.com                 5/20/2007       5
  www.racewalk.com               5/20/2007       2019
  www.greattreks.com             5/20/2007       1050
  www.twofeetgallery.com         5/20/2007       32
  www.walkinghealthy.com         5/20/2007       159
  www.zojjed.com                 5/21/2007       6
  www.zojjed.com                 5/22/2007       5
  www.cs.drexel.edu/~jsalvage    5/20/2007       376
  www.racewalk.com               5/21/2007       2099

  Is there anything wrong with the above relation?
                      Database Systems – Set Theory
hit-counts-Schema= (website, date, hit-count)‫‏‬

hit-counts relation

  website                        date            hit-count
  www.zojjed.com                 5/20/2007       5
  www.racewalk.com               5/20/2007       2019
  www.greattreks.com             5/20/2007       1050
  www.twofeetgallery.com         5/20/2007       32
  www.walkinghealthy.com         5/20/2007       159
  www.zojjed.com                 5/21/2007       6
  www.zojjed.com                 5/22/2007       5
  www.cs.drexel.edu/~jsalvage    5/20/2007       376
  www.racewalk.com               5/21/2007       2099

  Is there anything wrong with the above relation?


  No, there is no reason why we cannot list a website more than once.
                       Database Systems – Set Theory
If we did not care about the date and only cared about the hit count, could we define
the hit-counts Schema as follows:

hit-counts-Schema= (website, hit-count)‫‏‬

hit-counts relation:

        website                         hit-count
        www.zojjed.com                  5
        www.racewalk.com                2019
        www.greattreks.com              1050
        www.twofeetgallery.com          32
        www.walkinghealthy.com          159
        www.zojjed.com                  6
        www.zojjed.com                  5
        www.cs.drexel.edu/~jsalvage     376
        www.racewalk.com                2099
                       Database Systems – Set Theory
If we did not care about the date and only cared about the hit count, could we define
the hit-counts Schema as follows:

hit-counts-Schema= (website, hit-count)‫‏‬

hit-counts relation:

        website                         hit-count
        www.zojjed.com                  5
        www.racewalk.com                2019
        www.greattreks.com              1050
        www.twofeetgallery.com          32
        www.walkinghealthy.com          159
        www.zojjed.com                  6
        www.zojjed.com                  5
        www.cs.drexel.edu/~jsalvage     376
        www.racewalk.com                2099

In real databases there would be no problem, but we said that you cannot repeat
tuples in a relation. So the answer is no, we cannot use this schema because of
duplicates.
                       Database Systems – Set Theory
QUERY LANGUAGES

A query language is a language in which the user requests information from the
database.

Can be procedural or non-procedural.

We will study Relational Algebra

It Is a procedural language consisting of sets of operations that take one or two
relations as input and output a relation. Operations include:
 select

 project

 union

 set difference

 Cartesian product

 Rename

 Intersection

 Aggregate functions



We will also study various forms of joining relations.
                       Database Systems – Set Theory
Unary- operates on one relation
Binary – operates on a pair of relations

The Select Operation
Unary operation

Selects tuples that satisfy a given predicate

 - represents a select operation - sigma

<select condition>(R)‫‏‬

  <selection condition> = <attribute name> <comparison op> <constant value> or
  <selection condition> = <attribute name> <comparison op> <attribute name>

comparison operators are: =, <>, <, <=, >, >=

equal, not equal, less than, less than or equal to, greater than, greater than or equal
to
                         Database Systems – Set Theory
To select those tuples of the hit-counts relation where the website is “www.zojjed.com”
we write.
website = “www.zojjed.com”(hit-counts)

This returns the relation:                hit-counts relation

 website          date        hit-count     website                       date        hit-count
 www.zojjed.com   5/20/2007   5             www.zojjed.com                5/20/2007   5
 www.zojjed.com   5/21/2007   6             www.racewalk.com              5/20/2007   2019
 www.zojjed.com   5/22/2007   5             www.greattreks.com            5/20/2007   1050

                                            www.twofeetgallery.com        5/20/2007   32
                                            www.walkinghealthy.com        5/20/2007   159
                                            www.zojjed.com                5/21/2007   6
                                            www.zojjed.com                5/22/2007   5
                                            www.cs.drexel.edu/~jsalvage   5/20/2007   376
                                            www.racewalk.com              5/21/2007   2099
                         Database Systems – Set Theory
To select those tuples of the hit-counts relation where the hit-count is greater than
1000 we write.
hit-count > 1000 (hit-counts)

This returns the relation:                     hit-counts relation
  website              date        hit-count   website                       date        hit-count
  www.racewalk.com     5/20/2007   2019        www.zojjed.com                5/20/2007   5
  www.greattreks.com   5/20/2007   1050        www.racewalk.com              5/20/2007   2019
  ww.racewalk.com      5/21/2007   2099        www.greattreks.com            5/20/2007   1050

                                               www.twofeetgallery.com        5/20/2007   32
                                               www.walkinghealthy.com        5/20/2007   159
                                               www.zojjed.com                5/21/2007   6
                                               www.zojjed.com                5/22/2007   5
                                               www.cs.drexel.edu/~jsalvage   5/20/2007   376
                                               www.racewalk.com              5/21/2007   2099
                            Database Systems – Set Theory
Can combine predicates with and, or, and not

To select those tuples of the hit-counts relation where the hit-count is greater than 5
and the website is www.zojjed.com, we write.
hit-count > 5   and website = “www.zojjed.com”
                                                   (hit-counts)

This returns the relation:
                                                  hit-count > 5
  website            date         hit-count         website                       date        hit-count

  www.zojjed.com     5/21/2007    6                 www.racewalk.com              5/20/2007   2019
                                                    www.greattreks.com            5/20/2007   1050
website = “www.zojjed.com”                          www.twofeetgallery.com        5/20/2007   32
  website             date            hit-count     www.walkinghealthy.com        5/20/2007   159
  www.zojjed.com      5/20/2007       5             www.zojjed.com                5/21/2007   6
  www.zojjed.com      5/21/2007       6             www.cs.drexel.edu/~jsalvage   5/20/2007   376
  www.zojjed.com      5/22/2007       5             www.racewalk.com              5/21/2007   2099
                      Database Systems – Set Theory
The Project Operation
unary

returns arguments in relation without all attributes

duplicates are removed

 - represent project operation - pi
 <attribute list> (R)

website, category(Websites)‫‏‬


  website                              category
  www.zojjed.com                       Fiction
  www.racewalk.com                     Health
  www.greattreks.com                   Travel
  www.twofeetgallery.com               Photographs
  www.walkinghealthy.com               Health
  www.cs.drexel.edu/~jsalvage          Education
                       Database Systems – Set Theory
Composition of Relational Operations

Often we need to combine operations. Often we wish to select a set of tuples and limit
the relation returned to a few attributes.

What if we want to find out only the websites that have had greater than 1000 hits in a
given day?

First we must find out what tuples have hit counts greater than 1000.
We can accomplish this with the following relational query:

               (hit-counts)‫‏‬
 hit-count>1000

By using the Project operation we can remove the extra attributes such as hit-count
and date and only return the values in the website column.

website(               (hit-counts))‫‏‬
          hit-count>1000

What is the relation that is returned?
                       Database Systems – Set Theory
What if we want to find out only the websites that have had greater than 1000 hits in a
single day?

website(               (hit-counts))‫‏‬
          hit-count>1000

What is the relation that is returned?
                                         hit-counts relation
  website                                 website                       date        hit-count

  www.racewalk.com                        www.zojjed.com                5/20/2007   5

  www.greattreks.com                      www.racewalk.com              5/20/2007   2019
                                          www.greattreks.com            5/20/2007   1050

                                          www.twofeetgallery.com        5/20/2007   32
                                          www.walkinghealthy.com        5/20/2007   159
                                          www.zojjed.com                5/21/2007   6
                                          www.zojjed.com                5/22/2007   5
                                          www.cs.drexel.edu/~jsalvage   5/20/2007   376
                                          www.racewalk.com              5/21/2007   2099
                       Database Systems – Set Theory
Union Operator
binary
 - union operator

It is often useful to combine the results of queries.

Again, remember that set theory removes duplicates.

Relation 1  Relation 2 = Result Set
                            Database Systems – Set Theory
What is a query that returns all websites that have customers OR a hit count greater
than 1000?

We need information from both the customers relation as well as the hit count relation.

First we need the names of all websites that have customers

website(customers)‫‏‬                customers Relation
 website                             website                    first-name   last-name

 www.drexel.edu/~jsalvage            www.zojjed.com             Derek        Jeter

 www.racewalk.com                    www.zojjed.com             Chase        Utley

 www.zojjed.com                      www.drexel.edu/~jsalvage   Jeremy       Johnson

                                     www.racewalk.com           Ryan         Howard
                                     www.zojjed.com             Ryan         Howard
                        Database Systems – Set Theory
Then we need the names of the websites that have a hit count greater than 1000:
website(hit-count>1000 (hit-counts))
                                         hit-counts relation
                                         website                       date        hit-count
 website
                                         www.zojjed.com                5/20/2007   5
 www.racewalk.com
                                         www.racewalk.com              5/20/2007   2019
 www.greattreks.com
                                         www.greattreks.com            5/20/2007   1050

                                         www.twofeetgallery.com        5/20/2007   32
                                         www.walkinghealthy.com        5/20/2007   159
                                         www.zojjed.com                5/21/2007   6
                                         www.zojjed.com                5/22/2007   5
                                         www.cs.drexel.edu/~jsalvage   5/20/2007   376
                                         www.racewalk.com              5/21/2007   2099
                      Database Systems – Set Theory
Combine the results using a union operation
website(customers)  website(                 (hit-counts))
                                  hit-count>1000
 website
 www.drexel.edu/~jsalvage
 www.racewalk.com
 www.greattreks.com
 www.zojjed.com


website(hit-count>1000 (hit-counts))‫‏‬            website(customers)‫‏‬
 website                                            website
 www.racewalk.com                                   www.drexel.edu/~jsalvage
 www.greattreks.com                                 www.racewalk.com
                                                    www.zojjed.com

Remember, order is not important!

• Unions MUST be of similar types
• They MUST have the same number of attributes
• The domains of the attributes MUST be the same
                      Database Systems – Set Theory
Intersection Operator
binary
∩ - intersection operator

Returns all tuples contained within both relations

Relation 1 ∩ Relation 2 = Result Set
                      Database Systems – Set Theory
What is a query that returns all websites that have customers AND a hit count greater
than 1000?

First we need the names of all websites that have customers
website(customers)‫‏‬

            website
            www.drexel.edu/~jsalvage
            www.racewalk.com
            www.zojjed.com

Then we need the names of the websites that have a hit count greater than 1000:
website(               (hit-counts))
          hit-count>1000

            website
            www.racewalk.com
            www.greattreks.com
                       Database Systems – Set Theory
What is a query that returns all websites that have customers AND a hit
count greater than 1000?

website(customers) ∩ website(hit-count>1000 (hit-counts))


            website
            www.racewalk.com




website(hit-count>1000 (hit-counts))‫‏‬        website(customers)‫‏‬
  website                                       website
  www.racewalk.com                              www.drexel.edu/~jsalvage
  www.greattreks.com                            www.racewalk.com
                                                www.zojjed.com
                       Database Systems – Set Theory
The Set Difference Operation (MINUS)‫‏‬
binary
-, denotes set difference
Relation 1 - Relation 2 = Result Set

Finds tuples in one set but not in another
r – s, produces a set containing those tuples in r but not in s, operand order is
significant.
                      Database Systems – Set Theory
Produce a list of websites who have a hit count > 1000 and do not have a
customer.

We need the names of all websites that have customers
website(customers)‫‏‬
            website
            www.drexel.edu/~jsalvage
            www.racewalk.com
            www.zojjed.com


Then we need the names of the websites that have a hit count greater than 1000:
website(               (hit-counts))
          hit-count>1000

            website
            www.racewalk.com
            www.greattreks.com


 website(hit-count>1000 (hit-counts)) - website(customers)‫‏‬
                       Database Systems – Set Theory
Produce a list of websites who have a hit count > 1000 and do not have a customer.

website(hit-count>1000 (hit-counts)) - website(customers)‫‏‬

 website
 www.greattreks.com




website(hit-count>1000 (hit-counts))‫‏‬                 website(customers)‫‏‬
  website                                                website
  www.racewalk.com                                       www.drexel.edu/~jsalvage
  www.greattreks.com                                     www.racewalk.com
                                                         www.zojjed.com



Notice the attributes in R1 that are not in R2 are included in the result set, but the
attributes in R2 that are not in R1 are not included in the result set, i.e. operand order
is significant.
                       Database Systems – Set Theory
Given the relation websites below:


  website                              organization         first-year   category
  www.zojjed.com                       Walking Promotions   2006         Fiction
  www.racewalk.com                     Walking Promotions   1995         Health
  www.greattreks.com                   Walking Promotions   2006         Travel
  www.twofeetgallery.com               Walking Promotions   2004         Photographs
  www.walkinghealthy.com               Walking Promotions   2002         Health
  www.cs.drexel.edu/~jsalvage          Drexel University    2005         Education


What is the result of the following?

website(empty set) - website(websites)‫‏‬
                       Database Systems – Set Theory
Given the relation websites below:


  website                              organization         first-year   category
  www.zojjed.com                       Walking Promotions   2006         Fiction
  www.racewalk.com                     Walking Promotions   1995         Health
  www.greattreks.com                   Walking Promotions   2006         Travel
  www.twofeetgallery.com               Walking Promotions   2004         Photographs
  www.walkinghealthy.com               Walking Promotions   2002         Health
  www.cs.drexel.edu/~jsalvage          Drexel University    2005         Education


What is the result of the following?

website(empty set) - website(websites)‫‏‬


The result is the empty set, because no records are contained in “R1” and only the
records in R1 that are not in R2 are returned from the MINUS operator.
                       Database Systems – Set Theory

Given the relation websites below:

  website                              organization         first-year   category
  www.zojjed.com                       Walking Promotions   2006         Fiction
  www.racewalk.com                     Walking Promotions   1995         Health
  www.greattreks.com                   Walking Promotions   2006         Travel
  www.twofeetgallery.com               Walking Promotions   2004         Photographs
  www.walkinghealthy.com               Walking Promotions   2002         Health
  www.cs.drexel.edu/~jsalvage          Drexel University    2005         Education


What is the result of the following?

website(websites) - website(empty set)‫‏‬
                      Database Systems – Set Theory

The result of website(websites) - website(empty set) is the complete relation
websites, since no records are contained in the empty set all records from the
websites relation are included in the result set.


 website
 www.zojjed.com
 www.racewalk.com
 www.greattreks.com
 www.twofeetgallery.com
 www.walkinghealthy.com
 www.cs.drexel.edu/~jsalvage
                       Database Systems – Set Theory

The Cartesian-Product Operation
binary

x – combines information in two relations

Relation 1 x Relation 2 = Result Set

because attributes can be repeated in different relations, a notation relation.attribute
will be used.

Therefore, the resulting schema of r = websites x customers is:

(websites.website, websites.organization, websites.first-year, websites.category,
customers.website, customers.first-name, customers.last-name)‫‏‬

Note this does not support cases where you wish to use the same relation twice; we
will address this with the rename operation shortly.

What tuples exist in r if r = websites x customers?

The combination of all tuples in websites with every tuple in customers.

Given r1 with n1 tuples and r2 with n2 tuples then r1xr2 has n1*n2 tuples
                       Database Systems – Set Theory

Let’s look at a simplified example first.
                                             Then R1 x R2 contains the
If relation R1 contains the following:       following:

                                               R1.Value 1    R2.Value 2
            Value1
                                               1             A
            1
                                               1             B
            2
                                               1             C
            3
                                               2             A
and if relation R2 contains the following:     2             B

            Value2                             2             C

            A                                  3             A
            B                                  3             B
            C                                  3             C
                              Database Systems – Set Theory

Similarly, websites x customers appears as follows:

websites.website     organization         first-   category   customers.website          first-name   last-name
                                          year
www.zojjed.com       Walking Promotions   2006     Fiction    www.zojjed.com             Derek        Jeter

www.zojjed.com       Walking Promotions   2006     Fiction    www.zojjed.com             Chase        Utley

www.zojjed.com       Walking Promotions   2006     Fiction    www.drexel.edu/~jsalvage   Jeremy       Johnson

www.zojjed.com       Walking Promotions   2006     Fiction    www.racewalk.com           Ryan         Howard
www.zojjed.com       Walking Promotions   2006     Fiction    www.zojjed.com             Ryan         Howard
www.racewalk.com     Walking Promotions   1995     Health     www.zojjed.com             Derek        Jeter

www.racewalk.com     Walking Promotions   1995     Health     www.zojjed.com             Chase        Utley

www.racewalk.com     Walking Promotions   1995     Health     www.drexel.edu/~jsalvage   Jeremy       Johnson

www.racewalk.com     Walking Promotions   1995     Health     www.racewalk.com           Ryan         Howard

www.racewalk.com     Walking Promotions   1995     Health     www.zojjed.com             Ryan         Howard

www.greattreks.com   Walking Promotions   2006     Travel     www.zojjed.com             Derek        Jeter

www.greattreks.com   Walking Promotions   2006     Travel     www.zojjed.com             Chase        Utley

www.greattreks.com   Walking Promotions   2006     Travel     www.drexel.edu/~jsalvage   Jeremy       Johnson

www.greattreks.com   Walking Promotions   2006     Travel     www.racewalk.com           Ryan         Howard

www.greattreks.com   Walking Promotions   2006     Travel     www.zojjed.com             Ryan         Howard

.                    .                    .        .          .                          .            .
.                    .                    .        .          .                          .            .
.                    .                    .        .          .                          .            .
                           Database Systems – Set Theory

What if we want to find all the customers who bought from a website created before
the year 2000?

We could try the following:

first-year <2000 (websites x customers)‫‏‬

Note that we are not using a projection to reduce the number of attributes to show
what is really happening. In the end, you would use a projection to show only the
fields requested by the query.

websites.website   organization         first-year   category   customers.website          first-name   last-name
www.racewalk.com   Walking Promotions   1995         Health     www.zojjed.com             Derek        Jeter

www.racewalk.com   Walking Promotions   1995         Health     www.zojjed.com             Chase        Utley

www.racewalk.com   Walking Promotions   1995         Health     www.drexel.edu/~jsalvage   Jeremy       Johnson

www.racewalk.com   Walking Promotions   1995         Health     www.racewalk.com           Ryan         Howard

www.racewalk.com   Walking Promotions   1995         Health     www.zojjed.com             Ryan         Howard




Oops, too many tuples!
                            Database Systems – Set Theory

Because the Cartesian-product pairs all possible tuples, all tuples from websites are
combined with all tuples from customers. While only those with the first-year < 2000
are selected, it still returns 5 tuples.

Of those sets, we only want the ones where the websites relation’s website attribute
equals the customers relation’s website attribute.
websites.website   organization         first-year   category   customers.website          first-name   last-name
www.racewalk.com   Walking Promotions   1995         Health     www.zojjed.com             Derek        Jeter

www.racewalk.com   Walking Promotions   1995         Health     www.zojjed.com             Chase        Utley

www.racewalk.com   Walking Promotions   1995         Health     www.drexel.edu/~jsalvage   Jeremy       Johnson

www.racewalk.com   Walking Promotions   1995         Health     www.racewalk.com           Ryan         Howard

www.racewalk.com   Walking Promotions   1995         Health     www.zojjed.com             Ryan         Howard



The only tuple we truly want is the highlighted tuple.
we can write this as follows:

websites.website = customers.website(first-year <2000 (websites x customers))‫‏‬
                           Database Systems – Set Theory

Since,
websites.website = customers.website(first-year <2000 (websites x customers))‫‏‬

Returns the following tuple with too many attributes, we must also use a
projection to remove the excessive attributes.

websites.website   organization         first-year   category   customers.website   first-name   last-name
www.racewalk.com   Walking Promotions   1995         Health     www.racewalk.com    Ryan         Howard



Applying the projection of first-name, last name to the previous query gives us the
following query:

first-name, last-name(websites.website = customers.website(first-year <2000 (websites
x customers)))‫‏‬
                     Database Systems – Set Theory

The Assignment Operator
unary
allows an expression to be assigned to a variable
NewRelation  OldRelation

For example:

1200loans  amount > 1200(loan)‫‏‬
or
result  loan-number(1200loans)‫‏‬

Or

The Rename Operation
Unary

x(E) renames the expression E to x.

Relational-algebra expressions do not have a name that we can refer to them by using
the rename operator,

 is roh.
                      Database Systems – Set Theory

Example, without using an aggregation function (not yet shown), find the largest hit
   count of any website for a single day. If the same max hit count exists more than
   once, you are only allowed to return a single tuple containing the answer.

We accomplish this in two steps:

First, compute a temporary relation consisting of hit counts less than the largest hit
    count.
Second, take the set difference (minus) between the relation hit-count(hit-counts)
    and the temporary relation

To accomplish the first step, compute all the websites hit counts compared to all the
   websites hit counts, in other words compute the Cartesian product of the relation
   hit-counts with itself.

hit-counts x hit-counts

However, we must rename one of the hit-counts relations so that we can identify the
  balance distinctly
                          Database Systems – Set Theory

Given the projection of only the hit-count field from the relation hit-counts via

hit-count(hit-counts)‫‏‬
                              If we rename the result of this projection
We have:                      d (hit-count(hit-counts))‫‏‬


  hit-count
                              And thus create a cross product of the two relations as
  5
                              hit-count(hit-counts) x d (hit-count(hit-counts))‫‏‬
  2019
  1050
  32
  159
  6
  376
  2099
                              Database Systems – Set Theory

hit-count(hit-counts) x d (hit-count(hit-counts))‫‏‬

 hit-count   d(hit-count)‫‏‬       hit-count   d(hit-count)‫‏‬   hit-count   d(hit-count)‫‏‬

                                  5           1050             5           159
 5           5
                                  2019        1050             2019        159
 2019        5
                                  1050        1050             1050        159
 1050        5
                                  32          1050             32          159
 32          5
                                  159         1050             159         159
 159         5
                                  6           1050             6           159
 6           5
                                  376         1050             376         159
 376         5
                                  2099        1050             2099        159
 2099        5
                                  5           32               5           6
 5           2019
                                  2019        32               2019        6
 2019        2019
                                  1050        32               1050        6
 1050        2019
                                  32          32               32          6
 32          2019
                                  159         32               159         6
 159         2019
                                  6           32               6           6
 6           2019
                                  376         32               376         6
 376         2019
                                  2099        32               2099        6
 2099        2019
                              Database Systems – Set Theory

hit-count(hit-counts) x d (hit-count(hit-counts))‫‏‬

 hit-count   d(hit-count)‫‏‬

 5           376
 2019        376
 1050        376
 32          376
 159         376
 6           376
 376         376
 2099        376
 5           2099

 2019        2099

 1050        2099

 32          2099

 159         2099

 6           2099

 376         2099

 2099        2099
                               Database Systems – Set Theory

Now we select only those tuples that have the first attribute containing a
  value less than the second attribute, we do so with the following query:
hitcounts.hit-count < d.hit-count (hit-count(hit-counts) x d (hit-count(hit-counts)))‫‏‬
  hit-count   d(hit-count)‫‏‬         hit-count   d(hit-count)‫‏‬   hit-count   d(hit-count)‫‏‬
                                     5           1050             5           159
  5           5
                                     2019        1050             2019        159
  2019        5
                                     1050        1050             1050        159
  1050        5
                                     32          1050             32          159
  32          5
                                     159         1050             159         159
  159         5
                                     6           1050             6           159
  6           5
                                     376         1050             376         159
  376         5
                                     2099        1050             2099        159
  2099        5
                                     5           32               5           6
  5           2019
                                     2019        32               2019        6
  2019        2019
                                     1050        32               1050        6
  1050        2019
                                     32          32               32          6
  32          2019
                                     159         32               159         6
  159         2019
                                     6           32               6           6
  6           2019
                                     376         32
  376         2019                                                376         6
                                     2099        32
  2099        2019                                                2099        6
                              Database Systems – Set Theory

hit-count(hit-counts) x d (hit-count(hit-counts))‫‏‬

 hit-count   d(hit-count)‫‏‬

 5           376
 2019        376
 1050        376
 32          376
 159         376
 6           376
 376         376
 2099        376
 5           2099

 2019        2099

 1050        2099

 32          2099

 159         2099

 6           2099

 376         2099

 2099        2099
                       Database Systems – Set Theory

This certainly gives us a lot of tuples, but if we then project just the hit-count from the
   first column and remove the duplicates, we are left with the following:


   hit-count
   5
   2019
   1050
   32
   159
   6
   376


This is the set containing most hit counts, but it does not include the largest hit count.
                         Database Systems – Set Theory

To get just the largest hit count we now simply subtract our result set from the
   projection of the original hit count relation as follows:


hit-count(hit-counts) - hit-count(hitcounts.hit-count < d.hit-count (hit-count(hit-counts) x
   d (hit-count(hit-counts))))‫‏‬


    hit-count
    2099
                      Database Systems – Set Theory

We need a better way to represent certain queries because the notation for joining two
  relations and only selecting records where the attributes match is too cumbersome.
  Therefore we have:

The Natural Join Operation
Binary

Result Set = R1 |x| R2

The natural join operation finds the Cartesian product of two relations, but only returns
   tuples where the attributes whose names are the same in both relations contain the
   same values.
                       Database Systems – Set Theory

Let’s look at a simplified example first.

If relation R1 contains the following:      and if relation R2 contains the following:


            Value1     Value2                Value2     Value3
            1          X                     X          A
            2          Y                     Z          B
            3          Z                     A          C
                    Database Systems – Set Theory
Then R1 x R2 contains the following:

  R1.Value 1    R1.Value 2   R2.Value2   R2.Value3

  1             X            X           A
  1             X            Z           B
  1             X            A           C
  2             Y            X           A
  2             Y            Z           B
  2             Y            A           C
  3             Z            X           A
  3             Z            Z           B
  3             Z            A           C

Then R1 |x| R2 contains the following:

   R1.Value 1   R1.Value 2   R2.Value2   R2.Value3

   1            X            X           A
   3            Z            Z           B
                        Database Systems – Set Theory

Example:

Find the names of all customers who have made a purchase from a health or travel
   website. Return the name of the customer, the website, and the category of the
   website.

The old way:

Form a Cartesian product of the websites and customers relations.

Select the tuples of the same website as well as a category equal to “health” or
   “travel.”

Project the first-name, last-name, website, and category

   first-name, last-name, website, category
(
  websites.website = customers.website and
                                           (websites x customers))‫‏‬
category = “Health” or category = “Travel”
                      Database Systems – Set Theory

Another example:

Find all the names of websites and the dates they have a hit count for web sites that
   are in the health category.
                               Database Systems – Set Theory

Another example:

Find all the web site names and the dates that have hit counts for web sites in the
   health category.

   website, date   (category = “Health” (websites |x| hit-counts))‫‏‬
                          Database Systems – Set Theory

Generalized Projections

Allows basic arithmetic operations within fields of a tuple
Observe the Sales relation:

 product                              first-name      last-name        tax    total-cost
 Zojjed!                              Derek           Jeter            1.00   17.95
 Zojjed!                              Chase           Utley            1.00   17.95
 VB .NET Coach                        Jeremy          Johnson          0      54.95
 Race Walk Like A Champion            Ryan            Howard           1.25   25.95
 Zojjed!                              Ryan            Howard           0      16.95


What was the price of the product sold minus the tax paid?

 product, first-name, last-name, (total-cost – tax) as net-pay (Sales)‫‏‬
                       Database Systems – Set Theory

Aggregate Functions

takes a collection of values and returns a single value as a result

i.e
sum {1, 1, 3, 4, 4, 11} returns the value 24.
avg {1, 1, 3, 4, 4, 11} returns the value 4.
count {1, 1, 3, 4, 4, 11} returns the value 6.
min {1, 1, 3, 4, 4, 11} returns the value 1.
max {1, 1, 3, 4, 4, 11} returns the value 11.
count-distinct {1, 1, 3, 4, 4, 11} returns the value 4.

Ignore the fact that we said sets can’t contain duplicate values
                       Database Systems – Set Theory

Operations to Modify the Contents of Relations

Deletion
rr–E
This uses set difference (minus) and variable assignment.

Delete all of the sale of “Zojjed!” from the Sales relation
sales  sales - product = “Zojjed!” (sales)‫‏‬

Delete all sales with no tax collected
sales  sales - tax = 0 (sales)‫‏‬


Insertion
rrE
This uses union and variable assignment.

Add a record to the sales relation
sales  sales  {(“I Walk to Eat”, “Chase”, “Utley”, 0, 15.95)}

Add a record to the websites relation
websites  websites  {(“www.mediatitan.net”, “Walking Promotions”, 2006,
   “Fiction”)}
                       Database Systems – Set Theory

Updating
Zero all tax attributes in the sales relation

sales   product, first-name, last-name, 0, total-cost(sales)‫‏‬
                       Database Systems – Set Theory

Joins
There are other forms of joins. Let’s look at the following two simple relations:


      employee-name        city                 employee-name        team
      Jeter                New York City        Glavine              Mets
      Howard               Philadelphia         Howard               Phillies
      Utley                Philadelphia         Bonds                Giants
      Schilling            Boston               Schilling            Choke Sox
     cities relation                          teams relation


     employee-name       city              employee-name            team
     Howard              Philadelphia      Howard                   Phillies
     Schilling           Boston            Schilling                Choke Sox

Natural Join -> cities |x| teams
The natural join omits records that do not match, so we do not have
records for Jeter, Utley, Glavine, or Bonds.
                      Database Systems – Set Theory

  employee-name        city                   employee-name    team
  Jeter                New York City          Glavine          Mets
  Howard               Philadelphia           Howard           Phillies
  Utley                Philadelphia           Bonds            Giants
  Schilling            Boston                 Schilling        Choke Sox

    cities relation                           teams relation

  employee-name       city             employee-name           team
  Jeter               New York City    Null                    Null
  Howard              Philadelphia     Howard                  Phillies
  Utley               Philadelphia     Null                    Null
  Schilling           Boston           Schilling               Choke Sox


Left Outer Join -> cities LOJ teams
Includes all records from the left and only those records on the right
that match
                      Database Systems – Set Theory

  employee-name city                     employee-name     team
  Jeter               New York City      Glavine           Mets
  Howard              Philadelphia       Howard            Phillies
  Utley               Philadelphia       Bonds             Giants
  Schilling           Boston             Schilling         Choke Sox

    cities relation                      teams relation

  employee-name       city            employee-name        team
  Null                Null            Glavine              Mets
  Howard              Philadelphia    Howard               Phillies
  Null                Null            Bonds                Giants
  Schilling           Boston          Schilling            Choke Sox


Right Outer Join -> cities ROJ teams
Includes all records from the right and only those records on the left
that match
                      Database Systems – Set Theory

  employee-name city                     employee-name    team
  Jeter               New York City      Glavine          Mets
  Howard              Philadelphia       Howard           Phillies
  Utley               Philadelphia       Bonds            Giants
  Schilling           Boston             Schilling        Choke Sox

    cities relation                      teams relation

  employee-name       city            employee-name        team
  Null                Null            Glavine              Mets
  Howard              Philadelphia    Howard               Phillies
  Null                Null            Bonds                Giants
  Schilling           Boston          Schilling            Choke Sox
  Jeter               New York City   Null                 Null
  Utley               Philadelphia    Null                 Null

Full Outer Join -> cities FOJ teams
Includes all records from the right and left, tuples that do not match
have nulls in their place.
                Database Systems – Set Theory
NULLS

And:
true and unknown = unknown
false and unknown = false
unknown and unknown = unknown

Or
true or unknown = true
false or unknown = unknown
unknown or unknown = unknown

Not
not unknown = unknown
                     Database Systems – Set Theory
REFERENTIAL INTEGRITY

Superkey of R
A unique identifier for a tuple.

A set of attributes SK of R such that no two tuples in any valid relation
instance r(R) will have the same value for SK. That is, for any distinct
tuples t1 and t2 in r(R), t1[SK]  t2[SK].

Key of R
A "minimal" superkey; that is, a superkey K such that removal of any
attribute from K results in a set of attributes that is not a superkey, i.e. every
attribute of K is needed to maintain its superkey status.

    Example: The CAR relation schema

    CAR(State, Reg#, SerialNo, Make, Model, Year) has two keys

    Key1 = {State, Reg#}
    Key2 = {SerialNo}

    Both are superkeys. {SerialNo, Make} is a superkey but not a key.

If a relation has several candidate keys, one is chosen arbitrarily to be the
primary key. The primary key attributes are underlined.
                     Database Systems – Set Theory
REFERNTIAL INTEGRITY

Relational Database Schema

A set S of relation schemas that belong to the same database. S is the name
of the database.

S = {R1, R2, ..., Rn}

Entity Integrity:

The primary key attributes PK of each relation schema R in S cannot have
null values in any tuple of r(R). This is because primary key values are used
to identify the individual tuples.

t[PK]  null for any tuple t in r(R)‫‏‬

Note, other attributes of R may be similarly constrained to disallow null
values, even though they are not members of the primary key.
                        Database Systems – Set Theory
Referential Integrity

A constraint involving two relations (the previous constraints involve a single
relation).

Used to specify a relationship among tuples in two relations: the referencing
relation and the referenced relation.

Tuples in the referencing relation R1 have attributes FK (called foreign key
attributes) that reference the primary key attributes PK of the referenced
relation R2.

A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].

A referential integrity constraint can be displayed in a relational database
schema as a directed arc from R1.FK to R2.

				
DOCUMENT INFO