Docstoc

ra

Document Sample
ra Powered By Docstoc
					IELM 511: Information System design

     Introduction

     Part 1. ISD for well structured data – relational and other DBMS
          Info storage (modeling, normalization)
          Info retrieval (Relational algebra, Calculus, SQL)
          DB integrated API’s

     ISD for systems with non-uniformly structured data
          Basics of web-based IS (www, web2.0, …)
          Markup’s, HTML, XML
          Design tools for Info Sys: UML

     Part III: (one out of)
           API’s for mobile apps
           Security, Cryptography
           IS product lifecycles
           Algorithm analysis, P, NP, NPC
Agenda


 Relational Algebra

 Relational Calculus

 Structured Query Language (SQL)

 DB API’s
Recall our Bank DB design
BRANCH( b-name, city, assets)

CUSTOMER( cssn, c-name, street, city, banker, banker-type)

LOAN( l-no, amount, br-name)

PAYMENT( l-no, pay-no, date, amount)

EMPLOYEE( e-ssn, e-name, tel, start-date, mgr-ssn)
                                                                 1

ACCOUNT( ac-no, balance)

SACCOUNT( ac-no, int-rate)
                                                                     n
                                                     n                   1       n
CACCOUNT( ac-no, od-amt)                        n        m
                                                                     m



BORROWS( cust-ssn, loan-num)                                                 n
                                                1            1
DEPOSIT( c-ssn, ac-num, access-date)                         n


DEPENDENT( emp-ssn, dep-name)
Background: Algebra


 What is an algebra ?

    Study of systems of mathematical objects and operations defined on the objects



  Examples of algebras:

     Integers, with operations: +, -, ×, /, % …

     Real numbers, with operations: +, -, ×, /, …

     Vectors, with operations: +, -, , ×, …

     Booleans, with operations: , , , …
Relational Algebra

  Relational Algebra:
   objects: instances of relational schemas (namely, tables)
   operations: s, P, ×, set-theoretic operations: , , -, ‚

  Key concepts:
  Operator arguments:
   Arguments of operators are instances of schemas (table)

  Operation closure:
   The outcome of the operator is an instance of schema

  Expressions:
    A sequence of operations can be written as an expression

  Operator precedence:
    The sequence of application of operations in an expression is fixed.
                           Compare these concepts to those in other algebras
  Relational Algebra: select, s

      Notation: in remainder, we will refer to an instance of a schema as a table


      s : unary operator, input: one table; output: table


LOAN
loan_number   amount   branch_name
L17           1000     Downtown                  s[amount > 1200](LOAN)
L23           2000     Redwood
L15           1500     Pennyridge
L93           500      Mianus
                                                  loan_number   amount    branch_name
L11           900      Round Hill
                                                  L23           2000      Redwood
L16           1300     Pennyridge
                                                  L15           1500      Pennyridge
                                                  L16           1300      Pennyridge
  Relational Algebra: select, s
       q conditions of s operator:
          - Denote the criterion for selection of a given tuple
          - Must be evaluated one tuple at a time
          - Must evaluate to ‘true’ or ‘false’
          - Output = set of tuples for which q-conditions are ‘true’


LOAN
loan_number   amount   branch_name     s[(amount > 1200)  (branch_name = ‘Pennyridge’)]
L17           1000     Downtown        (LOAN)
L23           2000     Redwood
L15           1500     Pennyridge                  loan_number     amount      branch_name
L93           500      Mianus                      L15             1500        Pennyridge
L11           900      Round Hill                  L16             1300        Pennyridge
L16           1300     Pennyridge
  Relational Algebra: project, P

      P : unary operator, input: one table; output: table

      P[list of attributes] (TABLE)

                                          P[loan_number, amount] (LOAN)

LOAN
loan_number   amount     branch_name                 loan_number   amount
L17           1000       Downtown                    L17           1000
L23           2000       Redwood                     L23           2000
L15           1500       Pennyridge                  L15           1500
L93           500        Mianus                      L93           500
L11           900        Round Hill                  L11           900
L16           1300       Pennyridge                  L16           1300
  Relational Algebra: project, P

      Project returns a set of tuples; the number of rows may be smaller that input


      Example: Find the names of all branches that have given loans

                                             P[branch_name] (LOAN)

LOAN
loan_number   amount   branch_name                 branch_name
L17           1000     Downtown                    Downtown
L23           2000     Redwood                     Redwood
L15           1500     Pennyridge                  Pennyridge
L93           500      Mianus                      Mianus
L11           900      Round Hill                  Round Hill
L16           1300     Pennyridge
  Relational Algebra: combining operations

      Example: Find the names of all branches that have given loans larger than 1200




                                              P[branch_name] ( s[(amount > 1200) ] (LOAN))
LOAN
loan_number    amount   branch_name
L17            1000     Downtown
                                                            branch_name
L23            2000     Redwood
                                                            Redwood
L15            1500     Pennyridge
                                                            Pennyridge
L93            500      Mianus
L11            900      Round Hill
L16            1300     Pennyridge
  Relational Algebra: combining operations
      Note: expressions impose a sequence in which operations are perfromed

      Example: Find the names of all branches that have given loans larger than 1200

                        X = ( s[(amount > 1200) ] (LOAN))          Y = P[branch_name] (X)



LOAN                                   X                                      Y
loan_number   amount    branch_name     loan_number   amount   branch_name    branch_name
L17           1000      Downtown        L23           2000     Redwood        Redwood
L23           2000      Redwood         L15           1500     Pennyridge     Pennyridge
L15           1500      Pennyridge      L16           1300     Pennyridge
L93           500       Mianus
L11           900       Round Hill
L16           1300      Pennyridge
Relational Algebra: join, ×

 Join is useful when the information required is in two (or more) tables.

 Tables are sets of tuples, and
 the join of two tables produces a cartesian product of the two sets

  Background (set theory): cartesian product, A × B = { (x, y) | x  A, y  B}

  Example:
  A = { 1, 2, 3 }, B = { a, s}

  A × B = { (1, a), (1, s), (2, a), (2, s), (3, a), (3, s) }
  Relational Algebra: join, ×
BORROWS                 LOAN
customer      loan_no    loan_number   amount         branch_name
111-12-0000   L17        L17           1000           Downtown
222-12-0000   L23        L23           2000           Redwood
333-12-0000   L15        L15           1500           Pennyridge
444-00-0000   L93        L93           500            Mianus
666-12-0000   L17        L11           900            Round Hill
111-12-0000   L11        L16           1300           Pennyridge
999-12-0000   L17
777-12-0000   L16
                                Cartesian product, BORROWS × LOAN                       5 columns

                                customer        loan_no        loan_number   amount   branch_name
                                111-12-0000     L17            L17           1000     Downtown
                                111-12-0000     L17            L23           2000     Redwood
                                111-12-0000     L17            L15           1500     Pennyridge
                    48 rows     …
                                …
                                777-12-0000     L16            L16           1300     Pennyridge
Relational Algebra: join, ×
 Usually, a cartesian product produces several tuples with un-related
 information.
 q-join specifies a q-condition (same as a selection criterion) to restrict
 the output of a join to meaningful tuples only.

  Example: Find the loan no, amount and branch name for all customers.


         BORROWS ×[loan_no = loan_number] LOAN                   5 columns
                  customer      loan_no   loan_number   amount     branch_name
                  111-12-0000   L17       L17           1000       Downtown
                  222-12-0000   L23       L23           2000       Redwood
8 rows [Why ?]    333-12-0000   L15       L15           1500       Pennyridge
                  444-00-0000   L93       L93           500        Mianus
                  666-12-0000   L17       L17           1000       Downtown
                  111-12-0000   L11       L11           900        Round Hill
                  999-12-0000   L17       L17           1000       Downtown
                  777-12-0000   L16       L16           1300       Pennyridge
Relational Algebra: dot-notation in join, ×
 Two tables being joined may have the same attribute name
 (possibly denoting two different things). To distinguish the columns
 in the q-join, the names of attributes use dot-notation

 The following are all equivalent:

       C = BORROWS ×[loan_no = loan_number] LOAN



       C = BORROWS ×[BORROWS.loan_no = LOAN.loan_number] LOAN



       A = BORROWS
       B = LOAN
       C = A ×[A.loan_no = B.loan_number] B
Relational Algebra: set theoretic operations, 

 Since a table is a set of tuples, it is possible to make a union of two
 tables.
 BUT: we require closure (union of two tables should be a table).
           Union is defined for two tables with identical schemas.

 Example: Find the names of customers who have either a deposit, or a loan
 with the bank
                                                                           name
 A = P[customer] (BORROWS)  P[c_ssn] (DEPOSIT)
                                                                           Jones
                                                                           Smith
 RESULT = P[name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER )
                                                                           Hayes
                                                                           Curry
                                                                           Turner
                                                                           Williams
                                                                           Adams
                                                                           Johnson
                                                                           Brooks
                                                                           Lindsay
Relational Algebra: set theoretic operations, 

 Other set theoretic operations can be applied with same rules.

 Example: Find the names of customers who have both, a deposit and a loan
 with the bank

 A = P[customer] (BORROWS)  P[c_ssn] (DEPOSIT)

 RESULT = P[name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER )
                                                                  RESULT
  c_ssn             customer          customer                    name
  888-12-0000      111-12-0000   =   111-12-0000                 Jones
  222-12-0000       222-12-0000       222-12-0000                 Smith
  333-12-0000       333-12-0000       333-12-0000                 Hayes
  555-00-0000       444-00-0000
  111-12-0000       666-12-0000
  000-12-0000       999-12-0000
                    777-12-0000
Relational Algebra: set theoretic operations, -

 Other set theoretic operations (same rules).

 Example: Find the names of customers who have a loan but no deposits.


 A = P[customer] (BORROWS) - P[c_ssn] (DEPOSIT)

 RESULT = P[name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER )
                                                                 RESULT
  c_ssn             customer          customer                    name
  888-12-0000   -   111-12-0000   =   888-12-0000                 Johnson
  222-12-0000       222-12-0000       555-12-0000                 Turner
  333-12-0000       333-12-0000       000-12-0000                 Lindsay
  555-00-0000       444-00-0000
  111-12-0000       666-12-0000
  000-12-0000       999-12-0000
                    777-12-0000
Relational Algebra: set theoretic operations, ÷
    Set division extends the meaning of integer division, in the sense that it
    ‘cancels away’ common multiples. It is useful in answering ‘for all’ queries.

    Example: Do all the loan officers have the same manager ?
    A solution: Find the ssn of the person who manages all the loan officers.

    A = P[banker] (s[b_type=LO] (CUSTOMER) )
    B = P[mgr_ssn, e_ssn] (EMPLOYEE)
    RESULT = B ÷ A

B   mgr_ssn          e_ssn                 A                                         RESULT
     321-32-4321     111-22-3333            banker
     111-22-3333     333-11-4444                                                     mgr_ssn
                                    ÷         333-11-4444
     111-22-3333     123-45-6789                                                      111-22-3333
                                             123-45-6789
     321-32-4321     555-66-8888
     888-99-9999     987-65-4321
     777-77-7777     888-99-9999
     777-77-7777     321-32-4321   Note: for this example, we have to specify that
              null   777-77-7777   the common divisor in B is e_ssn.
    Relational Algebra: set theoretic operations, ÷
      Generic definition of ÷

      Attribute restrictions:
      A ÷ B is defined only for A( R, C) and B( C), where R, C are sets of attributes.

      Output:
      The output contains each ti[R] such that
       tuples tj[C]  B,  a tuple, t  A in which t[C] = tj[C] and t[R] = ti[R].


                 common attribute set, C
                                                            OUTPUT
attribute set, R
                                                                r1…   rm
     r1…    rm     c1 …   ck               c1 …   ck

                                  ÷                    t1       …
                                                       …
                                                       tn

     …
Relational Algebra: concluding remarks

  RA provides a formal language to get information from the database

  RA can potentially answer any query, as long as the query pertains to
  exactly one row of some table derivable using expressions.

  Limitations of RA: aggregation and summary information
  Examples:
           find the average amount of assets in the branches
           find the total assets of the bank,
           …

  RA is procedural, namely, an expression of RA specifies a step by
  step procedure for computing the result.
Relational Calculus (RC)

  Background: what is a calculus ?

  RC is based on a formal system in logic, first order predicate calculus (fopc)

  A formal system has:
    a set of symbols;
    rules about how the symbols can be arranged in well formed formulae (wff)
    a (logical) mechanism to derive if a wff is true/false.
    additionally, fopc allows wff with ‘variables’ and quantifiers (, ).

   A query in RC takes the form:        {t | P(t) }

   Meaning: the set of all tuples, t, for which some Proposition, P(t) is true.
   P is also called a predicate.
Relational Calculus (RC) examples

  1. Report the loans that exceed $1200:

          { t | t  LOAN  t[amount] > 1200}


  2. Find the names of customers who took a loan from the Pennyridge branch.


          { t[name] |        s  BORROWS  s[customer] = t[ssn]
                              u  LOAN  u[loan_number] = s[loan_no]
                                 u[branch_name] = ‘Pennyridge’}
Relational Calculus (RC) remarks

 RC is non-procedural – any way that the predicate P can be evaluated
 is valid.

 RC is the formal basis for Structured Query Language (SQL)

 SQL is the de facto standard language for all RDBMSs

 In terms of functionality (i.e. the power to get some information from any
 DB) RA and RC are equivalent). Namely, any query that can be written in
 RC has an equivalent RA formula, and vice versa.

 Advantage of RC (over RA): conceptually, it is better to allow the user to
 define the logic of the query, but leave the procedure for computing it
 to the program [why ?].
    Bank tables..
BRANCH                                  EMPLOYEE
branch_name   city         assets       e_ssn         e_name   tel       start_date   mgr_ssn
Downtown      Brooklyn        9000000   111-22-3333   Jones      12345    Nov-2005    321-32-4321
Redwood       Palo Alto       2100000   333-11-4444   Smith      54321    Mar-1998    111-22-3333
Pennyridge    Horseneck       1700000   123-45-6789   Lee        54321    Mar-1998    111-22-3333
Mianus        Horseneck        400000   555-66-8888   Turner     55555    Aug-2002    321-32-4321
Round Hill    Horseneck       8000000   987-65-4321   Jones      87621    Mar-1995    888-99-9999
Pownal        Bennington       300000   888-99-9999   Chan       87654    Feb-1980    777-77-7777
North Town    Rye             3700000   321-32-4321   Adams      77777    Feb-1990    777-77-7777
Brighton      Brooklyn        7100000   777-77-7777   Black      99111    Jan-1980              null
CUSTOMER                                                                   DEPOSIT
ssn           name       street       city         banker         b_type   c_ssn         ac_num    accessDate
111-12-0000   Jones      Main         Harrison      321-32-4321     CRM    888-12-0000   A101      Jan 1, 09
222-12-0000   Smith      North        Rye           321-32-4321     CRM    222-12-0000   A215      Feb 1, 09
333-12-0000   Hayes      Main         Harrison      321-32-4321     CRM    333-12-0000   A102      Feb 28, 09
444-12-0000   Curry      North        Rye           333-11-4444      LO    555-00-0000   A305      Mar 10, 09
555-12-0000   Turner     Putnam       Stamford      888-99-9999      DO    888-12-0000   A201      Mar 1, 98
666-12-0000   Williams   Nassau       Princeton     333-11-4444      LO    111-12-0000   A217      Mar 1, 09
777-12-0000   Adams      Spring       Pittsfield    123-45-6789      LO    000-12-0000   A101      Feb 25, 09
888-12-0000   Johnson    Alma         Palo Alto     888-99-9999      DO
999-12-0000   Brooks     Senator      Brooklyn      123-45-6789      LO             BORROWS
000-12-0000   Lindsay    Park         Pittsfield    888-99-9999      DO              customer      loan_no
                                                                                     111-12-0000   L17
LOAN
loan_number   amount     branch_name                                                 222-12-0000   L23

L17           1000       Downtown                                                    333-12-0000   L15

L23           2000       Redwood                                                     444-00-0000   L93

L15           1500       Pennyridge                                                  666-12-0000   L17

L93           500        Mianus                                                      111-12-0000   L11

L11           900        Round Hill                                                  999-12-0000   L17

L16           1300       Pennyridge                                                  777-12-0000   L16
References and Further Reading




  Silberschatz, Korth, Sudarshan, Database Systems Concepts, McGraw Hill




                                                  Next: SQL and DB API’s

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:12
posted:9/28/2011
language:English
pages:27