VIEWS: 12 PAGES: 27 POSTED ON: 9/28/2011
IELM 511: Information System design Introduction Part 1. ISD for well structured data – relational and other DBMS Info storage (modeling, normalization) Info retrieval (Relational algebra, Calculus, SQL) DB integrated API’s ISD for systems with non-uniformly structured data Basics of web-based IS (www, web2.0, …) Markup’s, HTML, XML Design tools for Info Sys: UML Part III: (one out of) API’s for mobile apps Security, Cryptography IS product lifecycles Algorithm analysis, P, NP, NPC Agenda Relational Algebra Relational Calculus Structured Query Language (SQL) DB API’s Recall our Bank DB design BRANCH( b-name, city, assets) CUSTOMER( cssn, c-name, street, city, banker, banker-type) LOAN( l-no, amount, br-name) PAYMENT( l-no, pay-no, date, amount) EMPLOYEE( e-ssn, e-name, tel, start-date, mgr-ssn) 1 ACCOUNT( ac-no, balance) SACCOUNT( ac-no, int-rate) n n 1 n CACCOUNT( ac-no, od-amt) n m m BORROWS( cust-ssn, loan-num) n 1 1 DEPOSIT( c-ssn, ac-num, access-date) n DEPENDENT( emp-ssn, dep-name) Background: Algebra What is an algebra ? Study of systems of mathematical objects and operations defined on the objects Examples of algebras: Integers, with operations: +, -, ×, /, % … Real numbers, with operations: +, -, ×, /, … Vectors, with operations: +, -, , ×, … Booleans, with operations: , , , … Relational Algebra Relational Algebra: objects: instances of relational schemas (namely, tables) operations: s, P, ×, set-theoretic operations: , , -, ‚ Key concepts: Operator arguments: Arguments of operators are instances of schemas (table) Operation closure: The outcome of the operator is an instance of schema Expressions: A sequence of operations can be written as an expression Operator precedence: The sequence of application of operations in an expression is fixed. Compare these concepts to those in other algebras Relational Algebra: select, s Notation: in remainder, we will refer to an instance of a schema as a table s : unary operator, input: one table; output: table LOAN loan_number amount branch_name L17 1000 Downtown s[amount > 1200](LOAN) L23 2000 Redwood L15 1500 Pennyridge L93 500 Mianus loan_number amount branch_name L11 900 Round Hill L23 2000 Redwood L16 1300 Pennyridge L15 1500 Pennyridge L16 1300 Pennyridge Relational Algebra: select, s q conditions of s operator: - Denote the criterion for selection of a given tuple - Must be evaluated one tuple at a time - Must evaluate to ‘true’ or ‘false’ - Output = set of tuples for which q-conditions are ‘true’ LOAN loan_number amount branch_name s[(amount > 1200) (branch_name = ‘Pennyridge’)] L17 1000 Downtown (LOAN) L23 2000 Redwood L15 1500 Pennyridge loan_number amount branch_name L93 500 Mianus L15 1500 Pennyridge L11 900 Round Hill L16 1300 Pennyridge L16 1300 Pennyridge Relational Algebra: project, P P : unary operator, input: one table; output: table P[list of attributes] (TABLE) P[loan_number, amount] (LOAN) LOAN loan_number amount branch_name loan_number amount L17 1000 Downtown L17 1000 L23 2000 Redwood L23 2000 L15 1500 Pennyridge L15 1500 L93 500 Mianus L93 500 L11 900 Round Hill L11 900 L16 1300 Pennyridge L16 1300 Relational Algebra: project, P Project returns a set of tuples; the number of rows may be smaller that input Example: Find the names of all branches that have given loans P[branch_name] (LOAN) LOAN loan_number amount branch_name branch_name L17 1000 Downtown Downtown L23 2000 Redwood Redwood L15 1500 Pennyridge Pennyridge L93 500 Mianus Mianus L11 900 Round Hill Round Hill L16 1300 Pennyridge Relational Algebra: combining operations Example: Find the names of all branches that have given loans larger than 1200 P[branch_name] ( s[(amount > 1200) ] (LOAN)) LOAN loan_number amount branch_name L17 1000 Downtown branch_name L23 2000 Redwood Redwood L15 1500 Pennyridge Pennyridge L93 500 Mianus L11 900 Round Hill L16 1300 Pennyridge Relational Algebra: combining operations Note: expressions impose a sequence in which operations are perfromed Example: Find the names of all branches that have given loans larger than 1200 X = ( s[(amount > 1200) ] (LOAN)) Y = P[branch_name] (X) LOAN X Y loan_number amount branch_name loan_number amount branch_name branch_name L17 1000 Downtown L23 2000 Redwood Redwood L23 2000 Redwood L15 1500 Pennyridge Pennyridge L15 1500 Pennyridge L16 1300 Pennyridge L93 500 Mianus L11 900 Round Hill L16 1300 Pennyridge Relational Algebra: join, × Join is useful when the information required is in two (or more) tables. Tables are sets of tuples, and the join of two tables produces a cartesian product of the two sets Background (set theory): cartesian product, A × B = { (x, y) | x A, y B} Example: A = { 1, 2, 3 }, B = { a, s} A × B = { (1, a), (1, s), (2, a), (2, s), (3, a), (3, s) } Relational Algebra: join, × BORROWS LOAN customer loan_no loan_number amount branch_name 111-12-0000 L17 L17 1000 Downtown 222-12-0000 L23 L23 2000 Redwood 333-12-0000 L15 L15 1500 Pennyridge 444-00-0000 L93 L93 500 Mianus 666-12-0000 L17 L11 900 Round Hill 111-12-0000 L11 L16 1300 Pennyridge 999-12-0000 L17 777-12-0000 L16 Cartesian product, BORROWS × LOAN 5 columns customer loan_no loan_number amount branch_name 111-12-0000 L17 L17 1000 Downtown 111-12-0000 L17 L23 2000 Redwood 111-12-0000 L17 L15 1500 Pennyridge 48 rows … … 777-12-0000 L16 L16 1300 Pennyridge Relational Algebra: join, × Usually, a cartesian product produces several tuples with un-related information. q-join specifies a q-condition (same as a selection criterion) to restrict the output of a join to meaningful tuples only. Example: Find the loan no, amount and branch name for all customers. BORROWS ×[loan_no = loan_number] LOAN 5 columns customer loan_no loan_number amount branch_name 111-12-0000 L17 L17 1000 Downtown 222-12-0000 L23 L23 2000 Redwood 8 rows [Why ?] 333-12-0000 L15 L15 1500 Pennyridge 444-00-0000 L93 L93 500 Mianus 666-12-0000 L17 L17 1000 Downtown 111-12-0000 L11 L11 900 Round Hill 999-12-0000 L17 L17 1000 Downtown 777-12-0000 L16 L16 1300 Pennyridge Relational Algebra: dot-notation in join, × Two tables being joined may have the same attribute name (possibly denoting two different things). To distinguish the columns in the q-join, the names of attributes use dot-notation The following are all equivalent: C = BORROWS ×[loan_no = loan_number] LOAN C = BORROWS ×[BORROWS.loan_no = LOAN.loan_number] LOAN A = BORROWS B = LOAN C = A ×[A.loan_no = B.loan_number] B Relational Algebra: set theoretic operations, Since a table is a set of tuples, it is possible to make a union of two tables. BUT: we require closure (union of two tables should be a table). Union is defined for two tables with identical schemas. Example: Find the names of customers who have either a deposit, or a loan with the bank name A = P[customer] (BORROWS) P[c_ssn] (DEPOSIT) Jones Smith RESULT = P[name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER ) Hayes Curry Turner Williams Adams Johnson Brooks Lindsay Relational Algebra: set theoretic operations, Other set theoretic operations can be applied with same rules. Example: Find the names of customers who have both, a deposit and a loan with the bank A = P[customer] (BORROWS) P[c_ssn] (DEPOSIT) RESULT = P[name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER ) RESULT c_ssn customer customer name 888-12-0000 111-12-0000 = 111-12-0000 Jones 222-12-0000 222-12-0000 222-12-0000 Smith 333-12-0000 333-12-0000 333-12-0000 Hayes 555-00-0000 444-00-0000 111-12-0000 666-12-0000 000-12-0000 999-12-0000 777-12-0000 Relational Algebra: set theoretic operations, - Other set theoretic operations (same rules). Example: Find the names of customers who have a loan but no deposits. A = P[customer] (BORROWS) - P[c_ssn] (DEPOSIT) RESULT = P[name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER ) RESULT c_ssn customer customer name 888-12-0000 - 111-12-0000 = 888-12-0000 Johnson 222-12-0000 222-12-0000 555-12-0000 Turner 333-12-0000 333-12-0000 000-12-0000 Lindsay 555-00-0000 444-00-0000 111-12-0000 666-12-0000 000-12-0000 999-12-0000 777-12-0000 Relational Algebra: set theoretic operations, ÷ Set division extends the meaning of integer division, in the sense that it ‘cancels away’ common multiples. It is useful in answering ‘for all’ queries. Example: Do all the loan officers have the same manager ? A solution: Find the ssn of the person who manages all the loan officers. A = P[banker] (s[b_type=LO] (CUSTOMER) ) B = P[mgr_ssn, e_ssn] (EMPLOYEE) RESULT = B ÷ A B mgr_ssn e_ssn A RESULT 321-32-4321 111-22-3333 banker 111-22-3333 333-11-4444 mgr_ssn ÷ 333-11-4444 111-22-3333 123-45-6789 111-22-3333 123-45-6789 321-32-4321 555-66-8888 888-99-9999 987-65-4321 777-77-7777 888-99-9999 777-77-7777 321-32-4321 Note: for this example, we have to specify that null 777-77-7777 the common divisor in B is e_ssn. Relational Algebra: set theoretic operations, ÷ Generic definition of ÷ Attribute restrictions: A ÷ B is defined only for A( R, C) and B( C), where R, C are sets of attributes. Output: The output contains each ti[R] such that tuples tj[C] B, a tuple, t A in which t[C] = tj[C] and t[R] = ti[R]. common attribute set, C OUTPUT attribute set, R r1… rm r1… rm c1 … ck c1 … ck ÷ t1 … … tn … Relational Algebra: concluding remarks RA provides a formal language to get information from the database RA can potentially answer any query, as long as the query pertains to exactly one row of some table derivable using expressions. Limitations of RA: aggregation and summary information Examples: find the average amount of assets in the branches find the total assets of the bank, … RA is procedural, namely, an expression of RA specifies a step by step procedure for computing the result. Relational Calculus (RC) Background: what is a calculus ? RC is based on a formal system in logic, first order predicate calculus (fopc) A formal system has: a set of symbols; rules about how the symbols can be arranged in well formed formulae (wff) a (logical) mechanism to derive if a wff is true/false. additionally, fopc allows wff with ‘variables’ and quantifiers (, ). A query in RC takes the form: {t | P(t) } Meaning: the set of all tuples, t, for which some Proposition, P(t) is true. P is also called a predicate. Relational Calculus (RC) examples 1. Report the loans that exceed $1200: { t | t LOAN t[amount] > 1200} 2. Find the names of customers who took a loan from the Pennyridge branch. { t[name] | s BORROWS s[customer] = t[ssn] u LOAN u[loan_number] = s[loan_no] u[branch_name] = ‘Pennyridge’} Relational Calculus (RC) remarks RC is non-procedural – any way that the predicate P can be evaluated is valid. RC is the formal basis for Structured Query Language (SQL) SQL is the de facto standard language for all RDBMSs In terms of functionality (i.e. the power to get some information from any DB) RA and RC are equivalent). Namely, any query that can be written in RC has an equivalent RA formula, and vice versa. Advantage of RC (over RA): conceptually, it is better to allow the user to define the logic of the query, but leave the procedure for computing it to the program [why ?]. Bank tables.. BRANCH EMPLOYEE branch_name city assets e_ssn e_name tel start_date mgr_ssn Downtown Brooklyn 9000000 111-22-3333 Jones 12345 Nov-2005 321-32-4321 Redwood Palo Alto 2100000 333-11-4444 Smith 54321 Mar-1998 111-22-3333 Pennyridge Horseneck 1700000 123-45-6789 Lee 54321 Mar-1998 111-22-3333 Mianus Horseneck 400000 555-66-8888 Turner 55555 Aug-2002 321-32-4321 Round Hill Horseneck 8000000 987-65-4321 Jones 87621 Mar-1995 888-99-9999 Pownal Bennington 300000 888-99-9999 Chan 87654 Feb-1980 777-77-7777 North Town Rye 3700000 321-32-4321 Adams 77777 Feb-1990 777-77-7777 Brighton Brooklyn 7100000 777-77-7777 Black 99111 Jan-1980 null CUSTOMER DEPOSIT ssn name street city banker b_type c_ssn ac_num accessDate 111-12-0000 Jones Main Harrison 321-32-4321 CRM 888-12-0000 A101 Jan 1, 09 222-12-0000 Smith North Rye 321-32-4321 CRM 222-12-0000 A215 Feb 1, 09 333-12-0000 Hayes Main Harrison 321-32-4321 CRM 333-12-0000 A102 Feb 28, 09 444-12-0000 Curry North Rye 333-11-4444 LO 555-00-0000 A305 Mar 10, 09 555-12-0000 Turner Putnam Stamford 888-99-9999 DO 888-12-0000 A201 Mar 1, 98 666-12-0000 Williams Nassau Princeton 333-11-4444 LO 111-12-0000 A217 Mar 1, 09 777-12-0000 Adams Spring Pittsfield 123-45-6789 LO 000-12-0000 A101 Feb 25, 09 888-12-0000 Johnson Alma Palo Alto 888-99-9999 DO 999-12-0000 Brooks Senator Brooklyn 123-45-6789 LO BORROWS 000-12-0000 Lindsay Park Pittsfield 888-99-9999 DO customer loan_no 111-12-0000 L17 LOAN loan_number amount branch_name 222-12-0000 L23 L17 1000 Downtown 333-12-0000 L15 L23 2000 Redwood 444-00-0000 L93 L15 1500 Pennyridge 666-12-0000 L17 L93 500 Mianus 111-12-0000 L11 L11 900 Round Hill 999-12-0000 L17 L16 1300 Pennyridge 777-12-0000 L16 References and Further Reading Silberschatz, Korth, Sudarshan, Database Systems Concepts, McGraw Hill Next: SQL and DB API’s