Biocomputing Databases and Perl

Document Sample
Biocomputing Databases and Perl Powered By Docstoc
					                                                                           What is a Database?

                                                  •     A collection of organised data
                                                  •     Data has structure
                            Biocomputing          •     Can be paper-based
                          Databases and Perl      •     More common to talk about
                                                        electronic databases i.e.
                            25th February 2008          computer-based




MSc Bioinformatics 2008                          MSc Bioinformatics 2008




                          Flat File Databases                              Database Concepts

 • A flat-file database is a simple database
                                                       • Flat file example
   that stores all data in a single table         Record
                                                  A single record will
                                                  hold all the data
 • A flat-file database can be stored in a        relating to one unit of
                                                  processing. e.g a track
   text file, such as a tab-delimited file; a     Field
   spreadsheet; or in a database file             A field stores one
   that contains one or more unrelated            item of data for the
                                                  record
   tables.                                        File
                                                  Data file contains a
                                                  number of records


MSc Bioinformatics 2008                                               FatMax
                                                 MSc Bioinformatics 2008       2007. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License




                                                                                                                                                                          1
                          Why use a database?                                   Database Concepts 2

                                                           Data types (Access specific)
                                                           • There are 5 main data types
                                                                                Text            Characters

                                                                          Numeric                Numbers

                                                                                Date         Various formats

                                                                          Boolean               True/False

                                                                          Counter            Auto-inserted no.


MSc Bioinformatics 2008                               MSc Bioinformatics 2008




                           Flat File Databases                            Problems with flat files

 • Useful for simple lists:                            Redundancy
         – Address book/contact list                   • Large amounts of
                                                         duplicated data
         – CD collection
                                                       • Makes data entry
                                                         slower
 • But many problems                                   • Uses disk space
         – You’ll usually be asked to identify FOUR    • Leads to other
           problems with flat files in the exam so       problems…
           pay attention to the next few slides!


MSc Bioinformatics 2008                               MSc Bioinformatics 2008




                                                                                                                 2
               Problems with flat files 2                    Flat files – more problems
 Reduced data integrity                          Difficult to update
 • More likely to be inaccurate!                         – If entries change all instances have to
                                                           be updated
                                                 Security
                                                         – All users have access to the same set of
                                                           data
 • Due to:                                       Program-Data Dependence
         – Data input errors                             – The user interface (amongst other
         – Inconsistent data entry                         things) is tied to a specific file. To create
                                                           new views of data you would need to
                                                           create new data files
MSc Bioinformatics 2008                         MSc Bioinformatics 2008




 Overcoming problems - RDBMS                                              RDBMS

                                                 • In the example on the previous slide there is
   • The problems associated with flat files
                                                   still a problem in the ORDERS table. How
     can be solved by moving some of the
     data out of the main table and accessing      would you resolve it?
     it when required
   • A relational                                • Relational databases are covered in more
     database (RDBMS)                              detail in another presentation. For now it is
     is a more complex
                                                   only important that you can identify the
     database that
     stores data in                                benefits of RDBMS over flat file systems…
     multiple tables that
     are interrelated.

MSc Bioinformatics 2008                         MSc Bioinformatics 2008




                                                                                                           3
                                   RDBMS                                               RDBMS - benefits

 • In the example on the previous slide there is              •     Reduced redundancy
   still a problem in the ORDERS table. How
                                                              •     Improved data consistency
   would you resolve it?
                                                              •     Improved data integrity
 • Relational databases are covered in more                   •     Better security
   detail in another presentation. For now it is              •     Program-data independence
   only important that you can identify the
   benefits of RDBMS over flat file systems…




MSc Bioinformatics 2008                                      MSc Bioinformatics 2008




             Data in Relational Databases                                       Database Design Levels
 •                is Stored in Tables                             •Conceptual Design / Model
           (also known as Relations)
                                                                          Model of the database design in terms that
   Employees
                                                                          users will understand
              empno       ename     sal   comm
                                                                  •Logical Design / Model
                 7499     ALLEN    1600    300
                 7654     MARTIN   1250    1400                           Description of the design in terms that can be
                 7698     BLAKE    2850            rows,                  directly used to build a database
                 7839      KING    5000            tuples,                (This is called a Relational Model,
                 7844     TURNER   1500     0      records                 if we are building an RDB)
                 7986     STERN    1500                           •Physical Design / Model
                                                                          Additional design descriptions that specify or
                                                                          affect the data representation in physical
                                                                          storage
                     columns, attributes, fields

MSc Bioinformatics 2008                                      MSc Bioinformatics 2008




                                                                                                                           4
                  Database Design Process                                     Why Do Conceptual Design?
             Requirements
                                               Why not just design
                                               the relational model
                                                                           • Faster to create &
                      Conceptual Design          directly from the           draw a conceptual model
                                                  requirements?
                                                                           • Clearer semantic intent
        Conceptual Model                          Why design an
                                                   intermediate            • Easier to understand and reason about
                                                conceptual design?
                      Relational Mapping                                   • Many ways to design a relational model for
                                                                             a given conceptual model
         Relational Model
                                           Actual Design of Tables         • Focus on essential design decisions
                                           in the Database
                                                                           • Allows secondary design decisions to be
                      Physical Mapping                                       deferred

   Physical Model using DDL & DCL


MSc Bioinformatics 2008                                               MSc Bioinformatics 2008




                          Conceptual Modeling                                      Data in Relational Databases
                                                                       •                is Stored in Tables
   • Conceptual Modeling is a way of designing                                   (also known as Relations)
     systems involving collections of tables by                          Employees
     focusing on                                                                    empno       ename     sal   comm
   • Entities                                                                          7499     ALLEN    1600    300

            – an abstraction of tuples                                                 7654     MARTIN   1250    1400
                                                                                       7698     BLAKE    2850            rows,
   • Entity Classes                                                                    7839      KING    5000            tuples,
            – an abstraction of tables                                                 7844     TURNER   1500     0      records
   • Relationships                                                                     7986     STERN    1500
            – between entities in different entity classes
   • Using diagrams called ER diagrams
     (or Entity-Relationship Diagrams)
                                                                                           columns, attributes, fields

MSc Bioinformatics 2008                                               MSc Bioinformatics 2008




                                                                                                                                   5
                                    Tables as Themes                                                       Columns as Attributes
              Employees                                                                                          Employees
          •       empno            ename         sal    comm                          •                              empno   ename    sal    comm

                     7499          ALLEN         1600    300                                                          7499   ALLEN    1600    300

                     7654          MARTIN        1250   1400                                                          7654   MARTIN   1250    1400
                                                                                            Primary Key
                     7698          BLAKE         2850                                      is underlined              7698   BLAKE    2850

                     7839           KING         5000
                                                                 A row represents                                     7839    KING    5000
                                                                 a single Employee                                    7844   TURNER   1500     0
                     7844      TURNER            1500     0                                   Uniquely
                     7986          STERN         1500                                       identifies an             7986   STERN    1500

                                                                                              employee
                     Every table has a theme
                     -- e.g. Employees                                                                         Every column represents an attribute
                     Every row represents an instance of that                                                  related to the theme -- e.g. the name
                     theme -- e.g. a single Employee                                                           or salary of an Employee

MSc Bioinformatics 2008                                                              MSc Bioinformatics 2008




               Rows as Objects/Entities                                                                             Entity Classes
     Employees
          empno           ename            sal   comm
                                                               an Employee Object
             7499         ALLEN        1600       300             empno: 7654             • In conceptual modeling,
             7654         MARTIN       1250      1400             ename: MARTIN
                                                                  sal: 1250
                                                                                            we focus on the entity class
             7698         BLAKE        2850
             7839           KING       5000
                                                                  comm: 1400                which represents the class of entities
             7844         TURNER       1500       0
                                                                                            with the same theme.
             7986         STERN        1500


                                                                                          • In general (but as we will see, not
                 It can be useful to think of each row as
                  It can be useful to think of each row as                                  always), an entity class is
                 an object or entity and the table as a
                  an object or entity and the table as a
                 collection of these entities.
                  collection of these entities.                                             implemented by a table in a relational
                 The columns of the table correspond to
                  The columns of the table correspond to                                    database
                 the instance variables for each object
                  the instance variables for each object

MSc Bioinformatics 2008                                                              MSc Bioinformatics 2008




                                                                                                                                                       6
                     Identify Entity Classes                                                  Modeling Entity Classes
                                                                                  • Visual Conceptual Model (Crow Magnum)
 •          Identify the classes of entities called for to                                                                        Sometimes we don't
            support requirements                                                          Sometimes we include
 •          Think of an entity as                                                           all the attributes                       Employee
         –         a readily identifiable thing
                                                                                                     Employee
 •          A potential entity class should have                                                                                Sometimes we just
                                                                                                                              include the primary key
         1. Multiple instances (i.e. the requirements                                                        empno
            should imply the need for more than one                                                          ename
                                                                                                             sal                      Employee
            entity of that class)                                                                            comm
         2. A clear set of attributes based on the                                                                                            empno
            requirements
         3. A primary key to uniquely identify tuples                              • Textual Conceptual Model (Brief ConText)
                                                                                                     • Employee( empno, ename, sal, comm )


MSc Bioinformatics 2008                                                    MSc Bioinformatics 2008




     ER (Entity-Relationship) Diagrams                                             Entity Classes vs. Attributes
                             (Crow Magnum style)
                                                                                   DON'T
     Depicts a relationship                                                        DO                                     makes
      between Employees                             relationship                                           Employee                       Salary
                                                                                   THIS
          and Depts                               characterization
                                                                                                                 empno                           salary
                                                                                                                 ename
                                         works for
                          Employee                         Dept
                               empno                              deptno                                DO
                               ename                              dname                                 THIS             Employee
                               sal                                                                      INSTEAD
                               comm        Crows Foot
                                                                                                                              empno
                                                                                                                              ename
        • The Crow's foot at Employee means …                                                                                 salary
        • A Dept can have MANY Employees                                              Don't create an entity class for something that really
        • No Crow's foot at Dept, so …                                                 does not need to have a "life of its own", but ought
        • An Employee works for no more than ONE Dept                                   to simply be an attribute of another entity class


MSc Bioinformatics 2008                                                    MSc Bioinformatics 2008




                                                                                                                                                          7
                                                                                                       ER & Instance Diagrams
           Attribute vs. Entity Class Principles
        Employee
                          empno
                          ename
                          deptno
                                            Employee
                                                   empno
                                                   ename


           Reasons for upgrading attributes to entity classes:
           1. Substance: It emphasizes that a department is
                                                                     Dept
                                                                             deptno
                                                                                       ER Diagram




                                                                                       Instance Diagram
                                                                                                                         Employee
                                                                                                                         Entity Class
                                                                                                                                               works for
                                                                                                                                               Relationship
                                                                                                                                                                     Dept
                                                                                                                                                                 Entity Class

                                                                                                                   Corresponds to links between instances of the related classes


                                                                                                                              Shows example instances and the links between them
                                                                                                                                                                                    *
              something that can and should stand in its own right
           2. Extensibility: One might want to add attributes specific                       7499               ALLEN        1600       300
              to a department (e.g. its name, location, etc.)
                                                                                             7654               MARTIN       1250       1400                    10          SALES
           3. Multiplicity: It emphasizes that a department can have
              multiple employees associated with it                                          7844               TURNER       1500        0
           4. Association: It emphasizes that an employees cannot
              have an arbitrary deptno value, but that the employee                           7698              BLAKE        2850
              is associated with a department which has a specific                                                                                              30      ACCOUNTING
              deptno                                                                          7986              STERN        1500
                                                                                                                                                    Links
                                                                                                                Entity Instances                                     Entity Instances
             Note that none of these reasons make sense for Salary
MSc Bioinformatics 2008                                                               MSc Bioinformatics 2008




    Instance Diagrams & Navigation                                                             1:M (1 to Many) Relationships
           7499              ALLEN   1600    300

                                                                                                Child Entity Class                                          Parent Entity Class
           7654             MARTIN   1250   1400           10     SALES

            7844            TURNER   1500     0                                                      Employee                                                    Dept
                                                                                                                                    works for
            7698             BLAKE   2850                                                                          empno                                                deptno
                                                           30   ACCOUNTING
            7986             STERN   1500



         Links are the basis of navigation between instances.                                      an Employee works for (at most) 1 Dept
         We will see later that there are SQL queries which
         effectively find the tuples in one entity class which are                                 a Dept has any number (i.e. M) of Employees
         related to the tuples in another entity class. So we can
         write SQL to find information about
             • The dept associated with an employee
             • All the employees who work for a department


MSc Bioinformatics 2008                                                               MSc Bioinformatics 2008




                                                                                                                                                                                        8
                          Naming Relationships
                                                                                            M:N (Many-to-Many) Relationships
                          It is useful, though not required,
                          to name relationships in ConText
Entity Classes                                                                               Visual Conceptual Model (Crow Magnum)
        Employee( empno, ename, addr )
        Dept( deptno, dname )                                                               •                                  assigned to
                                                                                                                Employee                       Project
                                                 Relationship
                                                    name                 Relationship
                                                                        characterization
Relationships                                                                                                  Each employee may be assigned to
        DeptAssignment: (*) Employee works for Dept
        DeptAssignment                                                                                         a number of projects
                                                                                                               Each project may have
                          Choose a relationship name so that it                                                a number of employees
                      clearly and uniquely identifies the relationship
MSc Bioinformatics 2008                                                                    MSc Bioinformatics 2008




                    M:N Related Instances
                                                                                                1:1 (One-to-One) Relationships
                                             assigned to
           Employee                                                   Project
                          empno                                                 pno
                                                                                                  Visual Conceptual Model (Crow Magnum)

        empno ename                address
                                                                                                 •                               assigned to
           7499           ALLEN      ...                                                                             Desk                       Employee
           7654           MARTIN     ...                        pno         pname …
                                                                 2618           …
           7698           BLAKE      ...

                                                                 2621           …                                    Each employee may be assigned to
           7839            KING      ...                                                                             at most one desk
                                                                 2622           …
           7844           TURNER     ...
                                                                                                                     Each desk may be assigned to
           7986           STERN      ...                                                                             at most one employee



MSc Bioinformatics 2008                                                                    MSc Bioinformatics 2008




                                                                                                                                                           9
                                                                                                                      Choosing Relationship
                      1:1 Related Instances                                                                             Characterizations
                                             assigned to
           Employee                                                  Desk
                                                                                                                         in a                              has
                          empno                                                 deskno           Employee                         Dept         Employee          Dept


           empno ename             address
              7499         ALLEN     ...

                                                             deskno …
              7654        MARTIN     ...
                                                              311           …
              7698         BLAKE     ...                                                                                                 works for
                                                                                                                     Employee                             Dept
                                                              312           …
              7839         KING      ...

              7844        TURNER     ...                      313           …                   •ER diagrams are meant for communicating designs as clearly
              7986         STERN     ...                                                                                  as possible.
                                                                                                   •It is worth taking the time to choose the best possible
                                                                                                                 relationship characterization.

MSc Bioinformatics 2008                                                                    MSc Bioinformatics 2008




 Indicating Relationship Direction                                                               Entity Attributes & Relationships



                                                                                                                                                                     *
                            works
                             for                                contains                                                                  works for
   Employee                                  Dept     Dept                      Employee                               Employee                           Dept

                                                                                                                            empno                           deptno
                            works                                                                   Does NOT                ename                           dname
                             for                               contains                           include deptno            addr
   Employee                                  Dept     Dept                      Employee

                                                                                                • Employee does not contain an attribute that identifies the
                                                                    works                         Dept it is associated with (i.e. deptno).
                            contains                                 for                        • Employees certainly are associated with Depts – but that is
   Employee                                  Dept     Dept                      Employee          represented by the relationship between Employee and Dept.
                                                                                                • A deptno attribute in Employee would be an
                                                                                                  entity attribute (it identifies a Dept entity).
                                                                                                • This would not only be redundant (with the
  • Must indicate the relationship direction if it is not the natural                             works for relationship), but wrong to include
    reading direction (L-to-R, top-to-bottom)                                                     in a Conceptual Model

MSc Bioinformatics 2008                                                                    MSc Bioinformatics 2008




                                                                                                                                                                         10
                            Design Exercise                                                        Entity Classes Needed
     • A university is divided into departments.
     • Each department is made up of faculty
       members.                                                                 • University: MAYBE, but it sounds like it's a singleton
                                                                                  class
     • Each department may have a number of
                                                                                • Department: YES
       degree-granting programs.                                                • Faculty Member: YES
     • Every student may declare a major in a                                   • Student: YES
       single program.                                                          • Person: MAYBE (ignore for now)
                                                                                • (Degree-Granting) Program: YES
     • Every student may be assigned a faculty
                                                                                • Major: MAYBE (but it is simpler to just treat major as
       member as an advisor.                                                      a relationship) between a student and a program
                                                                                • Advisor: MAYBE (but it is simpler to treat advise as a
                                                                                  relationship between Faculty Member and Student.

                          What are the entity classes needed?

MSc Bioinformatics 2008                                                  MSc Bioinformatics 2008




                          Identify Relationships                                                      Design Solution
     • If an instance of one entity class
       needs to                                                                                                   offers
             – know something about                                                       Department                            Program

             – keep track of
                                                                                                    made up of
     • an instance of another entity class                                                                                 majors in

       (or the same entity class)
                                                                                              Faculty            advises
     • The entity classes are related to one                                                  Member
                                                                                                                                 Student

       another

                  Draw the Easy Crow Magnum diagram based on
                 the identified entity classes. All 1:M Relationships!
MSc Bioinformatics 2008                                                  MSc Bioinformatics 2008




                                                                                                                                           11
                          Relationship Exercise                            Relationship Exercise Answer
                                  Bikeshop
          has in stock                                 sells
                                                                                                            Bikeshop
                                                                                   has in stock                                sells
                           Bike                 Bike Model
                                  instance of
                                                                                                     Bike                 Bike Model
                     part of                           used on                                              instance of

                                                                                                part of                         used on
                           Part                 Part Model
                                  instance of
                                                                                                     Part                 Part Model
                                                                                                            instance of
          The model above describes the parts on bikes sold by a
          chain of bike-shops (i.e. there are multiple bike-shops).
                It contains both 1:M and M:N relationships,
                but the Crow's Feet have all been removed.
                     Put them back where they belong!
MSc Bioinformatics 2008                                               MSc Bioinformatics 2008




                    Normalization Problem                                           Answer: What's Wrong …
                                                                           • Entity Class Principles: substance, extensibility,
 • In designing a conceptual model for                                       multiplicity, association argue that there should be a
                                                                             Dept class
   employees with the attributes empno,
                                                                           • Redundancy: deptno & dname
   ename, deptno, dname, addr
                                                                           • Extra Work: If changed name of a department, would
                                                                             have to do it in multiple places
 • What's wrong with just using a single
                                                                           • Anomalies: Could change deptno without changing
   entity class:                                                             dname or vice versa.
 • Employee( empno, ename, deptno,                                         • Too many NULLs: If an employee could be
   dname, addr )?                                                            unassigned, both deptno and dname would be NULL.




MSc Bioinformatics 2008                                               MSc Bioinformatics 2008




                                                                                                                                          12
   Redundancy: deptno                                                          dname                Redundancy and Anomaly
   •
   Employee
                          empno         deptno        dname
                                                                                        • Redundancy can cause anomalies (inconsistencies) if
                           7654     …    30            SALES           …                  modifications are not done carefully
                           7698          30            SALES
                           7839          10      ACCOUNTING                             • Update Anomaly:
                           7844          30            SALES                                –Updating a value in a single cell can make the
                           7986          50           SUPPORT                               database inconsistent
                                                                                        • Insertion Anomaly:
            • Entities with the values for deptno have the same value
                                                                                             –Adding an entity can make the database inconsistent
              for dname (deptno      dname)
                                                                                        • Deletion Anomaly:
            • Including dname in the table is redundant, and the                             –Deleting some information can make the database
              table repeats it unnecessarily.                                                inconsistent or cause unintended loss of information
   Redundancy causes duplicate work
            Suppose the company wants to change deptno 30 to be
             the Sales & Marketing department. That change must
             be made to MULTIPLE employees
MSc Bioinformatics 2008                                                                MSc Bioinformatics 2008




                          Anomaly Examples                                             Simple Conceptual Normal Form
      Employee
                            empno             deptno       dname                                An entity class is in
                             7654         …      30            SALES       …                      Simple Conceptual Normal Form
 •                           7698                30            SALES
                                                                                                if, within each entity class,
                             7839                10      ACCOUNTING
                                                                                                    the determinant of every
                             7844                30            SALES
                                                                                                    SIMPLE, NON-TRIVIAL functional dependency
                             7986                50        SUPPORT
                                                                                                    is a candidate key
            Modification Anomaly:
                    Modify 7654's dname to 'SUPPORT'                                    •Employee( empno, ename, deptno, dname, addr )
                    (without changing its deptno)
                                                                                            – empno is the only candidate key (i.e the only attribute
            Insert Anomaly:                                                                   which uniquely identifies tuples)
                    Insert a new employee with a deptno of 20,                              – but deptno       dname
                    and a dname of 'SUPPORT'
                                                                                            – so this is not in Simple Conceptual Normal Form, since
            Delete Anomaly:                                                                   deptno is the determinant of deptno    dname
                                                                                              but deptno is not a candidate key
                    Delete employee 7986 (it’s the only employee in
                    SUPPORT, and no other table keeps track that
                    dept 50 is SUPPORT)
MSc Bioinformatics 2008                                                                MSc Bioinformatics 2008




                                                                                                                                                        13
     Simple Conceptual Normalization                                                      Result of
       Employee( empno, ename, deptno, dname, addr )                           Simple Conceptual Normalization
            deptno   dname

        • Given an entity class with a                                     • Each
          (simple, non-trivial) functional dependency                        simple conceptual normalization step
          whose determinant is NOT a candidate key
                – Split out a new entity class
                                                                                   – adds one entity class
                – Make the determinant the primary key (or at least a              – adds one 1:M relationship link
                  candidate key) of the new class
                – Move all attributes that depend on it                          Employee
                                                                                                                                        has
                      Employee( empno, ename, addr )                                       empno                     Employee                           Dept
                                                                                           ename
                      Dept( deptno, dname )                                                deptno
                                                                                                                              empno                           deptno
                                                                                                                              ename                           dname
                                                                                           dname                              addr
                                                                                           addr
                    Note: Most books only discuss Normalization at the
                      Relational Design level. However, Conceptual                                          deptno
                    Normalization, though not complete, is an excellent                                      dname
                    way to improve a conceptual design. We'll examine
                    Normalization in much more detail later in the term

MSc Bioinformatics 2008                                                   MSc Bioinformatics 2008




                                                                                                    ER Decomposition (a)
Conceptual Normalization Exercise
                                                                                        Item
                                                                                                                                        has
                                                                                           itemsku                        Item                        Style
   • Assume you have designed an Item entity class with the                                stylecode
                                                                                                        Step
                                                                                                          1
     following attributes                                                                                                     itemsku                    stylecode
                                                                                           stylenam                           size                       stylenam
                                                                                           styledate
   •                           itemsku     e.g. 'FX311B-24M'                                         stylecode                color                      styledate
          stylecode            e.g. '302'                                                  catid
                                                                                                      stylenam,                             Step         catid
          stylenam             e.g. ‘Trouser'                                              catnam     styledate,                             2           catnam
          styledate            e.g. 1992 (when introduced)                                 size          catid
          catid                e.g. 'MU'                                                   color        catnam                                 catid
          catnam               e.g. 'Mens Wear'                                                                                                catnam
          size                 e.g. 36
          color                e.g. ‘cream'
                                                                                                           has                        has
                                                                                        Item                          Style                        Category
   • Find an FD with a non-candidate-key determinant, and use                              itemsku                     stylecode                        catid
     Conceptual Normalization to split out a new entity class.                             size                        stylenam                         catnam
   • Continue doing this with all of the resulting entity classes until                    color                       styledate
     each of them are in Conceptual Normal Form.
                                                                                                    Each simple conceptual normalization step
                                                                                                    • adds one entity class
                                                                                                    • adds one relationship link

MSc Bioinformatics 2008                                                   MSc Bioinformatics 2008




                                                                                                                                                                       14
                          ER Decomposition (b)                                                               Database Design Process
              Item
                                                                has
                 itemsku                       Item                          Category                                        Requirements
                 stylecode                          itemsku                       catid
                 stylenam            Step
                                                    stylecode                                                                   Conceptual Design
                                      1                                           catnam
                 styledate                          stylenam                                                                    & Conceptual Normalization
                                                                      Step
                 catid                              styledate          2
                                catid
                 catnam
                                catnam              size                 stylecode                                         Conceptual Model
                 size                               color                 stylenam,
                 color                                                     styledate
                                                                                                                                Relational Mapping
                                                                                                                                & Relational Normalization
                               has                         has
              Item                          Style                      Category
                 itemsku                     stylecode                         catid
                                                                                                                            Relational Model
                 size                        stylenam                          catnam
                 color                       styledate                                                                          Physical Mapping

                     Each simple conceptual normalization step
                     • adds one entity class                                                                         Physical Model using DDL & DCL
                     • adds one relationship link

MSc Bioinformatics 2008                                                                    MSc Bioinformatics 2008




                                            mySQL

 • Command-line interpreter
                          – UNIX/Mac
 /usr/local/bin/mysql -uhugh -pshhh

                          – Windows
 Start > All Programs > mySQL > mySQL Server 5.0 > mySQL command line client                                  Introduction to SQL




MSc Bioinformatics 2008                                                                    MSc Bioinformatics 2008




                                                                                                                                                             15
                   SQL Database Operations:                                                         SQL
                      Queries & Actions
                                                                • A language for dealing with tables
                                                                         – DML: Data Manipulation Language
                                                                                 • Querying: Extracting data from tables
                                    SQL DB Operations
                                                                                 • Modification: Inserting, updating, deleting rows in
                                                                                   a tables
                     Queries   SQL Queries                               – DDL: Data Definition Language
                                                                                 • Defining new tables, views (VDL), etc.
                                                                                 • Altering & dropping old ones
                               SQL Insert/Update/Delete                  – DCL: Data Control Language
                     Actions   SQL DDL (e.g. create table)
                                                                                 • Security: Access Control
                               SQL DCL (e.g. grant access)
                                                                                 • Transaction Mgt
                                                                                 • Performance (e.g. Indexing (SDL))



MSc Bioinformatics 2008                                      MSc Bioinformatics 2008




                                                                         Table Creation & Insertion
                                                                       mysql>CREATE TABLE Emps (             The default
                                                                              empno int primary key,          starting
                                                                              ename varchar(30),               salary
                                                                              deptno      number(3),
                                                                              sal         number(7,2) DEFAULT 700,
                                                                              comm        number(7,2)
                                                                            )
                                                                                                            Defaults to
                          Creating Tables                              mysql>INSERT INTO Emps                  NULL
                                                                         VALUES( 7499, 'ALLEN', 30 , 1600, 300 )

                                                                       mysql>INSERT INTO Emps( empno, deptno, ename )
                                                                         VALUES( 8614, 30, 'LUPIN' )

                                                                                                                     same result
                                                                       mysql>INSERT INTO Emps
                                                                         VALUES( 8614, 'LUPIN', 30, 700, NULL )


MSc Bioinformatics 2008                                      MSc Bioinformatics 2008




                                                                                                                                         16
                           Check Constraints                                                Table = fingerprint
                                                                                            +----------------------------------+----------------------------------+-------+
                                                                                            | Field                               | Type                                 | Length|
                                                                                            +----------------------------------+----------------------------------+-------+
                                                                                            | fprint_accn                         | varchar()                            | 15    |
                                                                                            | identifier                          | varchar()                            | 15    |
                                                                                            | motifs                              | int2                                 | 2     |
         mysql>CREATE TABLE Emps (                                                          | date                                | date                                 | 4     |
                                                                                            | up_date                  | date                                 | 4     |
                empno int primary key,                                         attributes   | family_title                        | text                                 | var   |
                                                                                            | family_doc                          | varchar()                            | 18    |
                ename varchar2(30),                                                         | category                            | text                                 | var   |
                                                                                            | second_accn                         | varchar()                            | 15    |
                deptno      number(3),                                                      | pseudo                              | bool                                 | 1     |
                                                                                            +----------------------------------+----------------------------------+-------+
                sal         number(7,2) DEFAULT 700
                                   check( sal >= 100 ),
                comm        number(7,2)                                                     Table = motif
                                                                                            +----------------------------------+----------------------------------+-------+
              )                                                                             |Field                                | Type                                 |Length   |
                                                                                            +----------------------------------+----------------------------------+-------+
                                                                                            |fprint_accn                          | varchar()                            | 18      |
                                                                                            | motif                               | int2                                 | 2       |
                                                                                            | repeat                              | varchar()                            | 4       |
                                                                                            | seqn_accn                           | varchar()                            | 15      |
                                                                                            | seqn_fragment                       | varchar()                            | 35      |
                                                                                            | start_position                      | int4                                 | 4       |
                                                                                            | inter_motif_dist                    | int4                                 | 4       |
                                                                                            | initial                             | bool                                 | 1       |
                                                                                            | final                               | bool                                 | 1       |

                                                                                            +----------------------------------+----------------------------------+-------+


MSc Bioinformatics 2008                                                              MSc Bioinformatics 2008




                                                                                                                              NULLs
                            NULLs in Tables
                                                                                      • NULLs are different from other data values
          Emps                                                                                – NULL is not the same as 0
 •                                                                                            – NULL is not the same as an empty string
                empno      ename     deptno    sal    comm

                   7499     ALLEN      30     1600     300                            • NULL is typically used in the following
                   7654    MARTIN      30     1250     1400
                                                                                        situations
                   7698     BLAKE      30     2850              NULL
                   7839     KING       10     5000              value                         – A value is Not Applicable
                   7844    TURNER      30     1500      0                                     – A value is known, but is currently
                   7986     STERN      50     1300                                                 Missing or Not Provided
                     int   varchar     int    decimal decimal
                                                                   Datatypes                  – A value is Unknown
                                                                                      • If a columns allows nulls,
              Each cell in a column has a single value of
                the datatype specified for that column
                                                                                        document what NULL represents!


MSc Bioinformatics 2008                                                              MSc Bioinformatics 2008




                                                                                                                                                                                       17
                                 Primary Key                                      Choosing a Primary Key
     • Primary key of a table
             – a column in the table
                     •         or columns (sometimes more than    • User-provided Id or Name
                          one column is needed)
                                                                            –Provided by user when tuple in created
             – chosen by the database designer
             – that uniquely identifies tuples                    • System-provided Id
     • Requirements                                                         –Provided by an id generator defined in the
             • All values in the column must be unique                      database system.
             • None of the values can be NULL                               –Useful as a surrogate key even if a name or a
                                                                            user-provided key is available.
     • Example
             – empno is the primary key of Emps

MSc Bioinformatics 2008                                          MSc Bioinformatics 2008




                     A little on more keys…                                                Naming Foreign Keys

 • A foreign key is a column or columns                           • Use name of foreign key, with table name included, if
                                                                    necessary:
   whose values are the same as the                                       – e.g. [in Emps] deptno (or deptid)
   primary key of another table. You can
                                                                  • Where that name is ambiguous, replace or combine with
   think of a foreign key as a copy of                              a name that reflects the role in the local table
   primary key from another relational                                    – e.g. [in Emps] use mgr or (better) mgrno
                                                                            to refer to the empno of the employee's
   table. The relationship is made between                                  manager
   two relational tables by matching the
   values of the foreign key in one table
   with the values of the primary key in
   another.
MSc Bioinformatics 2008                                          MSc Bioinformatics 2008




                                                                                                                             18
      Query-Based Create & Insert                                          Altering Column & Tables
                                              Example DDL               mysql>ALTER TABLE Emps ADD (
      mysql>CREATE TABLE RichEmps AS           command                    numkids int not null default 0 );
        SELECT empno, ename                                                Adds column
        FROM Emps
                                                                        mysql>ALTER TABLE Emps MODIFY (
        WHERE sal > 3000
                                                                          sal NUMBER(13,2) DEFAULT 250 );
                                                  Emps
                                                                           Modifies column datatype and/or default
      mysql>INSERT INTO RichEmps
        SELECT empno, ename                                             mysql>ALTER TABLE Emps
        FROM Emps                          empno, ename                   DROP (numkids, sal);
        WHERE (sal > 2000)
                                                                           Drops columns
        AND (sal <= 3000)
        AND (comm > 200)                                                mysql>ALTER TABLE Emps
                                                RichEmps
                                                                          RENAME TO Employees;
                                                                           Renames entire table (can't rename columns)
MSc Bioinformatics 2008                                     MSc Bioinformatics 2008




                           Modifiers                                Removing databases and tables

 • Are applied to attributes                                 • DROP statement (make sure you want
         – NOT NULL (row can’t exist without this              to do this – it doesn’t check!)
           attribute having a value)
         identifier varchar(15) NOT NULL,                                    mysql>DROP TABLE emps;
                                                                             mysql>DROP DATABASE employinfo;

         – DEFAULT (sets the data to the value
           that follows when no data is supplied)
                 pseudo" bool DEFAULT f,



MSc Bioinformatics 2008                                     MSc Bioinformatics 2008




                                                                                                                         19
               SQL Updates and Deletes
      mysql>UPDATE Emps
        SET sal = 2000, comm = 100
        WHERE sal = 0;

      mysql>UPDATE Emps
        SET sal = sal + 500
                                                                                   The SELECT Statement
        WHERE job = 'DEPTMGR‘;

      mysql>DELETE FROM Emps
        WHERE sal = 0;



MSc Bioinformatics 2008                                                 MSc Bioinformatics 2008




        Primary
          Key                        Emps Table                                                          SQL Queries
            empno         ename    job         hiredate     sal comm                                                                 ENAME
            -----         ------   ---------   ---------   ---- ----     mysql>SELECT ename                                          ------
             7369         SMITH    CLERK       17-DEC-80    800            FROM Emps                                                 SMITH
             7499         ALLEN    SALESMAN    20-FEB-81   1600 300        WHERE empno = 7499
             7521         WARD     SALESMAN    22-FEB-81   1250 500                               Note symmetry of lookups
             7566         JONES    DEPTMGR     02-APR-81   2975
             7654         MARTIN   SALESMAN    28-SEP-81   1250 1400     mysql>SELECT empno                                          EMPNO
             7698         BLAKE    DEPTMGR     01-MAY-81   2850            FROM Emps                                                 -----
             7782         CLARK    DEPTMGR     09-JUN-81   2450            WHERE ename = 'ALLEN'                                      6543
             7788         SCOTT    ANALYST     19-APR-87   3000
             7839         KING     PRESIDENT   17-NOV-81   5000
             7844         TURNER   SALESMAN    08-SEP-81   1500     0    mysql>SELECT empno, ename                           EMPNO   ENAME
             7876         ADAMS    CLERK       23-MAY-87   1100            FROM Emps                                         -----   ------
             7900         JAMES    CLERK       03-DEC-81    950            WHERE sal > 2975                                   7902   FORD
             7902         FORD     ANALYST     03-DEC-81   3000            ORDER BY ename                                     7839   KING
             7934         MILLER   CLERK       23-JAN-82   1300                                                               7788   SCOTT
                                                                                                     Query                   Query Result
Emps( empno, ename, job, hiredate, sal, comm )
MSc Bioinformatics 2008                                                 MSc Bioinformatics 2008




                                                                                                                                              20
                          Basic Query Parts

          SELECT empno, ename
                 FROM Emps
                                        3. Projection    *      Naming Query Result Columns
                                                                     SELECT
                                                                          empno AS "Employee Number",
                                                                          ename AS Name
                                                                       FROM Emps
                                                                       WHERE sal > 2000
                 WHERE sal > 2000       1. Restriction
                                                                       ORDER BY ename
                 ORDER BY ename
                                                                                       Employee Number    NAME
                                          2. Ordering                                  ---------------    --------
                                                                                                  7698    BLAKE
                                                                                                  7782    CLARK
                                                                                                  7902    FORD
                                                                                                  7566    JONES
                                                                                                  7839    KING
                                                                                                  7788    SCOTT

MSc Bioinformatics 2008                                      MSc Bioinformatics 2008




                                 OR                                                    Boolean Expressions
                                                                             < less than             <=    less than or equal
     mysql>SELECT empno, ename                                               > greater than          >= greater than or equal
     FROM Emps                                                               = equal                 <>    !=   not equal
     WHERE job IN
       ('CLERK', 'SALESMAN', 'ANALYST')
                                                                         SELECT empno, ename
                                                                           FROM Emps
     mysql>SELECT empno, ename                                             WHERE (sal > 1200)
     FROM Emps                                                             AND (sal <= 1500)
     WHERE job = 'CLERK'
                                                                           AND (comm > 200)
            OR job = 'SALESMAN'
            OR job = 'ANALYST'
                                                                                         What will this generate?

MSc Bioinformatics 2008                                      MSc Bioinformatics 2008




                                                                                                                                21
       Boolean Expressions Answer                                                                                 Nulls in Conditions
                                                                                                                                               SELECT * FROM Emps
                                                                                                                                                WHERE …
               mysql>SELECT empno, ename, sal
                 FROM Emps                                                                                Emps                                  comm IS NOT NULL
                 WHERE (sal > 1200)                                                                            empno    ename   dept   sal    comm
                 AND (sal <= 1500)                                                                             7499    ALLEN     30    1600    300
                                                                                                                                                             comm != 0
                 AND (comm > 200)                                                                              7654    MARTIN    30    1250    1400
                                                                                                               7698    BLAKE     30    2850
                                                                                                               7839     KING     10    5000
                                                                                                               7844    TURNER    30    1500     0            comm = 0
                                     EMPNO    ENAME   SAL                                                      7986    STERN     50    1300
                                     -----    ------ ----
                                      7521    WARD   1250
                                      7654    MARTIN 1250                                     How do we get all employees                           comm IS NULL
                                                                                              except those who definitely
                                                                                              get a commission?


MSc Bioinformatics 2008                                                              MSc Bioinformatics 2008




         Answer: Nulls in Conditions
                     Emps                                       comm IS NOT NULL
                          empno    ename   dept   sal     comm
                          7499    ALLEN      30   1600     300
                                                                         comm != 0
                          7654    MARTIN     30   1250     1400
                          7698    BLAKE      30   2850
                          7839     KING      10   5000
                          7844
                          7986
                                  TURNER
                                  STERN
                                             30
                                             50
                                                  1500
                                                  1300
                                                            0            comm = 0         Performance & Indexing
         How do we get all employees                            comm IS NULL
         except those who definitely
         get a commission?                               SELECT * FROM Emps
                                                          WHERE (comm = 0)
                                                              OR (comm IS NULL)

                                   SELECT * FROM Emps WHERE nvl( comm, 0 ) = 0
MSc Bioinformatics 2008                                                              MSc Bioinformatics 2008




                                                                                                                                                                         22
                          Query Performance                                                 Automatic Indexing
                                                                     mysql>SELECT empno, ename
     mysql>SELECT empno, ename                                             FROM Emps
           FROM Emps                                                       WHERE empno = 8197;
           WHERE sal = 2000;
                                                                     • The database automatically maintains a data structure
                                                                       (called an index) which it uses to look up the location of a
     • Suppose there are 1M employees                                  tuple given its primary key (or any attribute whose values
     • This requires scanning through all 1M of them to find           are known to be unique)
       the ones whose sal = 2000                                     • No need to scan through all 1M employees to find the one
                                                                       whose empno = 8197
     • If the database is stored on a hard disk, with 10
       employees in each disk block, this requires loading           • If the database is stored on a hard disk (including the
                                                                       index), ~4 disk blocks need to be examined to find 8197 in
       100K disk blocks into memory in order to look
                                                                       the index, which identifies the one disk block where
       through them                                                    employee 8197 is stored
                                                                     • Result: Load ~5 blocks instead of 100K blocks!



MSc Bioinformatics 2008                                           MSc Bioinformatics 2008




                          Creating Indexes                                     Index-Based Performance
                                                                       mysql>SELECT empno, ename
                                                                             FROM Emps
   • An index can be explicitly created for any attribute (or                WHERE sal = 2000
     combination of attributes)
                                                                       • Suppose there are 1M employees, but only 1K have a
   mysql>CREATE INDEX Emp_Index_By_Sal                                   salary of 2000
         ON Emps( sal );                                               • If the database is stored on a hard disk (including the sal
                                                                         index), ~20 disk blocks need to be examined to get the
   • Creates an index for the Emps table                                 locations of all 1K employees whose sal = 2000
     based on the sal attribute                                        • If all 1K employees are in different blocks,
                                                                         need to load ~1K blocks (actually 20 + 1K),
                                                                         (which is certainly better than 100K blocks)
   • This makes it possible to rapidly find the location of all        • Even better if some of those employees are stored in the
     the employees who have a specific salary!                           same block.
                                                                       • Even better if you can arrange to have the database store
                                                                         all the employees with the same salary together (this is
                                                                         called clustering)



MSc Bioinformatics 2008                                           MSc Bioinformatics 2008




                                                                                                                                       23
                                           DISTINCT                                         Composite Duplicate Elimination
                                       SELECT deptno FROM Emps                  DEPTNO                       mysql>SELECT DISTINCT deptno, job FROM Emps;
                                                                                ------
                                               Selection of a                       30                               List the distinct jobs within each department
                                              non-unique field                      30
     Emps
      empno               ename     deptno    sal     comm
                                                                                    30               Emps
                                                                       30           10                  empno deptno           job
         7499             ALLEN       30      1600     300
                                                                                    30                                                                        deptno     job
         7654             MARTIN      30      1250     1400                                              7839         10       CLERK
                                                                                    50                   7499         30      ANALYST
                                                                                                                                                               10        CLERK
         7698             BLAKE       30      2850
                                                                 10                                                                                            30       ANALYST
         7839              KING       10      5000                                                       7654         30      ANALYST
                                                                                                                                                               30        CLERK
                                                                                                         7698         30       CLERK
         7844             TURNER      30      1500      0
                                                                                DEPTNO                   7844         30     SALESMAN
                                                                                                                                                               30      SALESMAN
         7986             STERN       50      1300
                                                                      50        ------                   7986         50       CLERK
                                                                                                                                                               50        CLERK
                                                                                    10                                                                         50      SALESMAN
                                                                                                         7214         50       CLERK
                                                                                    30
                                                                                                         7586         50     SALESMAN
                            SELECT DISTINCT deptno FROM Emps                        50
                                                      eliminates duplicates                 Some SQL implementations will                           SELECT DISTINCT deptno,
                                                                                                order the results in the same                         job
                                                                                              way they are grouped, but use                           FROM Emps
MSc Bioinformatics 2008                                                                         ORDER
                                                                                         MSc Bioinformatics 2008 BY to guarantee this                 ORDER BY deptno, job;




                           Aggregate Functions                                                   Aggregate Function Answers
        Emps                                                                                     Emps
         empno             ename     deptno    sal     comm                                       empno            ename    deptno     sal   comm
            7499            ALLEN      30      1600     300                                          7499           ALLEN     30     1600     300
            7654           MARTIN      30      1250     1400                                         7654          MARTIN     30     1250     1400
            7698           BLAKE       30      2850                                                  7698          BLAKE      30     2850
            7839            KING       10      5000                                                  7839           KING      10     5000
            7844           TURNER      30      1500         0                                        7844          TURNER     30     1500      0
            7986           STERN       50      1300                                                  7986          STERN      50     1300

         SELECT            count(*) FROM Emps    6
                                                                                                 SELECT            count(*) FROM Emps    6
         SELECT            count(deptno) FROM Emps    ?
                                                                                                 SELECT            count(deptno) FROM Emps    6
         SELECT            count(DISTINCT deptno) FROM Emps                 ?
                                                                                                 SELECT            count(DISTINCT deptno) FROM Emps              3
         SELECT            count(comm)FROM Emps    ?
                                                                                                 SELECT            count(comm)FROM Emps    3                           Ignore
         SELECT            count(DISTINCT comm)FROM Emps    ?
                                                                                                 SELECT            count(DISTINCT comm)FROM Emps    3                  NULLS
         SELECT            max(comm)FROM Emps    ?                     What are
                                                                          the                    SELECT            max(comm)FROM Emps    1400
         SELECT            sum(comm)FROM Emps    ?
                                                                       answers?                  SELECT            sum(comm)FROM Emps    1700
         SELECT            avg(comm)FROM Emps    ?
                                                                                                 SELECT            avg(comm)FROM Emps    566.666666
         SELECT            round(avg(comm),2)FROM Emps            ?
                                                                                                 SELECT            round(avg(comm),2)FROM Emps  566.67
MSc Bioinformatics 2008                                                                  MSc Bioinformatics 2008




                                                                                                                                                                                  24
                          Distinct Aggregation                                    Aggregate Function Exercise
              How many different departments do
                    employees work in?                             •        Using
                                                                            Emps( empno, ename, deptno, sal, comm )

                                                                   •        Write SQL to determine
• SELECT count(DISTINCT deptno) AS knt
  FROM Emps                                                        •        How many departments have
                                                                            employees whose salary is more than
                                                                            1500?
           Distinct Aggregation can be used with any aggregation
               function, though it is primarily used with count



MSc Bioinformatics 2008                                            MSc Bioinformatics 2008




                 Answers: Aggregate Function                                                         Combining Tables
                          Exercise                                         Emps
                                                                            empno            ename    deptno   sal    comm
                                                                               7499          ALLEN      30     1600    300
  • How many departments have employees                                        7654          MARTIN     30     1250    1400
    whose salary is more than 1500?                                            7698          BLAKE      30     2850
                                                                               7839           KING      10     5000
                                                                               7844          TURNER     30     1500     0
               mysql>SELECT count(DISTINCT                                     7986          STERN      50     1500
                    deptno)FROM Emps
                    WHERE sal > 1500;                                      OtherEmpData                                Suppose we want a query to
                                                                            empno lockerno            deskid
                                                                                                                        combine this information:
                                                                               7499          50221     AAXL
                                                                               7654          73182     BQRF             For each empno in common,
                                                                               7698          61110     TMLB              we want the query result to
                                                                               7839          10579     JYZT
                                                                                                                      include empno, ename, deptno,
                                                                                                                       sal, comm, lockerno & deskid
                                                                               7844          20324     GHHT
                                                                               7986          61181     ARFO             We want to JOIN Emps
                                                                                                                         and OtherEmpData
MSc Bioinformatics 2008                                            MSc Bioinformatics 2008




                                                                                                                                                       25
                              Joined Tables                                                                      Overlapping Matching Columns
                                                                                                               Emps
                                                                                                                 empno                ename     deptno       sal    comm
                                                                                                                                                                                    OtherEmpData
                                                                                                                                                                                    empno lockerno    deskid
                                                                                                                    7499              ALLEN           30     1600    300
                                                                                                                                                                                     7499    50221     AAXL
                                   from Emps                                                                        7654          MARTIN              30     1250    1400
                                                                                                                                                                                     7653    73182     BQRF
                                                                                                                    7698              BLAKE           30     2850
                                                                                                                                                                                     7837    10579     JYZT
            empno         ename    deptno          sal         comm       lockerno deskid                           7839              KING            10     5000
                                                                                                                                                                                     7986    61181     ARFO
               7499       ALLEN       30       1600             300        50221     AAXL                           7844          TURNER              30     1500      0
                                                                                                                                                                                     7698    61110     TMLB
               7654       MARTIN      30       1250             1400       73182     BQRF                           7986              STERN           50     1500

               7698       BLAKE       30       2850                        61110     TMLB                                                                                            Joins the columns
               7839        KING       10       5000                        10579        JYZT                              Emps                  OtherEmpData                         whose names are
               7844       TURNER      30       1500              0         20324     GHHT                                                                                             identical: empno
               7986       STERN       50       1500                        61181     ARFO                                     empno            ename       deptno    sal    comm    lockerno deskid
                                                                                                                                 7499          ALLEN         30     1600      300    50221   AAXL
                                                                                                                                 7698          BLAKE         30     2850             61110   TMLB
                                                                 from OtherEmpData                                               7986          STERN         50     1500             61181   ARFO


                             Emps             OtherEmpData                                                  mysql>SELECT * FROM (Emps NATURAL JOIN
                                                                                                            OtherEmpData)
MSc Bioinformatics 2008                                                                                     MSc Bioinformatics 2008




     Natural Join of 1:M Relationships                                                                          Projected Join of 1:M Relationship
      Emps
             empno ename             addr deptno                              Depts                              empno ename                    addr deptno          dname
                7499
                7654
                           ALLEN
                          MARTIN
                                     ...
                                     …
                                              30
                                              30
                                                                               deptno          dname         •      7499              ALLEN     ...        30         SALES
                                                                                   30            SALES              7654              MARTIN    …          30         SALES
                7698       BLAKE     …        30                                   10          ACCOUNTING           7698              BLAKE     …          30         SALES
                7839       KING      …        10                                                                    7839               KING     …          10       ACCOUNTING
                                                                                   50           SUPPORT
                7844      TURNER     …        30                                                                    7844          TURNER        …          30         SALES
                7986       STERN     …        50                                                                    7986              STERN     …          50        SUPPORT




                           Emps               Depts                                                                                                                                    ENAME    DNAME
                             empno ename                 addr deptno            dname                        To just get each employee's                                               ------   ------
                                                                                                               name & department name, do                                              ALLEN    SALES
                              7499         ALLEN         ...         30            SALES                                                                                               MARTIN   SALES
                              7654         MARTIN        …           30            SALES                     mysql>SELECT ename, dname                                                 BLAKE    SALES
                              7698         BLAKE         …           30            SALES                       FROM (Emps NATURAL JOIN Depts);                                         KING     ACCOUNTING
                              7839          KING         …           10       ACCOUNTING
                                                                                                                                                                                       TURNER   SALES
                                                                                                                                                                                       STERN    SUPPORT
                              7844         TURNER        …           30            SALES
                              7986         STERN         …           50         SUPPORT

MSc Bioinformatics 2008                                                                                     MSc Bioinformatics 2008




                                                                                                                                                                                                               26
     Joins Get Extended Information                                                      Choosing Attribute Names
      Emps
             empno ename            addr deptno       Depts
                7499       ALLEN    ...   30          deptno    dname        • Why use
                7654       MARTIN   …     30            30        SALES
                7698       BLAKE    …     30                                         – Emps( empno, ename, addr, deptno )
                                                        10     ACCOUNTING
                7839        KING    …     10
                                                        50       SUPPORT             – Depts( deptno, dname )
                7844       TURNER   …     30
                7986       STERN    …     50

                                                                             • Why not use
         Emps provides us with the # of KING's dept.                                 – Emps( empno, name, addr, deptno )
         The NATURAL JOIN connect's KING's tuple to the tuple
         of KING's dept, from which we can get the department                        – Depts( deptno, name )
         name:
         mysql>SELECT dname FROM (Emps NATURAL JOIN Depts)
            WHERE ename = 'KING‘;                                                                      What happens if we try to do
                                                                                                      NATURAL JOIN of Emps & Depts?
                                                                               mysql>SELECT ename, dname FROM Emps, Depts WHERE
                                                                               Emps.deptno = Depts.deptid;
MSc Bioinformatics 2008                                                     MSc Bioinformatics 2008




                                                                                Multiple Join Exercise Answer
                          Multiple Join Exercise
                                          Styles       Items
      Categories                                                                                                  Styles             Items
                                          stylecode          itemsku              Categories
               catid                      stylenam           size                                                 stylecode             itemsku
               catnam                     styledate          color                         catid                  stylenam              size
                                          catid              stylecode                     catnam                 styledate             color
                                                                                                                  catid                 stylecode


                                                                                                      List the itemskus of all items in the
                      List the itemskus of all items in the                                               category ‘Womens clothing'
                          category ‘Womens clothing'
                                                                                           SELECT itemsku
                                                                                            FROM (Categories NATURAL JOIN
                                                                                                      (Styles NATURAL JOIN Items))
                                                                                            WHERE catnam = ‘Womens clothing '

                                                                                             It doesn’t matter which order the joins are
                                                                                                                done

MSc Bioinformatics 2008                                                     MSc Bioinformatics 2008




                                                                                                                                                    27
                                Cross Join                                                              Cross Product Joins
                               Suppose we want
                                 to generate all
                                combinations of                                          SELECT * FROM Cats            SELECT * FROM Cats, Dogs
                                   cats & dogs
        Cats                                            Dogs                             catid                         catid     dogid
              catid                                        dogid                         -----                         -----     -----
                                                                                         missy                         missy     rover
              missy                                        rover                         buffy                         missy     spot
                                                                                         puff                          missy     ubu
              buffy                                         spot                                                       missy     duke
                                                                                         SELECT * FROM Dogs            buffy     rover
              puff                                          ubu                                                        buffy     spot
                                                           duke                          dogid
                                                                                         -----
                                                                                                                       buffy
                                                                                                                       buffy
                                                                                                                                 ubu
                                                                                                                                 duke
                                                                                         rover                         puff      rover
       mysql>SELECT * FROM Cats, Dogs;                                                   spot                          puff      spot
                                                                                         ubu                           puff      ubu
       mysql>SELECT * FROM (Cats CROSS JOIN Dogs);                                       duke                          puff      duke


MSc Bioinformatics 2008                                                       MSc Bioinformatics 2008




                              Restricted Joins                                                          Restricting Matches
                                                                                      Cats                     length(catid) =
                                                                                                                                         Dogs
  SELECT * FROM Cats, Dogs                Restricted joins restrict a cross
                                            product with a join condition                    catid              length(dogid)             dogid
  catid
  -----
                      dogid
                      -----
                                           relating the columns from the
                                                  specified tables                          missy                                         rover
  missy               rover
                                                                                            buffy                                          spot
  missy               spot               SELECT * FROM Cats, Dogs
  missy               ubu
                                         WHERE                                              puff                                           ubu
  missy               duke
  buffy               rover                length(catid) =                                                                                duke
  buffy               spot                 length(dogid)
  buffy               ubu                                                       Join conditions can also be thought of
  buffy               duke               catid     dogid            Join          restricting which tuples are matched
  puff                rover              -----     -----          condition
  puff                spot               missy     rover                        In SQL, this can be made explicit by writing
  puff                ubu                buffy     rover
  puff                duke               puff      spot                               SELECT * FROM
                                         puff      duke                                ( Cats JOIN Dogs ON length(catid) = length(dogid) )
                                                                                                          ON (required with plain JOIN)
                                                                                                           describes how they are joined
MSc Bioinformatics 2008                                                       MSc Bioinformatics 2008




                                                                                                                                                  28
                          Qualified Names                                                          Natural Joins
                                                                                       Emps
      • SELECT * FROM                                                                  empno ename            address deptno      Depts
          ( Cats JOIN Dogs                                       Because
                                                                                          7499        ALLEN     ...   30            deptno       dname

            ON length(catid) =                                  deptno is
                                                                null, this
                                                                                          7654
                                                                                          7777
                                                                                                    MARTIN
                                                                                                      RUBY
                                                                                                                …
                                                                                                                …
                                                                                                                      30              30           SALES
                                                                tuple will                                                            10         ACCOUNTING
                length(dogid) )                                not be part
                                                                of any of
                                                                                          7839        KING      …     10
                                                                                                                                      50          SUPPORT
                                                               these joins                7844      TURNER      …     30

      • SELECT * FROM                                                                     7986        STERN     …     50


          ( Cats JOIN Dogs                                   SELECT ename, dname FROM
            ON length(Cats.catid) =                            (Emps JOIN Depts ON Emps.deptno = Depts.deptno)
                length(Dogs.dogid) )                         SELECT ename, dname FROM Emps, Depts
                                                               WHERE Emps.deptno = Depts.deptno
               In SQL, you can explicitly indicate which
                    table an attribute comes from.           SELECT ename, dname FROM (Emps CROSS JOIN Depts)
               Essential if joined tables have attributes      WHERE Emps.deptno = Depts.deptno
                         with the same names                 SELECT ename, dname
                                                               FROM (Emps NATURAL JOIN Depts)

MSc Bioinformatics 2008                                     MSc Bioinformatics 2008




                                                                              Choose Columns to Join
                           Table Aliases
                                                                                                                      Emps
  SELECT ename, dname                                                                                                   empno ename        addr deptno
    FROM Emps, Depts                                          Projs                                                        7499   ALLEN    ...       30
                                                                    pno               pname              pmgr              7654   MARTIN   …         30
    WHERE Emps.deptno = Depts.deptno;
                                                                      2618               Payroll         7499              7698    RUBY    …
  SELECT ename, dname                                                 2621            Bridge Design      7844              7839    KING    …         10

    FROM Emps e, Depts d                                              2622            Update Reqs        7499              7844   TURNER   …         30
                                                                                                                           7986   STERN    …         50
    WHERE e.deptno = d.deptno;

  SELECT ename, dname FROM                                       Suppose we want to know the name of the manager of the
       (Emps e CROSS JOIN Depts d)                               Bridge Design project.
    WHERE e.deptno = d.deptno;                                   Proj's provides us with the empno of the manager of the
                                                                 project.
                                                                 To get more information about that employee, we need to
  SELECT ename, dname FROM                                       look in the Emps table, in particular, in the tuple whose
       (Emps e JOIN Depts d                                      empno matches pmgr is the Proj's table
            ON e.deptno = d.deptno);                             SELECT ename FROM (Projs JOIN Emps ON pmgr = empno)
                                                                  WHERE pname = 'Bridge Design'
MSc Bioinformatics 2008                                     MSc Bioinformatics 2008




                                                                                                                                                              29
                           Join Exercise                                                  Join Exercise Answer
                                                                         • Show the names of the employees in the SALES
                                                                           department
                                                                                           Natural Joins are simplest.
 • Using SQL                                                                              Use them whenever possible
   NATURAL JOIN                                                          • SELECT ename
                                                                           FROM (Emps NATURAL JOIN Depts)
   CROSS JOIN                                                              WHERE dname = 'SALES'
   JOIN … ON                                                                                                                  Join
                                                                         • SELECT ename, dname FROM
    (in that order)                                                          (Emps e CROSS JOIN Depts d)
                                                                                                                            condition
                                                                           WHERE e.deptno = d.deptno
                                                                              AND dname = 'SALES'

 • Show the names of the employees in                                   • SELECT ename, dname FROM
   the SALES department                                                   (Emps e JOIN Depts d ON e.deptno = d.deptno)
                                                                          WHERE dname = 'SALES'
                                                                       dname is unambiguous,             Ordinary
                                                                           so it doesn’t need a
                                                                                  qualifier                   Restriction
MSc Bioinformatics 2008                                         MSc Bioinformatics 2008




                              Equijoins                                                        SemiJoins
                                                                        In a semijoin (of 2 tables)
          SELECT pname, ename                                              • The result columns do NOT come from
            FROM (Projs JOIN Emps ON empno = pmgr)                           BOTH of the tables being joined
                                                                           • One of the tables is just used to restrict
                                                                             tuples from the other table
         The join condition of an equijoin compares columns
              from each of the joined tables for equality                                         Emps
                                                                    •List the names                empno
                                                                                                               Depts
                                                                    of the                         ename

  SELECT ename, dname                                               employees                      job
                                                                                                   mgr
                                                                                                                 deptno
                                                                    whose                          hiredate
                                                                                                                 dname
    FROM (Emps e JOIN Depts d ON e.deptno = d.deptno)               department is                  sal
                                                                                                                 loc

                                                                    in BOSTON                      comm
                                                                                                   deptno
                                                                                                                                BOSTON

                          A natural join is an equijoin which
                             additionally requires that the
                             columns have the same name           SELECT ename FROM (Emps NATURAL JOIN Depts)
                                                                    WHERE loc = 'BOSTON'

MSc Bioinformatics 2008                                         MSc Bioinformatics 2008




                                                                                                                                         30
         Self Comparison with Dates                                                                              Self Comparison cntd.
                          Emps
                          empno
                          ename
                                                            What's the                                                             Emps
                          job                               meaning of                                                             empno
                          mgr
                          hiredate
                                                              this?                                                                ename
                          termdate                                                                                                 job
                          sal                                                                                                      mgr
                                                                                                                                   hiredate
                          comm                               Emps                                                                  termdate
                          deptno
                                                              empno                                                                sal
                                                              ename                                                                comm
           List the names of all                              job                                                                  deptno
           employees who were                                 mgr
         terminated the same day                              hiredate                                                  List the names of all employees
                                                              termdate                                                     who manage themselves!
             they were hired!                                 sal
                                                              comm
     SELECT ename                                             deptno
                                                                                                    SELECT ename FROM Emps WHERE empno = mgr
     FROM Emps
     WHERE
      hiredate = termdate


MSc Bioinformatics 2008                                                                MSc Bioinformatics 2008




                                      Self Join
     • List the name of each employee along with the name                                   Multi-Table Self Join Exercise
       of their manager
                                                                                                                 Emps
                                                                                                                  empno
             Now we                                                                                               ename
                                                                                                                  deptno
              simultaneously                            Join Diagram                                              mgr                             ZipState
              need to consider                                                                                    city
              two different                    Emps e            Emps m                                                                             zip
                                                                                                                  zip                               state
              employees                          empno              empno
                                                 ename              ename      mname
                                                 job                job
             e: the employee                     mgr                mgr
             m: their manager                    hiredate           hiredate
                                                 sal                sal

             and match them
                                                 comm
                                                 deptno
                                                                    comm
                                                                    deptno
                                                                                       • What are the names of employees who have
                                                                                         managers who live in MA?
    SELECT e.ename, m.ename as mname
      FROM (Emps e JOIN Emps m ON e.mgr = m.empno)

                             Pick aliases to be mnemonic & useful:
                                  e for employee, m for manager
MSc Bioinformatics 2008                                                                MSc Bioinformatics 2008




                                                                                                                                                             31
                 Answer: Multi-Table Self Join
                          Exercise
       • What are the names of employees which have managers
                           who live in MA?
                Emps e
                                      Emps m                                           SELECT e.ename
                    empno                                                                FROM Emps e, (Emps m NATURAL JOIN ZipState z)
                    ename              empno
                    deptno             ename      ZipState z                             WHERE e.mgr = m.empno AND z.state = 'MA'
                    mgr                deptno
                                       mgr          zip
                    city                            state                                   Emps e
                    zip                city                                                                       Emps m
                                       zip                           MA                         empno
                                                                                                ename                  empno
                                                                                                deptno                 ename         ZipState z
SELECT e.ename                                                                                  mgr                    deptno
                                                                                                                       mgr            zip
  FROM Emps e, Emps m, ZipState z                                                               city                                  state
                                                                                                zip                    city
  WHERE e.mgr = m.empno                                                                                                zip                           MA
       AND m.zip = z.zip
       AND z.state = 'MA'

                          Use a Natural Join of m and z
MSc Bioinformatics 2008                                                            MSc Bioinformatics 2008




               General Theta Join:                                                                           SQL for Theta Join
            Join not based on Equality                                                  • Show the names of pairs of employees whose salaries are
                                                                                          within $200 of one another
  • Show the names of pairs of employees whose salaries are
    within $200 of one another
                                                                                   Relational Schema                                  Join Diagram
                                                                                                  Emps                   Emps e1                     Emps e2
                                                                                                      empno               empno                       empno
Relational Schema                                 Join Diagram                                        ename      nm1      ename                       ename      nm2
                                                                                                      job                 job                         job
               Emps                   Emps e1                    Emps e2
                                                                                                      mgr                 mgr                         mgr
                   empno               empno                      empno
                                                                             nm2
                                                                                                      hiredate            hiredate      diff ≤ 200    hiredate
                   ename        nm1    ename                      ename                               sal                 sal                         sal
                   job                 job                        job                                 comm                comm                        comm
                   mgr                 mgr                        mgr                                 deptno              deptno                      deptno
                   hiredate            hiredate    diff ≤ 200     hiredate
                   sal                 sal                        sal
                   comm                comm                       comm
                   deptno              deptno                     deptno
                                                                                              SELECT e1.ename AS nm1, e2.ename AS nm2
                                                                                                FROM (Emps e1 JOIN Emps e2
                                                                                                      ON abs(e1.sal - e2.sal) <= 200)


MSc Bioinformatics 2008                                                            MSc Bioinformatics 2008




                                                                                                                                                                       32
                          Simple Join Example                                                    Joins with Duplication
     List the name & department name of every employee                                     List the department name of every employee
                          SELECT ename, dname                                                        SELECT dname
                            FROM Emps NATURAL JOIN Depts                                               FROM Emps NATURAL JOIN Depts
   Emps                                              Depts                    Emps                                                     Depts
          empno ename               address deptno   deptno   dname                  empno ename               address deptno          deptno   dname
              7499         ALLEN      ...   30         30       SALES                    7499         ALLEN      ...   30                  30     SALES
              7654         MARTIN     …     30         10     ACCOUNTING                 7654         MARTIN     …     30                  10   ACCOUNTING
              7698         BLAKE      …     30         40     OPERATIONS                 7698         BLAKE      …     30                  40   OPERATIONS
              7839          KING      …     10                                           7839          KING      …     10
                                                       50      SUPPORT                                                                     50    SUPPORT
              7844         TURNER     …     30                                           7844         TURNER     …     30
              7986         STERN      …     50       ENAME    DNAME                      7986         STERN      …     50                   DNAME
                                                     ------   ------                                                                        ------
                                                     ALLEN    SALES                                                                         SALES
                                                     MARTIN   SALES                                                                         SALES
            Suppose we just select                   BLAKE    SALES                                                                         SALES
            the department name?                     KING     ACCOUNTING                                                                    ACCOUNTING
                                                     TURNER   SALES                                                                         SALES
                                                     STERN    SUPPORT                                                                       SUPPORT

MSc Bioinformatics 2008                                                    MSc Bioinformatics 2008




                            Joins with Distinct                                     Join with Counting Problem
        List the name of departments that have employees
                                                                                                            •How many different
                          SELECT DISTINCT dname
                            FROM Emps NATURAL JOIN Depts
                                                                                                          locations have analysts?
   Emps                                              Depts                                                       Relational Schema
          empno ename               address deptno   deptno   dname
              7499         ALLEN      ...   30         30       SALES
                                                                                                                                Emps
              7654         MARTIN     …     30                                                                                  empno
                                                       10     ACCOUNTING
                                                                                                                                ename
              7698         BLAKE      …     30         40     OPERATIONS                                                        job
              7839          KING      …     10                                                                                  mgr
                                                       50      SUPPORT                                    Depts
                                                                                                                                hiredate
              7844         TURNER     …     30
                                                                                                              deptno            sal
              7986         STERN      …     50                                                                dname             comm
                                                                                                              loc               deptno
                                                       DNAME
                                                       ------
                                                       SALES
                                                       ACCOUNTING
                                                       SUPPORT


MSc Bioinformatics 2008                                                    MSc Bioinformatics 2008




                                                                                                                                                             33
                           Join with Counting                                                                          Join with Grouping
                          • How many locations have analysts?                                      • List the average salary of each department by name



     Relational Schema                                      Join Diagram                                Relational Schema                                  Join Diagram
                           Emps                                      Emps                                                  Emps                                         Emps
                           empno                                        empno
                                                                                                                               empno                                    empno
                           ename                                        ename
                                                                                                                               ename                                    ename
                           job                                          job
                                                                                                                               job                                      job
     Depts                 mgr                             Depts        mgr
                                                                                                                               mgr                    Depts             mgr
                           hiredate                                     hiredate   ANALYST                Depts                                                         hiredate
                                                            deptno                                                             hiredate
     deptno                sal                                          sal                                                                            deptno
                                                            dname                                         deptno               sal                                      sal        avg by
     dname                 comm                                         comm                                                                           dname            comm       dname
                                              distinct #    loc                                           dname                comm
     loc                   deptno                                       deptno                                                                         loc              deptno
                                                                                                          loc                  deptno




                    SELECT count(DISTINCT loc)                                                            SELECT dname, avg(sal) AS avgsal
                      FROM (Emps NATURAL JOIN Depts)                                                        FROM (Emps NATURAL JOIN Depts)
                      WHERE job = 'ANALYST'                                                                 GROUP BY dname


MSc Bioinformatics 2008                                                                      MSc Bioinformatics 2008




               SemiJoins are Restrictive                                                                                Joins Do Matching
     CoolEmps                                                                                     CoolEmps
            empno                                                                                        empno
             7782
                             Given Emps, a table of employees, and                                        7782
                               CoolEmps, a list of the cool employees,                                                            If CoolEmps can contain employees who
             7698              what are the jobs of the cool employees                                    7401                       are no longer with the company,
                                                                                                          7698                    the join shows information only about
                                         Restricts Emps information to just                                                          employees who are in both tables
                                          those employees in CoolEmps                                     8315

       Emps
            empno         ename        job        deptno
             7782         CLARK       CLERK         10
                                                                EMPNO    JOB                        Emps
             7369         SMITH       CLERK         20          -----    ---------                       empno         ename        job      deptno
             7499         ALLEN       SALESMAN      30           7782    CLERK                            7782         CLARK      CLERK       10
                                                                                                                                                              EMPNO   JOB
             7698         BLAKE       MANAGER       30
                                                                 7698    MANAGER                          7369         SMITH      CLERK       20              -----   ---------
                                                                                                          7499         ALLEN      SALESMAN    30               7782   CLERK
                                                                                                                                                               7698   MANAGER
                          SELECT empno, job FROM                                                          7698         BLAKE      MANAGER     30
                            Emps NATURAL JOIN CoolEmps
MSc Bioinformatics 2008                                                                      MSc Bioinformatics 2008




                                                                                                                                                                                            34
    Can We Make Joins Inclusive?                                                                       Left Joins are Inclusive
     CoolEmps                      Suppose we want our result to include a
                                                                                         CoolEmps                              Include all tuples in the left operand
            empno                                                                               empno
                                     tuple for every employee in CoolEmps,
             7782                    whether they still are employed or not.                     7782

             7401                  If they are still employed, we still want to                  7401
                                      show their job.                                                                 What are the jobs of the cool employees
             7698                                                                                7698
                                                                                                                       (showing jobs of NULL for those who
             8315                  If they aren't still employed, they don't have                8315                  are no longer with the company)?
                                      a job, so just show NULL for their job
                                   How do we generate this?
                                                                                           Emps                                                 EMPNO    JOB
                                                                                                empno         ename     job        deptno
       Emps                                                                                      7782         CLARK    CLERK         10
                                                                                                                                                -----    ---------
            empno         ename     job       deptno     EMPNO   JOB                                                                             7782    CLERK
             7782         CLARK    CLERK       10        -----   ---------                       7369         SMITH    CLERK         20
                                                                                                                                                 7401
             7369         SMITH    CLERK       20
                                                          7782   CLERK                           7499         ALLEN    SALESMAN      30          7698    MANAGER
                                                          7401                                                                                   8315
             7499         ALLEN    SALESMAN    30                                                7698         BLAKE    MANAGER       30
                                                          7698   MANAGER
             7698         BLAKE    MANAGER     30         8315
                                                                                                              SELECT empno, job FROM
                                                                                                                CoolEmps NATURAL LEFT JOIN Emps
MSc Bioinformatics 2008                                                             MSc Bioinformatics 2008




                    Rewrite as a Right Join                                                                           Types of Joins
                                                                                       Inner Join - when joining A & B, a row in A has a
     CoolEmps                                                                                 corresponding row in the result only if it matches some
            empno                     Include all tuples in the right operand
             7782
                                                                                              row in B (& vice versa)
             7401
                                  What are the jobs of the cool employees              Outer Joins:
             7698
                                   (showing jobs of NULL for those who
             8315                  are no longer with the company)?                    • LEFT JOIN - join A & B, but every row in A has at
                                                                                              least one corresponding row in the result
       Emps                                                                            • RIGHT JOIN - join A & B, but every row in B has at
                                                         EMPNO   JOB
            empno         ename     job       deptno
                                                         -----   ---------                    least one corresponding row in the result
             7782         CLARK    CLERK       10

             7369         SMITH    CLERK       20
                                                          7782   CLERK                 • FULL JOIN - join A & B, but every row in A and every
                                                          7401                                row in B has at least one corresponding row in the result
             7499         ALLEN    SALESMAN    30         7698   MANAGER
             7698         BLAKE    MANAGER     30
                                                          8315

                          SELECT empno, job FROM
                            Emps NATURAL RIGHT JOIN CoolEmps
MSc Bioinformatics 2008                                                             MSc Bioinformatics 2008




                                                                                                                                                                        35
        Including all Children in 1:M                                                                      1:M Child Outer Joins
                                                                                               Depts                             Emps
                                                                                                  deptno     dname        …
                                                                                                                                 empno     ename     … deptno
  • Show the names and department names of all                                                       10    ACCOUNTING     …       7782     CLARK     …   10
    employees.                                                                                       20    RESEARCH       …       7499     ALLEN     …   30
  • Include employees who are unassigned.                                                            30    SALES          …       8214     JOJO      …          *
           Depts                                Emps
                                                                                                     40    OPERATIONS     …       7698     BLAKE     …   30
              deptno        dname      …
                                                  empno   ename   … deptno
                 10       ACCOUNTING   …          7782    CLARK   …   10
                 20       RESEARCH     …
                                                                                             •Show the names and                   ENAME     DNAME
                                                  7499    ALLEN   …   30                     department names of
                 30       SALES        …                                                         all employees.                    -----     ----------
                                                  8214    JOJO    …          *              Include employees who                  CLARK     ACCOUNTING
                 40       OPERATIONS   …                                                        are unassigned.                    ALLEN     SALES
                                                  7698    BLAKE   …   30
                                                                                                                                   JOJO
                                                                                                                                   BLAKE     SALES
                              included in inner join
                           * not included in inner join,                                   SELECT ename, dname
                                                                                             FROM Depts NATURAL RIGHT JOIN Emps
                             but wanted in the result set

MSc Bioinformatics 2008                                                          MSc Bioinformatics 2008




         Including all Parents in 1:M                                                                         1:M Parent Outer Joins
                                                                                               Depts                             Emps
                                                                                                  deptno     dname        …      empno     ename     … deptno
                                                                                                     10    ACCOUNTING     …
  • Show names and department names of the employees                                                                              7782     CLARK     …   10
                                                                                                     20    RESEARCH       …
    in each department.                                                                       *                                   7499     ALLEN     …   30
                                                                                                     30    SALES          …
  • Include departments with no employees.                                                                                        8214     JOJO      …

                                                                                              *      40    OPERATIONS     …       7698     BLAKE     …   30
           Depts                                Emps
              deptno        dname      …          empno   ename   … deptno
                                                                                                    DNAME               ENAME
                 10       ACCOUNTING   …          7782    CLARK   …   10                            ------------        ------
                 20       RESEARCH     …
                                                                                                                                            Show names and
          *                                       7499    ALLEN   …   30                            ACCOUNTING          CLARK
                                                                                                                                         department names of
                                                                                                    RESEARCH
                 30       SALES        …          8214    JOJO    …                                                                          the employees
                                                                                                    SALES               ALLEN             in each department.
          *      40       OPERATIONS   …          7698    BLAKE   …   30                            SALES               BLAKE            Include departments
                                                                                                    OPERATIONS                            with no employees.
                              included in inner join
                           * not included in inner join,                                            SELECT dname, ename
                             but wanted in the result set                                             FROM Depts NATURAL LEFT JOIN Emps

MSc Bioinformatics 2008                                                          MSc Bioinformatics 2008




                                                                                                                                                                    36
                    Including All Parents &                                                                  1:M Full Join
                           Children                                                      Depts
                                                                                            deptno     dname      …
                                                                                                                             Emps
                                                                                                                              empno   ename   … deptno
                                                                                               10    ACCOUNTING   …            7782   CLARK   …   10
  • Show the names and department names of all
                                                                                        *      20    RESEARCH     …            7499   ALLEN   …   30
    employees in all departments,                                                              30    SALES        …            8214   JOJO    …          *
  • including employees who are unassigned,                                             *      40    OPERATIONS   …            7698   BLAKE   …   30
  • and
  • including departments with no employees                                                                  DNAME        ENAME
           Depts                           Emps
                                                                                                             ------------ ------
                                                                                                                          JOJO
              deptno        dname      …
                                            empno   ename   … deptno                                         ACCOUNTING   CLARK
                 10       ACCOUNTING   …    7782    CLARK   …   10                                           RESEARCH
                                                                                                             SALES        ALLEN
          *      20       RESEARCH     …    7499    ALLEN   …   30
                                                                                                             SALES        BLAKE
                 30       SALES        …    8214    JOJO    …          *                                     OPERATIONS
          *      40       OPERATIONS   …    7698    BLAKE   …   30
                                                                                               SELECT dname, ename
                                                                                                 FROM Depts NATURAL FULL JOIN Emps

MSc Bioinformatics 2008                                                    MSc Bioinformatics 2008




   Outer Join Conditions vs. Restrictions?                                     Outer Join Conditions vs. Restrictions
           For each budgeted project, list its name, and the
             name of its manager (show NULL if there is no                       For each budgeted project, list its name, and the
             manager)                                                              name of its manager (show NULL if there is no
                                                                                   manager)

                                                                                           SELECT pname, ename
            SELECT pname, ename                                                              FROM (Projs LEFT JOIN Emps ON pmgr = empno)
              FROM (Projs LEFT JOIN Emps ON pmgr = empno)
                                                                                             WHERE budget IS NOT NULL
              WHERE budget IS NOT NULL

                                                                                   For each project, list its name. Show the name of
            What is the meaning of this?                                             the manager if there is one and if the project is
                                                                                     budgeted.

                      SELECT pname, ename
                        FROM (Projs LEFT JOIN Emps                                          SELECT pname, ename
                                ON (pmgr = empno) AND                                         FROM (Projs LEFT JOIN Emps
                                    (budget IS NOT NULL))                                             ON (pmgr = empno) AND
                                                                                                          (budget IS NOT NULL))

MSc Bioinformatics 2008                                                    MSc Bioinformatics 2008




                                                                                                                                                             37
                                                                 Querying databases using Perl
                                                        #! /usr/bin/perl -w
                                                        # intro6.pl - connect to MySQL, retrieve data, write plain text output
                                                        use strict;
                                                        use DBI;

                                                        my $user = ‘george’;


                          Perl and Databases
                                                        my $pass = ‘george’;


                                                        my ($dbh, $sth, $count); $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb",
                                                            "webdev", "webdevpass", {PrintError => 0, RaiseError => 1});
                                                        $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams");
                                                        $sth->execute ();
                                                        $count = 0;
                                                        while (my @val = $sth->fetchrow_array ()) {
                                                        printf "name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2]; ++$count;
                                                        push(@allval,@val);
                                                        }
                                                        print "count rows total\n";
                                                        $sth->finish ();
                                                        $dbh->disconnect ();
                                                        exit (0);




MSc Bioinformatics 2008                                MSc Bioinformatics 2008




                                                                                 Perl example…
                                                        use DBI;
                                                        my $dbh = DBI->connect('DBI:MySql:payroll') or die "Couldn't connect to database:
                                                           " . DBI->errstr;
 %./intro6.pl                                           my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?') or die
 name = Fargo-Moorhead Twins, wins = 36, losses = 16       "Couldn't prepare statement: " . $dbh->errstr;
                                                        print "Enter name> ";
 name = Winnipeg Maroons, wins = 24, losses = 26        while ($lastname = <>) {
                                                        # Read input from the user my @data;
 name = Minot Why Nots, wins = 19, losses = 23          chomp $lastname;
 name = Warren Wanderers, wins = 16, losses = 30        $sth->execute($lastname)
                                                        # Execute the query or die "Couldn't execute statement: " . $sth->errstr;
 4 rows total                                           # Read the matching records and print them out
                                                        while (@data = $sth->fetchrow_array()) {
                                                                my $firstname = $data[1];
                                                                my $id = $data[2];
                                                                print "\t$id: $firstname $lastname\n";
                                                        }
                                                        if ($sth->rows == 0) {
                                                                print "No names matched `$lastname'.\n\n";
                                                         }
                                                        $sth->finish;
                                                        print "\n"; print "Enter name> "; }
                                                        $dbh->disconnect;



MSc Bioinformatics 2008                                MSc Bioinformatics 2008




                                                                                                                                                   38
                                                                                           More about data retrieval with
          Query databases over the Web
                                                                                                     DBI.pm
 #! /usr/bin/perl -w # intro7.pl - connect to MySQL, retrieve data, write HTML
    output
                                                                                  • fetchrow_arrayref
 use strict;
 use DBI;
 use CGI qw(:standard);
 my ($dbh, $sth, $count);
 $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev",        $sth = $dbh->prepare ("SELECT name, wins, losses
    "webdevpass", {PrintError => 0, RaiseError => 1});
 $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams");                     FROM teams");
 $sth->execute ();
 print header(), start_html ("team data");                                        $sth->execute ();
 $count = 0;
 while (my @val = $sth->fetchrow_array ()) {
                                                                                  while (my $ref = $sth->fetchrow_arrayref ()) {
    print p (sprintf ("name = %s, wins = %d, losses = %d\n", $val[0], $val[1],
    $val[2]));                                                                      printf "name = %s, wins = %d, losses = %d\n",
    ++$count;                                                                       $ref->[0], $ref->[1], $ref->[2];
 }
 print p ("$count rows total"), end_html ();                                      }
 $sth->finish ();
 $dbh->disconnect ();                                                             $sth->finish ();
 exit (0);




MSc Bioinformatics 2008                                                          MSc Bioinformatics 2008




      #!/usr/bin/perl
                                                                                                Higher retrieval methods
      use DBI;

      my    $db = “sequence”;
                                                                                  • selectrow_array()
      my
      my
            $server = ‘localhost’;
            $user = ‘root’;                                                               – to retrieve a single row or a single
      my    $passwd = ‘passwd’;
                                                                                            column value
      my $dbConnection = DBI->connect(“dbi::mysql:$db:$server”,$user,$passwd)
      my $query = “show tables”;                                                  • selectcol_arrayref()
      my $sql = $dbConnection->prepare($query);
      $sql ->execute();                                                                   – returns the first column of a result set,
                                                                                            as a reference to an array of values.
      while(my $row = $sql->fetchrow_arrayref){
         print join(“\t”,@$row),”\n”;
      }
      $dbConnection->disconnect;
      exit;
                                                                                  • selectall_arrayref()
                                                                                          – retrieves the entire result set as a
                                                                                            matrix and returns a reference to it

MSc Bioinformatics 2008                                                          MSc Bioinformatics 2008




                                                                                                                                        39
   Inserting data in the database                               Inserting data in the database
 sub new_employee {                                           sub new_employee {
                                                              # Arguments: database handle; first and last names of new employee; #
 # Arguments: database handle; first and last names of           department ID number for new employee's work assignment
   new employee; # department ID number for new               my ($dbh, $first, $last, $department) = @_;
   employee's work assignment                                 my ($insert_handle, $update_handle);
 my ($dbh, $first, $last, $department) = @_;                  my $insert_handle = $dbh->prepare_cached('INSERT INTO employees VALUES
 my ($insert_handle, $update_handle);                            (?,?,?)');
                                                              my $update_handle = $dbh->prepare_cached('UPDATE departments SET
 my $insert_handle = $dbh->prepare_cached('INSERT INTO           num_members = num_members + 1 WHERE id = ?');
   employees VALUES (?,?,?)');                                die "Couldn't prepare queries; aborting" unless defined $insert_handle
 my $update_handle = $dbh->prepare_cached('UPDATE                && defined $update_handle;
   departments SET num_members = num_members + 1 WHERE        my $success = 1;
   id = ?');                                                  $success &&= $insert_handle->execute($first, $last, $department);
                                                                 $success &&= $update_handle->execute($department);
 die "Couldn't prepare queries; aborting" unless defined      my $result = ($success ? $dbh->commit : $dbh->rollback); unless
   $insert_handle && defined $update_handle;                     ($result) { die "Couldn't finish transaction: " . $dbh->errstr }
                                                                 return $success;
 $insert_handle->execute($first, $last, $department) or       }
   return 0; $update_handle->execute($department) or
   return 0; return 1; # Success
 }


MSc Bioinformatics 2008                                      MSc Bioinformatics 2008




                                                                                         Fingerprint

 • A short cut for DELETING, UPDATING
   and INSERTING data use the do
   statement
 $dbh->do('DELETE FROM people WHERE age > 65');            Sequence




                                                                                         Motif




MSc Bioinformatics 2008                                      MSc Bioinformatics 2008




                                                                                                                                       40
              Entity-relationship model                     Relational logical model




MSc Bioinformatics 2008                   MSc Bioinformatics 2008




MSc Bioinformatics 2008                   MSc Bioinformatics 2008          Diagram taken from Jane Mabey




                                                                                                           41
                          Diagram taken from Jane Mabey
MSc Bioinformatics 2008




                                                          42