Docstoc

Databases

Document Sample
Databases Powered By Docstoc
					    Relational Databases




1
    Find Databases here…




2
    And here…




3
                    The “Deep Web”

    •   Dynamic pages, generated from databases
    •   Not easily discovered using crawling
    •   Perhaps 400-500 times larger than surface Web
    •   Fastest growing source of new information




4
                                             Deep Web
• 60 Deep Sites Exceed Surface Web by 40 Times
                                                                                                                   Web
                                                Type                             URL
                   Name                                                                                            Size
                                                                                                                  (GBs)

    National Climatic Data Center (NOAA)   Public       http://www.ncdc.noaa.gov/ol/satellite/satelliteresource   366,000
                                                        s.html


    NASA EOSDIS                            Public       http://harp.gsfc.nasa.gov/~imswww/pub/imswelcome/         219,600
                                                        plain.html


    National Oceanographic (combined       Public/Fee   http://www.nodc.noaa.gov/,                                32,940
    with Geophysical) Data Center (NOAA)                http://www.ngdc.noaa.gov/


    Alexa                                  Public       http://www.alexa.com/                                     15,860
                                           (partial)


    Right-to-Know Network (RTK Net)        Public       http://www.rtk.net/                                       14,640



    MP3.com                                Public       http://www.mp3.com/

5
    Content of the Deep Web




6
                    Database Basics
    • What is a database?
       – Collection of data, organized to support access
       – Models some aspects of reality
    • Components of a relational database:
       – Field = an “atomic” unit of data
       – Record = a collection of related fields
       – Table = a collection of related records
          • Each record is one row in the table
          • Each field is one column in the table
       – Primary Key = the field that uniquely identifies a record
       – Database = a collection of tables
7
              Why “Relational”?
    • Databases model some aspects of reality
    • A relational database views the world in
      terms of entities and relations between them




8
           The Registrar Example
    • What do we need to know (i.e., model)?
      – Something about the students (e.g., first name,
        last name, email, department)
      – Something about the courses (e.g., course ID,
        description, enrolled students, grades)
      – Which students are in which courses



9
                                          A First Try
                  Put everything in a big table…

 Student ID   Last Name   First Name   Dept ID   Dept         Course ID   Course name            Grade   email
     1        Arrows      John         EE        EE           lbsc690     Information Technology    90   jarrows@wam
     1        Arrows      John         EE        Elec Engin   ee750       Communication             95   ja_2002@yahoo
     2        Peters      Kathy        HIST      HIST         lbsc690     Informatino Technology    95   kpeters2@wam
     2        Peters      Kathy        HIST      history      hist405     American History          80   kpeters2@wma
     3        Smith       Chris        HIST      history      hist405     American History          90   smith2002@glue
     4        Smith       John         CLIS      Info Sci     lbsc690     Information Technology    98   js03@wam




                     Discussion: Why is this a bad idea?




10
             Good Database Design
     • Save space
       – Save each fact only once
     • More rapid updates
       – Every fact only needs to be updated once
     • More rapid search
       – Finding something once is good enough
     • Avoid inconsistency
       – Changing data once changes it everywhere



11
                          Another Try...
     Student Table
     Student ID       Last Name       First Name    Department ID   email
                  1   Arrows          John          EE              jarrows@wam
                  2   Peters          Kathy         HIST            kpeters2@wam
                  3   Smith           Chris         HIST            smith2002@glue
                  4   Smith           John          CLIS            js03@wam

     Department Table                                  Course Table
     Department ID         Department                   Course ID   Course Name
     EE                    Electrical Engineering       lbsc690     Information Technology
     HIST                  History                      ee750       Communication
     CLIS                  Information Studies          hist405     American History
     Enrollment Table
     Student ID           Course ID      Grade
                      1   lbsc690                     90
                      1   ee750                       95
                      2   lbsc690                     95
                      2   hist405                     80
                      3   hist405                     90
                      4   lbsc690                     98
12
       Approaches to Normalization
     • For simple problems:
       – Start with “binary relationships”: pairs of fields
         that are related
       – Group together wherever possible
       – Add keys where necessary
     • For more complicated problems:
       – Entity relationship modeling (LBSC 670)


13
                     Some Lingo
     • “Primary Key” uniquely identifies a record
       – e.g., student ID in the student table
     • “Foreign Key” is primary key in the other
       table
       – It need not be unique in this table




14
              The Data Model
     Student Table
     Student ID       Last Name       First Name    Department ID   email
                  1   Arrows          John          EE              jarrows@wam
                  2   Peters          Kathy         HIST            kpeters2@wam
                  3   Smith           Chris         HIST            smith2002@glue
                  4   Smith           John          CLIS            js03@wam

     Department Table                                  Course Table
     Department ID         Department                   Course ID   Course Name
     EE                    Electrical Engineering       lbsc690     Information Technology
     HIST                  History                      ee750       Communication
     CLIS                  Information Studies          hist405     American History
     Enrollment Table
     Student ID           Course ID      Grade
                      1   lbsc690                     90
                      1   ee750                       95
                      2   lbsc690                     95
                      2   hist405                     80
                      3   hist405                     90
                      4   lbsc690                     98
15
                                    Project
     Student ID   Last Name   First Name   Dept ID   Department               email
         1        Arrows      John         EE        Electrical Engineering   jarrows@wam
         2        Peters      Kathy        HIST      History                  kpeters2@wam
         3        Smith       Chris        HIST      History                  smith2002@glue
         4        Smith       John         CLIS      Information Stuides      js03@wam




                                   SELECT Student ID, Department

                     Student ID      Department
                          1          Electrical Engineering
                          2          History
                          3          History
                          4          Information Stuides

16
                                    Restrict
      Student ID   Last Name   First Name   Dept ID   Department               email
          1        Arrows      John         EE        Electrical Engineering   jarrows@wam
          2        Peters      Kathy        HIST      History                  kpeters2@wam
          3        Smith       Chris        HIST      History                  smith2002@glue
          4        Smith       John         CLIS      Information Stuides      js03@wam




                         WHERE Department ID = “HIST”


     Student ID    Last Name   First Name Department ID Department             email
          2        Peters      Kathy      HIST          History                kpeters2@wam
          3        Smith       Chris      HIST          History                smith2002@glue



17
                                      Join
     Student Table
     Student ID       Last Name     First Name     Department ID     email
                  1   Arrows        John           EE                jarrows@wam
                  2   Peters        Kathy          HIST              kpeters2@wam
                  3   Smith         Chris          HIST              smith2002@glue
                  4   Smith         John           CLIS              js03@wam

                                                      Department Table
                                                      Department ID      Department
                                                      EE                 Electrical Engineering
                                                      HIST               History
                                                      CLIS               Information Studies



     “Joined” Table
     Student ID       Last Name   First Name     Dept ID   Department               email
         1            Arrows      John           EE        Electrical Engineering   jarrows@wam
         2            Peters      Kathy          HIST      History                  kpeters2@wam
         3            Smith       Chris          HIST      History                  smith2002@glue
         4            Smith       John           CLIS      Information Stuides      js03@wam

18
               Relational Operations
     • Choosing columns: SELECT
        – Based on their label
          SELECT Student ID, Dept WHERE Dept = “History”

     • Choosing rows: WHERE
        – Based on their contents
           department ID = “HIST”

     • Joining tables: JOIN
     • These can be specified together


19
                   Some SQL
     • SQL = Structured Query Language
     • Used in many types of database systems




20
                                Select query
     • SELECT LastName, FirstName from
       StudentTable


     StudentTable

     Student ID   Last Name   First Name   Dept ID   Department               email
         1        Arrows      John         EE        Electrical Engineering   jarrows@wam
         2        Peters      Kathy        HIST      History                  kpeters2@wam
         3        Smith       Chris        HIST      History                  smith2002@glue
         4        Smith       John         CLIS      Information Stuides      js03@wam


21
                  Select with Restriction
     • SELECT LastName, FirstName from
       StudentTable where DeptID = „HIST‟
     • Will return
        – Peters, Kathy
        – Smith, Chris


     Student ID   Last Name   First Name   Dept ID   Department               email
         1        Arrows      John         EE        Electrical Engineering   jarrows@wam
         2        Peters      Kathy        HIST      History                  kpeters2@wam
         3        Smith       Chris        HIST      History                  smith2002@glue
         4        Smith       John         CLIS      Information Stuides      js03@wam


22
              Select with Restriction
     • SELECT StudentID from EnrollmentTable
       where Grade > 81



      Enrollment Table
      Student ID       Course ID   Grade
                   1   lbsc690             90
                   1   ee750               95
                   2   lbsc690             95
                   2   hist405             80
                   3   hist405             90
23                 4   lbsc690             98
                              Select with Join
 SELECT LastName, FirstName from StudentTable
 JOIN EnrollmentTable on
 StudentTable.StudentID =EnrollmentTable.StudentID
 where EnrollmentTable.Grade > 95

 Results:                                        Enrollment Table
                                                Student ID       Course ID   Grade
 Smith, John                                                 1   lbsc690             90
                                                             1   ee750               95
                                                             2   lbsc690             95
                                                             2   hist405             80
                                                             3   hist405             90
Student Table                                                4   lbsc690             98
Student ID       Last Name   First Name   Department ID   email
             1   Arrows      John         EE              jarrows@wam
             2   Peters      Kathy        HIST            kpeters2@wam
             3   Smith       Chris        HIST            smith2002@glue
             4   Smith       John         CLIS            js03@wam
24
                Discussion Point
     • How is a relational database different from
       a spreadsheet?




25

				
DOCUMENT INFO