Docstoc

IRU lecture week 4

Document Sample
IRU lecture week 4 Powered By Docstoc
					    Information Retrieval and Use
      Data Analysis & Data Modeling,
       Relational Data Analysis and
          Logical Data Modeling
       Geoff Leese September 2009


1
    Relational Data Analysis
       Captures the detailed knowledge of the
        meaning of the data.
       Ensures that the data is logically easy to
        maintain and extend.
         Data inter-dependencies have been
          identified
         Ambiguities have been resolved.
         Eliminate unnecessary duplication of
          data.
         Forms the data into optimum groups.
         Validates the Logical Data Model (LDM).
2
    Logical Data Modelling

       Basic Rules for converting 3NF to a LDM
           Create an entity type for each data relation
           Mark qualifying foreign keys
           Check compound key relations
           Make foreign/primary key relations




3
    Guidelines for logical
    modelling

     Entity type names are singular
      nouns, descriptive, concise and
      organisation specific.
     Attribute names are unique
      descriptive nouns of standard
      format.
     Relationship names are descriptive,
      precise verb phrases.
4
    Simple Master-Detail
    relationships

       Where a single foreign key of a relation
        corresponds to the primary key of another
        relation
       See next slide for example.




5
    Simple Master-Detail
    relationships




     Shows SINGLE primary key at MASTER entity
     (Organisation) connected to SINGLE foreign key at
     DETAIL entity (Contact people)

6
    Multiple level
    Master-Detail Relationships

     Example:   five entities




7
    Identifying Recursive (Unary)
    Relationships

     Is a relation where a foreign key
      references the same relation.
     Example: Employee
                   Employee-number
                   Employee-name
                   Employee-manager-number


        Employee

8
    Relationships: Student/Module

       At this point we need to identify the data
        items that describe or identify each entity
       Entity attributes are also known as data
        items
       What are the data items associated with
        the following LDS diagram?
                           Is taken by
         Student                         Module
                   Takes



9
 The Student

     Entity Type           Attribute Name      Attribute
     Student               Student Name        Jones
                           Street Address      Leek Road
                           Town                Stoke-on-Trent
                           Post Code           ST4 2DE
                           Telephone           294303


                                 Is taken by
         Student                                Module
                   Takes
10
     The Module

     Entity Type                Attribute Type
           Attribute
     Module               Module Number           CM5111-1
                          Module Name             SSAT
                          Module Leader           A Lecturer
                          Level                   1
                          Cats Points             10
                            Is taken by student
        Student                                    Module
                  Takes


11
     The Data Items


                          Is taken by student
        Student                                  Module
                  Takes
      Student Name                              Module Number
              Street                            Module Name
      Address Town                              Module Leader
              Post                              Level
      Code ST4 2DE                              Cats Points
      Telephone




12
     Identifying occurrences of
     entities
        Each occurrence of an entity must be
         uniquely identified in some way
        Imagine the British Gas data base that
         used only surnames to identify account
         holders
        There would be 100,000 account holders
         called Jones in this country
        Even if we used the given names there
         would still be considerable duplication
        It would be impossible to find the right
         account by name alone
13
           Adding a Primary Key

                         Is taken by student
       Student                                  Module
                 Takes
     Student Number                            Module Number
     Student Name                              Module Name
             Street                            Module Leader
     Address Town                              Level
             Post                              Cats Points
     Code ST4 2DE
     Telephone
                             Primary key added

14
     Relationships: Getting it right

      Is this right?
                         Is taken by student
       Student                                 Module
                 Takes


     The real situation is surely

                         Is taken by student
       Student                                 Module
                 Takes


15
     Putting it right: Intersection
     entity

     We need a link entity - less ambiguity
       Student                          Module

     Student Number
                        Stud/Mod
     Student Name                      Module Number
             Street                    Module Name
     Address Town     Student Number
                                       Module Leader
             Post     Module number
                                       Level
     Code ST4 2DE                      Cats Points
     Telephone

16
 Normalisation - steps

        Start with a set of un-normalised
         tables
          Entity/attribute   list
      Step 1 - remove ambiguity and
       repeating data
      Step 2 - remove shared data




17
 Normalisation - step 1
    Break down ALL attributes into smallest
     meaningful parts
        EG student name becomes student surname,
         student firstname, student title
    Remove REPEATED information to form a
     new table
        EG a course may be composed of MANY
         modules (but assume that each module is only
         on one course!) - so form a MODULE table



18
 Normalisation - step 2

        Remove SHARED data to form new
         tables
          EGmodules may share tutors - so form
          a TUTORS table.




19
 Normalisation

        FIRST NORMAL FORM - a relation
         (table) is in 1NF if it contains atomic
         values and all repeating groups have
         been removed




20
 Normalisation

        SECOND NORMAL FORM - a
         relation(table) is in 2NF if it is in 1NF
         and every non-key attribute is fully
         dependent on the primary key




21
 Normalisation

        THIRD NORMAL FORM - a
         relation(table) is in 3NF if it is in 2NF
         and every non-key attribute is not
         dependent on any other non-key
         attribute




22
 Relational Data Analysis Form
        Validates the LDM against the relations.
        Consists of:
            Unnormalised Form
              – attributes
            First Normal Form (1NF)
            Second Normal Form (2NF)
            Third Normal Form (3NF)
              – Relations
              – Attributes

23
 RDA Form

     Name                            Date
     UNF          1NF   2NF   3NF   Result
     attributes                      relation attribute




24
 Data Dictionary

        lists, for every field in every table
          Tablename
          Fieldname
          Field Type
          Field size (if variable)
          Decimal places (if applicable)
          Description (if required)
          Other significant field properties


25
  Data Dictionary example
Tablename   Fieldname           Fieldtype    Length   Dec      Description
                                                      Places
Students    Student ID          Counter      N/A      N/A
Students    Student firstname   Text         20       N/A      Full firstname(capitalised)
Students    Student other       Text         5        N/A      Other initials, Capitals,
            initials                                           Space separated
Students    Student Surname     Text         25       N/A      Surname, Capitalised
Students    Fee paid            Number       N/A      2        Fee paid
                                (currency)
Students    Date of Birth       Date/Time    N/A      N/A      Input mask Short date,
                                                               format Medium Date
Students    Full Time?          Yes/No       N/A      N/A
Etc




26
    The domain
 Is the “set” of items, and the definition
  thereof to which an attribute belongs
 Define domain once, saves time when
  defining attributes belonging to it.
 For example - Date of Birth, Course
  Start Date and Enrolment Date all
  belong to the DATE domain - data type
  is date/time, format dd/mm/yyyy, non-
  unique, non-null.
27
 Further reading

      Rolland chapters 3 and 4
      Hoffer chapters 10 and 12
      Kendall & Kendall chapter 17




28

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:2/16/2012
language:
pages:28