Principles of Databases by ooh34326

VIEWS: 4 PAGES: 44

More Info
									Principles of Database Design
          James J. Cimino
         NIH Clinical Center
        Outline
• Definition
• Motivation
• History and evolution
• Design principles
• Design methods
• Exercises
• Take-Home Messages
          Database: Definition


• A collection of data that:
  – is organized
  – usually computer-based
  – represents repetitive information implicitly
  – supports retrieval
                 Information

• Content              • Structure
  – Name                 – Field
  – Date                 – Record
  – Diagnosis            – Table
  – Medication           – Database
  – Problem
  – Procedure
  – Visit
Paper Database as Expert System
                 Motivation


• Power and flexibility depend on data model
• Database is the realization of data model
• Evaluation of commercial products
• Communicating with vendors and IT staff
• Building your own databases
In the beginning, there were punch cards…

       1234567 NAME SANDIEGO, CARMEN0001
      1234567 ADDR 123 MAIN STREET 0002
      1234567 DOB   02/01/1948           0003




        1-7         8-22         23-30    31-36

      1234567SANDIEGO,CARMEN123MAIN020148
                 Sequential Files
   •    Stored on magnetic tape
   •    Longer (or shorter) than 80 characters
   •    8-bit characters (256 characters)
   •    Variable-length records
Len=21          Data          Len=16   Data

0211234567SandiegoCarmen0161234568CiminoJim

   • Random access possible
   ID     Loc      ID   Loc
                       12345678901234567890123
1234567000112345680015 SandiegoCarmenCiminoJim
   • Slowwwwww…..
            Random Access Files

•   Disk storage with moving heads
•   Larger capacity (MB!)
•   Addressable records and fields using pointers
•   Indexes created as lists of pointers
•   Separation of physical and logical models
•   Can be difficult to recover if index corrupted
        Random Access Files



        }
        }
        }
        }
        }
        }
0000:   Name1000Addr2000DOB 6000
1000:   Sandiego, Carmen
2000:   123 Main Street
3000:
4000:
5000:
6000:   Feb 1, 1948
   Indexed Files

    0000: 0000
    0004: 4000
    0008: 2000

0000: Cimino, James
1000:
2000: Norton, Cathy
3000:
4000: Lindberg, Don
      Hierarchical Databases

               1234567


Sandiego, Carmen           123 Main Street

                    Labs

      Chem7                    Chem7

  Na136     K 4.3          Na142     K 3.9
         Hierarchical Databases

•   Easy to use
•   Efficient storage
•   “Tree walking” is fast
•   Queries across trees are slow
•   Flexible
•   Too flexible: chaos is allowed
•   Too easy to modify
•   Difficult to document complex structures
       Hierarchical Databases


^EMR(1234567)=“Sandiego, Carmen”
^EMR(1234567, “Address”)=“123 Main Street”
^EMR(1234567, “Chem7”, “2/2/02”, “Na”)=136
^EMR(1234567, “Chem7”, “2/2/02”, “K”)=4.3
^EMR(1234567, “Chem7”, “2/3/02”, “Na”)=142
^EMR(1234567, “Chem7”, “2/3/02”, “K”)=3.9
          Hierarchical Chaos

                   1234567

               Admissions

     Admission 1

Admit Date:   Primary DX:     Other DX
  2/2/02          CHF
                             AODM     A Fib

                             Flag: S Flag: P
             Network Databases


  1234567               Gyn Clinic   2 Main St.

 Sandiego                Secretary   305-2500

Gyn Clinic               Ms Smith    8AM-5PM

                         305-1000     Service
Pap
      Gyn Visit   Dr. Jones          Beeper 34
  Extensible Markup Language (XML)
              Databases
• SGML is a metalanguage
• SGML is used to write Document Type
  Definitions (DTDs) that define languages
• HTML is a language with an SGML DTD
   – Tags are for formatting/presentation syntax
• XML is a proper subset of SGML
• XML defines tags that convey semantics
• We could write “Health Markup Language”
  (“HML”) in XML (if we could agree on the
  semantics and tags)
• Tags may or may not be stored with data
<document>
  <document.id>CXR001</document.id>
  <doc. date>19991101</doc. date>
  <document.type>
      <identifier>P5-00010</identifier>
      <text>Chest X-Ray</text>
  </document.type>
  <document.body>
     <findings>No infiltrate, cardiac shadow
                  not enlarged...</findings>
     <impression>Normal X-ray</impression>
  <document.body>
</document>
<patient>
   <patient.id>
      <id.value>1234789</id.value>
   </patient.id>
   <patient.name>
      <family.name>Sandiego</family.name>
      <given.name>Carmen</given.name>
      <suffix>M.D.</suffix>
   </patient.name>
   <patient.dob>19230113</patient.dob>
   <patient.sex value="male"/>
   <inpatient/>
</patient>
Extensible Markup Language (XML)
            Databases
• Strengths
  – Flexibility to represent wide range of data
  – Data carries its field assignment
  – Sparse data handled compactly
  – Tags can have platform-specific display
• Weaknesses
  – Immature database tools
  – Verbose
  – I/O intensive
  – A trade-off of decreased efficiency for
    increased flexibility; ? scalability
Relational Databases - Features



 •   Tables with columns and rows
 •   Logical vs. physical representation
 •   Multiple indexes
 •   Inter-table relationships
 •   Virtual sequential files (with
     simultaneous update)
                 Relational Databases
   Patient
                                         Lab_test
    Pt-UI     Lname        Fname
   12345       Smith      Elmer           Pt-UI Tname           Date
   12346       Jones      Barbara        12345 Na               5/30/96
   12347       Clark      Arthur         42353 CBC              5/30/96
   12348       Jones      Casey          47756 ESR              5/30/96
   12349       Sample     Steve          12348 HBsAg            5/30/96
                                         34523 Amylase          5/30/96


From table “Patient”, get “Pt_UI” where “Lname”=“Jones” and “Fname”=“Casey”

 and then Get “Tname” and “Date” from table “Lab_test” for the same “Pt_UI”
               Normalization

•   Efficient database organization
•   Eliminate redundant data
•   Ensure data dependencies make sense
•   E.F. Codd, 1970: five normal forms
•   First Normal Form
    – Eliminate duplicative columns
    – Create separate tables for each group of
      related data
    – Identify each row with a unique column
      or set of columns (the primary key).
      Normalization (continued)

• Second normal form
  – Remove subsets of data that apply to
    multiple rows of a table and place them in
    a separate table
  – Create relationships between these new
    tables and their predecessors through the
    use of foreign keys
• Third normal form:
  – Remove columns that are not dependent
    upon the primary key
Relational Databases - Advantages
•   Comprehensible
•   Multiple “views” possible
•   Easy to modify
•   New elements don’t “break” programs
•   Database management systems (DBMS)
    – Referential integrity
    – “Reorg” for efficiency
    – Access control
    – Locking for multiple simultaneous use
Relational Databases - Disadvantages




         • Storage overhead
         • I/O-intense
         • Cost
   Systems Design Fundamentals


• The data model is the most critical aspect
• Data model should reflect real world objects
  and relationships to ensure durability
• A correct data model outlasts applications,
  including many not anticipated at system
  start-up
 System Design: Basic Concepts

• The world contains “things”
• Develop abstractions called “objects”
• Group objects by criteria which represent
  the abstract object as an empty table


 Patient ID   Name      Physician   Phone No.
         Types of Objects (Tables)

•   Tangible things (book, person)
•   Roles (doctor, patient, supervisor)
•   Events (ordering of a lab test)
•   Interactions - bind two or more other objects
    via a transaction (“purchase” relates buyer to
    seller)
                         Objects
• All of the real-world things in the set (the
  “instances”) have the same characteristics
• All instances conform to the same rules
                        LICENSE
Doctor             License       Exp. Date    Specialty
Casey              123 ABC       Jan 2004     Surgery

Kildare            691XKY        Mar 2005     Medicine
Holiday            12-A-962                    Sculpture

         No holes in the table    No strange values
    Basic Concepts (continued)

• Empty tables can be filled in to
  represent the real world things from
  which the object was abstracted
Patient ID     Name       Physician   Phone No.

 3131313     John Smith   Casey       867-5309

 1234567     Ben Casey    Killdare    PA6-9000

 2121212     Mary Jones   Holiday     555-1234
         Basic Concepts (continued)

    • Relationships between objects are
      “attributes” of those objects

Patient ID   Name       Physician   Phone No.




Relationship: “Has-Doc”
 Patient Has-Doc Physician   Physician   Address   Phone
                 Table Notation
   Empty Table form:
                   Patient_Admissions
        Pt_ID Date_Adm Time_Adm Unit        Room

Graphical Form:
     Patient_Admissions   Textual Form:
    * Pt_ID
    * Date_Adm            Patient_Admissions (Pt_ID,
    -Time_Adm             Date_Adm, Time_Adm, Unit, Room)
    -Unit
    -Room
          Formalisms for Tables
• Rule 1: One instance of an object has:
  – exactly one value for each attribute
  – only one element per row-column intersection
  – no repeating groups
  – no true “holes” in table
• Rule 2: Attributes contain no internal structure
          Formalisms for Tables
Not ok:
  Name    Sex-Age   Weight    Glucose1 Glucose 2
  Mary      F-32    133,135      201
   Joe     M-43       190        116      93
   Joe     M-43       190         88

Ok:          Name      Sex      Age
             Mary       F        32
              Joe       M        43

      Name    Weight              Name   Glucose
      Mary     133                Mary     201
      Mary     135                 Joe     116
       Joe     190                 Joe      93
                                   Joe      88
           Formalisms for Tables
• Rule 1: One instance of an object has:
  – exactly one value for each attribute
  – only one element per row-column intersection
  – no repeating groups
  – no true “holes” in table
• Rule 2: Attributes contain no internal structure
• Rule 3: Every attribute should represent a
  characteristic of the entire object, not a
  characteristic of a limited part of the object
          Formalisms for Tables

            Patient Table
Not OK:     *Person Identifier
            -Person Name         Attribute of
            -Date of Birth       encounter, not
            -Date of Admission   patient


      Patient Table         Admission Table
OK:   *Person Identifier    *Admission ID
      -Person Name          -Person Identifier
      -Date of Birth        -Date of Admission
           Formalisms for Tables
• Rule 1: One instance of an object has:
  – exactly one value for each attribute
  – only one element per row-column intersection
  – no repeating groups
  – no true “holes” in table
• Rule 2: Attributes contain no internal structure
• Rule 3: Every attribute should represent a
  characteristic of the entire object, not a
  characteristic of a limited part of the object
           Relationships

• Relationship: an abstraction of an
  association between real world things
   – Patient OCCUPIES Bed
   – Library CONTAINS Books
   – Specimen IS ASSAYED by Lab Method
• Inverse relationships:
   – Bed is OCCUPIED BY Patient
   – Book IS LENT BY Library
              Relationship Types

One-to-One:     Patient       Bed




One-to-Many    Patient         Disease




Many-to-Many    Patient        Doctors
Modeling Many-to-Many Relationships


DRUG                               DRUG MANUFACTURER
*generic name                      * manufacturer name
- other attributes                 - other attributes



                     LICENSE
                     * manufacturer name
                     * generic name
                     - date licensed
Exercise: Devise a Relational Model
       for MEDLINE citations
PMID- 2405204
TI- Medical informatics. An emerging academic discipline and institutional priority.
AB- Information management constitutes a major activity of the health care…\
AD- Department of Radiology, Brigham and Women's Hospital, Boston, MA 02115
AU- Greenes RA
AU- Shortliffe EH
LA- eng
PT- Journal Article
PT- Review
JT- JAMA : the journal of the American Medical Association
SO- JAMA. 1990 Feb 23;263(8):1114-20.
MH- Career Choice
MH- Hospital Information Systems
MH- Information Systems
MH- Medical Informatics/education/organization & administration/*trends
MH- Medical Informatics Applications
MH- National Library of Medicine (U.S.)
MH- Research
MH- Training Support
MH- United States
                               Case Presentation
The patient is a 50 year old, Native American female who present to the emergency room (ER) with the chief
complaint of lip numbness, nausea and chest pain.
The patient was generally well until about one half hour prior to arrival in the ER, while eating dinner at as seafood
restaurant in Rock Harbor, MA. She was finishing a dinner of New England clam chowder, lobster, steamed
clams, and corn on the cob when she noted onset of symptoms. Others in her party ate fish and chips, although
two other people ate the clam chowder; none at the steamers.
She gives a history of hypertension and states that she was getting a "capsule, half green, half blue-green" from
her private doctor. She also reports that she was treated in the past for tuberculosis while she was pregnant, but
doesn't remember what she was treated with or for how long. She reports that she was at another hospital on the
other side of town, where she had a liver biopsy. She reports that he thinks the diagnosis was
"hemachromatosis". The patient reports an allergy to Bufferin.
Physical examination revealed a well-developed, well-nourished diaphoretic female in moderate respiratory
distress. Vital signs showed a pulse of 110, a respiratory rate of 8, an oral temperature of 100.3, and a blood
pressure of 150/100. Examination revealed rales over both lower lung fields. Abdominal exam revealed a tender,
palpable liver edge. Neurologic exam reveals dysarthria, diffuse muscle weakness, and hyperreflexia.
Chem7 (serum): Glucose 100 (70-105) Chem7 (plasma): Glucose 150 (75-110)
CBC: Hgb 15 (12.0-15.8), Hct 45 (42.4-48.0), WBC 11,000 (3,540-9,060), Platelets 145K (165-415K)
A fingerstick blood sugar was 80
Urinalysis showed protein of 1+ and glucose of 0
A blood culture was positive for methicillin-resistant Staphylococcus aureus (MRSA)
ECG - Sinus Rhythm, 74BPM, Axis -30 degrees, ST segment 2mm elevated and
T-waves down in leads I, L, V5 and V6
Chest X-ray Left upper lobe infiltrate, left ventricular hypertrophy
The patient's nurse reported that the patient seemed more worried about who would care for her elderly father if
anything happened to her.
A medical student reviewing the case wonders whether paralytic shellfish poisoning could cause a myocardial
infarction; she decides to do a literature search.
The patient was treated with activated charcoal and stomach lavage , followed by enteric-coated aspirin. Due to
worsening respiratory insufficiency, she was intubated and placed on mechanical ventilation.
         Take-Home Messages


• Data model is the most critical aspect of
  system design and function
• Data models should reflect real world objects
  and their relationships to ensure durability
• A correct data model outlasts applications,
  including many not anticipated at system
  start-up

								
To top