Introduction to Information Systems Analysis Data, Process, and

Document Sample
scope of work template
							           Introduction to Information
                Systems Analysis
   Data, Process, and Network Modeling


                    INFO 503
                   Glenn Booker

INFO 503              Lecture #4         1
             Data Modeling
• Data modeling (or database or information
  modeling) is a way of organizing and
  describing the data in a system
• It is a logical model to describe the specific
  data fields (elements) we wish to capture,
  and how they are related to each other


INFO 503              Lecture #4                   2
             Where to start?
• Data modeling starts with thinking about
  the things involved in your system
• These things are formally called “entities” –
  nouns, if you will
• Start by identifying all of the places,
  people, events, and ideas which are
  affected by your system

INFO 503             Lecture #4               3
     Permanent vs. Transient Data
• A key for relational data modeling is that
  we are primarily concerned with data we
  need to keep permanently
• Data which is only needed briefly isn‟t
  modeled in an ERD
     – Major difference between relational and
       object-oriented analysis

INFO 503                 Lecture #4              4
           Characterize Entities
• Then examine each entity and determine the
  attributes which you are interested in – what
  do you need to know in order to describe
  one such entity meaningfully?
• Consider if some attributes can be readily
  grouped together, thereby forming
  compound attributes (e.g. name)

INFO 503             Lecture #4               5
            Characterize Entities
• Entities are generally one of two types:
     – A set of data you want to keep permanently
       (customer orders, product information, etc.), or
     – A lookup list or table (types of status codes,
       shipping rates, tax rates, etc.)
• Data which is transient is generally kept in
  local variables, and doesn‟t appear in an
  ERD (e.g. change of address info)
INFO 503                  Lecture #4                      6
             Keep it or not?
• In trying to decide if data needs to be kept,
  consider whether someone might want
  to analyze that data in the future
• For examples, to look for sales patterns,
  trace relocation history, keep record of
  data changes (who modified what data
  and when?)
• When in doubt, keep it for now

INFO 503             Lecture #4                   7
              Characterize Attributes
• For each attribute, define its data type:
     –     Text (“Fred”) [and the character set (Latin)]
     –     Number (real (3.56) or integer (124))
     –     Date and/or time
     –     Yes/No (a.k.a. T/F, binary, or Boolean)
     –     A fixed set of possible values (e.g. grades)
     –     Multimedia: photos, drawings, movies, sounds

INFO 503                     Lecture #4                    8
    Relevant Data Type Standards
• Character sets
     – ISO/IEC 8859
     – Unicode
• Representation of dates and times
     – ISO 8601




INFO 503              Lecture #4      9
           Characterize Attributes
• Identify the domain of each attribute – the
  range of allowable values
• Determine if there is a default value for
  each attribute
• Is each attribute mandatory (required) for
  each entity? (Avoid many mandatory fields)
• Is an attribute uniquely suited to be a key?

INFO 503            Lecture #4               10
                Key Attributes
• An attribute or group of attributes may be
  a unique identifier, or key, for each entity
     – Examples are Social Security Number,
       driver‟s license number, ISBN, Student ID
• If a group of attributes is used, it is
  a concatenated (a.k.a composite or
  compound) key

INFO 503                 Lecture #4                11
            Many Keys Possible
• There might be more than one key for
  an entity
• Each possible key is called a candidate key
• One candidate key is selected primary key
• All others are alternate keys
     – Example: the electric company may use a
       customer ID or account # as primary key, and
       your phone number as an alternate key

INFO 503                 Lecture #4                   12
   Primary Key may be Meaningless

• A primary key may correspond to some
  important piece of information
     – SSN, student ID, ISBN, etc.
• Or it may be completely meaningless
     – A sequential number, called Order_ID
• As long as the primary key is unique
  for every record, either is acceptable

INFO 503                 Lecture #4           13
              Relationships
• Entities affect each other by means
  of relationships
• Relationships are described by a verb
  phrase, e.g. “is a member of”, “is part of”,
  “is a prerequisite for”, etc.
• A different verb phrase may be used for
  each direction between two entities, “is
  enrolled in” versus “is being studied by”

INFO 503             Lecture #4                  14
                                                       p. 299
                                                       (180)

     Cardinality and Relationships
  Here we are using the Martin notation; many others exist
• Relationships are described by how many
  records of each entity may be related: 0
  (shown by a „0‟), 1 (shown with a single or
  double line), or many (shown by a trident)
• Cardinality of zero means the relationship
  is optional in that direction
• One-to-one is a unique relationship

INFO 503                   Lecture #4                        15
     Cardinality and Relationships
• Cardinality conveys the minimum and maximum
  number of relationships, and must be defined in
  both directions for all relationships:
   – Only one
   – Zero or one
   – One or many (more)
   – Zero, (one), or many
   – Many (only >1)
INFO 503              Lecture #4                    16
     Cardinality and Relationships
• To determine cardinality, ask “for one
  record in A, how many possible records
  could exist in B?”
             A                    B

• Consider extreme cases; a Customer may
  have no Orders briefly, before their first
  order is completed
INFO 503             Lecture #4                17
           Degree of Relationships
• The degree of a relationship is the number
  of entities involved
• Most relationships are binary (two entities)
• Recursive (unary) relationships involve one
  entity, e.g. list of employees and managers
• N-ary (3-ary, or ternary) relationships
  involve more than two entities

INFO 503             Lecture #4              18
                 Foreign Keys
• A foreign key (FK) is an attribute which
  exists, in an entity other than where it is a
  primary key (PK), to establish the
  relationship between the two entities
     – Primary key must be unique for each record,
       but a foreign key value may appear many times
     – Only one PK-FK connection is required for the
       relationship to exist
INFO 503                Lecture #4                 19
             Other Relationships
     – Entity with FK generally has a PK of its own
• A PK may also be a FK
     – Especially for 1:1 relationships or
       when generalization is used
• An associative entity builds a concatenated
  primary key from more than one entity
     – Uses a diamond shape inside the normal box
       to show its special nature          p. 301 (182)

INFO 503                  Lecture #4                      20
             Other Relationships
• A many-to-many (non-specific) relationship
  implies a lot of one-to-many relationships
     – Often use an associative entity to bridge
       between them
• An identifying relationship is when a parent
  entity‟s PK is used as part of the PK for a
  child entity
     – Child entity is then considered “weak”
       because it depends on the parent
INFO 503                  Lecture #4               21
                    Supertype
• A supertype is the result of generalizing
  similar characteristics of several entities
     – E.g. Students and Faculty are both People
     – Also used as basis for object modeling
     – Also known as an “is a”, “was a”, or “could
       be a” relationship
     – Uses one-to-one relationships


INFO 503                 Lecture #4                  22
                      Subtype
• The subtype inherits some characteristics
  from the supertype, and adds other specific
  characteristics (attributes) to each entity
• The same entity can be both supertype and
  subtype from different perspectives
     – Kind of like you could be a child and a parent
       at the same time

INFO 503                 Lecture #4                     23
           Data Modeling Process
• Data models evolve throughout the life
  of the system
• An organization may plan on a large scale
  using strategic data modeling to create an
  enterprise data model
• This is refined for each system with an
  application data model

INFO 503            Lecture #4                 24
           Data Modeling Process
• To start the model, look for nouns which
  are frequently used during fact finding;
  consider each a possible entity
• Note that entities should each appear lots
  of times; if it‟s rarer than that, it may not
  be an entity
• Give entities a singular name, not plural
     – Customer, not Customers

INFO 503               Lecture #4                 25
           Data Modeling Process
• Independent entities exist without any other
  entities, and are often found first
• Don‟t be afraid to reconsider the structure
  of each entity, or remove useless ones
     – This is an iterative process!
• Then name each relationship and define
  its cardinality

INFO 503                  Lecture #4         26
           Data Modeling Process
• Identify keys for each entity; keep them as
  simple as possible (PK, FK)
• Look for supertypes and subtypes
• Describe all data elements for each entity
     – Identify what type of data they will contain
     – Identify default values and whether they
       are mandatory

INFO 503                  Lecture #4                  27
           Data Modeling Process
• The bottom line for keys is:
     – Each entity must have at least one PK
     – Alternate keys are completely optional
     – Each entity may have from zero to many FK‟s
     – Each FK is a PK in another, related entity
     – Only one PK-FK relationship is needed to
       relate two entities
     – Some keys are not inherently meaningful data
INFO 503                Lecture #4                    28
             Data Normalization
• Analysis of a data model for
  implementation is done using
  data normalization
     – Normalization organizes data attributes to
       form simple, non-redundant, flexible,
       adaptive entities
• There are five levels of data normalization,
  of which three are generally used
INFO 503                 Lecture #4                 29
           First Normal Form (1NF)
• An entity is in first normal form if there are
  no attributes which can have more than one
  value for each instance (record) of the entity
• Attributes which could have more than one
  value for a given entity belong to a different
  kind of entity
• In other words, every attribute appears only
  once for each record
INFO 503             Lecture #4                30
      Second Normal Form (2NF)
              Look at concatenated keys only!
• Must be first normal form, and:
• Each non-primary-key attribute is uniquely
  determined by the entire primary key
• Non-primary-key attributes may not be
  dependent on only part of the primary key
     – If any are, move them to another table which
       uses only that part of the primary key

INFO 503                  Lecture #4                  31
           Third Normal Form (3NF)
• Must be second normal form, and:
• The value of each non-primary-key attribute
  is not dependent upon any other
  non-primary-key attribute
     – Everything depends only on the primary key
• The two ways to look for this are derived
  attributes and transitive dependencies...

INFO 503                Lecture #4                  32
           Third Normal Form (3NF)
• Derived attributes (data) are fields
  calculated or logically derived from
  other fields
     – Exception: OK to keep attribute if multiple
       entities are involved in deriving an attribute
• Transitive dependencies may exist for
  non-concatenated keyed tables; is when
  a non-key attribute depends on another
  non-key attribute
INFO 503                  Lecture #4                    33
           Third Normal Form (3NF)
• Or in brief, for third normal form…

   An entity is in third normal form if every
   non-primary key attribute is dependent on
   the primary key, the whole primary key, and
   nothing but the primary key

(as in, “Do you swear to tell the truth…”)
INFO 503                Lecture #4           34
           Further Normalization
• Additional improvement in data structure
  is possible through “Simplification by
  Inspection” - look for other redundancies
  or simplifications possible
• Many CASE tools can also inspect for first
  level normalization, but generally no further
• Just for the record, here are the 4th and 5th
  normal forms…

INFO 503             Lecture #4               35
  Fourth and Fifth Normal Forms
                                   INFO 605 text, pp. 351-354
• Fourth normal form (4NF) involves
  removing multivalued dependencies
     – If a pair of records has two matching attributes,
       decompose the data structure to remove that
• Fifth normal form (5NF) involves removing
  join dependencies (nearly impossible to do)
     – This is when business rules define a connection
       among many entities (e.g. if you replace a tire,
       you must also replace the valve stem)
INFO 503                  Lecture #4                        36
           Process Modeling
• Process modeling describes the way data
  flows throughout an organization or system
• A context diagram is a special process
  model which shows interfaces
• Data flow diagrams (DFDs) (a.k.a. bubble
  chart or transformation graph) are the most
  common process model

INFO 503            Lecture #4              37
            Data Flow Diagrams
                                              p. 346 (213)
• Notation has three shapes
     – Processes are in rounded-corner rectangles
     – External systems and users are in squares
     – Open-ended boxes are data storage
       files (may be more general than a single entity)
• Arrows show how data flows from one
  shape to another
            This is the Gane and Sarson notation
INFO 503                   Lecture #4                        38
 DFD is not a Program Flowchart
• Data Flow Diagram                   • Program Flowchart
     – Abstract                             – Precise
     – Can have parallel                    – Shows one activity
       (simultaneous)                         at a time
       activities                           – Must show loops and
     – Shows all possible                     branches (decisions)
       paths of data                        – Often must recognize
     – Has no time scale, no                  time dependencies
       decisions or logic

INFO 503                       Lecture #4                            39
           Data Flow Diagrams
• Popular for supporting BPR
• Processes respond to business events
  and conditions
• Processes transform data into information
• A system embodies a set of processes



INFO 503            Lecture #4                40
    Rules for Data Flow Diagrams
• A user or external system can only connect
  to one or more process boxes
• Each process will connect to at least one
  user or external system, and one data store
     – Each process may send data to a data store,
       and/or get data from a data store
     – Processes rarely connect to other processes
     – Each process needs data flowing in and out of it
INFO 503                 Lecture #4                  41
             DFD Cleanup
• Every data store needs data flowing both in
  and out (no black hole = inputs but no
  output, or miracle = outputs without input)
• Fix processes which have logically
  incomplete inputs and outputs
• Leave in processes which calculate
  something, make decisions, manipulate
  data, or organize data
INFO 503            Lecture #4                  42
           Process Decomposition
• A process transforms or responds to
  incoming data or events
     – Focus on what is done, and by whom
     – Ignore how it is accomplished
• Process decomposition breaks a system
  down into smaller subsystems and
  processes, until each is readily understood

INFO 503                Lecture #4              43
           Decomposition Diagram
                                                  p. 350
                                                  (243)
• A decomposition diagram uses an
  organization chart structure to show
  how a system is broken down logically
  into smaller pieces or functions
     – Car: start, go faster, slow down, turn, stop
     – University: admissions, registration, take
       courses, grading, graduation


INFO 503                  Lecture #4                       44
                Other Processes
• Functions are related ongoing activities
• Events (transactions) are units of work
  performed at a certain time
     – Events tend to activate various functions
• Elementary (primitive) processes are the
  lowest level of detail in a process model;
  should have a strong action verb

INFO 503                 Lecture #4                45
                    Process Logic
• Then identify the logic involved in
  processes using Structured English
• Use simple declarative sentences to describe
     –     Sequences of actions
     –     Conditional actions (if…then)
     –     Decision tables
     –     Iterations

INFO 503                     Lecture #4     46
              Data Packets
• Think of data between shapes as packets
  of information, regardless of their actual
  contents or form (e.g. drive up window air
  tube at bank)
• It may help to start at a very high level,
  then decompose each step into more
  detailed processes; a composite data flow

INFO 503            Lecture #4                 47
           Other Considerations
• Different types of data may be distinguished
  at a junction
• A control flow represents an event which
  triggers a process (end of month, etc.)
• More detailed process modeling can be
  performed ad nauseum


INFO 503            Lecture #4              48
           Network Modeling
• Network modeling describes a system in
  terms of its business locations
• These locations may cover suppliers,
  customers, and various aspects within
  the system
• A location connectivity diagram may be
  used to show the network model

INFO 503           Lecture #4              49
           Network Hardware
• For more information on the physical parts
  of a network, try the Cisco tutorials, such as
  for educational or small business networks




INFO 503             Lecture #4                50
           Model Synchronization
• It is important to make sure that the
  data, network, interface, and process
  models agree
• Map Data to Process, and Data to Location
  using a CRUD matrix
• Optionally, map Process to Location


INFO 503           Lecture #4                 51
                     CRUD matrix
• A CRUD matrix maps two system models
  to ensure complete coverage and
  coordination of requirements
• CRUD refers to the possible activities
     –     Create new data
     –     Read existing data
     –     Update or change existing data
     –     Delete existing data

INFO 503                     Lecture #4     52
             CRUD matrix
• The CRUD matrix shows each element
  from two different models, and identifies
  which properties (permissions) exist
  for communication
• A blank indicates those two elements are
  not related for those models
• Other properties can be defined as needed

INFO 503            Lecture #4                53
     Process-Location Association
• Similarly, each process can be mapped to
  the locations from which it is performed




INFO 503            Lecture #4               54
           Requirements Traceability
• Similar matrices can be done to map
  between the system requirements and
  the major functions
• This proves where each requirement is
  implemented in the system
• Tedious to generate, but invaluable!


INFO 503             Lecture #4           55

						
Related docs