Data Management and Functions by lja61768


More Info
									  Data Management

          MIS 503
Management Information Systems
       MBA Program
• Database: A DB is an organized collection
  of logically related data
• Data: stored representations of
  meaningful objects and events
  – Structured: numbers, text, dates
  – Unstructured: images, video, documents
• Information: data processed to increase
  knowledge in the person using the data
• Metadata: data that describes the
  properties and context of user data
   Data Entities, Attributes,
         and Keys
• An entity is a generalized class of
  people, places, or things for which
  data is collected, stored, and
• A attribute is a characteristic of an
• Data item - a value of an attribute
  can be found in the fields of the
  record describing an entity.
                    Key Fields
• Keys are special fields that serve two main
   – Primary keys are unique identifiers of the relation in
     question. Examples include employee numbers, social
     security numbers, etc. This is how we can guarantee
     that all rows are unique
   – Foreign keys are identifiers that enable a dependent
     relation (on the many side of a relationship) to refer to
     its parent relation (on the one side of the relationship)
   – A secondary key is a field in a record that does not
     uniquely identify the record but which is used to look
     up fields (e.g., Last Name) in, for example, and index
   – Candidate Key – an attribute that could be a
     key…satisfies the requirements for being a key
• Keys can be simple (a single field) or
  composite (more than one field)
           Data Model
• A data model is a diagram of entities
  and their relationships.
• Data modeling involves
  understanding a specific business
  problem and analyzing the data and
  information needed to deliver a
• The data model will be optimized to
  balance storage efficiency and query
         E-R Diagrams
• Entity-relationship (ER) diagrams
  use graphical diagrams to
  demonstrate the organization of and
  relationships between entities.
• Relationships include:
  – one-to-many (1:N)
  – one-to-one (1:1)
  – many-to-many (N:M)
                      ER Modeling
                         Single line: one     Crow 's foot: many
                            cardinality          cardinality
         Inside symbol:
      minimum cardinality

Course                                                     Offering
CourseNo                           Has                     Of f erNo
CrsDesc                                                    Of f Location
CrsUnits                                                   Of f Time

                                            Circle: zero
            Outside symbol:                 cardinality
           maximum cardinality
ER Modeling
  Relational Database Model
• Relational Model - the relational model describes
  data using a standard tabular format
    – All data elements are placed in two-dimensional tables,
      called relations
    – A relation contains rows (tuples, records) and columns
      (attributes, fields) with each intersecting cell containing an
      item of data
        • Each attribute has a domain, which is the structure or
          constraints on the type of data an attribute can hold

          Customer Table                        Order Table
Field Name       Description       Order Number           Primary Key
Customer Name Self Explanatory Order Item                 Self Explanatory
Customer Address Self Explanatory Number of Items Ordered Self Explanatory
Customer ID      Primary Key-----> Customer ID            Secondary Key
Order Number Secondary Key
  Data Management Issues
• Data redundancy – Un-needed duplication
  of data
• Data integrity – Can we rely on the data?
  – Is it accurate?
  – Is it secure?
• Program-data dependence - programs and
  data that are developed and organized so
  that the data is linked to the application
  program and the data is incompatible with
  other programs or data management tools
The Database Approach to
    Data Management
• The database approach
  – Central repository of shared data
  – Data is managed by a controlling agent
  – Stored in a standardized, convenient
  To implement a database one must have a
  Database Management System (DBMS)
    Database Management
      Systems (DBMS)
• A database management system is a
  group of programs used as an interface
  between a database and application
  programs or a database and the user.
  – DBMSs are classified by the type of
    database model they support.
  – Modern information systems are usually
    built on data housed in one or more
Tools for Managing Data

•   A DBMS helps manage data by providing
    seven functions:
    1.   Data storage, retrieval, update
    2.   Backup
    3.   Recovery
    4.   Integrity control
    5.   Security control
    6.   Concurrency control
    7.   Transaction control

                                            Page 139
      Advantages of DBMSs
•   Reduced data redundancy.
•   Improved data integrity.
•   Faster program development.
•   Easier modification and updating.
•   Data and program independence.
•   Standardization of data access.
•   A framework for program development.
•   Better overall protection of the data.
•   Shared data and information resources.
         Data Normalization
• A common problem with data organization is
  that data are often not well organized
  – Anomalies are problems and irregularities in data.
  – Data anomalies often result in giving users
    incorrect information, causing them to be
    misinformed about actual business conditions.
• If data are not well organized, the table may
  need to be normalized…
   Well-Structured Relations
• A relation that contains minimal data redundancy
  and allows users to insert, delete, and update rows
  without causing data inconsistencies
• Goal is to avoid anomalies
   – Insertion Anomaly – adding new rows forces user to
     create duplicate data
   – Deletion Anomaly – deleting rows may cause a loss of
     data that would be needed for other future rows
   – Modification Anomaly – changing data in a row forces
     changes to other rows because of duplication

General rule of thumb: a table should not pertain to
             more than one entity type
What’s wrong with this
         An Example of
• Click here to view a step-by-step
  example of normalization
Database Architecture

Database –
shared collection of logically related data, organized
to meet needs of an organization

Database Architecture –
way in which the data are structured and stored in
the database

                                                         Page 137
Figure 5.3 The Data Pyramid

                              Page 137
The Three-level DB Schema
      Database Schemas
• Schema - a general description of the
  entire database that shows all of the
  record types and their relationships.
  – A user view (external schema) is the portion
    of the database a user can access.
  – The conceptual view is the logical design of
    the database (how should the database be
    organized regardless of physical constraints)
  – The internal view (physical view) is the
    physical storage structure for the database
• A subschema shows only some of the
  records and their relationships in the
• Structured Query Language - a
  query language is a specialized type
  of data manipulation language.
• Query languages make retrieving
  information and manipulating a
  database easy and fast.
• SQL - structured query language.
 Structured Query Language
• Basic structure of a SQL expression
  – The select clause lists the attributes desired in
    answer to a query
  – The from clause is a list of relations or tables
    that the query language processor should
    consult in filling the request
  – The where clause describes the attributes
    desired in the answer
Emerging Database Trends
    Distributed Databases
• Distributed Processing - involves
  placing processing units at different
  locations and typing them together
  with data communications
  equipment and systems.
  – A distributed database is a database in
    which the actual data may be spread
    across several small databases
    connected via telecommunication
   Storage Area Networks
• High-speed, special purpose
  network or subnet that interconnects
  different kinds of storage devices
  with associated data servers to
  benefit a larger network of users.
• Part of an overall network for
  computing resources.
• Usually physically located near
  larger computing resources such as
  a mainframe or server.
   Network-Attached Storage
• Hard disk storage with its own
  network address rather than being
  attached to a server or workstation.
• Includes:
  – Multi-disk Redundant Arrays of
    Integrated Disks (RAID) systems
  – Software to configure and map file
  – Designed to handle a variety of
    network protocols
     The Data Warehouse
• Businesses collect a tremendous
  amount of transactions data from
  routine operations
• These data can be analyzed to
  understand the business better
  – Requires multidimensional analysis
    called Online Analytical Processing
  – Helps create a learning organization that
    is better able to understand its markets,
    customers and itself
• Data Warehouse:
  – A subject-oriented, integrated, time-variant, non-
    updatable collection of data used in support of
    management decision-making processes
  – Subject-oriented: e.g. customers, patients,
    students, products
  – Integrated: Consistent naming conventions,
    formats, encoding structures; from multiple data
  – Time-variant: Can study trends and changes
  – Nonupdatable: Read-only, periodically refreshed
• Data Mart:
  – A data warehouse that is limited in scope
                Components of a star schema
                           Fact tables contain
                           factual or quantitative

                                                     Dimension tables are
1:N relationship
                                                     denormalized to
between dimension
tables and fact tables

                         Dimension tables contain
                         descriptions about the
                         subjects of the business
                   Data Mining
• Discovers interesting structure in large
  amounts of data
• This structure consists of
   – Patterns
   – Statistical or predictive models of the data
   – Relationships between the data
• Applied extensively to customer data
   – Allows firms to determine for instance which
     products sell together
 Object-Oriented Databases
• Traditionally relational databases
  supported a limited number of data types
  – Alphabet, numeric, dates, and time
• Modern organizations use a variety of data
  – Graphics objects, audio clips, videos,
    subscripted arrays, and complex data for data
• RDBMS vendors have extended their
  packages to handle such data objects
     Threats to Data Security
• Accidental losses attributable to:
   – Human error
   – Software failure
   – Hardware failure
• Theft and fraud.
• Improper data access:
   – Loss of privacy (personal data)
   – Loss of confidentiality (corporate data)
• Loss of data integrity
• Loss of availability (through, e.g. sabotage)
    Data Management Security
•   Views or subschemas
•   Integrity controls
•   Authorization rules
•   User-defined procedures
•   Encryption
•   Authentication schemes
•   Backup, journalizing, and checkpointing
Principles in Managing Data

•    The need to manage data is permanent
•    Data can exist at several levels
•    Application software should be separate
     from the database
•    Application software can be classified by
     how they treat data
    1. Data capture
    2. Data transfer
    3. Data analysis and presentation

                                                 Page 140
Principles in Managing Data

•   Application software should be
    considered disposable
•   Data should be captured once
•   There should be strict data

                                     Page 143
The Data Management Process

                     Figure 5.6 Asset Management Functions   Page 144
Data Management Policies

• Organizations should have policies
  – Data ownership
  – Data administration

Data Ownership

Corporate information policy –
foundation for managing the ownership of data

Figure 5.8 Example Data Access Policy   Page 149
Data Administration
Key functions of the data administration group:
•   Promote and control data sharing
•   Analyze the impact of changes to application systems
    when data definitions change
•   Maintain the data dictionary
•   Reduce redundant data and processing
•   Reduce system maintenance costs and improve system
    development productivity
•   Improve quality and security of data
•   Insure data integrity

                                                           Page 150
Data Administration
Key functions of the database administrator (DBA):
•   Tuning database management systems.
•   Selection and evaluation of and training on database
•   Physical database design.
•   Design of methods to recover from damage to
•   Physical placement of databases on specific computers
    and storage devices.
•   The interface of databases with telecommunications
    and other technologies.

                                                            Page 150-151
  7 Habits of Highly Effective
       Data Modelers*
• Immerse
  – Immerse yourself in the task environment to find
    out what the client wants
• Challenge
  – Challenge existing assumptions; dig out the
    exceptions and test the boundaries of the model
• Generalize
  – Reduce the number of entities whenever possible;
    simpler is easier to understand
• Test
  – Read it to yourself and to others to see if it makes
    sense and is relevant to the problem
                          *adapted from R. Watson (1999)
   7 Habits of Highly Effective
        Data Modelers
• Limit
   – Set reasonable limits to the time and scope of the data
     modeling activities. Identify the core entities and attributes
     that will solve the problem and stick to those
• Integrate
   – Identify how your project’s model fits with the organization’s
     information architecture. Can it be integrated with the
     corporate data model? Look at the big picture.
• Complete
   – Don’t leave the data model ill-defined. Define entities,
     attributes, and relationships carefully.

To top