Intrinsic problem pattern

Document Sample
Intrinsic problem pattern Powered By Docstoc
					   Data and Information Quality

               Information Systems Design

                        CS 6144




(c) Jan Hannemann                       November 17, 1998
        Data and Information

Database 1      Database 2          ...   Database N


                  Extraction



             (useful) Information


              Decision-Making
          Data production system

  Data        Data producer    Data production




              Data custodian   Data storage, maintenance, security




Information   Data consumer    Data utilization
  Information System Design
 How to ensure data / information quality?
 How to extract information from data?
 How to present Information?
 Summary
  Information System Design
 How to ensure data / information quality?
 How to extract information from data?
 How to present Information?
 Summary
           Criteria for DQ / IQ
Category           Dimensions
    Intrinsic      Accuracy, Objectivity, Believability, Reputation

  Accessibility    Accessibility, Access Security

  Contextual       Relevancy, Added-Value, Timeliness,
                   Completeness, Amount of data
Representational   Interpretability, Ease of understanding, Concise
                   representation, Consistent representation
 Intrinsic problem pattern
Multiple sources                  Judgement involved
 of same data                      in data production

  Questionable                        Questionable
  Believability                        Objectivity

                      Poor
                    Application

                   Little Added-
                        value

                  Data not used
      Accessibility problem pattern
 Lack of          Privacy /                 Computerizing and
resources       Confidentiality               data analyzing

   Poor
                                  Interpretability         Amount of Data
Accessibility

                   Access                       Inconsistent
                   Security                    Representation

                                                                timeliness


                   Barriers to data accessibility
   Contextual problem pattern
Data production        Change in data           Distributed
   problems           consumers needs           Computing


                                                 Inconsistent
            Incomplete Data
                                                Representation


                Poor                            Little Added-
              Relevancy                              value



                  Data utilization difficulty
                 Scientific data
Quality criteria (PARCC)       Possible quality hazards


     Precision                Misread Instruments
     Accuracy                Mislabeled samplings
Representativeness              Switched values
  Completeness                 Inaccuracy of tests
  Comparability            Wrong boundary conditions
  Environmental Risk Assessments
Objective:   Tries to assess the risk associated with chemicals
Approach:    Mathematical formulae that take into account
             physical properties, production volume and toxicity
Key value:   PEC/PNEC ratio

Problems:    *   Incomplete data
             *   Ambiguity
             *   Unreliability (data sources)
             *   Varying boundary conditions
  Information System Design
 How to ensure data / information quality?
 How to extract information from data?
 How to present Information?
 Summary
Data aggregation: Problems
       36 1             Critical Area
  34        2
                4

                             N


                    W               O


                             S
  Information System Design
 How to ensure data / information quality?
 How to extract information from data?
 How to present Information?
 Summary
   How to present information
Category           Dimensions
    Intrinsic      Accuracy, Objectivity, Believability, Reputation

  Accessibility    Accessibility, Access Security

  Contextual       Relevancy, Added-Value, Timeliness,
                   Completeness, Amount of data
Representational   Interpretability, Ease of understanding, Concise
                   representation, Consistent representation
                                    Example1: Everything ok?
                                        Presentation duration influencing the
                                         percentage of interested listeners
                    120




                    100
Interested listeners [%]




                           80




                           60




                           40




                           20




                           0
                                0   5    10   15   20   25   30   35   40   45   50   55   60   65   70   75   80   85   90

                                                   Presentation duration [minutes]
  Example2: Unstructured information
Smith 1963 male industry worker 123121212 feb no no
yes 17 232 Atlanta 30332 345642 GT station Wood 192
8 mail man 888008080 mar yes yes yes 12 345 Atlanta
30332 322212 GT station Smith 1963 male industry wo
rker 123121212 feb no no yes 17 232 Atlanta 30332 3
45642 GT station Wood 1928 mail man 888008080 mar y
es yes yes 12 345 Atlanta 30332 322212 GT station S
mith 1963 male industry worker 123121212 feb no no
yes 17 232 Atlanta 30332 345642 GT station Wood 192
8 mail man 888008080 mar yes yes yes 12 345 Atlanta
30332 322212 GT station Smith 1963 male industry wo
rker 123121212 feb no no yes 17 232 Atlanta 30332 3
45642 GT station Wood 1928 mail man 888008080 mar y
es yes yes 12 345 Atlanta 30332 322212 GT station S
Example3: Information overkill

           Your browser is…

!            You OS is …
            Your name is…
                  ...
                                       Click here!


                                Immediate
      Joke of the day                        Extra!
                                  action
                                           Free Pizza!
                                needed !!!
    Reactor temperature: 237F
  Information System Design
 How to ensure data / information quality?
 How to extract information from data?
 How to present Information?
 Summary
                 Summary
 Quality criteria are use(r) dependant
 Software Engineering should focus on the user
 Criteria can be emphasized differently
 Consistency is important when aggregating /
   extracting
 Data quality is the limiting factor of
   information quality
Data and Information Quality

          Information Systems Design

                   CS 6144




Jan Hannemann                   November 17, 1998
Data aggregation: Problems
                         5                                                                WRG                  WR
                         31

                              36 1                                                                  36 1
                        34 35      2                                                          34 35      2
                   33                    3                                               33                    3
              32              N               4                                     32             N                4
         31                                        5            5              31                                        5
       30                                              6                     30                                              6
                                                                    5
      29                                                   7                29                                                   7
 15   28                                                   8                28                                                   8
31    27   W                                       O       9                27   W                                       O       9
      26                                                   10               26                                                   10
                                                                     10 5
      25                                              11                    25                                              11
       24                                            12
                                                                    5        24                                            12
         23                                        13    10                    23                                        13
      10
              22
                   21         S          15
                                              14                                    22
                                                                                         21        S           15
                                                                                                                    14
                        20 19       16                                                        20 19       16
                              18 17                                                                 18 17



                                                                                     Mittelwert

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:1/25/2013
language:Latin
pages:22