Data Warehousing by wanghonghx

VIEWS: 5 PAGES: 16

									Chapter-4: What Can Data Mining Do?
          www.AhsanAbdullah.com




                                           1
             (c) 2008 Dr. Ahsan Abdullah
  What Can Data Mining Do?

 Classification
 Estimation
 Prediction
 Market-Basket Analysis
 Clustering
 Description
                                              2
                (c) 2008 Dr. Ahsan Abdullah
            1. CLASSIFICATION
 Classification consists of examining the properties
  of a newly presented observation and assigning it
  to a predefined class.

   Assigning customers to predefined customer segments
    (good vs. bad)

   Assigning keywords to articles

   Classifying credit applicants as low, medium, or high risk

   Classifying instructor rating as excellent, very good,
    good, fair, or poor
                                                             3
                        (c) 2008 Dr. Ahsan Abdullah
              2. ESTIMATION
  As opposed to discrete outcome of classification i.e.
  YES or NO, deals with continuous valued outcomes

Example:
  Building a model and assigning a value from 0 to 1
  to each member of the set.

  Then classifying the members into categories based
  on a threshold value.

  As the threshold changes the class changes.       4
                     (c) 2008 Dr. Ahsan Abdullah
              3. PREDICTION
  Same as classification or estimation except
  records are classified according to some predicted
  future behavior or estimated value.

  Using classification or estimation on a training
  example with known predicted values and
  historical data a model is built.

  Then explain the known values, and use the model
  to predict future.

Example:
  Predicting how much customers will spend during
  next 6 months.
                                                     5
                     (c) 2008 Dr. Ahsan Abdullah
4. MARKET BASKET ANALYSIS
Determining which things go together, e.g. items in
a shopping cart at a super market.

Used to identify cross-selling opportunities

Design attractive packages or groupings of
products and services or increasing price of some
items etc.




                                                  6
                    (c) 2008 Dr. Ahsan Abdullah
4. MARKET BASKET ANALYSIS


         A
             Y
         B




  98% of people who purchased items A and B
            also purchased item C


                                               7
                 (c) 2008 Dr. Ahsan Abdullah
    Discovering Association Rules
 Given a set of records, each containing set of items
    Produce dependency rules that predict occurrences of an
     item based on others


 Applications:
    Marketing, sales promotion and shelf management
    Inventory management
    TID   Items                                  Rules:
    1     Bread, Cola, Milk
                                                      {Milk}  {Cola}
    2     Juice, Bread
    3     Juice, Cola, Diaper, Milk
                                                      {Diaper, Milk}  {Juice}
    4     Juice, Bread, Diaper, Milk
    5     Cola, Diaper, Milk
                                                                                 8
                               (c) 2008 Dr. Ahsan Abdullah
            5. CLUSTERING
Task of segmenting a heterogeneous population
into a number of more homogenous sub-groups or
clusters.

Unlike classification, it does NOT depend on
predefined classes.

It is up to you to determine what meaning, if any, to
attached to resulting clusters.

It could be the first step to the market segmentation
effort.
                                                    9
                     (c) 2008 Dr. Ahsan Abdullah
Ambiguity in Clustering




  How many clusters?
                                      Two clusters
                                      Four clusters
                                      Six clusters
                                                 10
        (c) 2008 Dr. Ahsan Abdullah
             6. DESCRIPTION
Describe what is going on in a complicated database so as to
increase our understanding.

A good description of a behavior will suggest an explanation
as well.




                                                          11
                       (c) 2008 Dr. Ahsan Abdullah
      Comparing The Methods (1)
 Predictive accuracy: this refers to the ability of the
  model to correctly predict the class label of new or
  previously unseen data

 Speed: this refers to the computation costs
  involved in generating and using the method.

 Robustness: this is the ability of the method to
  make correct predictions/groupings given noisy
  data or data with missing values


                                                       12
                       (c) 2008 Dr. Ahsan Abdullah
      Comparing The Methods (2)
 Scalability: this refers to the ability of the method to
  handle problem efficiently given large amount of data.

 Interpretability: this refers to the level of understanding
  and insight that is provided by the method.

 Simplicity:
    decision tree size
    rule compactness

 Domain-dependent quality indicators


                                                                13
                          (c) 2008 Dr. Ahsan Abdullah
Some Applications of Data Mining




                                          14
            (c) 2008 Dr. Ahsan Abdullah
Some Applications of Data Mining
 Telecommunications: Some typical applications:
    Fraud Detection
    Marketing/Customer Profiling
    Network Fault Isolation

 Insurance: Some applications:
    Detecting Exceptional Claims
    Identifying claims requiring special treatment
    Identifying verbalized scenarios

 Banking: Some applications:
    Credit risk assessment
    Credit threat risks

 Customer Acquisition: Techniques used:
    Market-basket Analysis
    Clustering
                                                          15
                            (c) 2008 Dr. Ahsan Abdullah
Some Applications of Data Mining
 CRM: Some applications:
     Generating on-line catalogues

 E-Commerce: Applications include:
     Identify performance limitations
     Perform required diligence on the data itself
     Examine temporal patterns of transactions
     Detect, analyze, and mitigate fraud
     Examine connection times required for data exchange between
      different network nodes
     Investigate alternative routing strategies, database replication
      costs, and throughput to alleviate data traffic

 Bioinformatics: Applications include:
       Sequence clustering/classification
       Protein interaction networks
       Whole, multiple genome comparison
       Visualization and Image Analysis
       Text mining and ontologiesDr. Ahsan Abdullah
                               (c) 2008
                                                                         16

								
To top