# Data Warehousing by wanghonghx

VIEWS: 5 PAGES: 16

• pg 1
```									Chapter-4: What Can Data Mining Do?
www.AhsanAbdullah.com

1
(c) 2008 Dr. Ahsan Abdullah
What Can Data Mining Do?

 Classification
 Estimation
 Prediction
 Clustering
 Description
2
(c) 2008 Dr. Ahsan Abdullah
1. CLASSIFICATION
 Classification consists of examining the properties
of a newly presented observation and assigning it
to a predefined class.

 Assigning customers to predefined customer segments

 Assigning keywords to articles

 Classifying credit applicants as low, medium, or high risk

 Classifying instructor rating as excellent, very good,
good, fair, or poor
3
(c) 2008 Dr. Ahsan Abdullah
2. ESTIMATION
As opposed to discrete outcome of classification i.e.
YES or NO, deals with continuous valued outcomes

Example:
Building a model and assigning a value from 0 to 1
to each member of the set.

Then classifying the members into categories based
on a threshold value.

As the threshold changes the class changes.       4
(c) 2008 Dr. Ahsan Abdullah
3. PREDICTION
Same as classification or estimation except
records are classified according to some predicted
future behavior or estimated value.

Using classification or estimation on a training
example with known predicted values and
historical data a model is built.

Then explain the known values, and use the model
to predict future.

Example:
Predicting how much customers will spend during
next 6 months.
5
(c) 2008 Dr. Ahsan Abdullah
4. MARKET BASKET ANALYSIS
Determining which things go together, e.g. items in
a shopping cart at a super market.

Used to identify cross-selling opportunities

Design attractive packages or groupings of
products and services or increasing price of some
items etc.

6
(c) 2008 Dr. Ahsan Abdullah
4. MARKET BASKET ANALYSIS

A
Y
B

98% of people who purchased items A and B
also purchased item C

7
(c) 2008 Dr. Ahsan Abdullah
Discovering Association Rules
 Given a set of records, each containing set of items
 Produce dependency rules that predict occurrences of an
item based on others

 Applications:
 Marketing, sales promotion and shelf management
 Inventory management
TID   Items                                  Rules:
1     Bread, Cola, Milk
{Milk}  {Cola}
3     Juice, Cola, Diaper, Milk
{Diaper, Milk}  {Juice}
4     Juice, Bread, Diaper, Milk
5     Cola, Diaper, Milk
8
(c) 2008 Dr. Ahsan Abdullah
5. CLUSTERING
Task of segmenting a heterogeneous population
into a number of more homogenous sub-groups or
clusters.

Unlike classification, it does NOT depend on
predefined classes.

It is up to you to determine what meaning, if any, to
attached to resulting clusters.

It could be the first step to the market segmentation
effort.
9
(c) 2008 Dr. Ahsan Abdullah
Ambiguity in Clustering

How many clusters?
Two clusters
Four clusters
Six clusters
10
(c) 2008 Dr. Ahsan Abdullah
6. DESCRIPTION
Describe what is going on in a complicated database so as to
increase our understanding.

A good description of a behavior will suggest an explanation
as well.

11
(c) 2008 Dr. Ahsan Abdullah
Comparing The Methods (1)
 Predictive accuracy: this refers to the ability of the
model to correctly predict the class label of new or
previously unseen data

 Speed: this refers to the computation costs
involved in generating and using the method.

 Robustness: this is the ability of the method to
make correct predictions/groupings given noisy
data or data with missing values

12
(c) 2008 Dr. Ahsan Abdullah
Comparing The Methods (2)
 Scalability: this refers to the ability of the method to
handle problem efficiently given large amount of data.

 Interpretability: this refers to the level of understanding
and insight that is provided by the method.

 Simplicity:
 decision tree size
 rule compactness

 Domain-dependent quality indicators

13
(c) 2008 Dr. Ahsan Abdullah
Some Applications of Data Mining

14
(c) 2008 Dr. Ahsan Abdullah
Some Applications of Data Mining
 Telecommunications: Some typical applications:
 Fraud Detection
 Marketing/Customer Profiling
 Network Fault Isolation

 Insurance: Some applications:
 Detecting Exceptional Claims
 Identifying claims requiring special treatment
 Identifying verbalized scenarios

 Banking: Some applications:
 Credit risk assessment
 Credit threat risks

 Customer Acquisition: Techniques used:
 Clustering
15
(c) 2008 Dr. Ahsan Abdullah
Some Applications of Data Mining
 CRM: Some applications:
 Generating on-line catalogues

 E-Commerce: Applications include:
 Identify performance limitations
 Perform required diligence on the data itself
 Examine temporal patterns of transactions
 Detect, analyze, and mitigate fraud
 Examine connection times required for data exchange between
different network nodes
 Investigate alternative routing strategies, database replication
costs, and throughput to alleviate data traffic

 Bioinformatics: Applications include:
   Sequence clustering/classification
   Protein interaction networks
   Whole, multiple genome comparison
   Visualization and Image Analysis
   Text mining and ontologiesDr. Ahsan Abdullah
(c) 2008
16

```
To top