# Assignment 3 (Due on 11102008) (Association Rule by umf86597

VIEWS: 121 PAGES: 4

• pg 1
```									INFS 755 (Fall 2008)

Assignment 3 (Due on 11/10/2008)
(Association Rule Mining, Clustering, Classification)
[Huzefa Rangwala]
Max Points: 80
Extra Credit Points 30

This is an individual assignment. Please ensure that assignment is submitted in class in hard copy
before start class. No late submissions allowed. There are three parts to the assignment. Some of
the parts require Weka. The point of the assignment is to have fun!

Part 1 (Clustering)

1. Use the similarity matrix in Table 8.1 to perform single and complete link hierarchical
clustering. Show your results by drawing a dendrogram. The dendrogram should clearly
show the order in which the points are merged. (20 points)

P1     P2     P3     P4     P5
P1   1.00   0.10   0.41   0.55   0.35
P2   0.10   1.00   0.64   0.47   0.98
P3   0.41   0.64   1.00   0.44   0.85
P4   0.55   0.47   0.44   1.00   0.76
P5   0.35   0.98   0.85   0.76   1.00

2. Download the “iris.arff” dataset from the class web-site. Load the Iris dataset into the
Weka Explorer. Ensure that you are not using the target labels in the clustering solution.
(20 points)

a. Perform k-means on this dataset to form “k” clusters. Note set values of k to be 2
to 10.
b. Plot SSE as k varies from 2 to 10. Describe the curve in brief.
c. For k=3, describe what classes (Setosa, Virginica, and Versicolor) each cluster
contains.
d. Compute the entropy, and a confusion matrix using the class labels for k =3.
What classes are confused ?
Part 2 (Association Rule Mining)

1. Given the lattice structure shown in Figure above and the transactions given in Table 6.3, label
each node with the following letter(s): (20 points)

• M if the node is a maximal frequent itemset,
• C if it is a closed frequent itemset,
• N if it is frequent but neither maximal nor closed, and
• I if it is infrequent.
Assume that the support threshold is equal to 30%.

2. Download the “basket.data” (Transaction Database from Clementine’s Demos (Another Data
Mining Toolkit). The goal of this data mining study is to find groups of product items often
bought together by the customers of a supermarket, whose baskets are represented in the baskets
dataset.

Understand the parameters for Apriori algorithm in weka. Run the algorithm with minimum
support (0.5) and confidence (0.9). Explain the output generated. Is it in line with the class
discussion on apriori rule mining. Explain the top two rules generated. Run the algorithm again
with support = 0.9, and confidence = 0.9. What changes did you observe in the two runs ? (20
points)
Part 3 (Classification)

1. Given the Bayesian
network shown above compute the following probabilities. (15 points)

(a) P(B = good, F = empty, G = empty, S = yes).
(b) P(B = bad, F = empty, G = not empty, S = no).
(c) Given that the battery is bad, compute the probability that the car will start.

2. Given the data sets shown in Figures (a) to (e), explain how the decision tree, naıve
Bayes, and k-nearest neighbor classifiers would perform on these data sets. Be sure to