Frequent Patterns
Miner -Tool for Mining
ABSTRACT
In this project report the main focus lies in the generation of frequent patterns which is
the most important task in explanation of the fundamentals of association rule mining.
This is done by analyzing the implementations of the well known association rule mining
algorithms like Apriori and Dynamic Item set counting Algorithm, and a new algorithm
‘Matrix Based Association Rule Mining Algorithm’. The association rule mining is a
fundamentally important task in the process of knowledge discovery in large databases.
Several algorithms have been developed for single-level, single-dimensional, Boolean
association rule mining. Some of them require a small amount of memory, but heavy disk
access and some necessitate low I/O activity, but large amount of memory. This
experimental system is developed using Java under Windows XP Operating System. Run
time behaviors of these algorithms are analyzed and compared using Mushroom dataset.
2.2 About Technical Environment:
Software Environment:
The object-oriented software development is a significant departure from the
traditional structured approach. Object-oriented systems have the ability to reuse code
and they are better designed, more resilient to change, and more reliable, since they are
built from completely tested and debugged classes. Because of these features, the
software for this project is built, following the object-oriented software development
approach.
To develop the software for this project, the Incremental process model is
chosen. Incremental process model combines elements of linear sequential model with
the iterative philosophy of prototyping.
In this project basic requirements will be addressed first to complete a core
product in the first increment and will be shown to the user. Based upon the user
evaluations on the core product, a plan for the next increment will be made. All the
requirements will be addressed gradually when the development reaches the final
increment. At each increment, before the delivery of the product, the analysis, design,
code and testing must be done.
The software development process is subdivided into small, interacting phases or
sub processes. Each phase must contain the following aspects:
A description of how it works
Specification of the input required for the process
Specification of the output to be produced.
Hardware Environment:
512MB RAM
Pentium 4
80GB HDD
REQUIREMENT ANALYSIS
AND
REQUIREMENT ELICITATION
Requirement Analysis & Requirement Elicitation:
Requirement Analysis:
To understand the requirements, it is required to identify the users/actors
that have the highest probability of using the system. A usecase is a typical interaction
between the user and a system that captures the user’s goals and needs.
Requirements Elicitation:
The system requirements are defined with the consultation of various
personnel in an organization or any company and the data sets are collected. Every
organization stores the transactions or the properties of the items they offer etc. This
system identifies the frequent patterns from their datasets and organizes the data into
association rules which will help in better maintenance of their company.
The requirements that are gathered are listed below:
1. The software should provide various algorithms to generate the frequent patterns.
2. The user must be able to select the data set of his choice.
3. Any user who wants to know about the frequent data patterns should be able to
provide the minimum support and generate the required.
4. Any user must be able to provide the minimum confidence. For the association
rules.
The execution time must be displayed for each algorithm which would help the user to
identify the best algorithm from them.
Problem Analysis:
FP-Miner is mainly aimed at identifying the frequent patterns form the Mushroom dataset
using different algorithms along with a newly proposed algorithm. The execution times
of these algorithms are compared and the best of them is identified.
Functional Requirements:
Functional requirements describe what the system should do, i.e. the services provided
for the users and for other systems.
Inputs:
Large Data Set : Mushroom Dataset
Type of Input:
Since the datasets are generated by various companies for their own
special purpose. It is the requirement that the data set should satisfy
certain constraints for being mined by various methods.
For being mined by these algorithms the data must be in numeric format
and in ascending of the properties.
Outputs:
Frequent patterns
Computations:
Verifying the correct data set is in the correct format.
Identifying the frequent patterns.
Generating the strong Association rules.
Comparing the Execution times of each algorithm.
Non- Functional Requirements:
Non functional requirements are constraints that must be adhered to during
development.
Maintainability and enhancement:
The system should be extendible with other data mining techniques:
Reusability:
The system should be 60% reusable so as to be applied for other
datasets.
Platform:
Windows XP Professional Operating System.
Technology to be used:
Programming Language
Java is chosen as the programming language for the implementation of the
system.
Software Requirement Specification
The reasons for choosing this language are
i. The design is object oriented and hence requires an object-oriented language.
ii. The system needs file management features which provide data access in various
formats such as Buffered Reader, FileReader from the files independent from
their storage format.
iii.The system is designed to work independent with respect to the platform to
provide portability of it.
The classical algorithms should be carefully analyzed and implemented
for providing the minimum execution time.
The dataset must be formatted before being applied to any algorithm.
USECASE MODEL OF THE SYSTEM
4. Use Case Model of the System
4.1 Use Case Scenarios
Usecase: Apriori
The system generates the support for each itemsets. If this support is
greater than the minimum support then those itemsets are displayed for the user as
frequent itemsets.
Usecase: Dynamic Itemset Counting
The system copies all the 1-itemsets into dashed circle. After reading one
interval of M transactions from database, check each itemset, in dashed circle. If it
exceeds the minimum support, change it from dashed circle to a dashed box.
Check each super set of dashed circle. If all the subsets of dashed circle are in
solid box or dashed box, then add it into dashed circle. Check each set in dashed
circle and dashed box. If it has been counted over all the transactions, change it
into solid circle if it is in circle or change it into solid box if it is in box.
Usecase: FP-Growth
The system compresses a large database into a compact, frequent–pattern–
tree (FP tree) structure. Fp – tree structure stores all necessary information about
frequent itemsets in a database. The frequent patterns are displayed to the user.
Usecase: Matrix Based Association Rule Mining
The system generates the frequency of each items present in the database.
This information is used for finding 1-frequent itemsets at the user given support.
Now frequent item table is constructed for frequent items.
Apriori
Dynamic Itemset
Counting
Data Set
File
User
FP-Growth
Matrix Based
Association
Fig: Usecase Diagram for the proposed system
Identifying Classes form the above Usecases:
Classes are an important mechanism for classifying objects. The chief role of a
class is to identify attributes, methods and applicability of its instances.
GUI
MBA
Apriori
FileName
FileName support
support
Mba()
apriori()
dicprocess FP-Growth
FileName FileName
support support
Dicprocess() FPGrowth()
Each algorithm is described as a class. Each class has the responsibility of identifying the
frequent patterns and displaying them to the user I their own prescribed way. The
attributes defined in each class is the filename and the minimum support each itemset
should contain.
DESIGN
5. Design of the System: In the context of software, design is problem
solving process whose objective is to find and describe a way to find and describe the
way to implement the functional requirements while respecting the constraints imposed
by the non functional requirements and by adhering to general principles of good quality.
The goal of the design process is to produce a model or representation of a system
which can be used later to build that system and use this model to build the overall
system.
The design process for software system has often two levels. At the first level the
focus is on deciding which modules are needed for the system, the specifications of these
modules, and how these modules should be interconnected. This is what is called the
system design or top-level design. In the second level, the internal design of the modules,
or how the specifications of the module can be satisfied, is decided. This design level
often called detailed design or logic design.
Top-level Design
Architectural Design
Detailed Design
Dynamic and behavioral Modeling
Sequence Diagrams
Collaboration Diagrams
Activity Diagrams
Class Design
User Interface Design
Structural Model
Class diagram:
In the Unified Modeling Language, a class diagram describes the structure of a
system by showing the system’s classes and the relationships between them.
Class Diagram contains the following elements:
A class which represents entities with common characteristics or features
attributes, operations and associations.
Association, which represent relationship between two or more classes where
relationships have common characteristics or features like attributes and
operations.
Class Name: AprioriTgui.
Attributes:
FileName -- Input Data File name.
startTtreeRef[] -- The reference to start of t-tree.
dataArray[][] -- 2-D aray to hold input data from file.
Methods:
getFileName() -- to get the input fiel.
aprioriT() -- to generate the frequent patterns.
createTtreeTopLevel() -- Create Top level of T-tree (First pass)
generateLevel2() -- Generate level 2
createTtreeLevelN() -- Further passes of the dataset
outputFrequentSets() -- prints the frequent patterns
createTtreeTopLevel2() -- Adds supports to level 1(top) of the T-tree.
addSupportToTtreeLevelN -- Add Support Values To T-Tree Level N
pruneLevelN -- Prune unsupported candidate sets
generateNextLevel ( ) -- Generate Next Level
FindItemSetInTtree () -- Find Item Set In T-Tree
OutputItemSet -- Outputs a given item set.
Main() -- which initiates the implementation
AprioriTgui.
+FileName:File
-startTtreeRef[]:TtreeNode[],
-dataArray[][]: short[][]
+numRows,numCols:int,+ support:double
-getFileName():void,-aprioriT() :void
#createTtreeTopLevel():void,#generateLevel2():void
#createTtreeLevelN():void,+outputFrequentSets():void
+createTtreeTopLevel2():void,
#addSupportToTtreeLevelN:void
#pruneLevelN ():void,#generateNextLevel( ):void
-findItemSetInTtree():bool,#outputItemSet():void
+main():void
Class Name: TtreeNode
Attributes:
Support -- The support associate with the itemset
represented by the node.
childRef[] -- A reference variable to the child of the
node
TtreeNode
#support:int
#childRef[]:TtreeNode[]
Class Name: dicprocess
Attributes:
Outfile -- to store the input file
DC,DS,SC, SS -- four states of tree node
N -- total item , M -- total transaction
Stepm -- step increment
tid; -- current line # of transaction
k -- current processing k-itemset
setnum -- item # in current transaction
minsup -- support, root -- hashtreenode object
DSset, DCset, SCset, Ssset—to store the frequent itemsets.
Methods:
Getconfig() -- open file config.txt
Getitemat() -- get an item from an itemset
Itemsetsize() -- get item number of an itemset
Dashfound() -- check the hashtree to see if exists dashed
node
Printhashtree -- print the whole hash tree
Transatrahashtree() -- recursive transaction traversal in hash tree
Checkcountedall() -- travese hashtree to chech if an itemset
Is counted
checkcounter()-- ( DC==>DS )travese hashtree to check if an
itemset node is stated DC
checkhashtree -- traversal hashtree
dicprocess
+Outfile:File
+DC,DS,SC, SS:int,+minsup:double
+N,M,Stepm,tid,k,setnum: int
+root:hashtreenode,+DSset, DCset, SCset, Ssset:String
+Getconfig():void,+Getitemat():int,+Itemsetsize():int
+Dashfound():bool,+Printhashtree():void,+Transatrahashtree():void
+Checkcountedall():void,+checkcounter(),:void
+Checkhashtree:void
ClassName: hashtreenode
Attributes:
State -- should be one of (DC,DS,SC,SS)
Itemset -- itemset that this node stores
Counter -- counte the number of occurrence in transactions
Starting -- transaction id when this node starts to be
counted
Startingk -- k's value when this node starts to be counted
hashtreenode
+state int
+itemset:String
+counter :int
+starting :int
+startingk: int
+ht :Hashtable
Class Name: FPGrowthApp
Attributes:
File f1 -- to store the data set
Double minsup -- the minimum support
Methods:
Start() -- Read data to be mined from file and reorder
and prune input data according to frequency
of single attributes and to build initial FP-tree.
SendData() -- to obtain the filename and support from the
GUI class
FPGrowthApp
-f1 : File
-minsup:Double
-Start():void
-SendData():void
Class Name: FPTree
Attributes:
protected FPtreeNode rootNode -- to store the root node
protected FPgrowthHeaderTable[] headerTable -- to store the header table
Methods:
public void createFPtree() -- Create header table
private void addToFPtree() -- add to fp-tree
private void addRefToFPgrowthHeaderTable()-- add ref to header table
public void startMining() -- to mine the FP tree
private void generateAncestorCodes()-- Generates ancestor itemSets
private void pruneAncestorCodes() -- Removes elements in
ancestor itemSets
private FPgrowthHeaderTable[] createLocalHeaderTable()--Creates a local
header table comprising those item that are Supported in the count array.
private FPtreeNode generateLocalFPtree()--Generates a local FP tree
public void outputFPtree() -- outputting FP-tree to screen
private void outputFPtreeNode()--outputting a given branch of an FP-tree
FPTree
# rootNode: FPtreeNode
# headerTable: FPgrowthHeaderTable[]
+ createFPtree() : void
- addToFPtree(): void
- addRefToFPgrowthHeaderTable(): void
+ startMining(): void
- generateAncestorCodes(): void
- pruneAncestorCodes(): void
- createLocalHeaderTable(): FPgrowthHeaderTable[]
- generateLocalFPtree(): FPtreeNode
+ outputFPtree(): void
- outputFPtreeNode(): void
Class Name: AssocRuleMining
Attributes:
protected int[][] conversionArray --1-D array used to reconvert input
protected short[] reconversionArray
protected String fileName -- for data file name
protected double support -- for % support
protected double confidence -- for % confidence
protected int numOneItemSets -- The number of one itemsets
Methods:
public void inputDataSet() -- process of getting input data
public void idInputDataOrdering()-- Reorders input data according to frequency
of single attributes.
protected int getNumSupOneItemSets() -- Gets number of supported
single item sets.
protected short[] removeElementN() -- Removes the nth
element/attribute from the given item set.
private int combinations() -- method to calculate all
possible combinations of a given item set.
public void outputDataArray() -- Outputs stored input data set
protected void outputItemSet() -- Outputs a given item set.
AssocRuleMining
# conversionArray : int[][]
# reconversionArray: short[]
# fileName: String,# support: double ,# confidence: double ,
# numOneItemSets: int
+ inputDataSet() : void
+ InputDataOrdering: void
# getNumSupOneItemSets() : int
# removeElementN() : short[]
- combinations() : int
+outputDataArray(): void,#OutputItemSet():void
Class Name: TotalSupportTree
Attributes:
protected TtreeNode[] startTtreeRef -- The reference to start of t-tree.
protected int numFrequentsets -- The number of frequent sets
protected long numUpdates -- The number of updates required to
generate the T-tree
Methods:
public void addToTtree() -- adding an itemset
protected int getSupportForItemSetInTtree() -- process for finding the
support value for the given item set in the T-tree
private int getSupForIsetInTtree2() --Returns the support value for the
given itemset
public void generateARs() -- Initiates process of generating
Association Rules
TotalSupportTree
# startTtreeRef :TtreeNode[]
# numFrequentsets :int
# numUpdates : long
+ addToTtree() : void
# getSupportForItemSetInTtree(): int
- getSupForIsetInTtree2(): int
+ generateARs():void
Class Name: Mba
Attributes:
File f1 -- to get the dataset
Int lff -- count the number of frequent itemsets
Double minsup -- support for the frequent itemsets
Methods:
void start() -- to generate the single frequent itemsets and
to construct the frequent matrix.
void generatelevelN() -- to generate the next level from the
obtained frequent itemsets after removing
the false frequent itemsets.
Void superset() -- to generate the superset of the one item
sets
Mba
-f1: File
-lff: Int
-minsup: Double
+start() :void
+generatelevelN(): void
+superset() :void
5.2 Behavioral Model:
Sequence Diagram:
Sequence Diagrams represents the interactions between classes to achieve
a result such as a usecase. The sequence diagram lists objects horizontally and time
vertically, and models these messages over time.
AprioriTgui dicProcess FPgrowthApp MBA
Select
open file
Check the
Enter file validity of the
name file
file name
valid Check the
Enter validity of the
Support support
Support
valid
Enter the
algorithm
If it is apriori,
Display execute
frequent
itemsets
If it is DIC, execute
Display frequent itemsets
If it is FPGrowth execute
Display frequent itemsets
If it is MBA
execute
Display frequent itemsets
5.3 Activity Diagram:
Activity diagrams describe the workflow behavior of a system. Activity diagrams
can show activities that are conditional or parallel. Activity Diagrams are useful for
analyzing a use case by describing what actions needs to take place and when they should
occur; describing a complicated sequential algorithm; and modeling applications with
parallel processes.
For the proposed system:
Select File name and enter support
no
Apriori class:
Select File name and enter support
Get the number of rows and number of columns
Generate first itemsets
Obtain the support count of each item set
For Dynamic Itemset Counting Class:
Select File name and enter support and enter step size
Get the number of rows and number of columns
Generate first itemsets until step size is reached
For FP-Growth class:
Select File name and enter support
Generate first itemsets Obtain the support count
Create the root of the tree labeled null
For Matrix Based Association Rule mining class:
no
Select File name and enter support
Get the number of rows and number of columns
Generate first itemsets
CODING AND IMPLEMENTATION
6. Coding and Implementation:
Sample Code:
private void aprioriT()
{
textArea.append("Apriori-T (Minimum support threshold = " + support + "%)\n------------
-----------------------------\n" +
"Generating K=1 large itemsets\n");
minSupportRows = numRows*support/100.0;
createTtreeTopLevel();
generateLevel2();
createTtreeLevelN();
textArea.append("\n");
outputFrequentSets();
}
protected void createTtreeTopLevel() {
startTtreeRef = new TtreeNode[numCols+1];
for (int index=1;index=1;index--) {
if (tableRef[index].nodeLink != null) {
startMining(tableRef[index].nodeLink,tableRef[index].itemName,
itemSetSofar);
}
}
}
protected void startMining(FPgrowthItemPrefixSubtreeNode nodeLink,
short itemName, short[] itemSetSofar) {
int support = genSupHeadTabItem(nodeLink);
short[] newCodeSofar = realloc2(itemSetSofar,itemName);
addToTtree(newCodeSofar,support);
startTempSets=null;
generateAncestorCodes(nodeLink);
if (startTempSets != null) {
FPgrowthColumnCounts[] countArray = countFPgrowthSingles();
FPgrowthHeaderTable[] localHeaderTable =
createLocalHeaderTable(countArray);
if (localHeaderTable != null) {
pruneAncestorCodes(countArray);
FPtreeNode localRoot = generateLocalFPtree(localHeaderTable);
startMining(localHeaderTable,newCodeSofar);
}}}
private int genSupHeadTabItem(FPgrowthItemPrefixSubtreeNode nodeLink) {
int counter = 0;
while(nodeLink != null) {
counter = counter+nodeLink.itemCount;
numUpdates++;
nodeLink = nodeLink.nodeLink;
}return(counter);
}
private void generateAncestorCodes(FPgrowthItemPrefixSubtreeNode ref) {
short[] ancestorCode = null;
int support;
while(ref != null) {
support = ref.itemCount;
ancestorCode = getAncestorCode(ref.parentRef);
if (ancestorCode != null) startTempSets =
new FPgrowthSupportedSets(ancestorCode,support,
startTempSets);
ref = ref.nodeLink;
}}
private short[] getAncestorCode(FPgrowthItemPrefixSubtreeNode ref) {
short[] itemSet = null;
if (ref == null) return(null);
while (ref != null) {
itemSet = realloc2(itemSet,ref.itemName);
ref = ref.parentRef;
}
return(itemSet);
}
private void pruneAncestorCodes(FPgrowthColumnCounts[] countArray) {
FPgrowthSupportedSets ref = startTempSets;
while(ref != null) {for(int index=0;index= minSupport) counter++;
}
if (counter == 1) return(null);
FPgrowthHeaderTable[] localHeaderTable =
new FPgrowthHeaderTable[counter];
int place=1;
for (int index=1;index= minSupport) {
localHeaderTable[place] = new
FPgrowthHeaderTable((short) countArray[index].columnNum);
place++;
}}return(localHeaderTable);}
private void orderLocalHeaderTable(FPgrowthHeaderTable[] localHeaderTable,
FPgrowthColumnCounts[] countArray) {
boolean isOrdered;
FPgrowthHeaderTable temp;
int index, place1, place2;
do {
index = 1;
isOrdered=true;
while (index countArray[place2].support) {
isOrdered=false;
temp = localHeaderTable[index];
localHeaderTable[index] = localHeaderTable[index+1];
localHeaderTable[index+1] = temp;
}index++;
}
} while (isOrdered==false);}
private FPtreeNode generateLocalFPtree(FPgrowthHeaderTable[] tableRef) {
FPgrowthSupportedSets ref = startTempSets;
FPtreeNode localRoot = new FPtreeNode();
while(ref != null) {
if (ref.itemSet != null) addToFPtree(localRoot,0,ref.itemSet,
ref.support,tableRef);
ref = ref.nodeLink;
}return(localRoot);}
private FPtreeNode[] reallocFPtreeChildRefs(FPtreeNode[] oldArray,
FPtreeNode newNode) {
if (oldArray == null) {
FPtreeNode[] newArray = {newNode};
tempIndex = 0;
return(newArray);
}
int oldArrayLength = oldArray.length;
FPtreeNode[] newArray = new FPtreeNode[oldArrayLength+1];
for (int index1=0;index1 < oldArrayLength;index1++) {
if (newNode.node.itemName < oldArray[index1].node.itemName) {
newArray[index1] = newNode;
for (int index2=index1;index2
newArray[index2+1] = oldArray[index2];
tempIndex = index1;
return(newArray);
}newArray[index1] = oldArray[index1];
}
// Default
newArray[oldArrayLength] = newNode;
tempIndex = oldArrayLength;
return(newArray);
}
public void outputItemPrefixSubtree() {
int flag;
System.out.println("PREFIX SUBTREE FROM HEADER TABLE");
for(int index=1;index
System.out.println("Header = " +
reconvertItem(headerTable[index].itemName));
flag = outputItemPrefixTree(headerTable[index].nodeLink);
if (flag!=1) System.out.println();
}System.out.println();}
private void outputItemPrefixSubtree(FPgrowthHeaderTable[] tableRef) {
int flag;
System.out.println("PREFIX SUBTREE FROM LOCAL HEADER TABLE");
for(int index=1;index
System.out.println("Header = " +
reconvertItem(tableRef[index].itemName));
flag = outputItemPrefixTree(tableRef[index].nodeLink);
if (flag!=1) System.out.println();
}System.out.println();
}
private int outputItemPrefixTree(FPgrowthItemPrefixSubtreeNode ref) {
int counter = 1;
while (ref != null) {
System.out.print("(" + counter + ") " +
(reconvertItem(ref.itemName)) + ":" + ref.itemCount + " ");
counter++;
ref = ref.nodeLink;
}return(counter);}
public void outputFPtree() {
System.out.println("FP TREE");
outputFPtreeNode1();
System.out.println();
}
private void outputFPtreeNode(FPtreeNode ref) {
System.out.println("LOCAL FP TREE");
outputFPtreeNode2(ref.childRefs,"");
System.out.println();
}
private void outputFPtreeNode1() {
outputFPtreeNode2(rootNode.childRefs,"");
}
private void outputFPtreeNode2(FPtreeNode ref[],String nodeID) {
if (ref == null) return;
for (int index=0;index
System.out.print("(" + nodeID + (index+1) + ") ");
outputItemPrefixSubtreeNode(ref[index].node);
outputFPtreeNode2(ref[index].childRefs,nodeID+(index+1)+".");
}
}
public void outputItemPrefixSubtreeNode(FPgrowthItemPrefixSubtreeNode ref) {
System.out.print((reconvertItem(ref.itemName)) + ":" + ref.itemCount);
if (ref.nodeLink != null) {
System.out.println(" (ref to " +(reconvertItem(ref.nodeLink.itemName)) + ":" +
ref.nodeLink.itemCount + ")");}
else System.out.println(" (ref to null)"); }
TESTING
7. Testing:
7.1 Testing Activities:
Software testing is a critical aspect of software quality assurance and represents the
ultimate service of specification and coding.
Software testing is a process used to help identify the correctness, completeness,
security, and quality of developed computer software.. During testing the program is
executed with a set of test cases and the output of the program for the test cases is
evaluated to determine if the program is performing as it is expected to.
The two main issues in software quality are
Validation or User satisfaction
Verification or Quality Assurance.
7.2 Black Box & White Box Testing:
Black box testing:
It is also known as functional testing. In this internal workings of the item being
tested are not known by the tester. The tester does not ever examine the programming
code and does not need any further knowledge of the program other than the
specifications.
White box testing:
White box testing is also known as Glass box, structural, clear box and open
box testing. In this technique explicit knowledge of the internal working of the item being
tested. The test is accurate only if the programmer knows what the program is supposed
to do. Tester can then check whether the program diverges from its intended goal. White
box testing does not account for errors caused by omission, and all visible code must also
be readable.
Black-Box Testing:
Black-Box testing methods are used to perform the validation testing. The
Validation Testing is carried to demonstrate conformity of the software with reference to
the requirements specified by the users. The developed software product is shown to the
user before delivery for their acceptance. The users accepted the product by evaluating it
after testing it by giving their inputs.
Test Case1: Valid file name
Input : Selecting the file name
Expected Output : If the file name selected is valid , the text
area is appended with the data in the file
Observed Output : same as expected.
Test Case2: Invalid file name
Input : Selecting the invalid file name
Expected Output : If the file name selected is invalid, the error
message must be displayed.
Observed Output : same as expected.
Test Case3: Data in the file not in order
Input : Selecting the file name
Expected Output : If the data in the file name selected is not in
order then the error message must be displayed.
Observed Output : same as expected.
Test Case 4: Valid Support
Input : enter support value
Expected Output : The support value must be appended to the
text area and the run button must be highlighted
Observed Output : same as expected.
Test Case 5: Invalid Support
Input : enter invalid support value
Expected Output : The error message must be displayed
Observed Output : same as expected.
White Box testing:
In this technique the internal working of the item is tested. Unit testing focuses on the
smallest compatible program unit – the sub program. Class testing for object-oriented
software is the equivalent of unit testing for conventional software. Class testing for
object oriented software is driven by the operations encapsulated by the class and state
behavior of the class. Class Testing for AprioriTgui class:
S.No. Test Input Expected Actual
output output
To test the functionality Invalid Print error Print error
of getFileName() File Name message message
1
To test the functionality Valid Display frequent Display frequent
of aprioriT() filename patterns patterns
2 and
support
To test the TtreeNode Generate one Generate one
functionality of itemsets itemsets
3 createTtreeTopLevel()
To test the TtreeNode Generate second Generate second
functionality of level of the tree level of the tree
4 generateLevel2()
To test the TtreeNode Prune the Prune the
functionality of unsupported unsupported
5 pruneLevelN () candidates candidates
To test the Itemset True if present and True if present
functionality of false if not and false if not
6 findItemSetInTtree()
To test the TtreeNode Generate the next Generate the next
level of the tree level of the tree
7 functionality
of
generateNextLevel( )
Class Testing for Dic class:
Expected Actual
Sno Test Input
Output Output
To test the
Print
functionality Invalid File
1. error
of Name
message
getconfig()
To test the
functionality Valid Display
2. of filename frequent
dicProcess() and support patterns
To test the Get an Get an
functionality itemset itemset
Index and
3. of from the from the
Itemset
Getitemat() given given
index index
To test the
Returns
functionality Returns
the
of the index
4. Itemset index of
Itemsetsize() of the
the
itemset
itemset
Return 1
To test the Return 1
if
functionality if dashed
hashtreeno dashed
5. of node
de node
Dashfound() exists
exists
else 0
else 0
To test the
Traverse Traverse
functionality
hashtreeno through through
6. of
de the hash the hash
Transatrahas
tree tree
htree()
To test the
functionality Display Display
hashtreeno
7. of the hash the hash
de
Printhashtree tree tree
()
To test the
functionality hashtreeno Traverse Traverse
8.
ofcheckcount de hash tree hash tree
er()
generate
To test the generate
all
functionality all subset
9. itemset subset
of given an
given an
Gensubset() itemset
itemset
Class Testing for FP-growthAPP class:
Expected Actual
Sno Test Input
Output Output
Valid
To test the
file Print Print
functionality
1. name error error
of Start()
and message message
support
Get the
To test the Get the
file
functionality file name
2. --- name
of SendData() and
and
support
support
Class Testing for FPTree class:
Expected Actual
S.No Test Input
Output Output
To test the Header
1 Header
functionality table
-- table must
of must be
be created
createFPtree() created
The The
2 To test the
FPtreeNode itemset itemset
functionality
, itemset , must be must be
of
support added to added to
addToFPtree()
the Fptree the Fptree
To test the
3 Generate Generate
functionality
HeaderTabl the the
of
e , itemset frequent frequent
startMining()
patterns patterns
To test the
4
functionality
Prefixsubtr Generate Generate
of
eenode itemsets itemsets
generateAnce
storCodes()
To test the Display Display
5
functionality FPTreeNod the the
of e frequent frequent
outputFPtree() patterns patterns
class Testing for TotalSupportTree class:
S.no Expected Actual
Test Input
Output Output
To test the Itemset The tree The tree
1 functionality of and must be must be
addToTtree() support appended appended
To test the
2 functionality of
getSupport Itemset support support
ForItemSetInTt
ree()
Expected Actual
Test Input
S.No Output Output
Valid Number Number
To test the
1 filename of rows of rows
functionality
and and and
of start()
support columns columns
To test the One
2 functionality Itemset Display Display
of and the the N- the N-
generatelevel pruned itemsets itemsets
N() dataset
To test the Supersets Superset
3 functionality
N-level
of the s of the
itemsets
of superset() itemset itemset
Class Testing for Mba class:
7.3 Test Code Report:
The FPMiner tool is implemented using Java language and all the experiments are
performed on 1.7GHz PC machine with 256MB memory. The Operating System is
WindowsXP.
Experiment 1:
Execution times for different support for different algorithms can be tabulated as
follows:
Execu
Execution
Execution Execution tion
Suppor time of
time of time of time
t FP-
AprioriT DIC of
Growth
MBA
991m
50 187ms 226754ms 94ms
s
988m
60 110ms 184297ms 74ms
s
984m
70 78ms 161265ms 46ms
s
954m
80 47ms 106953ms 32ms
s
90 32ms 74984ms 31ms 938m
Experiment 2:
The number of frequent itemsets generated using different algorithms:
Frequent itemsets
Support
generated
50 153
60 51
Aprior
70 31
iT
80 23
90 9
50 153
60 51
MBA 70 31
80 23
90 9
SCREENS AND REPORTS
8. Screens & Reports:
Case1: Valid file name
Input : Selecting the file name
Expected Output : If the file name selected is valid , the text
area is appended with the data in the file
Observed Output : same as expected.
Case2: Invalid file name
Input : Selecting the invalid file name
Expected Output : If the file name selected is invalid, the error
message must be displayed.
Observed Output : same as expected.
Case3: Data in the file not in order
Input : Selecting the file name
Expected Output : If the data in the file name selected is not in
order then the error message must be displayed.
Observed Output : same as expected.
Case 4: Valid Support
Input : enter support value
Expected Output : The support value must be appended to the
text area and the run button must be highlighted
Observed Output : same as expected.
Case 5: Invalid Support
Input : enter invalid support value
Expected Output : The error message must be displayed
Observed Output : same as expected.
Case 6:
To test whether the system is opening the require dataset.
Expected output: The dataset is appended to the text area.
Case 7:
To test whether the system is accepting the minimum support.
Expected output: The support is appended to the text area.
Case 8:
The system must provide an option to select the required algorithm.
Expected Output: the algorithm selection dialog box must be displayed.
Case 9:
The system must run the select the required algorithm.
Expected Output: the algorithm must display the frequent patterns.
Case 10:
If the selected algorithm is DIC then the step length must be entered
Expected Output: the dialog box must be displayed
Case 11:
If the selected algorithm is DIC the algorithm must be executed
Expected Output: the frequent patterns must be displayed
Case 12:
If the selected algorithm is FP-Growth the algorithm must be executed
Expected Output: the frequent patterns must be displayed
Case 13:
If the selected algorithm is DIC the algorithm must be executed
Expected Output: the frequent patterns must be displayed
CONCLUSION
Conclusion
Frequent Pattern mining is used for finding frequent itemsets among items
in a given data set. An objective of frequent pattern mining is to develop a systematic
method using the given data set and find frequent items.
This project work is focused on explaining the fundamentals of association rule
mining and analyzing the implementation of the well known
association rule algorithms by comparing the execution time for generating frequent item
sets with the different minimum support values.
Study focuses on algorithms Apriori, FP-Growth and Dynamic Itemset Counting.
At the same time, we described a new approach for association rule mining based on
matrix based association rule mining.
The results show that Apriori cannot be run very effective than FP -
Tree. Apriori on the other hand runs too slow because each transaction contains density
data. DIC (Dynamic Itemset Counting) is much slower than every other algorithm for the
real -dataset.