Embed
Email

Frequent Patterns Miner -Tool for Mining project

Document Sample
Frequent Patterns Miner  -Tool for Mining  project
Shared by: Anil kumar
Categories
Tags
Stats
views:
10
posted:
2/3/2012
language:
pages:
69
Frequent Patterns

Miner -Tool for Mining

ABSTRACT

In this project report the main focus lies in the generation of frequent patterns which is

the most important task in explanation of the fundamentals of association rule mining.

This is done by analyzing the implementations of the well known association rule mining

algorithms like Apriori and Dynamic Item set counting Algorithm, and a new algorithm

‘Matrix Based Association Rule Mining Algorithm’. The association rule mining is a

fundamentally important task in the process of knowledge discovery in large databases.

Several algorithms have been developed for single-level, single-dimensional, Boolean

association rule mining. Some of them require a small amount of memory, but heavy disk

access and some necessitate low I/O activity, but large amount of memory. This

experimental system is developed using Java under Windows XP Operating System. Run

time behaviors of these algorithms are analyzed and compared using Mushroom dataset.





2.2 About Technical Environment:

Software Environment:

The object-oriented software development is a significant departure from the

traditional structured approach. Object-oriented systems have the ability to reuse code

and they are better designed, more resilient to change, and more reliable, since they are

built from completely tested and debugged classes. Because of these features, the

software for this project is built, following the object-oriented software development

approach.

To develop the software for this project, the Incremental process model is

chosen. Incremental process model combines elements of linear sequential model with

the iterative philosophy of prototyping.

In this project basic requirements will be addressed first to complete a core

product in the first increment and will be shown to the user. Based upon the user

evaluations on the core product, a plan for the next increment will be made. All the

requirements will be addressed gradually when the development reaches the final

increment. At each increment, before the delivery of the product, the analysis, design,

code and testing must be done.





The software development process is subdivided into small, interacting phases or

sub processes. Each phase must contain the following aspects:

 A description of how it works

 Specification of the input required for the process

 Specification of the output to be produced.





Hardware Environment:

 512MB RAM

 Pentium 4

 80GB HDD

REQUIREMENT ANALYSIS

AND

REQUIREMENT ELICITATION

Requirement Analysis & Requirement Elicitation:

Requirement Analysis:

To understand the requirements, it is required to identify the users/actors

that have the highest probability of using the system. A usecase is a typical interaction

between the user and a system that captures the user’s goals and needs.





Requirements Elicitation:





The system requirements are defined with the consultation of various

personnel in an organization or any company and the data sets are collected. Every

organization stores the transactions or the properties of the items they offer etc. This

system identifies the frequent patterns from their datasets and organizes the data into

association rules which will help in better maintenance of their company.





The requirements that are gathered are listed below:





1. The software should provide various algorithms to generate the frequent patterns.

2. The user must be able to select the data set of his choice.

3. Any user who wants to know about the frequent data patterns should be able to

provide the minimum support and generate the required.

4. Any user must be able to provide the minimum confidence. For the association

rules.

The execution time must be displayed for each algorithm which would help the user to

identify the best algorithm from them.









Problem Analysis:

FP-Miner is mainly aimed at identifying the frequent patterns form the Mushroom dataset

using different algorithms along with a newly proposed algorithm. The execution times

of these algorithms are compared and the best of them is identified.









Functional Requirements:

Functional requirements describe what the system should do, i.e. the services provided

for the users and for other systems.





Inputs:

 Large Data Set : Mushroom Dataset





Type of Input:





 Since the datasets are generated by various companies for their own

special purpose. It is the requirement that the data set should satisfy

certain constraints for being mined by various methods.





 For being mined by these algorithms the data must be in numeric format

and in ascending of the properties.





Outputs:

 Frequent patterns





Computations:

 Verifying the correct data set is in the correct format.

 Identifying the frequent patterns.

 Generating the strong Association rules.

 Comparing the Execution times of each algorithm.





Non- Functional Requirements:

Non functional requirements are constraints that must be adhered to during

development.

Maintainability and enhancement:

 The system should be extendible with other data mining techniques:

Reusability:

 The system should be 60% reusable so as to be applied for other

datasets.

Platform:

 Windows XP Professional Operating System.





Technology to be used:





 Programming Language

Java is chosen as the programming language for the implementation of the

system.

Software Requirement Specification

The reasons for choosing this language are

i. The design is object oriented and hence requires an object-oriented language.

ii. The system needs file management features which provide data access in various

formats such as Buffered Reader, FileReader from the files independent from

their storage format.

iii.The system is designed to work independent with respect to the platform to

provide portability of it.

 The classical algorithms should be carefully analyzed and implemented

for providing the minimum execution time.

 The dataset must be formatted before being applied to any algorithm.









USECASE MODEL OF THE SYSTEM



4. Use Case Model of the System





4.1 Use Case Scenarios

Usecase: Apriori

The system generates the support for each itemsets. If this support is

greater than the minimum support then those itemsets are displayed for the user as

frequent itemsets.

Usecase: Dynamic Itemset Counting

The system copies all the 1-itemsets into dashed circle. After reading one

interval of M transactions from database, check each itemset, in dashed circle. If it

exceeds the minimum support, change it from dashed circle to a dashed box.

Check each super set of dashed circle. If all the subsets of dashed circle are in

solid box or dashed box, then add it into dashed circle. Check each set in dashed

circle and dashed box. If it has been counted over all the transactions, change it

into solid circle if it is in circle or change it into solid box if it is in box.

Usecase: FP-Growth

The system compresses a large database into a compact, frequent–pattern–

tree (FP tree) structure. Fp – tree structure stores all necessary information about

frequent itemsets in a database. The frequent patterns are displayed to the user.





Usecase: Matrix Based Association Rule Mining

The system generates the frequency of each items present in the database.

This information is used for finding 1-frequent itemsets at the user given support.

Now frequent item table is constructed for frequent items.









Apriori







Dynamic Itemset

Counting

Data Set

File



User

FP-Growth









Matrix Based

Association



Fig: Usecase Diagram for the proposed system

Identifying Classes form the above Usecases:





Classes are an important mechanism for classifying objects. The chief role of a

class is to identify attributes, methods and applicability of its instances.





GUI





MBA

Apriori

FileName

FileName support

support

Mba()

apriori()



dicprocess FP-Growth

FileName FileName

support support

Dicprocess() FPGrowth()



Each algorithm is described as a class. Each class has the responsibility of identifying the

frequent patterns and displaying them to the user I their own prescribed way. The

attributes defined in each class is the filename and the minimum support each itemset

should contain.

DESIGN



5. Design of the System: In the context of software, design is problem

solving process whose objective is to find and describe a way to find and describe the

way to implement the functional requirements while respecting the constraints imposed

by the non functional requirements and by adhering to general principles of good quality.





The goal of the design process is to produce a model or representation of a system

which can be used later to build that system and use this model to build the overall

system.





The design process for software system has often two levels. At the first level the

focus is on deciding which modules are needed for the system, the specifications of these

modules, and how these modules should be interconnected. This is what is called the

system design or top-level design. In the second level, the internal design of the modules,

or how the specifications of the module can be satisfied, is decided. This design level

often called detailed design or logic design.





 Top-level Design

 Architectural Design

 Detailed Design

 Dynamic and behavioral Modeling

 Sequence Diagrams

 Collaboration Diagrams

 Activity Diagrams

 Class Design

 User Interface Design









Structural Model

Class diagram:

In the Unified Modeling Language, a class diagram describes the structure of a

system by showing the system’s classes and the relationships between them.





Class Diagram contains the following elements:

 A class which represents entities with common characteristics or features

attributes, operations and associations.

 Association, which represent relationship between two or more classes where

relationships have common characteristics or features like attributes and

operations.





Class Name: AprioriTgui.





Attributes:

FileName -- Input Data File name.

startTtreeRef[] -- The reference to start of t-tree.

dataArray[][] -- 2-D aray to hold input data from file.

Methods:

getFileName() -- to get the input fiel.

aprioriT() -- to generate the frequent patterns.

createTtreeTopLevel() -- Create Top level of T-tree (First pass)

generateLevel2() -- Generate level 2

createTtreeLevelN() -- Further passes of the dataset

outputFrequentSets() -- prints the frequent patterns

createTtreeTopLevel2() -- Adds supports to level 1(top) of the T-tree.

addSupportToTtreeLevelN -- Add Support Values To T-Tree Level N

pruneLevelN -- Prune unsupported candidate sets

generateNextLevel ( ) -- Generate Next Level

FindItemSetInTtree () -- Find Item Set In T-Tree

OutputItemSet -- Outputs a given item set.

Main() -- which initiates the implementation









AprioriTgui.



+FileName:File

-startTtreeRef[]:TtreeNode[],

-dataArray[][]: short[][]

+numRows,numCols:int,+ support:double

-getFileName():void,-aprioriT() :void

#createTtreeTopLevel():void,#generateLevel2():void

#createTtreeLevelN():void,+outputFrequentSets():void

+createTtreeTopLevel2():void,

#addSupportToTtreeLevelN:void

#pruneLevelN ():void,#generateNextLevel( ):void

-findItemSetInTtree():bool,#outputItemSet():void

+main():void









Class Name: TtreeNode

Attributes:

Support -- The support associate with the itemset

represented by the node.

childRef[] -- A reference variable to the child of the

node



TtreeNode

#support:int

#childRef[]:TtreeNode[]







Class Name: dicprocess

Attributes:

Outfile -- to store the input file

DC,DS,SC, SS -- four states of tree node

N -- total item , M -- total transaction

Stepm -- step increment

tid; -- current line # of transaction

k -- current processing k-itemset

setnum -- item # in current transaction

minsup -- support, root -- hashtreenode object

DSset, DCset, SCset, Ssset—to store the frequent itemsets.





Methods:

Getconfig() -- open file config.txt

Getitemat() -- get an item from an itemset

Itemsetsize() -- get item number of an itemset

Dashfound() -- check the hashtree to see if exists dashed

node

Printhashtree -- print the whole hash tree

Transatrahashtree() -- recursive transaction traversal in hash tree

Checkcountedall() -- travese hashtree to chech if an itemset

Is counted

checkcounter()-- ( DC==>DS )travese hashtree to check if an

itemset node is stated DC

checkhashtree -- traversal hashtree









dicprocess

+Outfile:File

+DC,DS,SC, SS:int,+minsup:double

+N,M,Stepm,tid,k,setnum: int

+root:hashtreenode,+DSset, DCset, SCset, Ssset:String

+Getconfig():void,+Getitemat():int,+Itemsetsize():int

+Dashfound():bool,+Printhashtree():void,+Transatrahashtree():void

+Checkcountedall():void,+checkcounter(),:void

+Checkhashtree:void



ClassName: hashtreenode

Attributes:

State -- should be one of (DC,DS,SC,SS)

Itemset -- itemset that this node stores

Counter -- counte the number of occurrence in transactions

Starting -- transaction id when this node starts to be

counted

Startingk -- k's value when this node starts to be counted









hashtreenode



+state int

+itemset:String

+counter :int

+starting :int

+startingk: int

+ht :Hashtable









Class Name: FPGrowthApp





Attributes:

File f1 -- to store the data set

Double minsup -- the minimum support

Methods:

Start() -- Read data to be mined from file and reorder

and prune input data according to frequency

of single attributes and to build initial FP-tree.

SendData() -- to obtain the filename and support from the

GUI class

FPGrowthApp

-f1 : File

-minsup:Double

-Start():void

-SendData():void

Class Name: FPTree





Attributes:

protected FPtreeNode rootNode -- to store the root node

protected FPgrowthHeaderTable[] headerTable -- to store the header table

Methods:

public void createFPtree() -- Create header table

private void addToFPtree() -- add to fp-tree

private void addRefToFPgrowthHeaderTable()-- add ref to header table

public void startMining() -- to mine the FP tree

private void generateAncestorCodes()-- Generates ancestor itemSets

private void pruneAncestorCodes() -- Removes elements in

ancestor itemSets

private FPgrowthHeaderTable[] createLocalHeaderTable()--Creates a local

header table comprising those item that are Supported in the count array.

private FPtreeNode generateLocalFPtree()--Generates a local FP tree

public void outputFPtree() -- outputting FP-tree to screen

private void outputFPtreeNode()--outputting a given branch of an FP-tree





FPTree

# rootNode: FPtreeNode

# headerTable: FPgrowthHeaderTable[]

+ createFPtree() : void

- addToFPtree(): void

- addRefToFPgrowthHeaderTable(): void

+ startMining(): void

- generateAncestorCodes(): void

- pruneAncestorCodes(): void

- createLocalHeaderTable(): FPgrowthHeaderTable[]

- generateLocalFPtree(): FPtreeNode

+ outputFPtree(): void

- outputFPtreeNode(): void









Class Name: AssocRuleMining

Attributes:

protected int[][] conversionArray --1-D array used to reconvert input

protected short[] reconversionArray

protected String fileName -- for data file name

protected double support -- for % support

protected double confidence -- for % confidence

protected int numOneItemSets -- The number of one itemsets

Methods:





public void inputDataSet() -- process of getting input data

public void idInputDataOrdering()-- Reorders input data according to frequency

of single attributes.

protected int getNumSupOneItemSets() -- Gets number of supported

single item sets.

protected short[] removeElementN() -- Removes the nth

element/attribute from the given item set.

private int combinations() -- method to calculate all

possible combinations of a given item set.

public void outputDataArray() -- Outputs stored input data set

protected void outputItemSet() -- Outputs a given item set.







AssocRuleMining

# conversionArray : int[][]

# reconversionArray: short[]

# fileName: String,# support: double ,# confidence: double ,

# numOneItemSets: int



+ inputDataSet() : void

+ InputDataOrdering: void

# getNumSupOneItemSets() : int

# removeElementN() : short[]

- combinations() : int

+outputDataArray(): void,#OutputItemSet():void









Class Name: TotalSupportTree





Attributes:

protected TtreeNode[] startTtreeRef -- The reference to start of t-tree.

protected int numFrequentsets -- The number of frequent sets

protected long numUpdates -- The number of updates required to

generate the T-tree

Methods:

public void addToTtree() -- adding an itemset

protected int getSupportForItemSetInTtree() -- process for finding the

support value for the given item set in the T-tree

private int getSupForIsetInTtree2() --Returns the support value for the

given itemset

public void generateARs() -- Initiates process of generating

Association Rules







TotalSupportTree

# startTtreeRef :TtreeNode[]

# numFrequentsets :int

# numUpdates : long



+ addToTtree() : void

# getSupportForItemSetInTtree(): int

- getSupForIsetInTtree2(): int

+ generateARs():void









Class Name: Mba

Attributes:





File f1 -- to get the dataset

Int lff -- count the number of frequent itemsets

Double minsup -- support for the frequent itemsets





Methods:

void start() -- to generate the single frequent itemsets and

to construct the frequent matrix.

void generatelevelN() -- to generate the next level from the

obtained frequent itemsets after removing

the false frequent itemsets.

Void superset() -- to generate the superset of the one item

sets









Mba

-f1: File

-lff: Int

-minsup: Double





+start() :void

+generatelevelN(): void

+superset() :void









5.2 Behavioral Model:

Sequence Diagram:





Sequence Diagrams represents the interactions between classes to achieve

a result such as a usecase. The sequence diagram lists objects horizontally and time

vertically, and models these messages over time.

AprioriTgui dicProcess FPgrowthApp MBA





Select

open file

Check the

Enter file validity of the

name file

file name

valid Check the

Enter validity of the

Support support



Support

valid



Enter the

algorithm

If it is apriori,

Display execute

frequent

itemsets



If it is DIC, execute







Display frequent itemsets

If it is FPGrowth execute





Display frequent itemsets

If it is MBA

execute



Display frequent itemsets







5.3 Activity Diagram:



Activity diagrams describe the workflow behavior of a system. Activity diagrams

can show activities that are conditional or parallel. Activity Diagrams are useful for

analyzing a use case by describing what actions needs to take place and when they should

occur; describing a complicated sequential algorithm; and modeling applications with

parallel processes.



For the proposed system:









Select File name and enter support

no

Apriori class:









Select File name and enter support









Get the number of rows and number of columns







Generate first itemsets









Obtain the support count of each item set

For Dynamic Itemset Counting Class:









Select File name and enter support and enter step size









Get the number of rows and number of columns







Generate first itemsets until step size is reached

For FP-Growth class:









Select File name and enter support







Generate first itemsets Obtain the support count







Create the root of the tree labeled null

For Matrix Based Association Rule mining class:

no









Select File name and enter support









Get the number of rows and number of columns







Generate first itemsets

CODING AND IMPLEMENTATION









6. Coding and Implementation:





Sample Code:

private void aprioriT()

{

textArea.append("Apriori-T (Minimum support threshold = " + support + "%)\n------------

-----------------------------\n" +

"Generating K=1 large itemsets\n");

minSupportRows = numRows*support/100.0;

createTtreeTopLevel();

generateLevel2();

createTtreeLevelN();

textArea.append("\n");

outputFrequentSets();

}

protected void createTtreeTopLevel() {

startTtreeRef = new TtreeNode[numCols+1];

for (int index=1;index=1;index--) {

if (tableRef[index].nodeLink != null) {

startMining(tableRef[index].nodeLink,tableRef[index].itemName,

itemSetSofar);

}

}

}

protected void startMining(FPgrowthItemPrefixSubtreeNode nodeLink,

short itemName, short[] itemSetSofar) {

int support = genSupHeadTabItem(nodeLink);

short[] newCodeSofar = realloc2(itemSetSofar,itemName);

addToTtree(newCodeSofar,support);

startTempSets=null;

generateAncestorCodes(nodeLink);

if (startTempSets != null) {

FPgrowthColumnCounts[] countArray = countFPgrowthSingles();

FPgrowthHeaderTable[] localHeaderTable =

createLocalHeaderTable(countArray);

if (localHeaderTable != null) {

pruneAncestorCodes(countArray);

FPtreeNode localRoot = generateLocalFPtree(localHeaderTable);

startMining(localHeaderTable,newCodeSofar);

}}}





private int genSupHeadTabItem(FPgrowthItemPrefixSubtreeNode nodeLink) {

int counter = 0;

while(nodeLink != null) {

counter = counter+nodeLink.itemCount;

numUpdates++;

nodeLink = nodeLink.nodeLink;

}return(counter);

}









private void generateAncestorCodes(FPgrowthItemPrefixSubtreeNode ref) {

short[] ancestorCode = null;

int support;

while(ref != null) {

support = ref.itemCount;

ancestorCode = getAncestorCode(ref.parentRef);

if (ancestorCode != null) startTempSets =

new FPgrowthSupportedSets(ancestorCode,support,

startTempSets);

ref = ref.nodeLink;

}}

private short[] getAncestorCode(FPgrowthItemPrefixSubtreeNode ref) {

short[] itemSet = null;

if (ref == null) return(null);

while (ref != null) {

itemSet = realloc2(itemSet,ref.itemName);

ref = ref.parentRef;

}

return(itemSet);

}

private void pruneAncestorCodes(FPgrowthColumnCounts[] countArray) {

FPgrowthSupportedSets ref = startTempSets;

while(ref != null) {for(int index=0;index= minSupport) counter++;

}

if (counter == 1) return(null);

FPgrowthHeaderTable[] localHeaderTable =

new FPgrowthHeaderTable[counter];

int place=1;

for (int index=1;index= minSupport) {

localHeaderTable[place] = new

FPgrowthHeaderTable((short) countArray[index].columnNum);

place++;

}}return(localHeaderTable);}





private void orderLocalHeaderTable(FPgrowthHeaderTable[] localHeaderTable,

FPgrowthColumnCounts[] countArray) {

boolean isOrdered;

FPgrowthHeaderTable temp;

int index, place1, place2;

do {

index = 1;

isOrdered=true;

while (index countArray[place2].support) {

isOrdered=false;

temp = localHeaderTable[index];

localHeaderTable[index] = localHeaderTable[index+1];

localHeaderTable[index+1] = temp;

}index++;

}

} while (isOrdered==false);}

private FPtreeNode generateLocalFPtree(FPgrowthHeaderTable[] tableRef) {

FPgrowthSupportedSets ref = startTempSets;

FPtreeNode localRoot = new FPtreeNode();

while(ref != null) {

if (ref.itemSet != null) addToFPtree(localRoot,0,ref.itemSet,

ref.support,tableRef);

ref = ref.nodeLink;

}return(localRoot);}

private FPtreeNode[] reallocFPtreeChildRefs(FPtreeNode[] oldArray,

FPtreeNode newNode) {

if (oldArray == null) {

FPtreeNode[] newArray = {newNode};

tempIndex = 0;

return(newArray);

}

int oldArrayLength = oldArray.length;

FPtreeNode[] newArray = new FPtreeNode[oldArrayLength+1];

for (int index1=0;index1 < oldArrayLength;index1++) {

if (newNode.node.itemName < oldArray[index1].node.itemName) {

newArray[index1] = newNode;

for (int index2=index1;index2
newArray[index2+1] = oldArray[index2];

tempIndex = index1;

return(newArray);

}newArray[index1] = oldArray[index1];

}





// Default





newArray[oldArrayLength] = newNode;

tempIndex = oldArrayLength;

return(newArray);

}





public void outputItemPrefixSubtree() {

int flag;

System.out.println("PREFIX SUBTREE FROM HEADER TABLE");

for(int index=1;index
System.out.println("Header = " +

reconvertItem(headerTable[index].itemName));

flag = outputItemPrefixTree(headerTable[index].nodeLink);

if (flag!=1) System.out.println();

}System.out.println();}

private void outputItemPrefixSubtree(FPgrowthHeaderTable[] tableRef) {

int flag;

System.out.println("PREFIX SUBTREE FROM LOCAL HEADER TABLE");

for(int index=1;index
System.out.println("Header = " +

reconvertItem(tableRef[index].itemName));

flag = outputItemPrefixTree(tableRef[index].nodeLink);

if (flag!=1) System.out.println();

}System.out.println();

}

private int outputItemPrefixTree(FPgrowthItemPrefixSubtreeNode ref) {

int counter = 1;

while (ref != null) {

System.out.print("(" + counter + ") " +

(reconvertItem(ref.itemName)) + ":" + ref.itemCount + " ");

counter++;

ref = ref.nodeLink;

}return(counter);}

public void outputFPtree() {

System.out.println("FP TREE");

outputFPtreeNode1();

System.out.println();

}

private void outputFPtreeNode(FPtreeNode ref) {

System.out.println("LOCAL FP TREE");

outputFPtreeNode2(ref.childRefs,"");

System.out.println();

}

private void outputFPtreeNode1() {

outputFPtreeNode2(rootNode.childRefs,"");

}

private void outputFPtreeNode2(FPtreeNode ref[],String nodeID) {

if (ref == null) return;

for (int index=0;index
System.out.print("(" + nodeID + (index+1) + ") ");

outputItemPrefixSubtreeNode(ref[index].node);

outputFPtreeNode2(ref[index].childRefs,nodeID+(index+1)+".");

}

}

public void outputItemPrefixSubtreeNode(FPgrowthItemPrefixSubtreeNode ref) {

System.out.print((reconvertItem(ref.itemName)) + ":" + ref.itemCount);

if (ref.nodeLink != null) {

System.out.println(" (ref to " +(reconvertItem(ref.nodeLink.itemName)) + ":" +

ref.nodeLink.itemCount + ")");}

else System.out.println(" (ref to null)"); }









TESTING

7. Testing:

7.1 Testing Activities:





Software testing is a critical aspect of software quality assurance and represents the

ultimate service of specification and coding.





Software testing is a process used to help identify the correctness, completeness,

security, and quality of developed computer software.. During testing the program is

executed with a set of test cases and the output of the program for the test cases is

evaluated to determine if the program is performing as it is expected to.









The two main issues in software quality are

 Validation or User satisfaction

 Verification or Quality Assurance.





7.2 Black Box & White Box Testing:





 Black box testing:

It is also known as functional testing. In this internal workings of the item being

tested are not known by the tester. The tester does not ever examine the programming

code and does not need any further knowledge of the program other than the

specifications.





 White box testing:

White box testing is also known as Glass box, structural, clear box and open

box testing. In this technique explicit knowledge of the internal working of the item being

tested. The test is accurate only if the programmer knows what the program is supposed

to do. Tester can then check whether the program diverges from its intended goal. White

box testing does not account for errors caused by omission, and all visible code must also

be readable.





Black-Box Testing:

Black-Box testing methods are used to perform the validation testing. The

Validation Testing is carried to demonstrate conformity of the software with reference to

the requirements specified by the users. The developed software product is shown to the

user before delivery for their acceptance. The users accepted the product by evaluating it

after testing it by giving their inputs.

Test Case1: Valid file name

Input : Selecting the file name

Expected Output : If the file name selected is valid , the text

area is appended with the data in the file

Observed Output : same as expected.





Test Case2: Invalid file name





Input : Selecting the invalid file name

Expected Output : If the file name selected is invalid, the error

message must be displayed.

Observed Output : same as expected.

Test Case3: Data in the file not in order

Input : Selecting the file name

Expected Output : If the data in the file name selected is not in

order then the error message must be displayed.

Observed Output : same as expected.

Test Case 4: Valid Support

Input : enter support value

Expected Output : The support value must be appended to the

text area and the run button must be highlighted

Observed Output : same as expected.









Test Case 5: Invalid Support





Input : enter invalid support value

Expected Output : The error message must be displayed

Observed Output : same as expected.





White Box testing:

In this technique the internal working of the item is tested. Unit testing focuses on the

smallest compatible program unit – the sub program. Class testing for object-oriented

software is the equivalent of unit testing for conventional software. Class testing for

object oriented software is driven by the operations encapsulated by the class and state

behavior of the class. Class Testing for AprioriTgui class:





S.No. Test Input Expected Actual

output output

To test the functionality Invalid Print error Print error

of getFileName() File Name message message

1

To test the functionality Valid Display frequent Display frequent

of aprioriT() filename patterns patterns

2 and

support

To test the TtreeNode Generate one Generate one

functionality of itemsets itemsets

3 createTtreeTopLevel()





To test the TtreeNode Generate second Generate second

functionality of level of the tree level of the tree

4 generateLevel2()





To test the TtreeNode Prune the Prune the

functionality of unsupported unsupported

5 pruneLevelN () candidates candidates





To test the Itemset True if present and True if present

functionality of false if not and false if not

6 findItemSetInTtree()

To test the TtreeNode Generate the next Generate the next

level of the tree level of the tree

7 functionality

of

generateNextLevel( )









Class Testing for Dic class:









Expected Actual

Sno Test Input

Output Output

To test the

Print

functionality Invalid File

1. error

of Name

message

getconfig()

To test the

functionality Valid Display

2. of filename frequent

dicProcess() and support patterns





To test the Get an Get an

functionality itemset itemset

Index and

3. of from the from the

Itemset

Getitemat() given given

index index

To test the

Returns

functionality Returns

the

of the index

4. Itemset index of

Itemsetsize() of the

the

itemset

itemset



Return 1

To test the Return 1

if

functionality if dashed

hashtreeno dashed

5. of node

de node

Dashfound() exists

exists

else 0

else 0

To test the

Traverse Traverse

functionality

hashtreeno through through

6. of

de the hash the hash

Transatrahas

tree tree

htree()

To test the

functionality Display Display

hashtreeno

7. of the hash the hash

de

Printhashtree tree tree

()

To test the

functionality hashtreeno Traverse Traverse

8.

ofcheckcount de hash tree hash tree

er()

generate

To test the generate

all

functionality all subset

9. itemset subset

of given an

given an

Gensubset() itemset

itemset

Class Testing for FP-growthAPP class:





Expected Actual

Sno Test Input

Output Output

Valid

To test the

file Print Print

functionality

1. name error error

of Start()

and message message

support

Get the

To test the Get the

file

functionality file name

2. --- name

of SendData() and

and

support

support









Class Testing for FPTree class:









Expected Actual

S.No Test Input

Output Output

To test the Header

1 Header

functionality table

-- table must

of must be

be created

createFPtree() created

The The

2 To test the

FPtreeNode itemset itemset

functionality

, itemset , must be must be

of

support added to added to

addToFPtree()

the Fptree the Fptree



To test the

3 Generate Generate

functionality

HeaderTabl the the

of

e , itemset frequent frequent

startMining()

patterns patterns



To test the

4

functionality

Prefixsubtr Generate Generate

of

eenode itemsets itemsets

generateAnce

storCodes()

To test the Display Display

5

functionality FPTreeNod the the

of e frequent frequent

outputFPtree() patterns patterns

class Testing for TotalSupportTree class:



S.no Expected Actual

Test Input

Output Output



To test the Itemset The tree The tree

1 functionality of and must be must be

addToTtree() support appended appended



To test the



2 functionality of

getSupport Itemset support support

ForItemSetInTt

ree()









Expected Actual

Test Input

S.No Output Output

Valid Number Number

To test the

1 filename of rows of rows

functionality

and and and

of start()

support columns columns



To test the One



2 functionality Itemset Display Display

of and the the N- the N-

generatelevel pruned itemsets itemsets

N() dataset



To test the Supersets Superset

3 functionality

N-level

of the s of the

itemsets

of superset() itemset itemset







Class Testing for Mba class:









7.3 Test Code Report:

The FPMiner tool is implemented using Java language and all the experiments are

performed on 1.7GHz PC machine with 256MB memory. The Operating System is

WindowsXP.

Experiment 1:

Execution times for different support for different algorithms can be tabulated as

follows:





Execu

Execution

Execution Execution tion

Suppor time of

time of time of time

t FP-

AprioriT DIC of

Growth

MBA





991m

50 187ms 226754ms 94ms

s

988m

60 110ms 184297ms 74ms

s

984m

70 78ms 161265ms 46ms

s

954m

80 47ms 106953ms 32ms

s



90 32ms 74984ms 31ms 938m









Experiment 2:

The number of frequent itemsets generated using different algorithms:

Frequent itemsets

Support

generated

50 153

60 51

Aprior

70 31

iT

80 23

90 9

50 153

60 51

MBA 70 31

80 23

90 9

SCREENS AND REPORTS









8. Screens & Reports:

Case1: Valid file name

Input : Selecting the file name

Expected Output : If the file name selected is valid , the text

area is appended with the data in the file

Observed Output : same as expected.

Case2: Invalid file name





Input : Selecting the invalid file name

Expected Output : If the file name selected is invalid, the error

message must be displayed.

Observed Output : same as expected.

Case3: Data in the file not in order

Input : Selecting the file name

Expected Output : If the data in the file name selected is not in

order then the error message must be displayed.

Observed Output : same as expected.

Case 4: Valid Support

Input : enter support value

Expected Output : The support value must be appended to the

text area and the run button must be highlighted

Observed Output : same as expected.

Case 5: Invalid Support





Input : enter invalid support value

Expected Output : The error message must be displayed

Observed Output : same as expected.

Case 6:

To test whether the system is opening the require dataset.

Expected output: The dataset is appended to the text area.

Case 7:

To test whether the system is accepting the minimum support.

Expected output: The support is appended to the text area.

Case 8:

The system must provide an option to select the required algorithm.

Expected Output: the algorithm selection dialog box must be displayed.

Case 9:

The system must run the select the required algorithm.

Expected Output: the algorithm must display the frequent patterns.









Case 10:

If the selected algorithm is DIC then the step length must be entered

Expected Output: the dialog box must be displayed

Case 11:

If the selected algorithm is DIC the algorithm must be executed

Expected Output: the frequent patterns must be displayed

Case 12:





If the selected algorithm is FP-Growth the algorithm must be executed

Expected Output: the frequent patterns must be displayed

Case 13:

If the selected algorithm is DIC the algorithm must be executed

Expected Output: the frequent patterns must be displayed

CONCLUSION

Conclusion



Frequent Pattern mining is used for finding frequent itemsets among items

in a given data set. An objective of frequent pattern mining is to develop a systematic

method using the given data set and find frequent items.





This project work is focused on explaining the fundamentals of association rule

mining and analyzing the implementation of the well known

association rule algorithms by comparing the execution time for generating frequent item

sets with the different minimum support values.





Study focuses on algorithms Apriori, FP-Growth and Dynamic Itemset Counting.

At the same time, we described a new approach for association rule mining based on

matrix based association rule mining.





The results show that Apriori cannot be run very effective than FP -

Tree. Apriori on the other hand runs too slow because each transaction contains density

data. DIC (Dynamic Itemset Counting) is much slower than every other algorithm for the

real -dataset.


Related docs
Other docs by Anil kumar
Good Resume Preparation tricks
Views: 30  |  Downloads: 0
Materials for CADD
Views: 7  |  Downloads: 0
Chemicals used in synthesis:
Views: 12  |  Downloads: 0
Method for Modeling and Docking
Views: 9  |  Downloads: 0
A true Love story by anil
Views: 8  |  Downloads: 0
therading
Views: 1  |  Downloads: 0
Impartment notes on dot net
Views: 9  |  Downloads: 0
Datamining concepts
Views: 6  |  Downloads: 0
Detail DCOM notes
Views: 7  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!