Efficient Association Rules Mining Using MPI by dandanhuanghuang

VIEWS: 35 PAGES: 15

									        Çankaya University
  Computer Engineering Department



Parallel Apriori Algorithm
       Using MPI
Congressional Voting Records


          Ahmet Artu YILDIRIM

              January 2010
Overview
• Apriori algorithm used for discovery of association
  rules

• Computation time is the major issue if dataset is
  pretty large

• The aim is to increase efficiency of mining process
  in running time manner utilizing computers for
  parallel computation




                 Efficient Association Rules Mining
                             Using MPI
Apriori Algorithm (Example)
                                          • Min support=50%

                                          • Min support
                                            count=0.5x4 = 2

                                          • Min confidence =
                                            0.50


• Confidence({5}→{2,3})=Prob({2,3,5}/{5})=2/3=0.66



                Efficient Association Rules Mining
                            Using MPI
Technology and Methodology
• Platform: GNU/Linux 2.6.20.7 i386
  Programming language: ISO C99 language
  Cross platform APIs: MPICH API for MPI
  implementation and Glib API utility library
  Compiler suite: GNU toolchain

• Division Methodology:

  1. Dataset division

  2. Large frequent itemset division

• Dataset division methodology used
                Efficient Association Rules Mining
                            Using MPI
Data Division (Merging Local Support)




              Efficient Association Rules Mining
                          Using MPI
Parallel Apriori Algorithm Flowchart




               Efficient Association Rules Mining
                           Using MPI
Dataset
• 1984 United States congressional voting records

• Attribute Information: Democrat, republican,
  handicapped infants yes-no, water project cost
  sharing yes-no, adoption of the budget resolution
  yes-no, physician fee freeze yes-no, el salvador aid
  yes-no, religious groups in schools yes-no, aid to
  nicaraguan contras yes-no, mx-missile yes-no,
  immigration yes-no, synfuels corporation cutback
  yes-no, education spending yes-no, superfund right
  to sue yes-no, crime yes-no, duty free exports yes-
  no, export admin act south africa yes-no
                 Efficient Association Rules Mining
                             Using MPI
Preprocessing of Dataset
                           • Data transformation applied
                             before processing

                           • Attributes numbered such
                             as democrat = 1, republican
                             = 2, handicapped infants
                             yes = 3, handicapped
                             infants no = 4, water project
                             cost sharing yes = 5 …




              Efficient Association Rules Mining
                          Using MPI
Config File and Run Command
Config File:

attributecount=34

transactioncount=435

minsupportpercent=50

minconfidencepercent=80

Command:

mpirun -np x -machinefile machines ./aprioriparallel
                 Efficient Association Rules Mining
                             Using MPI
Program Output




             Efficient Association Rules Mining
                         Using MPI
Rules
Rules according to confidence threshold level 80%:

• Democrats support

  • Adoption of the budget resolution

  • Aid to Nicaraguan contras

• Democrats do NOT support

  • Physician fee freeze


                Efficient Association Rules Mining
                            Using MPI
Rules (cont.)
Rules according to confidence threshold level 80%:

• Those who do not support physician fee freeze,
  support adoption of the budget resolution

• Those who support adoption of the budget resolution
  also do not support physician fee freeze




                Efficient Association Rules Mining
                            Using MPI
Parallel Computation Speed Up
• Run on Çankaya University wee cluster

• Processor Specs: 600 MHz CPU, 250 Mb Ram

• Speed up = ts / tp




                  Efficient Association Rules Mining
                              Using MPI
Conclusion
• Parallel version of Apriori algorithm is efficient in
  running time manner with large datasets

• Scalability gained via adding additional nodes
  (computers) or memory without modification of code

• High price-performance ratio by utilizing less
  powerful computers




                   Efficient Association Rules Mining
                               Using MPI
Questions?




Thank You

								
To top