Docstoc

An Interactive Visualization Methodology For Association Rules

Document Sample
An Interactive Visualization Methodology For Association Rules Powered By Docstoc
					                                                          (IJCSIS) International Journal of Computer Science and Information Security,
                                                          Vol. 9, No. 2, February 2011




        AN INTERACTIVE VISUALIZATION
      METHODOLGY FOR ASSOCIATION RULES

                 MOHAMMAD KAMRAN                                                               Dr. S. QAMAR ABBAS
     Research Scholar, Integral University, Kursi Road,                            Professor, Ambalika Institute of Management &
                     Lucknow, India                                                         Technology, Lucknow, India
           E-mail: mkamran_lko@hotmail.com,                                             Dr. MOHAMMAD RIZWAN BAIG
                                                                              Professor, Department of Information Technology, Integral
                                                                                             University, Lucknow, India


Abstract- The task of the knowledge discovery and data mining              much higher degree of confidence in the findings of the
process is to extract knowledge from data such that the resulting          exploration. This fact leads to a high demand for visual
knowledge is useful in a given application. Obviously, only the            exploration techniques and makes them indispensable in
user can determine whether the resulting knowledge satisfies this          conjunction with automatic exploration techniques.
requirement. Moreover, what one user may find useful is not
necessarily useful to another user. Visual data mining tackles the             The main contribution in this study is addressing the
data mining tasks from this perspective enabling human                     capabilities and strengths of data mining technology in
involvement and incorporating the perceptivity of humans. The              identifying placement of students and to guide the teachers to
objective of this paper is to present the students performance
                                                                           concentrate on appropriate attribute associated and counsel the
through visualization mining method on data coming from
educational institute. Such method together with the novel
                                                                           students or arrange for suitable placement to them. In this
visualization technique described here allows the analyst to               work, we propose a dynamical framework for association rule
explore data and view significant differences among performance            mining that integrates interactive visualization techniques in
values of students. The results are immediately presented in a             order to allow users to drive the association rule finding
graphical form and the user is allowed to change settings in order         process, giving them control and visual cues to ease
to allow him or her to iteratively explore the data and find some          understanding of both the process and its results.
useful knowledge.
                                                                                   II. ASSOCIATION RULE MINING (ARM)
                     I. INTRODUCTION
                                                                                Association Rules Mining (ARM) [2] can be divided into
    For data mining [1] to be effective, it is important to                two sub problems: the generation of the frequent itemsets
include the human in the data exploration process and                      lattice and the generation of association rules. The complexity
combine the flexibility, creativity, and general knowledge of              of the first sub problem is exponential. Let |I|=m the number
the human with the enormous storage capacity and the                       of items, the search space to enumerate all possible frequent
computational power of today’s computers. Visual data                                            m

exploration aims at integrating the human in the data                      itemsets is equal to 2 , and so exponential in m [2]. Let I ={a1,
exploration process, applying its perceptual abilities to the              a2, … , am} be a set of items, and let T ={t1, t2, … , tn} be a set
large data sets available in today’s computer systems. The                 of transactions establishing the database, where every
basic idea of visual data exploration is to present the data in            transaction ti is composed of a subset X I of items. A set of
some visual form, allowing the human to get insight into the               items X   I   is called itemset A transaction ti contains an
data, draw conclusions, and directly interact with the data.
Visual data mining techniques have proven to be of high value              itemset X in I, if X ti. Several ARM published papers are
in exploratory data analysis and they also have a high potential           based on two main indices which are support and confidence
for exploring large databases. These huge databases contain a              [2]. The support of an itemset is the percentage of transactions
wealth of data and constitute a potential goldmine of valuable             in a database where this itemset is one subgroup. The
information. As new courses and new colleges emerges, the                  confidence is the conditional probability that a transaction
structure of the educational database changes. Finding the                 contains an itemset knowing that it contains another itemset.
valuable information hidden in those databases and identifying             An itemset is frequent if support (X)  minsup, where minsup
and constructing appropriate models is a difficult task. Data              is the user-specified minimum support. An association rule is
mining techniques play an important role at each stop of the
                                                                           strong if confidence(r) minconf, where minconf is the user-
information discovery process and visual data exploration
                                                                           specified minimum confidence. Left part of an association rule
usually allows a faster data exploration and often provides
                                                                           is called antecedent and right part is called conclusion. Our
better results, especially in cases where automatic algorithms
                                                                           motivations are described hereafter.
fail. In addition, visual data exploration techniques provide a



                                                                     129                               http://sites.google.com/site/ijcsis/
                                                                                                       ISSN 1947-5500
                                                          (IJCSIS) International Journal of Computer Science and Information Security,
                                                          Vol. 9, No. 2, February 2011


                     III.   MOTIVATION                                         We focus on visualization during the post processing stage
    The number of generated rules is a major problem on                    and we are interested by ARM. Independently of both context
association rules mining. This number is too significant and               and task, ARM has a main drawback which is the high number
leads to another problem called Knowledge mining. The                      of generated rules. Several works on filtering rules were
human cycles spent in analyzing knowledge is the real bottle               proposed and a state of the art was presented in [3]. Although
neck in datamining. This issue can limit the final user‘s                  reducing the whole of generated rules significantly, this
expertise because of a strong cognitive activity. To solve it,             number remains however important. Expert must be able to
visual datamining became an important research area. Indeed,               easily interact with an environment of datamining in order to
extracting relevant information is very difficult when it is               more easily understand the displayed results. This point is
hidden in a large amount of data. Visual data mining attempts              essential for the global performance of the system. Visual
to improve the KDD process by offering adapted visualization               tools for association rules were proposed to reduce this
tools which allow tackling various known problems. Those                   cognitive analysis but they remain limited [3].
tools can use several kinds of visualization techniques which                     V. VISUAL ASSOCIATION RULE MINING
allow simplifying the acquisition of knowledge by the human
mind. It can handle more data visually and extract relevant                    Various works already exist to help expert analysis in text-
information quickly.                                                       mode [4]. Several works on visual rules exploration were
                                                                           published [2], [5], [6], [7]. The main beliefs of our interactive
    Indeed, in most real life databases, thousands and even                ARM are described hereafter. All these tools use several
millions of high-confidence rules are generated, among which               methods which are textual, 2D or 3D way. The choice of one
many are redundant. In this paper, we are interested in the                of them proves to be a difficult work. Moreover, their
most used kind of visualization categories in data mining, i.e.,           interpretations can vary according to the expert. Each one of
use visualization techniques to present the information catched            these techniques presents advantages and drawbacks. It is
out from the mining process. Visualization tools became more               necessary to take them into account for the initial choice of the
appealing when handling large data sets with complex                       representation. The effectiveness of these approaches is
relationships, since information presented in the form of                  dependent on the input data files. These representations are
images is more direct and easily understood by humans.                     understandable for small quantities of data but become
Visualization tools allow users to work in an interactive                  complex when these quantities increase. Indeed, particular
environment with ease in understanding rules. In a based                   information can not be sufficiently perceptible in the mass.
tabular view of association rules, all strong rules are                    The common limitation of all the representations is that if they
represented as in a tabular representation format (rule table), in         are global, they quickly become unreadable (size of the objects
which each entry corresponds to a rule. All rules can be                   in 2D, occlusions in 3D) and if they are detailed, they do not
displayed in different order, such as order by premise,                    provide an overall picture on these data to the expert.
conclusion, support or confidence. This helps users to have a
clearer view of the rules and locate a particular rule more                                   VI. RELATED WORK
easily.                                                                        Traditionally, many simple methods are designed to render
                                                                           small amount of data or statistical features of big data sets,
                IV. VISUAL DATA MINING
                                                                           such as histogram, pie, tree, etc. To visualize more complex
    The rise of KDD revealed new problems as knowledge                     data, modern scientific visualization utilizes more advanced
mining. These large amounts of knowledge must be explored                  techniques. Visualization techniques, such as EXVIS [8],
with specific advanced tools. Indeed, expertise requires an                Chernoff Faces [9], icons [10] and m-Arm Glyph [11], are
important cognitive work, a fortiori, a harmful waste of time              often called glyph-based methods. Glyphs are graphical
for industrial. Extracting nuggets is a difficult task when                entities whose visual features, such as shape, orientation, color
relevant information is hidden in a large amount of data. In               and size, are used to encode attributes of an underlying
order to tackle this issue, visual datamining was conceived to             dataset, and glyphs are often used for interactive exploration
propose visual tools adapted to several well-known KDD                     of data sets [12]. Glyph-based techniques range from
tasks. These tools contribute to the effectiveness of the                  representation via individual icons to the formation of texture
processes      implemented      by     giving   understandable             and color patterns through the overlay of many thousands of
representations while facilitating interaction with experts.               glyphs [13]. Chernoff used facial characteristics to represent
Visual data mining is present during all KDD process:                      information in a multivariate dataset [14]. Each dimension of
upstream to apprehend the data and to carry out the first                  the data set encodes one facial feature, such as nose, eyes,
selections, during the mining, downstream to evaluate the                  eyebrows, mouth, or jowls. Glyphmaker proposed by Foley
obtained results and to display them. Visual tools became                  and Ribarsky visualize multivariate datasets in an interactive
major components because of the increasing role of the expert              fashion [14]. Levkowitz described a prototype system for
within KDD process. Visual datamining integrates concepts                  combining colored squares to produce patterns to represent an
resulting from various domains such as visual perception,                  underlying multivariate dataset [15]. In [10] an icon encodes
cognitive psychology, visualization metaphors, information                 six dimensions by six lines of different colors within a square
visualization, etc.                                                        icon. In [13] Levkowitz describes the combination of textures




                                                                     130                              http://sites.google.com/site/ijcsis/
                                                                                                      ISSN 1947-5500
                                                        (IJCSIS) International Journal of Computer Science and Information Security,
                                                        Vol. 9, No. 2, February 2011


and colors in a visualization system. The m-Arm Glyph by                 students. With the help of this technique, educational
Pickett and Grinstein [11] consists of a main axis and m arms,           institutions can.
and the length and thickness of each arm and the angles                      i. Segment the student database to create student
between each arm and main axis are used to encode different                       profiles.
dimensions of a data set. [6] describes a glyph-based system
for large high dimensional datasets. These techniques are                   ii. Conduct analysis on a single student segment for a
incapable of visualizing large amount of high dimensional data                  single factor. For example, the institution can perform
because:                                                                        in-depth analysis of the relationship between
                                                                                attendance and academic achievement.
        Lack of human computer interaction.
        Lack of integration with other data mining and                     iii. Analyze the student segments for multiple factors
         knowledge discovery (KDD) tools.                                        using group processing and multiple target variables.
                                                                                 For example, ―What are the characters shared by
                 VII. PROPOSED WORK                                              students who drop out from colleges?
    Nowadays, higher educational organizations are facing a                 iv. Perform sequential (over time) basket analysis on
very high competitive environment and are aiming to get more                    student segments. For example, ―What percentage of
competitive advantages over the other business competitions.                    high attendance holders also achieved in academic
These organizations should improve the methodology of                           side also?
teaching, placement and counseling of students. They consider
students and teachers as their main assets and they want to              B. Developing new strategies
improve their key process indicators by effective and efficient              Teachers can increase the placement percentage by
use of their assets                                                      identifying the most lucrative student segments and organize
                                                                         the training sessions accordingly. The results may be affected,
    Students’ academic performance is critical for educational           if teachers do not offer the right kind of training to the right
institutions because strategic programs can be planned in                student segment at the right time. With data mining operations
improving or maintaining students’ performance during their              such as segmentation or association analysis, institutions can
period of studies in the institutions. The academic                      now utilize all of their available information for betterment of
performance in this study is measured by certain attribute as            students.
indicated in Table 1. This study presents the work of data
mining in predicting the final placement of students. This                                  TABLE I ATTRIBUTE LIST
study applies association rule mining technique to choose the
best prediction and analysis. The list of students who are
                                                                                     ATTRNAME          ATTR         Possible
predicted as likely to drop from the selection criterion by data
                                                                                                                    Values
mining is then turned over to teachers and management for                            Enrolment No.     ENR          Yes, No
direct or indirect intervention.
                                                                                     Attendance        ATT          Poor, Good,
    For example, let us consider the transaction database of                                                        Average
few students from Students’ repository of institute which                            10+2 Grade        INT          A, B, C
shows the students general and academic grades in different                          Area of           EXP          M,C,E
courses they enrolled for during their years of attendance in                        expertise
the institution. Student performance score is basically                              Gender              G          M, F
determined by the sum total of the continuous assessment and
the examination scores. In most institutions the continuous                          Fund                F          P, S, F
assessment which includes various assignments, class tests,                          Student           STD          ME, CS, IT
                                                                                     Department
group presentations is summed up to weigh 30% of the total
score while the main semester examination is 70%. To                                 Activities        ACT          A, B, C
                                                                                     performed by
differentiate different students’ performances we have selected                      the
different attributes as attendance, Mark, Activity etc. .as                          student
shown in table 1.
                                                                                     Percentage of PSA              A, B, C
    Educational institutions with Association rule mining can                        practical
predict the student's performance more accurately, which in                          session
                                                                                     Exercise given ET              A, B, C
turn can result in quality education.                                                by
                                                                                     teacher
A. Student Level Analysis                                                            Average mark   ER              A, B, C
    Successfully training the student requires analyzing the                         of the
                                                                                     experience
data at the student level. Using the associated discovery data                       report
mining technique, educational institutions can more accurately                       Final mark    MARK             A, B, C
select the kind of training to offer to different kinds of                           Evaluation    EVL              A, B, C




                                                                   131                              http://sites.google.com/site/ijcsis/
                                                                                                    ISSN 1947-5500
                                                               (IJCSIS) International Journal of Computer Science and Information Security,
                                                               Vol. 9, No. 2, February 2011


              VIII. SYSTEM ARCHITECTURE                                       is not distributed uniformly within data values. A user would
                                                                              like a visualization system to be able to show these knowledge
                                       Request                                differences clearly. To be specific, two differences of same
                                                                              amount in data values may not necessarily be rendered by the
       VB Applications         ODBC                     DBMS
                                        Result                                identical difference in visual elements on the screen. Instead
                                                                              the difference representing more information should be
            Main Window                                                       displayed more significantly to get attention from a viewer.
                                                                              Interactive Visualization Model
    LogIn    Rule Generator Visualization
                                                        Stored                                        Load the dataset
                                                       Procedure

           3-D                  Rule Table                                                  Find cluster for each individual dimension
       Visualization

                                                                                      Perform association and transformation according to rule
          Client Machine                          Database Server

                       Figure 1. System Architecture                                                 Render Data

    The system architecture is shown in Figure. The database
resides in the server machine. The stored procedures (Oracle)                                 Change
reside in the server side. Our VB application runs in the client                             Association                   Change Association
machine. It consists of several modules: LogIn, Rule                                            Step
Generator, and Visualization module. LogIn module is used to
connect to the database server. Rule Generator is used to
mining the association rules given the information provided by
                                                                                            Transformation
the user. Visualization module consists of two sub-modules                                                                        Change
                                                                                                 Step                         transformation
Rule table and 3-D visualization. These modules can be
accessed using the Main window.
Knowledge Extraction Stage
                                                                                                    Figure 2. Visualization Model
    Rendering millions of icons is computationally expensive,
and interpretation and analysis to be performed by the user is                In Figure 2 we give an interactive visualization model which
even harder. A visualization system has to provide not only a                 has the following properties:
“loyal” picture of the original dataset, but also an “improved”
picture to a viewer for easier interpretation and knowledge                   1) Interaction: It is clear that integration of domain knowledge
extraction. Integration of analysis functionality is important                   to a visualization system is very important due to the
and necessary to help the viewer to extract knowledge from                       problem of non-uniform knowledge distribution. To a
the display. The basic requirement about a visualization                         visualization system integration of domain knowledge can
system as:                                                                       be achieved by choosing proper association function and
                                                                                 transformation function during visualization process.
   “Different data values should be visualized differently,                      However, there is no universal technique for all fields, data
   and the more different the data values are, the more                          sets or users, and a visualization system should be
   different they should look”.                                                  interactive and provide a mechanism for views to adjust or
    But what a viewer wants to find with a visualization                         change association and transformation functions during
system is not data values themselves, instead, it is the                         visualization process. And each data set or field has to be
information or knowledge represented by data values. So, the                     studied individually and visualized interactively before its
above requirement can be better stated as:                                       important information can be revealed, which can only be
                                                                                 performed by viewers or domain experts. By interaction a
   “Different information should be visualized differently,                      viewer can guide a visualization system step by step to
   and the more different the information is, the more                           display what he is interested in more and more clearly.
   different it should look”.
                                                                              2) Correctness: We propose the following criteria for
    To help a viewer on knowledge extraction a visualization                     “correct” visualization:
system has to deal with the problem of non-uniform
knowledge/information distribution. It is common in some                             a)   If possible a visualization system should show
data sets or fields that a small difference of a value could mean                         different dimensions of a data set differently
a big difference, which means the knowledge and information




                                                                        132                                  http://sites.google.com/site/ijcsis/
                                                                                                             ISSN 1947-5500
                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                         Vol. 9, No. 2, February 2011


            through different visual objects or visual elements               The main window consists of the menu, toolbar, and a text
            of one visual object.                                         area. The user can connect to different databases, here Oracle
                                                                          through the connect sub-menu and disconnect from the same
       b) The more different the values are, the more
                                                                          through the disconnect sub-menu. Under the Generate Rule
          differently they should be rendered. Since we may
                                                                          menu, the user can choose generate rules. The operations of
          not know the distribution of a dataset, assigning
                                                                          rule generation and rule visualization are mainly done through
          data values to visual elements/properties may not
                                                                          the menu.
          make      full   usage     of    available     visual
          elements/properties, a clustering step is preferred.                We use VB standard EXE as the software development
                                                                          tool to implement our project. VB provides an Integrated
       c)   The more different the information represented by
                                                                          Development Environment (IDE), which makes interface
            data values are, the more differently they should be
                                                                          design, program debugging very efficiently. The menu can be
            rendered. A distinguished visual difference
                                                                          implemented using the Menu Editor. All the objects in the
            between different information can help viewers
                                                                          main window can be designed visually.
            better, which can be achieved by interaction
            between a visualization system and viewers. In this
            interaction process, viewers can fine tune the
            transformation between data values and visual
            elements, and domain knowledge is obtained and
            reflected through a more customized display.
3) “Maximizing” rule: To optimize the rendering quality, the
   maximal range of visual objects/elements should be used as
   default settings.
       IX. IMPLEMENTATION METHODOLOGY
    At the beginning of any mining task, the system acquires
the support for each attribute category defined at discretization
step during preprocessing phase of a generalized composite
record in the corresponding cluster. Figure 3 depicts the user
interface screens that acquire these supports. In order to show
how our technique has enhanced the rule generated, we
conducted the following experiment steps: Run the system and
give variable support for each attribute category based on the                                  Figure 4. Menu Editor
user interest.
                                                                             After the user chooses “Connect” menu item, a Login
    1) Count the number of rules generated and the number                 window will be brought up. Login module of Association Rule
       of used premises in these rules.                                   Software is described in fig. 5. After the user provided all the
    2) Rerun the system and give equal support for all                    needed information, the user can choose to “Connect” to the
       attributes categories.                                             DBMS
    3) Count the number of rules and the premises used in
       these rules.                                                        Private Sub cmdOK_Click()
    4) Examine the quality of rules generated in each case                 a.connect txtUserName.Text, txtPassword.Text
       by comparing the number of rules and premises used.                 If Loginsucceeded Then
                                                                           Form1.mnuconn.Enabled = Not Loginsucceeded
                                                                           Form1.mnudisc.Enabled = Loginsucceeded
                                                                           Form1.Toolbar1.Buttons(1).Enabled=Not Loginsucceeded
                                                                           Unload Me
                                                                           Form1.Show
                                                                           End If
                                                                           End Sub

                                                                                               Figure 5. Login Module
                      Figure 3. Interface Screen




                                                                    133                              http://sites.google.com/site/ijcsis/
                                                                                                     ISSN 1947-5500
                                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                                         Vol. 9, No. 2, February 2011


Rule Generator                                                                          graph of our choice by clicking on any of the option button
    For each input data set, some parameters have to be                                 available in the visualization effect window, as shown in the
specified by the user for the association rule generation. This                         figure 6.
kind of information can be arranged in the concerned table,                                 Some of the generated rules are given in Table 2 in a form
because the data is not always in the same table, and                                   that is understandable by humans. In Table 2, the first column
sometimes it is needed to obtain the data from two or more                              represents the rule number, the generated rules are presented
different tables, the user should have the ability to select                            in the second column, the number of the students who
multiple tables as the data source in the procedure. The user                           successfully satisfy the rules is given in the third column,
may also want to specify the lowest support and confidence                              and the number of attributes contained in the rule is given in
value to get the interested association rules. The value of stop                        the last column. The table shows the rules in a descending
level is used to let the user decide that after how much passes                         order depending on the number of the students who
that the user wants the rule generation needs to be canceled.                           successfully have satisfied the rule. This ordering helps in
The information is taken from the transaction table and the                             determining the most significant rule. For the generated rules,
user can click the “Generate Rules” menu button to begin to                             the longest rule consists of 10 attributes while the shorter
merge the data from different tables. Then the association rule                         rule contained only 3 attributes.
generation algorithm will be called to generate the rules.
                                                                                                          TABLE 2 GENERATED RULES

     1- Select the mining task and consequently the appropriate
                                                                                                Rule #   Rules                               # Obj     # Attrib
        cluster
     2- Get the confidence threshold for generating a rule (this                                          IF ENR = Y, ATT = A, INT=A, G
        means that the rule will only be generated if the number of                                        = M, STD=IT, ACT=A, PSA=A,
        occurrences of records described by this rule divided by the                               7       ET=A, ER=B, MARK=A THEN             13          10
        total number of records in the cluster greater than the                                                      EVL = A
        given confidence threshold)
     3- Construct a matrix (calculated relative weight) with                                             IF ENR = Y, ATT = B, INT=A, G
        number of rows equal to the number of attributes (m) and                                          = F, STD=IT, ACT=A, PSA=A,
        number of columns (n) equal to the maximum number of                                       3      ET=A, ER=B, MARK=A THEN               9          10
        categories of a certain attribute                                                                           EVL = A
     4- Using the appropriate cluster, fill in the calculated relative                                   IF ENR = Y, ATT = B, INT=A, G =
        matrix with the relative weight of each attribute category in                                      M, STD=CS, ACT=A, PSA=C,
        this cluster                                                                                       ET=C, ER=B, MARK=B THEN
     5- Compare the calculated relative weight with the user given                                11                                            9          10
                                                                                                                     EVL = B
        support and mark irrelevant attributes categories.
     6- For each generalized composite record do                                                         IF ENR = Y, ATT = C, INT=A,
     7- For each generalized composite record attribute do { if the                                         G = M, F=SC, STD=ME,
        attribute category is irrelevant then mark it as irrelevant                               17       ACT=A, PSA=B, MARK=B                 8          9
        copy relevant attributes category into a new table}                                                     THEN EVL = A
     8- Group similar rows in the new table and calculate a                                              IF ENR = Y, ATT = A, INT=B, G
        confidence value for this grouped records                                                          = F, ACT=A, PSA=A, ET=A,
     9- Generate rules                                                                                    ER=B, MARK=B THEN EVL =
                                                                                                   9                                            5               8
                                                                                                                       B
                           Proposed algorithm
                                                                                                           IF ENR = Y, ATT = C, G = M,
    The algorithm is based on well known existing techniques                                                  ACT=B, PSA=A, ER=B,
                                                                                                  14         MARK=A THEN EVL = A                4               7
to obtain association rules as Apriori algorithm. This
algorithm is modified to enable a user to control and impose
                                                                                                          IF ENR = Y, ATT = C, MARK=C
his area of focus during knowledge discovery steps in order to                                     3             THEN EVL = C                   3               3
overcome the loss of information problem and to enable
him/her to generate rules that he/she is interested in. The                                 We implemented the mapping of intermediate rule table
proposed algorithm solved this problem by allowing the user                             into the format that the user can understand easily. A
to define the relative weight or support of each attribute                              visualization module that includes rule table and 2-D, 3-D
interval category such that the mining algorithm could                                  graphics was developed to help the user get the interested
generate rules using this attribute interval category only if this                      information easier through sorting, and filtering functions.
support is satisfied.                                                                   Besides the performance our software can access the data
The generated rules can be visualized in either the table format                        stored in multiple data tables through ODBC such as Oracle.
or 2D, 3D format by selecting the appropriate visualization                             Our visualization module uses ‘rule-item’ relationship so that
Menu Item. As we execute the program the title screen comes                             it can display more rules at one time. In additional, the rule
into action which is shown in fig. 3.                                                   sorting and filtering ability of our visualization module gives
                                                                                        the user more flexibility and efficiency in managing and
Further we click on the Visualization Menu to get different                             understanding the association rule. In our implementation, we
graph related to Association rules. We can further select the                           store the generated rules in the database. Once the rules are



                                                                                  134                               http://sites.google.com/site/ijcsis/
                                                                                                                    ISSN 1947-5500
                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                         Vol. 9, No. 2, February 2011


stored in database, they can be easily handled because of the                                            REFERENCES
SQL capabilities.                                                         [1.]    Fayyad U., Piatetsky-Shapiro G., Smyth P.:”From Data Mining to
                                                                                  Knowledge Discovery: An Overview”, Advances in Knowledge
                                                                                  Discovery and Data Mining, AAAI Press, Menlo park, CA, pp.1-30.
                                                                          [2.]    AGRAWALR,MANNILAH,SRIKANTR,TOIVONENH,&
                                                                                  VERKAMOA.I.(1996), Fast discovery of association rules, Advances
                                                                                  in knowledge discovery and data mining, American Association for
                                                                                  Artificial Intelligence, p. 307-328.
                                                                          [3.]    O. Couturier, E. Mephu Nguifo, and B. Noiret. A formal approach to
                                                                                  occlusion and optimization in association rules visualization. In
                                                                                  Proceedings of VDM of IEEE 9th International Conference on
                                                                                  Information Visualization (IV@VDM’05), Poster, UK, July 2005.
                                                                          [4.]    LIU B., HSU W .,W ANG K., CHEN S. (1999), Visually aided                 rd
                                                                                  exploration of interesting association rules, Proceedings of the 3
                                                                                  Pacific-Asia       Conference     on   Knowledge       Discovery      and
                                                                                  Datamining(PAKDD'99), Beijing, China, p. 380-389.
                                                                          [5.]    BEN YAHIA S., MEPHU NGUIFO E. (2004), Emulating a
                    Figure 6. Visualization effect                                cooperative behaviorthin a generic association rule visualization tool. In
                                                                                  Proceedings ofthe16 IEEE International Conference on Tools with
                                                                                  Artificial Intelligence (ICTAI‘04), BocaRaton, Florida, USA.
                     X.     CONCLUSION                                    [6.]    BLANCHARD J., GUILLET F., & BRIAND H. (2003), Exploratory                th
                                                                                  Visualization for Association Rule Rummaging, Proceedings ofthe4
    The framework proposed will ease the task of association                      International Workshop on Multimedia Data Mining MDM/KDD2003,
rule mining by giving users greater control over the mining                       Washington, D.C., U.S.A., p. 107-114.
task and by improving their ability to interpret the rules,               [7.]    W ONG P.C., W HITNEY P., & THOMAS J. (2000), Visualizing
                                                                                  Association Rules for Text Mining, Proceedings of the1999 IEEE
evaluate their relevance and obtain insight on the knowledge                      Symposium on Information Visualization (INFOVIS‘00), Salt Lake
mined from large datasets. We rely on interactive                                 City, Utah, USA, p. 120-128.
visualizations as an efficient approach to bridge the gap                 [8.]    Grinstein, G. G., Pickett, R. M. and Williams, M., EXVIS: An
between task automation and user control in mining tasks.                         Exploratory Data Visualization Environment. Proceedings of Graphics
                                                                                  Interface ’89 pages 254-261, London, Canada, 1989.
    This study has bridge the gap in educational data analysis            [9.]    Chernoff, H. The use of faces to represent points in k-dimensional
and shows the potential of the association rule mining                            space graphically. Journal of the American Statistical Association 68,
algorithm for enhancing the effectiveness of academic                             342, pages 361-367, 1973.
planners and level advisers in higher institutions of leaning.            [10.]   Levkowitz, H. Color Icons: Merging Color and Texture Perception for
                                                                                  Integrated Visualization of Multiple Parameter, Proceedings of IEEE
The analysis reveals some hidden patterns of students’ which                      Visualization’91 Conference, San Diego, CA, Oct. 1996
could serve as bedrock for academic planners in making                    [11.]   Pickett, R. M. and Grinstein, G. G., Iconographics Displays for
academic decisions and an aid in the curriculum re-structuring                    Visualizing Multidimensional Data. IEEE Conference on Systems,
and modification with a view to improving students’                               Man and Cybernetics. China, 1988.
performance. To adopt this approach a larger number of                    [12.]   Wegenkittl, R., Lffelmann, H., Grller, E., Visualizing the behavior of
students should be considered from the first year to the final                    higher dimensional dynamical systems. Proceedings of the conference
                                                                                  on Visualization ’97, 1997, Phoenix, Arizona, United States
year in the institution. This will surely reveal more interesting
                                                                          [13.]   Christopher, G. Healey, James T. Enns, Large Datasets at a Glance:
patterns. With all these observations, if academic planners can                   Combining Textures and Colors in Scientific Visualization. IEEE
make use of the extracted hidden patterns from students’                          Transactions on Visualization and Computer Graphics, Volume 5,
performances using association rule mining approach, it will                      Issue 2, 1999.
surely help in curriculum re-structuring and also, help in                [14.]   Foley, J., and Ribarsky, W. Next-generation data visualization tools.
                                                                                  Scientific Visualization: Advances and Challenges, L. Rosenblum, Ed.
monitoring the students’ ability. This will enable the academic                   Academic Press, San Diego, California, pages 103-127, 1994.
advisers to guide students properly on courses they should                [15.]   Laidlaw, D. H., Ahrens, E.T., Kremers, D., Avalos, M.J., Jacobs, R.E.,
enroll for. This, eventually, tends to increase the student                       and Readhead, C. Visualizing diffusion tensor images of the mouse
placement rate.                                                                   spinal cord. Proceedings of Visualization ’98, pages 127-134, 1998
                                                                          [16.]   Pickett, R. M. and Grinstein, G. G., Iconographics Displays for
                                                                                  Visualizing Multidimensional Data. IEEE Conference on Systems,
                          XI. FUTURE                                              Man and Cybernetics. China, 1988.

    We conclude by remarking that visualization of association                                          AUTHORS PROFILE
mining results in particular and data mining results in general           Mohammad Kamran is a Software Developer. His primary interests lay in the
is a promising area of future work. Educational, research,                areas of data mining and association rules. Nowadays, he is a research scholar
government and business institute can benefit significantly               in Computer Science at Integral University. His paper summarizes the current
                                                                          state of his thesis work on the field of "study of association rules in large
from the symbiosis of data mining and information                         database.
visualization disciplines.




                                                                    135                                       http://sites.google.com/site/ijcsis/
                                                                                                              ISSN 1947-5500

				
DOCUMENT INFO
Description: The International Journal of Computer Science and Information Security (IJCSIS Vol. 9 No. 2) is a reputable venue for publishing novel ideas, state-of-the-art research results and fundamental advances in all aspects of computer science and information & communication security. IJCSIS is a peer reviewed international journal with a key objective to provide the academic and industrial community a medium for presenting original research and applications related to Computer Science and Information Security. . The core vision of IJCSIS is to disseminate new knowledge and technology for the benefit of everyone ranging from the academic and professional research communities to industry practitioners in a range of topics in computer science & engineering in general and information & communication security, mobile & wireless networking, and wireless communication systems. It also provides a venue for high-calibre researchers, PhD students and professionals to submit on-going research and developments in these areas. . IJCSIS invites authors to submit their original and unpublished work that communicates current research on information assurance and security regarding both the theoretical and methodological aspects, as well as various applications in solving real world information security problems.