The Innovative Application of Multiple Correlation Plane by ijcsiseditor

VIEWS: 117 PAGES: 9

The International Journal of Computer Science and Information Security (IJCSIS) is a well-established publication venue on novel research in computer science and information security. The year 2010 has been very eventful and encouraging for all IJCSIS authors/researchers and IJCSIS technical committee, as we see more and more interest in IJCSIS research publications. IJCSIS is now empowered by over thousands of academics, researchers, authors/reviewers/students and research organizations. Reaching this milestone would not have been possible without the support, feedback, and continuous engagement of our authors and reviewers.

Field coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. ( See monthly Call for Papers)

We are grateful to our reviewers for providing valuable comments. IJCSIS December 2010 issue (Vol. 8, No. 9) has paper acceptance rate of nearly 35%.
We wish everyone a successful scientific research year on 2011.

Available at http://sites.google.com/site/ijcsis/
IJCSIS Vol. 8, No. 9, December 2010 Edition
ISSN 1947-5500 � IJCSIS, USA.

More Info
									                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                            Vol. 8, No. 9, December 2010

                  The Innovative Application of Multiple
                            Correlation plane

         Julaluk Watthananon                          Sageemas Na Wichian                                Anirach Mingkhwan
   Faculty of Information Technology,            College of Industrial Technology,             Faculty of Industrial and Technology
     King Mongkut’s University of                 King Mongkut’s University of                Management, King Mongkut’s University
      Technology North Bangkok,                    Technology North Bangkok,                      of Technology North Bangkok,
           Bangkok, Thailand                            Bangkok, Thailand                               Bangkok, Thailand
      watthananon@hotmail.com                          sgm@kmutnb.ac.th                                  anirach@ieee.org



Abstract—Presentation data with column graph and line graph is               1) Selecting the highest value: classifying quantitative data
a well-known technique used in data explanation to compare and           of each variable, and then selecting the most quantities
show direction that users can easily understand. However, the            variables, for instance, in order to classify books categories
techniques has limitations on the data describing complex with           [3], librarians will normally do on the essence of the books.
multiple relations, that is, if the data contains diverse                Disadvantage of this method is other contents relating to other
relationships and many variables, the efficiency of the                  topics are decreased in the importance and deleted.
presentation will decrease. In this paper, the mathematical
method for multi relations based on Radar graph is proposed.                2) Selecting from the mean: By this method a value data
The position of information approaches on the correlation plane          representative from the mean or neutral value calculating from
referred to the distribution of content and the deep specific            an outcome of added data divided by data amount. This
content. However, the proposed method analyzes the multi                 method is usually employed in research to selecting variables
variants data by plotting in the correlation plane, and compared         representatives. However, it is not suitable for selecting data
with the base line system. The result shows that the performance         with multiple correlations because accurate data cannot be
is higher than other methods in term of accuracy, time and               identified clearly.
features.
                                                                             3) Calculating combined results of directions: this is a
    Keywords-Correlation plane; correlation boundary; correlation        highly successful technique commonly used with data with
plot; Star plot; Radar graph                                             multiple variables [4], [5], [6]. A mathematic process is
                                                                         employed to acquire relation between rectangular and polar
                                                                         coordinates on a radar chart and proper coordinates’ positions
                      I.    INTRODUCTION                                 resulted from calculations of directions and distances of those
                                                                         relations. The authors name these plots data correlation plots.
    In statistics, bar graph and line graph are common types of          They are on correlation plane of connected lines and will
graphs employed to explain data analyses, to compare                     confine the area, create an n axis and divide the plane within
directions and to represent a set of qualitative data with               polar coordinates. The plane in this research is referred to as
correlation between two variables [1]. Nonetheless,                      the correlation plane. The intersection of n axis is called the
comparative analyses of more than two qualitative variables              origin. Intersection of n axes will divide the plane into n parts.
and multiple correlations have been increasingly implemented             Each part is called a correlation boundary, details of which
in many fields of work, namely weather conditions, context               are elaborated in Section 3.
consistency of documents, etc. It is important to have a proper
form of data presentation that can effectively send messages                 Hence, the authors have developed a concept of applying
across to readers. One of the commonly used forms of data                the method of calculating combined results of directions to
presentation is a radar chart that can represent data with               present results in the correlation form as above mentioned
correlation of over two variables in an effective manner due to          definition. Furthermore, efficiency of presentation of
its continuity and its ability to clearly compare many aspects           implementing methods, directions and depth levels of the
of data plot correlations [2]. However, there are a number of            correlation to data with multiple variables was analyzed.
limitations in presenting a larger amount of data with multiple
                                                                             The rest of this paper is organized as follows: In the
correlations. Representatives of those relations need to be
                                                                         section 2 we provide a review of related works about star
sought so as to determine appropriate data positions.
                                                                         graph, polar coordinates, distance between plots, Dewey
    Generally, there are three methods of selecting                      decimal classification and Dewey decimal classification –
representatives of data values with correlation of multiple              Multiple relations. Section 3, 4 and 5 present the definition of
variables. The three methods are as follows:                             correlation such as: correlation plot, correlation plane and




                                                                    61                               http://sites.google.com/site/ijcsis/
                                                                                                     ISSN 1947-5500
                                                               (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                              Vol. 8, No. 9, December 2010
correlation boundary, the concept of application and the                   variable of lines radiating from the center. It means the “data
experiments with the discussion on the results respectively.               length” of each variable. The characteristics of radar graph are
Section 6 is the conclusion.                                               polygons under the frame of circle that shows many data in the
                                                                           same graph, so the principles of creation consist of:
                                                                               1) Determination of the axis: Determination of the axis
              II.   A REVIEW OF RELATED WORKS                              and number of axis displays data where we define one axis for
    Many researchers have studied and designed methods of                  one data, the first axes is in vertical (x-axis) and then
presentation from the information retrieval format that allows             circulates to the east. In addition, users can define the color
users to access and easily understand with the visualization,              [5], [8], weight and name of title.
such as in research Texas State Auditor’s [1] presented how to                 2) Plot the value on the axis: Plot the value on the axis
use graphs representations of the relationships between two or             that starting from the origin (point O) to the circumference by
more variables and the issues of interest. Yeh [2] presented               assigning the position (x, y) on each axis.
star chart showing the target numeric variable among
categories. The results showed the GRADAR procedure
providing a new looks to the clinical data and helped checking             B. Polar coordinates
the data, tables and clinical reports. Wang et al. [4], [5]                    The polar coordinate [4], [5], [6], [9] is a popular method
proposed a new graphical representation for multi-dimensional              used to calculate the appropriate location of multi variances, in
data in multivariate. The experimental results showed the                  order to represent the data referred to multiple relations. The
effectiveness of accurate classification. Klippel et al. [8]               research of Wang et al. [4] shows that, this method can be
proposed that the best visual representations for a data set               classified of data efficiently. In previous works [6], we had
presented are: how to assign variables to rays and to add color            analyzed and computed the correlation of document contents
to rays of a star plot graph. The results shown that the star plot         by DDC-MR method [3]. It showed that position could refer to
graphs were meaningful; the represented data and star plot                 the relationship of multiple variables effectively, so this paper
enhanced color had positive effects on the processing speed.               we used the sum of vector method to represent the multi
Peng et al. [9] presented a new method for shape                           variances as shown in Figure 1.
representation by converting the CSS descriptor circular
vector map and defining two histograms in polar coordinate
system. The advantages of their proposed are simplicity,
execution speed and efficiency of well in clustering the shape
images. Sukmar et al. [12] presented the construction of a new
polygonal interpolant that was based on the concept of natural
neighbors. They used technique to adapt the above
construction on polygonal elements to quad tree meshes to
obtain Co() admissible approximations along edges with
“hanging nodes.” Mohseni et al. [13] presented a method for
treating the coordinate singularity whereby singular
coordinates were redefined. Thus, the results showed the new                                                     O
pole treatment giving spectral convergence and more accurate
for all. Demsar et al. [14] presented a new method for
visualization “FreeViz”. The results showed that the FreeViz                                                                        P
was very fast and can presented high quality with clear class
separation.
    From the researches above, the most effective technique to
present data was a compute of the relationships and presented
a new method for intelligent visualization [4], [5], [9], [12],
[14], [15], [16], [17] of data sets. In this paper, we also applied
the star graph and polar coordinates to improve the                           Figure 1. Example data with multi variances, where n is the number of
classification correlation and presented the position of data.             variance, ri is relationships between rectangular and polar Coordinates (r, θ).
Since a normal plane cannot explain correlations of that
calculated position as a result of the starting point originated                In Figure 1 we show example data with multi variances, let
from variables with multiple correlations. Below are theories              rij denote the distance of a point P from the origin and the
of related works with techniques coming from these diverse                 symbol O is the data length. The shade means area in the
fields.                                                                    computed appropriate position of multiple variable, let  =
                                                                           angle between the radial line for P to O and the given line “
A. Star graph                                                              = 0”, a kind of positive axis for our polar coordinate system
    The star graph (can call radar graph or spider graph) is a             and R is the distance from the point P to the origin. Polar
technique used to represent graphical data analysis with all               coordinates are defined in terms of ordinary Cartesian
variables in multivariate data sets. It consists of a different            coordinates by computing and connecting the n points Pij, for i
                                                                           = 1,…,n. It is calculated by using equation as follows:



                                                                      62                                    http://sites.google.com/site/ijcsis/
                                                                                                            ISSN 1947-5500
                                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                               Vol. 8, No. 9, December 2010
                                                                           countries. The Dewey decimal system divides the knowledge
                             x ij  r cos  i                             into 10 classes, in each class it is divided into 10 sub-classes
                      Pij                                     (1)
                                                                           and in each sub-class it is divided into 10 divisions
                             y ij  r sin  i                             accordingly. By using numbers as symbols with the purpose of
                                                                           easy to remember, it is popular to use with more than 30
    where r ≥ 0 0 ≤  < 2, that every point P(xij, yij) in the            languages translation around the world.
ordinary xy–plane (correlation plane) can be rewrite to (r, )-
answer which is, is a result of the fact of P lines on the                 E. Dewey decimal classification – Multiple relaitons
circumference.                                                                 Dewey decimal classification – Multiple Relations or we
    From these multiple relations, we called correlation of data           call DDC-MR. It is a technical analysis classification multiple
on the coordinates of our point P satisfy the relation xij2 + yij2         relations which was developed between Search engine and
= rij2 (cos2i + sin2i)  xij2 + yij2 = rij2 (so that, as we              Dewey decimals classification. It focuses on the analysis of
indicated, the point P(xij, yij) and (cos2i + sin2i) = 1) is on a        proportion in the content [3]. By using the library standard
circle of radius r centered at O). So, we can find  by solving            classification schemes, one keyword will be able to classify as
the equation as:                                                           deep as 4 levels which assigns number for notation [6], [7].
                                                                           This scheme refers to DDC that does divide human knowledge
                              yij              y                         into 10 classes in the first level, 100 subclasses in the second
                 tan i           i  arctan ij      ,     (2)        level, 1000 divisions in the third level and the last level or leaf
                              xij              x        
                                                ij                       node contains more than 10000 sections.


    where  in the interval 0 ≤  < 2, let arctangent denoting
the function by arctan we see that:                                                       III.   DEFINITION TO CORRELATION
                                                                               Our study of implementing methods is to study of
                                                                           correlation deformation connected by related radar graphs, and
                                 y                                    subsequently replaced by polar coordinates. One main concern
              y       arctan  x    if     ,
                                          2      2                      of the study of implementing methods is to consider the
  i  arctan  ij                                           (3)
              x                               3                        shapes, quantities of content correlations, distances,
                      arctan     if
                                 y
               ij                              ,                      correlation positions and directions of determined coordinates.
                       
                               x       2       2                        Thus, in this research, the authors provide definitions for the
                                                                           purpose of comparing correlations before and after
   with the interpretation that  = ±/2 corresponds to points             deformation and identifying advantages and implementing
on the real y–axis and  = 0 corresponds to points on the real             methods. For instance, a document pertaining to many
x–axis, that we called correlation plot.                                   sciences, when examined to find out whether it is a suitable
                                                                           representative of documents, has to be adapted so that the plot
                                                                           position is found and the plot of intrinsic correlation on the
C. The distance between plots                                              plane and boundary is consistent with that correlation. As
    We can use the theoretical Pythagorus1 method to compute               such, a normal plane cannot explain correlations of that
the distance between points in the plane in order to find the              calculated position because the starting point originates from
distance d. In Data mining we call Centroid [7] to calculate               variables with multiple correlations. Below are definitions of
using equation as follows:                                                 keywords.
                                1 n
                         di       c  vi
                                n i 1
                                                                (4)        A. Correlation plot
                                                                               A correlation plot indicates a position of coordinates
                                                                           derived from a calculation of combined values of every
    where C is the centroid or the correlation plot (xCoordinate,          correlation so that one position on the same area is identified.
yCoordinate), Vi is coordinates in the circumference (xi, yi), and         The point resulting from that calculation is titled in this
|C - Vi| is the distance between plots with the coordinates of i           research as a correlation plot, which is used to show or
in the circumference, we see that:                                         represent a position of each data set on the correlation plane
                                                                           referring to any correlation with n relevant contents.
                                                                           Correlations can be demonstrated in pairs (r, ), where the
   c  vi       xcoordinate  xi 2   ycoordinate  yi 2   (5)        first pair refers to only one plot and represents only one data
                                                                           set of distance and directional correlations of variables on
                                                                           polar coordinates. For example, one document containing a
D. Dewey decimal classification                                            number of related contents is represented by n axis (with
   Dewey decimal classification was developed by Melvil                    results shown in the form of a radar graph), and then
Dewey in 1876. It is widely used in the library. Besides, there            calculated by mutual tension. Consequently, one plot in the
are many kinds of the books which unlimited of any field.                  form of (r,) was acquired as seen in Figure 1.
That is the system used in more than 200,000 libraries in 135
   1
    http://en.wikipedia.org/wiki/Pythagorean_theorem



                                                                      63                               http://sites.google.com/site/ijcsis/
                                                                                                       ISSN 1947-5500
                                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                           Vol. 8, No. 9, December 2010
B. Correlation plane                                                                        Figure 2 shows examples of determination of correlation
    A correlation plane indicates the area where coordinates                            plots on correlation planes, where correlation boundary in
derived from calculated correlation points of data are located.                         class X1,…, n means the range of correlation boundary in each
The points require locations and addresses so normal planes                             science. From the above example, n refers to 10 sciences, with
cannot be applied in this research. The number of occurring                             the first science referring to a general class that has the
correlations results from variables with multiple correlations.                         correlation boundary of 0 – 35. The second science refers to
Therefore, the calculated values of pairs were not solely data                          a philosophy class with the correlation boundary of 36 – 71.
derived from (x, y) axes, but also data resulted from tension                           The third science refers to a religion class with the correlation
among n axes that divided the plane within polar coordinates.                           boundary of 72 – 107while the fourth science refers to a
In this research, the plane is called a correlation plane which                         social sciences class with the correlation boundary of 108 –
is essential to distances and directional correlations especially                       143. The fifth science refers to a language class with the
loadings and depth directions. The intersection of n axis is                            correlation boundary of 144 – 179and the sixth science
called the origin and intersecting n axes divide the plane into n                       refers to a pure science and mathematics class with the
parts. Each part is called a correlation boundary.                                      correlation boundary of 180 – 215. The seventh science
                                                                                        refers to a technology and applied science class with the
C. Correlation boundary                                                                 correlation boundary of 216 – 251 while the eight science
                                                                                        refers to the arts and recreation class with the correlation
    A correlation boundary indicates angle values from lines
                                                                                        boundary of 252 – 287. The next science is a literature class
appearing on a correlation plane by determining the boundary
of measurements of angles between x axis of the correlation                             with the correlation boundary of 288 – 323. And the last
plane and lines appearing on the plane. Boundaries are divided                          science is a history and geography class with the correlation
according to categories of applications. In this research, a                            boundary of 324 – 360, respectively by Dewey decimal
correlation boundary is used to determine the correlation area                          classification (DDC).
and the content correlation level of each category. The area                                Positions of occurring points, or correlation plots, can be
which is close to the center (O = Origin) represents low                                employed to refer to variables with n correlations. Each
density of the content of that category while the area which is                         correlation differs in quantity and direction leading to different
far from the center represents high density of the content of                           distances between coordinates on the correlation plane and the
that category or specificity highly consistent to that particular                       origin. Therefore, in accordance with the DDC classification
category. This is applicable for categorization of correlations                         of books, a widely practiced technique among libraries, if each
with DDC-MR [3], [6]. For example, in order to divide the                               plot is replaced by a set of books, the calculated correlation
correlation boundary into 10 main scientific categories, each                           plot will be replaced by related contents of books, and the
science has the width of 36 and the first correlation boundary                         correlation plane will be replaced by areas of correlation of
starts from 0. Then, a counterclockwise rotation was done in                           scientific content structure respectively. Dense plots are lines
order to divide sessions and determine the correlation                                  appearing in the direction with correlations within the
boundary of the subsequent categories starting at 36, 72,                             correlation boundary. The plot which is very far from the
108, 144, 180, 216, 252, 288 and 324, respectively, as                           center means that a book containing very specific and in-depth
shown in Figure 2.                                                                      contents of that science. Since force loading and directions of
                                                                                        variables are highly related and the plot which is very close to
                                                                                        the center also means that the book is specific to that science,
                                                                                        but does not have contents related to many sciences, as seen in
                                                                                        Figure 2 (#1 and #2), if the loading and direction in each
                                                                                        science are highly related in terms of proportion, that book
                                                                                        will have contents related to many sciences. Additionally,
                                                                                        redundant plots will bring about a different pattern of
                                                                                        correlations of books with related contents. We, then, realize
                                                                                        which books have the same content and what kind of content
                                                                                        they have. It is possible to state that correlation plots,
                                                                                        correlation planes and correlation boundaries have
                                                                                        continuously related meanings and are major basic elements of
                                                                                        application of multiple correlations.


                                                                                                        IV.   CONCEPT TO APPLICATION

                                                                                        A. Conceptual
                                                                                            The concept of calculating variables that have correlation
Figure 2. Example data for correlation plot refer to the distribution of content        is an analyzing technique developed from a mathematic
       related and the deep specific of content in the correlation plane.
                                                                                        method: a proper representative of data is identified by
                                                                                        calculating total tension values and presenting them in the



                                                                                   64                               http://sites.google.com/site/ijcsis/
                                                                                                                    ISSN 1947-5500
                                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                        Vol. 8, No. 9, December 2010
form of polar coordinates of correlation. The objectives of this                     of the data will be changed along with the number of sections
technique are to demonstrate similarity of data in the same                          of competency. If the size of competency sections is 10 of
area and explicitly explain levels of relations that, the process                    DDC-MR, then the same size of boundary can be applied, but
as follows in Figure 3.                                                              if the sections decrease or increase in size, the degree size of
                                                                                     the applied boundary will change accordingly.
                           Correlation plots
            Radar graph                          Sum vectors                                        V.    EXPERIMENTS AND RESULTS
                                                                                         This section performance of correlation plot, correlation
                                                                                     plane and correlation boundary are shown three ways. The
                                                                                     first way is complexity of correlation. It is used to explain the
                          Correlation planes                                         multi variances with multiple relations; if these correlations
                                                                                     are high performance they should represent the different data
                  (r,)                       Polar coordinates                      in that correlation plane. The second way is accuracy of
                                                                                     classifying and analyzing with the different multiple relations;
                                                                                     we test correctness by articles, documents library and
                                                                                     competency data. And the last way is features to use these
                           Correlation plots                                         correlations classification.
         Divide the degree                          Cluster
                                                                                     A. Data sets
                                                                                         In our experiments, we used a collection of multiple
Figure 3. The concept of Correlation plots, Correlation plane and Correlation        relations of data from three examples given below.
                                 boundary.
                                                                                          Academic articles: This data from a national conference
     Changing all relations of variables to correlation plots is                    disciplines in the computer and information technology which
a process of summing vectors, where all classified correlations                      were published during 2005 to 2010, and we provided the
that can be clearly seen on a radar graph of one document are                        dataset used 3 sections: Title, Abstract and Keyword. This
                                                                                     data has multiple relations by DDC-MR in level 3 of DDC to
calculated so that one plot with the pair value of (r, ) is
                                                                                     classify 1,000 classes. The total number of articles is 700.
acquired and represents all relations of that document.
                                                                                          Documents library: This data from the document library
     Locating the position of a document with correlation
                                                                                     in the multidisciplinary amounting to 100 documents and we
plane, as seen in the above process, yields a pair value of (r, )                   provided the data set used 3 sections: Title, Table of content
that represents the document. The pair value is then plotted,                        and Index. Each document contains multiple relations links to
using the principle of polar coordinate determining the plane                        other content which are related to the document.
and (x, y) axes instead of applying its value only. Therefore, if
we want to present several documents simultaneously, we                                   Competency data: This data from evaluate 10 out of 18
have to have a number of axes to indicate the position of each                       principles of competencies evaluation Spencer [10], [11], to
document and determine a correlation plane so that all                               select personnel basic competencies. There are: Achievement
documents can be at their (r, ) values on the determined plane                      orientation, Analytical thinking, Conceptual thinking,
in that particular area. These way data sets are overlapped and                      Customer service orientation, Developing others, Impact and
not presented one by one. As such, no matter how the (r,)                           influence, Information seeking, Teamwork and cooperation,
value is calculated, that document will always be on that axis.                      Team leadership and Self-confidence.
     Identifying the boundary section of an area is a process
of grouping correlation planes used to indicate the position of                      B. Experiments
each document and overlapping a number of polar coordinates                              We used the correlation plot, correlation plane and
so that several unseen axes are produced. Therefore, to                              correlation boundary provided by the multiple relations of
categorize that data sets or document in a clear manner, the                         multi variances to computed our experiments. The correlation
correlation boundary of those axes needs to be determined in                         plot is a coordinates from the computed of all relationships,
accordance with the number of sections of the sample group                           the correlation plane is an area coordinates arising from
                                                                                     correlation plot and the correlation boundary is a range
B. Analysis                                                                          between the degrees of set. In this experiment, we applied the
                                                                                     radar graph provided under the correlation and set the number
   If we use this method to analyze and categorize data, as                          of academic articles to 1,000 classes, set the documents library
seen in Section 3 with examples, and use the boundary on                             to 100 classes and set the number of competency data to 10
DDC-MR to categorize the data on the correlation plane, the                          classes. For text classification process of academic articles and
document analyzed by the DDC-MR process will be able to                              documents library, we used DDC-MR and competency data
locate that position. The correlation boundary will be 10                            from analytical to perform the experiments.
groups in the DDC main section. However, if we apply this
concept and plot the boundary by competency, the boundary



                                                                                65                              http://sites.google.com/site/ijcsis/
                                                                                                                ISSN 1947-5500
                                                                  (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                 Vol. 8, No. 9, December 2010
C. Evaluation Metrics                                                         accuracy such as: C1 = 97.66%, C2 = 93.75%, C3 = 93.75%,
    The standard performance metrics for evaluation the                       C4 = 92.97%, C5 = 92.19%, C6 = 86.72%, C7 = 88.50%, C8
classification used in the experiments is accuracy. These                     = 93.75%, C9 = 97.66 and C10 = 98.44%, which we
metrics assume the prediction process evaluation based on the                 considering all clusters. The accuracy of correlation plot is
counts of test records correctly and incorrectly predicted. That              close to K-Mean clustering. It means both of method can be
shows the confusion matrix for a binary classification                        used to classify data in this research. If we compared in all
problem. Each entry fij in the confusion matrix denotes the                   clusters, we find out that K-Mean clustering had problem in 2
number of records from class i predicted to be of class j, which              clusters; that were C5 and C6, The accuracy was less than
is defined as follows:                                                        80% while the accuracy correlation plot method were more
                                                                              than 80% in all clusters. Furthermore, the accuracy of
                                                                              Hierarchical clustering and Factor analyses were similar and
              the number of correctly classified test documents    (6)        while accuracy of some clusters was less than 80%. This
 Accuracy 
                      total number of test documents                          means the effectiveness of cluster is low, as shown in Figure
                                                                              4.
    where Accuracy represents the percentage of correct
predictions in total predictions. We used Accuracy as our
choice of evaluation metric to report prediction experiments
because it can be expressed in terms and most classification
seek models that attain the highest accuracy when applied to
the test set. In our previous experiments we have seen that
Accuracy provide the experimental schemes in terms of
prediction performance.

D. Experimental results
    The innovation of our paper is presented how to analyze
multi variants data by plot in the correlation plane. In this
research, the experimental results of document relevant focus
on three issues such as: accuracy to classification, time in
process and the features interesting.                                                  Figure 4. Comparison the accuracy of four classifications.

     Accuracy: The experimental results on accuracy
classification by comparing between Correlation plot, K-Mean                       Time: The experimental results of time in process shows
clustering, Hierarchical clustering and Factor analysis. We                   that, if we applied to the Polar coordinate, the correlation plot
found out that, the correlation plot method showed that the                   methods had best effective use when comparing to other
accuracy is higher than other methods. As shown in Table 1.                   methods. As shown in Table 2.
TABLE I.      COMPARISON THE ACCURACY OF CLASSIFICATION RESULTS                              TABLE II.        TIME COMPARISON RESULTS
                         IN EACH CLASS.


                              Accuracy (%)                                                                      Time (Second)
 Model                                                                         Model      Correlation                                         Factor
           Correlation                                   Factor                                              K-Mean         Hierarchy
                            K-Mean       Hierarchy                                           plot                                            Analysis
              plot                                      Analysis
             97.66           95.31         97.66         92.97                  100        0.0078             0.0262          0.1094          0.0469
  C1
             93.75           97.66         97.66         93.75                  200        0.0352             0.0625          0.168           0.0859
  C2
                                                                                300        0.0859             0.1055          0.1445          0.1953
  C3         93.75           90.63         86.72         87.50
             92.97           92.97         92.19         86.72                  400        0.0664             0.1016          0.1953          0.2500
  C4
             92.19           78.91         64.06         59.38                  500        0.0977             0.1133          0.1836          0.4258
  C5
             86.72           79.69         67.19         72.88                  600        0.1029             0.1159          0.2013          0.4369
  C6
                                                                                700        0.1033             0.1309          0.2145          0.5012
  C7         88.50           88.28         85.94         70.31
  C8         93.75           85.16         93.75         92.19
             97.66           95.31         91.41         92.97                   Table 2 shows experimental results comparing the time in
  C9
                                                                              processing with 4 methods. We test with difference 7 data
  C10        98.44           97.66         96.86         97.66                sets size of data. The first data sets had 100 documents, the
                                                                              second had 200 documents, the third had 300 documents, the
   Table 1 shows experimental results comparing the                           fourth had 400 documents, the fifth had 500 documents, the
accuracy of 10 clusters describing how the Correlation plot                   sixth had 600 documents and the last data sets had 700
method had the best accuracy. Thus, if we compared with                       documents. The results described correlation plot used the
other methods, almost every parameter has a very high                         less time or minimum in processing. Thus, if we analysis with



                                                                         66                                   http://sites.google.com/site/ijcsis/
                                                                                                              ISSN 1947-5500
                                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                    Vol. 8, No. 9, December 2010
the 7 data sets using similar or nearest time in processing, the                 TABLE IV. FEATURES COMPARISION RESULTS ON CLASSIFICATION.
increasing of the data amount does not effect to time in                                                                              Model
processing because this method does not need to recalculate
every time whenever adding new information. Thus, original




                                                                                                                         Correlatio




                                                                                                                                               Hierarchy
                                                                                                                                      K-Mean




                                                                                                                                                           analysis
                                                                                                                                                            Factor
                                                                                                                          n plot
data will remain the same position and same cluster. Factor                                   Features
analysis shows the results effect with time in processing, if
we change the amount of data it will spend more time, as
shown in Figure 5.
                                                                                1.  Easy to understand                                                    
                                                                                2.  Segmentation is clear                                                 
                                                                                3.  To break the color of the group                                       
                                                                                4.  Display hierarchical data                                             
                                                                                5.  Display depth of information                                          
                                                                                6.  Display specific information on
                                                                                    each group                                                            
                                                                                7. Display the direction and distance                                      
                                                                                8. Display data in multiple groups
                                                                                    simultaneously                                                        
                                                                                9. The ability to compare data                                            
                                                                                10. Do not adjust the scale display                                       
                                                                                11. Do constants in the group                                             
                                                                                12. The amount of data does not affect
           Figure 5. Time comparison of four classifications.                       to the process                                                        
                                                                                13. Time in process < 15 second*                                          
                                                                                14. Accuracy > 85% *                                                      
    Table 3 shows experimental results comparing accuracy                                                       * From evaluation results in Table 3
and time in processing between Correlation plot with K-Mean
clustering, Hierarchical clustering and Factor analysis. This
is to test performance cluster with 700 articles in the first time                 Features: From the experimental results and analysis
and we repeated the second time with 100 documents library.                   with the performance of classification shows that, if we apply
The results shown that, correlation plot were the most                        data with the Polar coordinate it will increase ability of
accuracy in academic article which was 93.54% while                           classification and will get more features interesting. From
document library was 90.33%. The process in minimum for                       table 4, we used the same data set compared with the elements
academic article was 0.1033 and document library was                          and dominant features of correlation plot method effected to
0.0452. In this research, the value of correlation plot was                   more classification interesting which includes 14 features as
similar with K-Mean clustering in accuracy and time in                        follows: 1) Easy to understand when we used it in the first
processing, this means that the two methods can be used to                    time. 2) Segmentation is clear because we had the line
classification in this data sets. Furthermore, Factor analysis                segment. 3) To break the color of the group for more separated
had the lowest accuracy in academic article which was                         data. 4) Display hierarchical of data in each cluster because it
84.63% and document library which was 78.54%. They are                        had radius of the circle line to compute. 5) Display depth of
lower than the statistical acceptance and lower with the                      information to relevant in each cluster and could refer to the
criteria by the researcher which the accuracy must be greater                 distribution of content related. 6) Display specific information
than 80%. In addition, it used the most of time in processing                 on each group, if we plot near the center, it means that the
academic article which was 0.5012 and document library                        content is similar with the other cluster but, if we plot far from
which was 0.1959. They are over than the time criteria that                   the center, it means that the content is more specific in the
must be less than 15 seconds.                                                 cluster. 7) Display the direction and distance in each cluster, if
                                                                              we know that we can predict the road map and fulfill the
                                                                              knowledge in each cluster. 8) Display data in multiple groups
        TABLE III. EVALUATION RESULTS ON CLASSIFICATION                       simultaneously. It means that the ability to display more than
                                                                              in one cluster in the same time such as: 4 clusters or 10
                          Accuracy (%)            Time (second)               clusters depending on the desired number of clusters. 9) The
       Model             Article  Doc.            Article  Doc.               ability to compare data with pie charts is useful for comparing
                         (700)    (100)           (700)    (100)              the contributions of different clusters to the total value of a
                                                                              variable. We can also compare two identical nominal data sets
 Correlation plot        93.54        90.33        0.1033       0.0452        under different conditions, especially at different time. 10) Do
 K-Mean                  90.16        88.63        0.1309       0.0959        not adjust the scale display when we show all information
 Hierarchy               87.34        85.21        0.2145       0.1594        because in some visualization [7] we need for larger visualize
                                                                              to clearly information. 11) Do constants in the group mean
 Factor Analysis         84.63        78.54        0.5012       0.1959        that if we added new data, the original data plot is in the same
                                                                              cluster and does not compute to new cluster. This method was



                                                                         67                               http://sites.google.com/site/ijcsis/
                                                                                                          ISSN 1947-5500
                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                           Vol. 8, No. 9, December 2010
difference with other methods because every parameter had                   4) Advantages of data mining: previous research placed
affected to the classification. 12) The amount of data does not         emphasis on sorting data or categorizing relevant data into the
affect to the process. From the results in Figure 5 it can be           same group and paid no attention to relations of data contents.
proved that the number of data does not a problem for this              Correlation application, apart from being able to precisely and
method. 13) Time in process < 15 second*, and 14) Accuracy              rapidly sort data (referring to test results in Table 3), can
> 85% * from the evaluation results in Table 3 shows the                explain appearing relations of information at the content level
performance of the experimental.                                        and classify levels of relations of information within a group
                                                                        in a clear manner.
                      VI.   CONCLUSION                                                                 REFERENCES
    In an attempt to improve the performance of correlation             [1]    “Data Analysis: Displaying Data – Graphs – 1”, Texas State Auditor’s
plot, correlation plane and correlation boundary, we propose                   Office, Methodology Manual, rev. 5/95,
an innovative method of multiple correlation application, a                    http://www.preciousheart.net/chaplaincy/Auditor_Manual/11graphd.pdf
                                                                        [2]    S. T. Yen, “Using Radar chart to display Clinical data”.
method of polar coordinate application so that data are                 [3]    W. Lertmahakiat and A. Mingkhwan. “Information Retrieval by
categorized in an effective manner regardless of their type and                Multiple DDC Relational Classification,” in Proceeding on NCCIT’08,
quantity in the sample group. With correlation plot, correlation               the 4th National Conference on Computing and Information Technology.
plane and correlation boundary, we can present data that are                   Thailand. 22 – 23 May 2008.
representatives of that relation: they are applied to identify          [4]    J. Wang, J. Li and W. Hong, “Feature Extraction and Classification for
                                                                               Graphical Representations of Data,” Springer-Verlag Berlin Heidelberg,
relations of data and categorize data in many aspects.                         vol. 5226, pp. 506 – 513, 2008.
According to a test, it is found out that if we use boundary on         [5]    J. Wang, W. Hong and X. Li, “The New Graphical Features of Star Plot
DDC-MR to categorize a data sets, it could be categorized into                 for K Nearest Neighbor Classifier,” Springer-Verlag Berlin Heidelberg,
10 groups of the DDC main section. However, if this concept                    vol. 4682/2007, pp. 926 – 933, July 2007.
is applied to other data sets, the degree of boundary will be           [6]    J. Watthananon and A. Mingkhwan, “Multiple Relation Knowledge
                                                                               Mapping Using Rectangular and Polar Coordinates,” in Proceeding on
changed along with the number of sections of those data sets.                  PGNET’09, the 10th Annual Postgraduate Symposium on The
The authors believe that correlation application can give                      Convergence of Telecommunications, Networking and Broadcasting,
practical explanation on analysis and data presentation to users               Liverpool John Moores University, UK. 21 – 22 June 2009.
as follows.                                                             [7]    J. Watthananon and A. Mingkhwan. “A Magnify Classification
                                                                               technique for group of Knowledge Using DDC-MR,” in Proceeding on
    1) Advantages to users: correlation application is clear,                  NCCIT’09, the 5th National Conference on Computing and Information
comprehensive and reliable. It is a data presentation method in                Technology. Thailand. 22 – 23 May 2009.
the form of bar graphs with explicit lines dividing boundaries          [8]    A. Klippel, F. Hardisty, R. Li and C. Weaver, “Color Enhanced Star Plot
                                                                               Glyphs Can Salient Shape Characteristics be Overcom?,” Journal of
of each group. Different colors can be applied to different bars
                                                                               Cartographica, 44:3, pp. 217 - 231, 2009.
for the aesthetic and distinct purposes. With correlation               [9]    J. Peng, W. Yang and Y. Li, “Shape Classification Based on Histogram
application, problems of various levels of users’ basic                        Representation in Curvature Scale Space,” IEEE Transaction on
knowledge can be reduced because it is a familiar presentation                 Computational Intelligence and Security, vol. 2, pp. 1722 – 1725, 2006.
method widely used in data analysis explanation.                        [10]   L. M., Spencer, S. M., Spencer. “Competence at Work – Models for
                                                                               Superior Performance,” Wiley, New York, NY, 1993.
    2) Advantages of knowledge management in organizations              [11]   P. Chintanaporn and A. Mingkhwan, “A Study of Information Retrieval
with an emphasis on enhancement of knowledge storage                           Algorithm for Multiple Relation Competency Team Recruitment,” in
capacity: most organizations focus on storing, collecting,                     Proceeding on PGNET’09, the 10th Annual Postgraduate Symposium on
                                                                               The Convergence of Telecommuni-cations, Networking and
exchanging, transferring and publishing data and do not                        Broadcasting, Liverpool John Moores University, UK. 21 – 22 June
consider how those data are related. Hence, in order to                        2009.
optimize organizations’ benefits, relations of those data should        [12]   N. Sukumar and A. Tabarraei, “Polygonal Interpolants: Construction and
be identified. This correlation method can help analyze and                    Adaptive Computations on Quadtree Meshes,” in Proceeding on
present different levels of data with related contents so                      ECCOMAS 2004, July 2004, pp. 1 – 9.
                                                                        [13]   K. Mohseni and T. Colonius, “Numerical Treatment of Polar Coordinate
knowledge. It can be implemented in a useful way. If we can                    Signularities,” Journal of Computational Physics, vol. 157, pp. 787 –
effectively explain data, those data will support our work and                 795, 2000.
foster ongoing changes.                                                 [14]   J. Demsar, G. Leban and B. Zupan, “FreeViz-An intelligent multivariate
                                                                               visualization approach to explorative analysis of biomedical data,”
    3) Advantages of integration: different kinds of                           Journal of Biomedical Informations, vol. 40, pp. 661 – 671, 2007.
information that is related or has the same content are applied         [15]   M. Meyer, H. Lee, A. Barr and M. Desbrun, “Generalized Barycentric
and transformed into new knowledge that is easy for users to                   Coordinates on Irregular Polygons,” Journal of Graphics Tools, vol. 7,
understand and implement. The authors believe that                             pp. 13 – 22, November 2002.
                                                                        [16]   A. Klippel, F. Hardisty, and C. Weaver, “Star Plots: How Shape
correlation application can help synthesize, analyze and                       Characteristics Influence Classification Tasks”. Cartography and
explain connections of information contents. For example,                      Geographic Information Science, 36:2, pp. 149 - 163, 2009.
relations of different courses can by identified by correlation         [17]   S. N. Patnaik and D. A. Hopkins, “Completed Beltrami-Michell
application: courses that are close together, overlapped or                    formulation in Polar coordinates,” International Journal of Physical
missing can be replaced by one another because of their high                   Sciences, vol. 2 (5), pp. 128 – 139, May 2007.
similarity, or courses can be combined into one new course
relevant to former basic courses.




                                                                   68                                     http://sites.google.com/site/ijcsis/
                                                                                                          ISSN 1947-5500
                                                   (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                  Vol. 8, No. 9, December 2010
     AUTHORS PROFILE

Julaluk Watthananon received B.Sc. degree in
Information System from Rajamangala University of
Technology Thanyaburi, M.S. degree in Information
Technology from King Mongkut’s University of
Technology North Bangkok. Currently, she is a Ph.D.
candidate in the Facullty of Information Technology at
King Mongkut’s University of Technology North
Bangkok. Her current research interests Knowledge
Management, Data Mining and Information Retrieval.

Sageemas Na Wichian received Ph. D. in Educational
Research Methodology at Chulalongkorn University.
She has been an educator for over ten years, teaching in
universities in the areas of diversity, Psychology and
Research for Information Technology. She is currently
working as a lecturer at King Mongkut’s University of
Technology North Bangkok, Thailand. Her research
focuses is on advanced research methodology, Industrial
and Organizational psychology.

Anirach Mingkhwan received B.Sc. degree in
Comupter Science from King Mongkut’s University of
Technology North Bangkok, M.Sc. degree in Computer
and Information Technology from King Mongkut’s
Institute of Technology Ladkrabang and Ph.D. degree in
Computer Network from Liverpool John Moores
University. His current research interests Computer
Network and Information Security, Wireless and Mobile
Ad Hoc Networks, Knowledge management and
Ontology. Currently, he is an Assistant Professor in the
Faculty of Industrial and Tecnology Management at
King Mongkut’s University of Technology North
Bangkok, Thailand.




                                                           69                           http://sites.google.com/site/ijcsis/
                                                                                        ISSN 1947-5500

								
To top