VIEWS: 117 PAGES: 9 CATEGORY: Emerging Technologies POSTED ON: 1/19/2011
The International Journal of Computer Science and Information Security (IJCSIS) is a well-established publication venue on novel research in computer science and information security. The year 2010 has been very eventful and encouraging for all IJCSIS authors/researchers and IJCSIS technical committee, as we see more and more interest in IJCSIS research publications. IJCSIS is now empowered by over thousands of academics, researchers, authors/reviewers/students and research organizations. Reaching this milestone would not have been possible without the support, feedback, and continuous engagement of our authors and reviewers.
Field coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. ( See monthly Call for Papers)
We are grateful to our reviewers for providing valuable comments. IJCSIS December 2010 issue (Vol. 8, No. 9) has paper acceptance rate of nearly 35%.
We wish everyone a successful scientific research year on 2011.
Available at http://sites.google.com/site/ijcsis/
IJCSIS Vol. 8, No. 9, December 2010 Edition
ISSN 1947-5500 � IJCSIS, USA.
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 The Innovative Application of Multiple Correlation plane Julaluk Watthananon Sageemas Na Wichian Anirach Mingkhwan Faculty of Information Technology, College of Industrial Technology, Faculty of Industrial and Technology King Mongkut’s University of King Mongkut’s University of Management, King Mongkut’s University Technology North Bangkok, Technology North Bangkok, of Technology North Bangkok, Bangkok, Thailand Bangkok, Thailand Bangkok, Thailand watthananon@hotmail.com sgm@kmutnb.ac.th anirach@ieee.org Abstract—Presentation data with column graph and line graph is 1) Selecting the highest value: classifying quantitative data a well-known technique used in data explanation to compare and of each variable, and then selecting the most quantities show direction that users can easily understand. However, the variables, for instance, in order to classify books categories techniques has limitations on the data describing complex with [3], librarians will normally do on the essence of the books. multiple relations, that is, if the data contains diverse Disadvantage of this method is other contents relating to other relationships and many variables, the efficiency of the topics are decreased in the importance and deleted. presentation will decrease. In this paper, the mathematical method for multi relations based on Radar graph is proposed. 2) Selecting from the mean: By this method a value data The position of information approaches on the correlation plane representative from the mean or neutral value calculating from referred to the distribution of content and the deep specific an outcome of added data divided by data amount. This content. However, the proposed method analyzes the multi method is usually employed in research to selecting variables variants data by plotting in the correlation plane, and compared representatives. However, it is not suitable for selecting data with the base line system. The result shows that the performance with multiple correlations because accurate data cannot be is higher than other methods in term of accuracy, time and identified clearly. features. 3) Calculating combined results of directions: this is a Keywords-Correlation plane; correlation boundary; correlation highly successful technique commonly used with data with plot; Star plot; Radar graph multiple variables [4], [5], [6]. A mathematic process is employed to acquire relation between rectangular and polar coordinates on a radar chart and proper coordinates’ positions I. INTRODUCTION resulted from calculations of directions and distances of those relations. The authors name these plots data correlation plots. In statistics, bar graph and line graph are common types of They are on correlation plane of connected lines and will graphs employed to explain data analyses, to compare confine the area, create an n axis and divide the plane within directions and to represent a set of qualitative data with polar coordinates. The plane in this research is referred to as correlation between two variables [1]. Nonetheless, the correlation plane. The intersection of n axis is called the comparative analyses of more than two qualitative variables origin. Intersection of n axes will divide the plane into n parts. and multiple correlations have been increasingly implemented Each part is called a correlation boundary, details of which in many fields of work, namely weather conditions, context are elaborated in Section 3. consistency of documents, etc. It is important to have a proper form of data presentation that can effectively send messages Hence, the authors have developed a concept of applying across to readers. One of the commonly used forms of data the method of calculating combined results of directions to presentation is a radar chart that can represent data with present results in the correlation form as above mentioned correlation of over two variables in an effective manner due to definition. Furthermore, efficiency of presentation of its continuity and its ability to clearly compare many aspects implementing methods, directions and depth levels of the of data plot correlations [2]. However, there are a number of correlation to data with multiple variables was analyzed. limitations in presenting a larger amount of data with multiple The rest of this paper is organized as follows: In the correlations. Representatives of those relations need to be section 2 we provide a review of related works about star sought so as to determine appropriate data positions. graph, polar coordinates, distance between plots, Dewey Generally, there are three methods of selecting decimal classification and Dewey decimal classification – representatives of data values with correlation of multiple Multiple relations. Section 3, 4 and 5 present the definition of variables. The three methods are as follows: correlation such as: correlation plot, correlation plane and 61 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 correlation boundary, the concept of application and the variable of lines radiating from the center. It means the “data experiments with the discussion on the results respectively. length” of each variable. The characteristics of radar graph are Section 6 is the conclusion. polygons under the frame of circle that shows many data in the same graph, so the principles of creation consist of: 1) Determination of the axis: Determination of the axis II. A REVIEW OF RELATED WORKS and number of axis displays data where we define one axis for Many researchers have studied and designed methods of one data, the first axes is in vertical (x-axis) and then presentation from the information retrieval format that allows circulates to the east. In addition, users can define the color users to access and easily understand with the visualization, [5], [8], weight and name of title. such as in research Texas State Auditor’s [1] presented how to 2) Plot the value on the axis: Plot the value on the axis use graphs representations of the relationships between two or that starting from the origin (point O) to the circumference by more variables and the issues of interest. Yeh [2] presented assigning the position (x, y) on each axis. star chart showing the target numeric variable among categories. The results showed the GRADAR procedure providing a new looks to the clinical data and helped checking B. Polar coordinates the data, tables and clinical reports. Wang et al. [4], [5] The polar coordinate [4], [5], [6], [9] is a popular method proposed a new graphical representation for multi-dimensional used to calculate the appropriate location of multi variances, in data in multivariate. The experimental results showed the order to represent the data referred to multiple relations. The effectiveness of accurate classification. Klippel et al. [8] research of Wang et al. [4] shows that, this method can be proposed that the best visual representations for a data set classified of data efficiently. In previous works [6], we had presented are: how to assign variables to rays and to add color analyzed and computed the correlation of document contents to rays of a star plot graph. The results shown that the star plot by DDC-MR method [3]. It showed that position could refer to graphs were meaningful; the represented data and star plot the relationship of multiple variables effectively, so this paper enhanced color had positive effects on the processing speed. we used the sum of vector method to represent the multi Peng et al. [9] presented a new method for shape variances as shown in Figure 1. representation by converting the CSS descriptor circular vector map and defining two histograms in polar coordinate system. The advantages of their proposed are simplicity, execution speed and efficiency of well in clustering the shape images. Sukmar et al. [12] presented the construction of a new polygonal interpolant that was based on the concept of natural neighbors. They used technique to adapt the above construction on polygonal elements to quad tree meshes to obtain Co() admissible approximations along edges with “hanging nodes.” Mohseni et al. [13] presented a method for treating the coordinate singularity whereby singular coordinates were redefined. Thus, the results showed the new O pole treatment giving spectral convergence and more accurate for all. Demsar et al. [14] presented a new method for visualization “FreeViz”. The results showed that the FreeViz P was very fast and can presented high quality with clear class separation. From the researches above, the most effective technique to present data was a compute of the relationships and presented a new method for intelligent visualization [4], [5], [9], [12], [14], [15], [16], [17] of data sets. In this paper, we also applied the star graph and polar coordinates to improve the Figure 1. Example data with multi variances, where n is the number of classification correlation and presented the position of data. variance, ri is relationships between rectangular and polar Coordinates (r, θ). Since a normal plane cannot explain correlations of that calculated position as a result of the starting point originated In Figure 1 we show example data with multi variances, let from variables with multiple correlations. Below are theories rij denote the distance of a point P from the origin and the of related works with techniques coming from these diverse symbol O is the data length. The shade means area in the fields. computed appropriate position of multiple variable, let = angle between the radial line for P to O and the given line “ A. Star graph = 0”, a kind of positive axis for our polar coordinate system The star graph (can call radar graph or spider graph) is a and R is the distance from the point P to the origin. Polar technique used to represent graphical data analysis with all coordinates are defined in terms of ordinary Cartesian variables in multivariate data sets. It consists of a different coordinates by computing and connecting the n points Pij, for i = 1,…,n. It is calculated by using equation as follows: 62 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 countries. The Dewey decimal system divides the knowledge x ij r cos i into 10 classes, in each class it is divided into 10 sub-classes Pij (1) and in each sub-class it is divided into 10 divisions y ij r sin i accordingly. By using numbers as symbols with the purpose of easy to remember, it is popular to use with more than 30 where r ≥ 0 0 ≤ < 2, that every point P(xij, yij) in the languages translation around the world. ordinary xy–plane (correlation plane) can be rewrite to (r, )- answer which is, is a result of the fact of P lines on the E. Dewey decimal classification – Multiple relaitons circumference. Dewey decimal classification – Multiple Relations or we From these multiple relations, we called correlation of data call DDC-MR. It is a technical analysis classification multiple on the coordinates of our point P satisfy the relation xij2 + yij2 relations which was developed between Search engine and = rij2 (cos2i + sin2i) xij2 + yij2 = rij2 (so that, as we Dewey decimals classification. It focuses on the analysis of indicated, the point P(xij, yij) and (cos2i + sin2i) = 1) is on a proportion in the content [3]. By using the library standard circle of radius r centered at O). So, we can find by solving classification schemes, one keyword will be able to classify as the equation as: deep as 4 levels which assigns number for notation [6], [7]. This scheme refers to DDC that does divide human knowledge yij y into 10 classes in the first level, 100 subclasses in the second tan i i arctan ij , (2) level, 1000 divisions in the third level and the last level or leaf xij x ij node contains more than 10000 sections. where in the interval 0 ≤ < 2, let arctangent denoting the function by arctan we see that: III. DEFINITION TO CORRELATION Our study of implementing methods is to study of correlation deformation connected by related radar graphs, and y subsequently replaced by polar coordinates. One main concern y arctan x if , 2 2 of the study of implementing methods is to consider the i arctan ij (3) x 3 shapes, quantities of content correlations, distances, arctan if y ij , correlation positions and directions of determined coordinates. x 2 2 Thus, in this research, the authors provide definitions for the purpose of comparing correlations before and after with the interpretation that = ±/2 corresponds to points deformation and identifying advantages and implementing on the real y–axis and = 0 corresponds to points on the real methods. For instance, a document pertaining to many x–axis, that we called correlation plot. sciences, when examined to find out whether it is a suitable representative of documents, has to be adapted so that the plot position is found and the plot of intrinsic correlation on the C. The distance between plots plane and boundary is consistent with that correlation. As We can use the theoretical Pythagorus1 method to compute such, a normal plane cannot explain correlations of that the distance between points in the plane in order to find the calculated position because the starting point originates from distance d. In Data mining we call Centroid [7] to calculate variables with multiple correlations. Below are definitions of using equation as follows: keywords. 1 n di c vi n i 1 (4) A. Correlation plot A correlation plot indicates a position of coordinates derived from a calculation of combined values of every where C is the centroid or the correlation plot (xCoordinate, correlation so that one position on the same area is identified. yCoordinate), Vi is coordinates in the circumference (xi, yi), and The point resulting from that calculation is titled in this |C - Vi| is the distance between plots with the coordinates of i research as a correlation plot, which is used to show or in the circumference, we see that: represent a position of each data set on the correlation plane referring to any correlation with n relevant contents. Correlations can be demonstrated in pairs (r, ), where the c vi xcoordinate xi 2 ycoordinate yi 2 (5) first pair refers to only one plot and represents only one data set of distance and directional correlations of variables on polar coordinates. For example, one document containing a D. Dewey decimal classification number of related contents is represented by n axis (with Dewey decimal classification was developed by Melvil results shown in the form of a radar graph), and then Dewey in 1876. It is widely used in the library. Besides, there calculated by mutual tension. Consequently, one plot in the are many kinds of the books which unlimited of any field. form of (r,) was acquired as seen in Figure 1. That is the system used in more than 200,000 libraries in 135 1 http://en.wikipedia.org/wiki/Pythagorean_theorem 63 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 B. Correlation plane Figure 2 shows examples of determination of correlation A correlation plane indicates the area where coordinates plots on correlation planes, where correlation boundary in derived from calculated correlation points of data are located. class X1,…, n means the range of correlation boundary in each The points require locations and addresses so normal planes science. From the above example, n refers to 10 sciences, with cannot be applied in this research. The number of occurring the first science referring to a general class that has the correlations results from variables with multiple correlations. correlation boundary of 0 – 35. The second science refers to Therefore, the calculated values of pairs were not solely data a philosophy class with the correlation boundary of 36 – 71. derived from (x, y) axes, but also data resulted from tension The third science refers to a religion class with the correlation among n axes that divided the plane within polar coordinates. boundary of 72 – 107while the fourth science refers to a In this research, the plane is called a correlation plane which social sciences class with the correlation boundary of 108 – is essential to distances and directional correlations especially 143. The fifth science refers to a language class with the loadings and depth directions. The intersection of n axis is correlation boundary of 144 – 179and the sixth science called the origin and intersecting n axes divide the plane into n refers to a pure science and mathematics class with the parts. Each part is called a correlation boundary. correlation boundary of 180 – 215. The seventh science refers to a technology and applied science class with the C. Correlation boundary correlation boundary of 216 – 251 while the eight science refers to the arts and recreation class with the correlation A correlation boundary indicates angle values from lines boundary of 252 – 287. The next science is a literature class appearing on a correlation plane by determining the boundary of measurements of angles between x axis of the correlation with the correlation boundary of 288 – 323. And the last plane and lines appearing on the plane. Boundaries are divided science is a history and geography class with the correlation according to categories of applications. In this research, a boundary of 324 – 360, respectively by Dewey decimal correlation boundary is used to determine the correlation area classification (DDC). and the content correlation level of each category. The area Positions of occurring points, or correlation plots, can be which is close to the center (O = Origin) represents low employed to refer to variables with n correlations. Each density of the content of that category while the area which is correlation differs in quantity and direction leading to different far from the center represents high density of the content of distances between coordinates on the correlation plane and the that category or specificity highly consistent to that particular origin. Therefore, in accordance with the DDC classification category. This is applicable for categorization of correlations of books, a widely practiced technique among libraries, if each with DDC-MR [3], [6]. For example, in order to divide the plot is replaced by a set of books, the calculated correlation correlation boundary into 10 main scientific categories, each plot will be replaced by related contents of books, and the science has the width of 36 and the first correlation boundary correlation plane will be replaced by areas of correlation of starts from 0. Then, a counterclockwise rotation was done in scientific content structure respectively. Dense plots are lines order to divide sessions and determine the correlation appearing in the direction with correlations within the boundary of the subsequent categories starting at 36, 72, correlation boundary. The plot which is very far from the 108, 144, 180, 216, 252, 288 and 324, respectively, as center means that a book containing very specific and in-depth shown in Figure 2. contents of that science. Since force loading and directions of variables are highly related and the plot which is very close to the center also means that the book is specific to that science, but does not have contents related to many sciences, as seen in Figure 2 (#1 and #2), if the loading and direction in each science are highly related in terms of proportion, that book will have contents related to many sciences. Additionally, redundant plots will bring about a different pattern of correlations of books with related contents. We, then, realize which books have the same content and what kind of content they have. It is possible to state that correlation plots, correlation planes and correlation boundaries have continuously related meanings and are major basic elements of application of multiple correlations. IV. CONCEPT TO APPLICATION A. Conceptual The concept of calculating variables that have correlation Figure 2. Example data for correlation plot refer to the distribution of content is an analyzing technique developed from a mathematic related and the deep specific of content in the correlation plane. method: a proper representative of data is identified by calculating total tension values and presenting them in the 64 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 form of polar coordinates of correlation. The objectives of this of the data will be changed along with the number of sections technique are to demonstrate similarity of data in the same of competency. If the size of competency sections is 10 of area and explicitly explain levels of relations that, the process DDC-MR, then the same size of boundary can be applied, but as follows in Figure 3. if the sections decrease or increase in size, the degree size of the applied boundary will change accordingly. Correlation plots Radar graph Sum vectors V. EXPERIMENTS AND RESULTS This section performance of correlation plot, correlation plane and correlation boundary are shown three ways. The first way is complexity of correlation. It is used to explain the Correlation planes multi variances with multiple relations; if these correlations are high performance they should represent the different data (r,) Polar coordinates in that correlation plane. The second way is accuracy of classifying and analyzing with the different multiple relations; we test correctness by articles, documents library and competency data. And the last way is features to use these Correlation plots correlations classification. Divide the degree Cluster A. Data sets In our experiments, we used a collection of multiple Figure 3. The concept of Correlation plots, Correlation plane and Correlation relations of data from three examples given below. boundary. Academic articles: This data from a national conference Changing all relations of variables to correlation plots is disciplines in the computer and information technology which a process of summing vectors, where all classified correlations were published during 2005 to 2010, and we provided the that can be clearly seen on a radar graph of one document are dataset used 3 sections: Title, Abstract and Keyword. This data has multiple relations by DDC-MR in level 3 of DDC to calculated so that one plot with the pair value of (r, ) is classify 1,000 classes. The total number of articles is 700. acquired and represents all relations of that document. Documents library: This data from the document library Locating the position of a document with correlation in the multidisciplinary amounting to 100 documents and we plane, as seen in the above process, yields a pair value of (r, ) provided the data set used 3 sections: Title, Table of content that represents the document. The pair value is then plotted, and Index. Each document contains multiple relations links to using the principle of polar coordinate determining the plane other content which are related to the document. and (x, y) axes instead of applying its value only. Therefore, if we want to present several documents simultaneously, we Competency data: This data from evaluate 10 out of 18 have to have a number of axes to indicate the position of each principles of competencies evaluation Spencer [10], [11], to document and determine a correlation plane so that all select personnel basic competencies. There are: Achievement documents can be at their (r, ) values on the determined plane orientation, Analytical thinking, Conceptual thinking, in that particular area. These way data sets are overlapped and Customer service orientation, Developing others, Impact and not presented one by one. As such, no matter how the (r,) influence, Information seeking, Teamwork and cooperation, value is calculated, that document will always be on that axis. Team leadership and Self-confidence. Identifying the boundary section of an area is a process of grouping correlation planes used to indicate the position of B. Experiments each document and overlapping a number of polar coordinates We used the correlation plot, correlation plane and so that several unseen axes are produced. Therefore, to correlation boundary provided by the multiple relations of categorize that data sets or document in a clear manner, the multi variances to computed our experiments. The correlation correlation boundary of those axes needs to be determined in plot is a coordinates from the computed of all relationships, accordance with the number of sections of the sample group the correlation plane is an area coordinates arising from correlation plot and the correlation boundary is a range B. Analysis between the degrees of set. In this experiment, we applied the radar graph provided under the correlation and set the number If we use this method to analyze and categorize data, as of academic articles to 1,000 classes, set the documents library seen in Section 3 with examples, and use the boundary on to 100 classes and set the number of competency data to 10 DDC-MR to categorize the data on the correlation plane, the classes. For text classification process of academic articles and document analyzed by the DDC-MR process will be able to documents library, we used DDC-MR and competency data locate that position. The correlation boundary will be 10 from analytical to perform the experiments. groups in the DDC main section. However, if we apply this concept and plot the boundary by competency, the boundary 65 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 C. Evaluation Metrics accuracy such as: C1 = 97.66%, C2 = 93.75%, C3 = 93.75%, The standard performance metrics for evaluation the C4 = 92.97%, C5 = 92.19%, C6 = 86.72%, C7 = 88.50%, C8 classification used in the experiments is accuracy. These = 93.75%, C9 = 97.66 and C10 = 98.44%, which we metrics assume the prediction process evaluation based on the considering all clusters. The accuracy of correlation plot is counts of test records correctly and incorrectly predicted. That close to K-Mean clustering. It means both of method can be shows the confusion matrix for a binary classification used to classify data in this research. If we compared in all problem. Each entry fij in the confusion matrix denotes the clusters, we find out that K-Mean clustering had problem in 2 number of records from class i predicted to be of class j, which clusters; that were C5 and C6, The accuracy was less than is defined as follows: 80% while the accuracy correlation plot method were more than 80% in all clusters. Furthermore, the accuracy of Hierarchical clustering and Factor analyses were similar and the number of correctly classified test documents (6) while accuracy of some clusters was less than 80%. This Accuracy total number of test documents means the effectiveness of cluster is low, as shown in Figure 4. where Accuracy represents the percentage of correct predictions in total predictions. We used Accuracy as our choice of evaluation metric to report prediction experiments because it can be expressed in terms and most classification seek models that attain the highest accuracy when applied to the test set. In our previous experiments we have seen that Accuracy provide the experimental schemes in terms of prediction performance. D. Experimental results The innovation of our paper is presented how to analyze multi variants data by plot in the correlation plane. In this research, the experimental results of document relevant focus on three issues such as: accuracy to classification, time in process and the features interesting. Figure 4. Comparison the accuracy of four classifications. Accuracy: The experimental results on accuracy classification by comparing between Correlation plot, K-Mean Time: The experimental results of time in process shows clustering, Hierarchical clustering and Factor analysis. We that, if we applied to the Polar coordinate, the correlation plot found out that, the correlation plot method showed that the methods had best effective use when comparing to other accuracy is higher than other methods. As shown in Table 1. methods. As shown in Table 2. TABLE I. COMPARISON THE ACCURACY OF CLASSIFICATION RESULTS TABLE II. TIME COMPARISON RESULTS IN EACH CLASS. Accuracy (%) Time (Second) Model Model Correlation Factor Correlation Factor K-Mean Hierarchy K-Mean Hierarchy plot Analysis plot Analysis 97.66 95.31 97.66 92.97 100 0.0078 0.0262 0.1094 0.0469 C1 93.75 97.66 97.66 93.75 200 0.0352 0.0625 0.168 0.0859 C2 300 0.0859 0.1055 0.1445 0.1953 C3 93.75 90.63 86.72 87.50 92.97 92.97 92.19 86.72 400 0.0664 0.1016 0.1953 0.2500 C4 92.19 78.91 64.06 59.38 500 0.0977 0.1133 0.1836 0.4258 C5 86.72 79.69 67.19 72.88 600 0.1029 0.1159 0.2013 0.4369 C6 700 0.1033 0.1309 0.2145 0.5012 C7 88.50 88.28 85.94 70.31 C8 93.75 85.16 93.75 92.19 97.66 95.31 91.41 92.97 Table 2 shows experimental results comparing the time in C9 processing with 4 methods. We test with difference 7 data C10 98.44 97.66 96.86 97.66 sets size of data. The first data sets had 100 documents, the second had 200 documents, the third had 300 documents, the Table 1 shows experimental results comparing the fourth had 400 documents, the fifth had 500 documents, the accuracy of 10 clusters describing how the Correlation plot sixth had 600 documents and the last data sets had 700 method had the best accuracy. Thus, if we compared with documents. The results described correlation plot used the other methods, almost every parameter has a very high less time or minimum in processing. Thus, if we analysis with 66 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 the 7 data sets using similar or nearest time in processing, the TABLE IV. FEATURES COMPARISION RESULTS ON CLASSIFICATION. increasing of the data amount does not effect to time in Model processing because this method does not need to recalculate every time whenever adding new information. Thus, original Correlatio Hierarchy K-Mean analysis Factor n plot data will remain the same position and same cluster. Factor Features analysis shows the results effect with time in processing, if we change the amount of data it will spend more time, as shown in Figure 5. 1. Easy to understand 2. Segmentation is clear 3. To break the color of the group 4. Display hierarchical data 5. Display depth of information 6. Display specific information on each group 7. Display the direction and distance 8. Display data in multiple groups simultaneously 9. The ability to compare data 10. Do not adjust the scale display 11. Do constants in the group 12. The amount of data does not affect Figure 5. Time comparison of four classifications. to the process 13. Time in process < 15 second* 14. Accuracy > 85% * Table 3 shows experimental results comparing accuracy * From evaluation results in Table 3 and time in processing between Correlation plot with K-Mean clustering, Hierarchical clustering and Factor analysis. This is to test performance cluster with 700 articles in the first time Features: From the experimental results and analysis and we repeated the second time with 100 documents library. with the performance of classification shows that, if we apply The results shown that, correlation plot were the most data with the Polar coordinate it will increase ability of accuracy in academic article which was 93.54% while classification and will get more features interesting. From document library was 90.33%. The process in minimum for table 4, we used the same data set compared with the elements academic article was 0.1033 and document library was and dominant features of correlation plot method effected to 0.0452. In this research, the value of correlation plot was more classification interesting which includes 14 features as similar with K-Mean clustering in accuracy and time in follows: 1) Easy to understand when we used it in the first processing, this means that the two methods can be used to time. 2) Segmentation is clear because we had the line classification in this data sets. Furthermore, Factor analysis segment. 3) To break the color of the group for more separated had the lowest accuracy in academic article which was data. 4) Display hierarchical of data in each cluster because it 84.63% and document library which was 78.54%. They are had radius of the circle line to compute. 5) Display depth of lower than the statistical acceptance and lower with the information to relevant in each cluster and could refer to the criteria by the researcher which the accuracy must be greater distribution of content related. 6) Display specific information than 80%. In addition, it used the most of time in processing on each group, if we plot near the center, it means that the academic article which was 0.5012 and document library content is similar with the other cluster but, if we plot far from which was 0.1959. They are over than the time criteria that the center, it means that the content is more specific in the must be less than 15 seconds. cluster. 7) Display the direction and distance in each cluster, if we know that we can predict the road map and fulfill the knowledge in each cluster. 8) Display data in multiple groups TABLE III. EVALUATION RESULTS ON CLASSIFICATION simultaneously. It means that the ability to display more than in one cluster in the same time such as: 4 clusters or 10 Accuracy (%) Time (second) clusters depending on the desired number of clusters. 9) The Model Article Doc. Article Doc. ability to compare data with pie charts is useful for comparing (700) (100) (700) (100) the contributions of different clusters to the total value of a variable. We can also compare two identical nominal data sets Correlation plot 93.54 90.33 0.1033 0.0452 under different conditions, especially at different time. 10) Do K-Mean 90.16 88.63 0.1309 0.0959 not adjust the scale display when we show all information Hierarchy 87.34 85.21 0.2145 0.1594 because in some visualization [7] we need for larger visualize to clearly information. 11) Do constants in the group mean Factor Analysis 84.63 78.54 0.5012 0.1959 that if we added new data, the original data plot is in the same cluster and does not compute to new cluster. This method was 67 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 difference with other methods because every parameter had 4) Advantages of data mining: previous research placed affected to the classification. 12) The amount of data does not emphasis on sorting data or categorizing relevant data into the affect to the process. From the results in Figure 5 it can be same group and paid no attention to relations of data contents. proved that the number of data does not a problem for this Correlation application, apart from being able to precisely and method. 13) Time in process < 15 second*, and 14) Accuracy rapidly sort data (referring to test results in Table 3), can > 85% * from the evaluation results in Table 3 shows the explain appearing relations of information at the content level performance of the experimental. and classify levels of relations of information within a group in a clear manner. VI. CONCLUSION REFERENCES In an attempt to improve the performance of correlation [1] “Data Analysis: Displaying Data – Graphs – 1”, Texas State Auditor’s plot, correlation plane and correlation boundary, we propose Office, Methodology Manual, rev. 5/95, an innovative method of multiple correlation application, a http://www.preciousheart.net/chaplaincy/Auditor_Manual/11graphd.pdf [2] S. T. Yen, “Using Radar chart to display Clinical data”. method of polar coordinate application so that data are [3] W. Lertmahakiat and A. Mingkhwan. “Information Retrieval by categorized in an effective manner regardless of their type and Multiple DDC Relational Classification,” in Proceeding on NCCIT’08, quantity in the sample group. With correlation plot, correlation the 4th National Conference on Computing and Information Technology. plane and correlation boundary, we can present data that are Thailand. 22 – 23 May 2008. representatives of that relation: they are applied to identify [4] J. Wang, J. Li and W. Hong, “Feature Extraction and Classification for Graphical Representations of Data,” Springer-Verlag Berlin Heidelberg, relations of data and categorize data in many aspects. vol. 5226, pp. 506 – 513, 2008. According to a test, it is found out that if we use boundary on [5] J. Wang, W. Hong and X. Li, “The New Graphical Features of Star Plot DDC-MR to categorize a data sets, it could be categorized into for K Nearest Neighbor Classifier,” Springer-Verlag Berlin Heidelberg, 10 groups of the DDC main section. However, if this concept vol. 4682/2007, pp. 926 – 933, July 2007. is applied to other data sets, the degree of boundary will be [6] J. Watthananon and A. Mingkhwan, “Multiple Relation Knowledge Mapping Using Rectangular and Polar Coordinates,” in Proceeding on changed along with the number of sections of those data sets. PGNET’09, the 10th Annual Postgraduate Symposium on The The authors believe that correlation application can give Convergence of Telecommunications, Networking and Broadcasting, practical explanation on analysis and data presentation to users Liverpool John Moores University, UK. 21 – 22 June 2009. as follows. [7] J. Watthananon and A. Mingkhwan. “A Magnify Classification technique for group of Knowledge Using DDC-MR,” in Proceeding on 1) Advantages to users: correlation application is clear, NCCIT’09, the 5th National Conference on Computing and Information comprehensive and reliable. It is a data presentation method in Technology. Thailand. 22 – 23 May 2009. the form of bar graphs with explicit lines dividing boundaries [8] A. Klippel, F. Hardisty, R. Li and C. Weaver, “Color Enhanced Star Plot Glyphs Can Salient Shape Characteristics be Overcom?,” Journal of of each group. Different colors can be applied to different bars Cartographica, 44:3, pp. 217 - 231, 2009. for the aesthetic and distinct purposes. With correlation [9] J. Peng, W. Yang and Y. Li, “Shape Classification Based on Histogram application, problems of various levels of users’ basic Representation in Curvature Scale Space,” IEEE Transaction on knowledge can be reduced because it is a familiar presentation Computational Intelligence and Security, vol. 2, pp. 1722 – 1725, 2006. method widely used in data analysis explanation. [10] L. M., Spencer, S. M., Spencer. “Competence at Work – Models for Superior Performance,” Wiley, New York, NY, 1993. 2) Advantages of knowledge management in organizations [11] P. Chintanaporn and A. Mingkhwan, “A Study of Information Retrieval with an emphasis on enhancement of knowledge storage Algorithm for Multiple Relation Competency Team Recruitment,” in capacity: most organizations focus on storing, collecting, Proceeding on PGNET’09, the 10th Annual Postgraduate Symposium on The Convergence of Telecommuni-cations, Networking and exchanging, transferring and publishing data and do not Broadcasting, Liverpool John Moores University, UK. 21 – 22 June consider how those data are related. Hence, in order to 2009. optimize organizations’ benefits, relations of those data should [12] N. Sukumar and A. Tabarraei, “Polygonal Interpolants: Construction and be identified. This correlation method can help analyze and Adaptive Computations on Quadtree Meshes,” in Proceeding on present different levels of data with related contents so ECCOMAS 2004, July 2004, pp. 1 – 9. [13] K. Mohseni and T. Colonius, “Numerical Treatment of Polar Coordinate knowledge. It can be implemented in a useful way. If we can Signularities,” Journal of Computational Physics, vol. 157, pp. 787 – effectively explain data, those data will support our work and 795, 2000. foster ongoing changes. [14] J. Demsar, G. Leban and B. Zupan, “FreeViz-An intelligent multivariate visualization approach to explorative analysis of biomedical data,” 3) Advantages of integration: different kinds of Journal of Biomedical Informations, vol. 40, pp. 661 – 671, 2007. information that is related or has the same content are applied [15] M. Meyer, H. Lee, A. Barr and M. Desbrun, “Generalized Barycentric and transformed into new knowledge that is easy for users to Coordinates on Irregular Polygons,” Journal of Graphics Tools, vol. 7, understand and implement. The authors believe that pp. 13 – 22, November 2002. [16] A. Klippel, F. Hardisty, and C. Weaver, “Star Plots: How Shape correlation application can help synthesize, analyze and Characteristics Influence Classification Tasks”. Cartography and explain connections of information contents. For example, Geographic Information Science, 36:2, pp. 149 - 163, 2009. relations of different courses can by identified by correlation [17] S. N. Patnaik and D. A. Hopkins, “Completed Beltrami-Michell application: courses that are close together, overlapped or formulation in Polar coordinates,” International Journal of Physical missing can be replaced by one another because of their high Sciences, vol. 2 (5), pp. 128 – 139, May 2007. similarity, or courses can be combined into one new course relevant to former basic courses. 68 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 AUTHORS PROFILE Julaluk Watthananon received B.Sc. degree in Information System from Rajamangala University of Technology Thanyaburi, M.S. degree in Information Technology from King Mongkut’s University of Technology North Bangkok. Currently, she is a Ph.D. candidate in the Facullty of Information Technology at King Mongkut’s University of Technology North Bangkok. Her current research interests Knowledge Management, Data Mining and Information Retrieval. Sageemas Na Wichian received Ph. D. in Educational Research Methodology at Chulalongkorn University. She has been an educator for over ten years, teaching in universities in the areas of diversity, Psychology and Research for Information Technology. She is currently working as a lecturer at King Mongkut’s University of Technology North Bangkok, Thailand. Her research focuses is on advanced research methodology, Industrial and Organizational psychology. Anirach Mingkhwan received B.Sc. degree in Comupter Science from King Mongkut’s University of Technology North Bangkok, M.Sc. degree in Computer and Information Technology from King Mongkut’s Institute of Technology Ladkrabang and Ph.D. degree in Computer Network from Liverpool John Moores University. His current research interests Computer Network and Information Security, Wireless and Mobile Ad Hoc Networks, Knowledge management and Ontology. Currently, he is an Assistant Professor in the Faculty of Industrial and Tecnology Management at King Mongkut’s University of Technology North Bangkok, Thailand. 69 http://sites.google.com/site/ijcsis/ ISSN 1947-5500