Clustering of Engineering Materials Data Sets Using Fuzzy System by editorijettcs


More Info
									    International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

        Clustering of Engineering Materials Data Sets
                     Using Fuzzy System
                                        Sarakutty.T.K1, Dr.M.Hanumanthappa2
                                             Department of Computer Science & Applications,
                                               Dayananda Sagar College, Bangalore, India
                                             Department of Computer Science & Applications,
                                                 Bangalore University, Bangalore, India

                                                                   knowledge of the fact that distortion is likely to occur in a
Abstract:      Data mining enables efficient knowledge             part when it is heat treated under certain conditions is
extraction from large datasets, in order to discover hidden or     useful, in selecting parameters so as to minimize
non-obvious patterns in data. Clustering of engineering            distortion in an industrial heat treatment process. This in
material data sets deals with the systematic categorization of     turn helps to optimize processes and make better products
materials based on distinguished characteristics as well as        hence improving business by satisfying customers. Thus
criteria. Material informatics deals with real world material      on the whole, E-business is promoted by facilitating
data sets with high dimensionality and complex structure.
                                                                   worldwide exchange of knowledge useful in the domain
Fuzzy approaches can play an important role in data mining,
because they can deal with complex high dimension data and
                                                                   for supporting various aspects of decision support [2].
is capable of producing comprehensible results. Fuzzy              Coupling of computational material science and
clustering method is used to cluster the materials data set        informatics is essential in order to
based on their similarities and performance. The knowledge               Accelerate insertion of materials into engineering
extracted from the engineering material data sets is proposed                systems
for effective decision making in advanced engineering                    Establishment of new structure, property
materials design applications.                                               correlations among large, heterogeneous and
                                                                             distributed data sets
Keywords: Data mining,              Material       Informatics,          Discovery of new chemistries and compounds
Clustering, Fuzzy C-Means                                                Formulation and / or refinement of new theories
                                                                             for materials behavior
1. INTRODUCTION                                                          Rapid identification of critical data and theoretical
Materials play an important role in the construction and                     needs for future problems
manufacturing of equipment/tools, transportation,                  The research areas of materials informatics are mainly
housing, clothing, communication, recreation and food              focused on following tasks - Data standards, Organization
production.       Historically, the development and                and management of material data and data mining on
advancement of societies have been intimately tied to the          materials data [3],[4]. Materials informatics is very likely
member’s ability to produce and manipulate materials to            to become a major force because of enormous
fill their needs.                                                  improvements in efficiency and capabilities in
During the last decades many new materials and material            computational methods for materials and the recent
types have been developed. At present of the order of              progress in data mining techniques.
100000 engineering materials exist. In addition many               The research is aimed to establish if data mining
materials have successively obtained improved properties.          techniques can be used to assist in the clustering of
This has been possible not only due to the development of          materials by finding the meaningful patterns that exist
the materials but also due to the appearance of new                across various materials. The materials are clustered
production methods. As a consequence of this rapid                 based on their properties. The resulting clusters, and the
development many material types can be used for a given            classifications that can be developed from them, depend
component. Computational tools assist in making                    on the selected attributes and to some extent on the
decisions by analyzing the data, and discovering useful            method of clustering. Grouping materials allows a
patterns for predicting future trends. In the Materials            designer to assess the similarity of two materials,
Science domain it is imperative to connect materials               stimulating innovation and suggesting substitutions. The
suppliers, automobile companies, heat treatment                    knowledge extracted from this is proposed for effective
industries, universities, researchers, aerospace agencies,         decision making in advanced engineering materials
manufacturing companies and other users [1]. Exchange              design applications.
of knowledge among these users enables them to make
faster and more effective decisions. For example, prior

Volume 1, Issue 3 September-October 2012                                                                              Page 18
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

2. LITERATURE SURVEY                                           Naïve Bayesian classification algorithm [5] is used to
                                                               classify engineering materials data sets consisting of only
Materials informatics has been a subject of materials
                                                               categorical attribute values. Here we are using fuzzy
science, since the international conference of Materials
                                                               system to classify engineering materials data sets
Informatics [5]. It is a new subject that leverages
                                                               consisting of both numerical and categorical attribute
information technology and computer network technology
                                                               values. When we consider both numerical and categorical
to represent, parse, store, manage and analyze the
                                                               attribute values it is possible for us to have higher
material data, in order to realize the sharing and
                                                               classification accuracy since many of the material
knowledge mining of materials data for uncovering the
                                                               properties are expressed numerically. This technique
essence of materials, and accelerate the new material
                                                               reduces complexity and helps expose hidden order and
discovery and design[5].
                                                               deeply buried patterns in data.

  2.1 Materials informatics                                    3. PROPOSED METHOD
Data quality plays a central role for compiling valid and      Clustering is an unsupervised learning method used to
reliable plans to make the right decisions. At the same        find a structure in a collection of unlabeled data.
time, it is acknowledged that planning processes are both      Clustering is the process of organizing objects into groups
data and knowledge intensive and characterized by the          whose members are similar in some way. A cluster is
human-computer interface. Informatics is a science where       therefore a collection of objects which are similar between
a new knowledge system is built up by collecting and           them and are dissimilar to the objects belonging to other
classifying information using computers and networks. It       clusters. Clustering of data is a method by which large
is the integration of computer science, information            sets of data are grouped into clusters of smaller sets of
science, and some domain area to provide new                   similar data. Clustering is used to quickly and easily seed
understandings and to facilitate knowledge discovery [6].      the process of taxonomy generation. It provides a way of
Materials informatics can be thought of as a tool for          understanding how attributes of high dimensional data
material scientists to gain new understandings of their        are organized and related.
data through the use of a myriad of machine learning           Clustering and fuzzy logic together provide simple
approaches, integrated with new visualization schemes,         powerful techniques to model complex systems. Fuzzy
more human-like interactions with the data, and guided         clustering provides a robust and resilient method of
by domain experts. It can also accelerate the research         classifying collections of data elements by allowing the
process and minimize data handling. All of this is fuelled     same data point to reside in multiple clusters with
by the unprecedented growth in the field of information        different degrees of membership. Interpretations of
technology and is driving the interest in the application of   membership degrees include similarity, preference, and
knowledge representation, knowledge discovery, machine         uncertainty. In contrast to classical set theory, in which
learning, information retrieval, semantic technology etc.      an object or a case either is a member of a given set
[7].                                                           defined by some property or not, fuzzy set theory makes it
The main issues to be addressed regarding the                  possible that an object or a case belongs to a set only to a
development of materials informatics are                       certain degree. Using fuzzy clustering it is possible to
       Redefinition of database formats, aiming at            state how similar an object or case is to a prototypical
          improved data sharing                                one, it can indicate preferences between suboptimal
       Database networking and the development of             solutions to a problem, or it can model uncertainty about
          software for data sharing                            the true situation, if this situation is described in
       Development of data analysis software and              imprecise terms. In general, due to their closeness to
          visualization software                               human reasoning, solutions obtained using fuzzy
       Development of software for data mining from           approaches are easy to understand and to apply. Due to
          databases                                            these strengths, fuzzy systems are the method of choice, if
       Prediction of new functions by the combination of      linguistic, vague, or imprecise information has to be
          data mining and computation science.                 modeled. There are many different clustering algorithms
      Standardization of platforms that integrate all         that could be used, and we have relied on the Fuzzy C-
         these factors [6].                                    Means algorithm, because it is fast and straightforward.
                                                               Fuzzy C-Means is a data clustering technique in which a
   2.2 Previous Work
                                                               dataset is grouped into n clusters with every data point in
A comparative study of different classification algorithms     the dataset belonging to every cluster to a certain degree.
is present in [8] and Fuzzy C-Means algorithm performs         For example, a certain data point that lies close to the
well on unsupervised data with uncertainty. Cluster            center of a cluster will have a high degree of belonging or
analysis [9] is used as an analytical tool to materials        membership to that cluster and another data point that
design to cluster materials and the processes that shape       lies far away from the center of a cluster will have a low
them, using their attributes as indicators of relationship.    degree of belonging or membership to that cluster. In our

Volume 1, Issue 3 September-October 2012                                                                         Page 19
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

study, Fuzzy C-Means algorithm is used to deal                       4. EXPERIMENTAL SETUP & RESULTS
unsupervised data with uncertainty. The goal of Fuzzy C-
                                                                     Materials database is organized from popular materials
Means algorithm is to group the objects into clusters
                                                                     website [11] and from peer reviewed research papers
based only on their observable features such that each
                                                                     published. The atomic and electronic structure of the
cluster contains objects that share some important
                                                                     material determines its properties. A typical set of
properties [8].
                                                                     training sample data set is shown in table 1 which
Fuzzy C-Means algorithm used in the proposed model for
                                                                     contains the properties of metal with respect to steel like
clustering engineering materials is given below.
                                                                     specific gravity, young’s modulus, thermal conductivity,
Algorithm: Fuzzy C-Means
                                                                     linear expansion coefficient, melting point and electrical
Input: Data - data set to be clustered; each row is a
                                                                     resistivity. The properties are assumed at 20Deg. C.
                    sample data point
                                                                                     Table 1: Material Properties
          Cluster n - number of clusters (greater than one)
Output: Center coordinates of final cluster centers
          Obj_fcn - values of the objective function during
Let X = {x1 , x2, x3 ..., xn} be the set of data points and
V = {v1 , v2, v3 ..., vc} be the set of centers.
     1) Randomly select ‘c’ cluster centers.
     2) Calculate the fuzzy membership 'µij' using
     ij  1 / k 1 (d ij / d ik ) ( 2 / m1)                 (1)
                         Compute the fuzzy centers 'vj' using
            n                  n
   V j  ( ( ij ) m xi ) /( (ij ) m ), j  1,2,........
                                                           c   (2)
           i 1               i 1

    3) Repeat step 2) and 3) until the minimum 'J' value
        is achieved or ||U (k+1) - U (k) || < β.
        k is the iteration step.
        β is the termination criterion between [0, 1].
        U = (µij)n*c is the fuzzy membership matrix.
        J is the objective function, which is to minimize
                          n     c                  2
           J (U , V )   ( ij ) m xi  v j                  (3)
                         i 1 j 1

          where, ||xi – vj|| is the Euclidean distance
          between ith data and jth cluster center [10].
A block diagram summarizing FCM clustering algorithm
is given in figure 1.

                                                                     Material property charts are two-dimensional plots using
                                                                     pairs of material properties as the variables. The idea of
                                                                     seeking clusters in two dimensions is to plot the two
                                                                     variables as if they were x, y coordinates. Material 1
                                                                     appears as the point x=X11, y=Y11[9]. Figure 2 shows a
                                                                     cluster diagram using the values of two technical
                                                                     attributes, specific gravity and young’s modulus for
                                                                     metals selected with respect to steel.

 Figure 1: Block diagram summarizing FCM Clustering

Volume 1, Issue 3 September-October 2012                                                                              Page 20
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

                                                                [1] Begley E.F, “National Institute of Standards and
                                                                      Technology Report”, USA, Jan 2003.
                                                                [2] Aparna S. Varde, Makiko Takahashi, Elke A.
                                                                     Rundensteiner, Matthew O. Ward, Mohammed
                                                                     Maniruzzaman and Richard D. Sisson, “Apriori
                                                                     Algorithm and Game-of-Life for Predictive Analysis
                                                                     in Materials Science”
                                                                [3] Rajan, "Informatics and Integrated Computational
                                                                     Materials Engineering: Part II”, JOM, Vol. 61, pp.
                                                                     47-47, 2009.
                                                                [4] Wei,Q.Y., Peng,X. D., Liu, X.G., Xie,W.D .: ,(2006)
                                                                     "Materials informatics and study on its further
                                                                     development," CHINESE SCIENCE BULLETIN,
                 Figure 2: Cluster diagram                           Vol. 51, pp 498-504
The fuzzy clustering algorithm outputs the final cluster         [5] Doreswamy, Hemanth.K.S, “Hybrid Data Mining
centers and values of objective function for each iteration.         Technique for Knowledge Discovery from
The clustering process stops when the objective function             Engineering Materials Data Sets”, International
improvement between two consecutive iterations is less               Journal of Database Management Systems, Vol.3,
than the minimum amount of improvement specified that                No.1, February 2011.
is 1e-5, with the accuracy off 0.99.                            [6] Toyohiro Chikyow, “Trends in Materials Informatics
The result obtained by applying Fuzzy C-Means                        in Research on inorganic materials”, quarterly
clustering with 2 cluster centers using MATLAB is                    review No 20, July 2006.
shown in figure 3.                                              [7] R. L. King, O. Abuomar, H. Rhee, A.
                                                                     Konstantinidis, N. Pavlidou and M. Petrou, “On
                                                                     materials informatics and pattern formation in
                                                                     materials”, ENOC 2011, 24-29, July 2011.
                                                                 [8] P. Bhargavi, Dr. S. Jyothi, “Soil Classification Using
                                                                     Data Mining Techniques: A Comparative Study”,
                                                                     International Journal of Engineering Trends and
                                                                     Technology- July to Aug Issue 2011
                                                                 [9] K.W. Johnson,       P.M. Langdon, M.F. Ashby,
                                                                     “Grouping materials and processes for the designer:
                                                                     an application of cluster analysis”, Elsevier Science
                     Figure 3: Results                               Ltd, 2002
Here we have taken two properties specific gravity and           [10] Mohanad Alata, Mohammad Molhim, and
young’s modulus and clustered the metals into two                     Abdullah Ramini, “Optimizing of Fuzzy C-Means
groups. Based on the clusters it is possible for us to select         Clustering Algorithm Using GA”, World
the metals which have similar values for the selected                 Academy of Science, Engineering                 and
properties. The same analysis can be continued with                   Technology, 2008.
different properties so that we get clusters based on those      [11]
properties.                                                           etals.htm

5. CONCLUSION & FUTURE WORKS                                    AUTHORS
Fuzzy C-Means was used for classifying the engineering
                                                                                        Sarakutty T K received MCA degree
materials for better business decision, which helps to                                  from Bharathiar University and M.Phil
identify which engineering material belongs to which                                    Computer Science from M S
category by using numerical properties and clustering the                               University. She is working in the
materials data set based on their similarities and                                      department of Computer Science and
performance. This can be achieved by repeating the same                                 Applications in Dayananda Sagar
analysis first with different properties and then with                                  College, Bangalore, India. She has 15
different materials. This exploratory analysis suggests                                 years of teaching experience in the
how a designer might be able to use such an analysis to                                 field of computer science and
                                                                applications and her research area includes Data Mining,
suggest materials that are similar to each other. The same
                                                                Predictive Analytics and Algorithms.
module can be used to cluster and classify the different
engineering materials to take business decisions.


Volume 1, Issue 3 September-October 2012                                                                           Page 21
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 3, September – October 2012                                    ISSN 2278-6856

                       Dr. M. Hanumanthappa is
                       currently working as a faculty as
                       well as chairman in the Dept. of
                       Computer        Science        and
                       Applications,           Bangalore
                       University, Bangalore. He has
                       over 16 Years of teaching (Post
                       Graduate) as well as Industry
experience. His area of Interest includes mainly Data
Mining, Information Retrieval and Programming
Languages. Besides, he has conducted a number of
training programmes and workshops for Computer
Science students. He is also the Principle Investigator of
UGC-Major Research Project; he has published nearly 50
Research Papers in National and International Journal
and Conferences. Currently he is guiding students for
Ph.D in Computer Science, under Bangalore University.
He is also one of the member of Board of Studies as well
as Board of Examiners for various Universities of

Volume 1, Issue 3 September-October 2012                                              Page 22

To top