A Fuzzy Clustering-Based Approach for Fuzzy Modeling

Document Sample
A Fuzzy Clustering-Based Approach for Fuzzy Modeling Powered By Docstoc
					           A Fuzzy Clustering-Based Algorithm for Fuzzy Modeling

                                  THOMAS MAVROFIDES
                Laboratory of Image Processing and Multimedia Applications,
       Department of Cultural Technology and Communication, University of the Aegean,
                  Faonos & Harilaou Trikoupi Str., 81100, Mytilene, Greece
                       Tel: +301-2251-0-36631, Fax: +301-2251-0-36609

Abstract: Fuzzy rules have a simple structure within a multidimensional vector space and they are produced by
dismembering this space into fuzzy subspaces. The most efficient way to produce fuzzy partitions in a vector
space is the use of fuzzy clustering analysis. This paper proposes a fuzzy clustering-based algorithm, which
generates fuzzy rules from a set of input-output data. The algorithm is based on the assumption that, with an
input fully matching with the premise part of a specific fuzzy rule, the corresponding output should completely
participate in the consequent part. In order to accomplish this, certain conditions are derived. The application of
the algorithm to a test case, which has been considered as a benchmark in fuzzy modeling applications, shows
that the produced models are of compact size, while their performances are very efficient.

Key-Words: Fuzzy clustering; Fuzzy modeling; System identification; Model parameter estimation; Fuzzy
           partition; Crisp partition.

1. Introduction                                            areas where the membership functions determine the
The basic issue in fuzzy modeling is the                   structure of the rules.
identification procedure that is employed. Fuzzy           In this paper, a novel fuzzy clustering-based method
model      identification    consists    of    structure   is proposed for system identification. The proposed
identification and parameter estimation. Structure         algorithm is based on decomposing the input space
identification is directly related to the determination    into a certain number of subspaces (clusters), each of
of the appropriate number of rules [1,2]. On the other     which is assigned to a specific fuzzy rule. Then, the
hand, parameter estimation concerns the calculation        output space is relationally dismembered into the
of the appropriate model parameter values that             same number of clusters in such a way, that certain
provide an accurate system description. Structure          conditions have to be satisfied.
identification and parameter estimation are usually
carried out via a training procedure. So far, a wide       2. The Proposed Algorithm
spectrum of methods has been proposed to train             In this section the proposed algorithm is analyzed in
fuzzy systems. Many of these methods use heuristic         details. The algorithm is able to efficiently generate
approaches [3], self-learning and adaptive schemes         fuzzy rules based on a set of n input-output data pairs
[4,5], or gradient descent algorithms [6].                 of the form ( x k ; y k ) (1 ≤ k ≤ n) . The basic design
One of the most efficient fuzzy modeling procedures
                                                           issues of the proposed method are described within
is the utilization of fuzzy clustering analysis. Fuzzy
                                                           the next subsections.
clustering provides a certain advantage over other
approaches, since the partition of the input (or the
product) space is obtained as a direct result [7]. The     2.1 Partitioning the Input Space by Using
method developed in [8] use fuzzy clustering analysis          Fuzzy Clustering Analysis
to detect multidimensional reference fuzzy areas,          A major issue in fuzzy modeling is the reduction of
where the number of rules is determined by reducing        the computational complexity, and since simplified
the model parameters, based on a system                    fuzzy models use less parameters their usefulness is
performance index. In [9] it is proposed an algorithm      considerable. In our approach we adopt the simplified
that yields clusters in the mapping space by               fuzzy model introduced in [3], which is described by
incorporating the nature of the functional                 the following fuzzy rules,
relationships into an objective function. In [10] the
structure identification is obtained via hyper-                             i              i
                                                           R i : If x1 is X 1 and x 2 is X 2 and ... and x p is X ip
ellipsoidal clustering with simultaneous use of
human intuition, while in [11] the hyper-ellipsoidal           Then y is b i (1 ≤ i ≤ c)                         (1)
subspaces have been replaced by spherical fuzzy
where p is the number of inputs, c in the number of                    degrees that solve the above constrained optimization
rules, X (1 ≤ i ≤ c;1 ≤ j ≤ p ) are fuzzy sets, and b are    i         problem are respectively given by the following
                                                                       equations [12],
real numbers. The above fuzzy model can
approximate any nonlinear function to arbitrary                               n
accuracy on a compact set [3].                                               ∑ (uik ) m x k
Based on fuzzy reasoning, it is evident that even                      v i = k =1n                 ,       1≤ i ≤ c                      (5)
when an input linguistic variable is not appearing in
the premise part of one rule, a fuzzy set can be                                ∑ (uik )    m

                                                                                k =1
assigned to it with a firing degree of unity. This
remark suggests a uniform structure of the premise
part of the rule base, where all the input linguistic
variables participate in all of the fuzzy rules. In                    u ik =                                   , 1 ≤ i ≤ c, 1 ≤ k ≤ n   (6)
addition to that, a more credible fuzzy rule base can                             c     || x − v ||    m −1
be created by assuming that the output variable                                 ∑  || x k − v i
participates in each rule with a normal fuzzy set,                                j =1      k   j || 
meaning that there is at least one element belonging
to the fuzzy set with membership degree of unity. By                   The eqs (7) and (8) constitute an iterative
considering that c fuzzy rules are needed to describe                  optimization procedure.
a nonlinear system, the uniform structure of the                       By applying the above minimization procedure to the
premise part of the rule base enables us to partition                  input training data vectors, these vectors are classified
the input space X into c fuzzy subspaces                               into c fuzzy clusters, where the i-th cluster
 X 1 , X 2 , ..., X c . Each of these subspaces is                     corresponds to the i-th fuzzy subspace. Since a single
assigned to only one fuzzy rule. Therefore, the fuzzy                  fuzzy subspace corresponds to a specific fuzzy rule,
rule in (1) can be modified as,                                        the number of clusters coincides with the total
                                                                       number of fuzzy rules. Eventually, the membership
                                                                       degree of the training vector xk to the i-th fuzzy
R i : If x is X i Then y is b i (1 ≤ i ≤ c)                      (2)   subspace Xi is the membership degree uik. In the rest
                                                                       of the paper, the word «fuzzy cluster» will replace the
where            x = [ x1 , x 2 , ..., x p ]T and   Xi ⊂ X   with      word «fuzzy subspace», meaning that these two
                                                                       words are referred to the same concept.
          i     i
X i = { X 1 , X 2 , ..., X ip } . Since our model is described
by fuzzy rules of the form (2), we can produce a                       2.2 Model Parameter Initialization
constrained fuzzy c-partition of the input space X by                  Based on the analysis presented in the previous
applying the well-known fuzzy c-means algorithm on                     section, the premise part of each rule consists of
the input training data set. The fuzzy c-means is                      multidimensional fuzzy clusters, the membership
based on the minimization of the following objective                   functions of which are given in eq (6). The form of
function [12],                                                         this equation indicates that the membership function
                                                                       is interpreted as the membership degree that is
         n       c                                                     assigned to the input vector xk by the center element
Jm =    ∑ ∑          (u ik ) m || x k − v i || 2                 (3)   vi of the cluster X i. Thus, the width of the cluster Xi
        k =1 i =1
                                                                       is not included in the membership function and
                                                                       therefore, it is not taken into account in the parameter
under the next equality constraint,                                    estimation either. Another important issue is the
                                                                       presence of the parameter m. This parameter controls
                                                                       the fuzziness of the resulted partition and thus, it
∑ uik =1 ,                      ∀k                               (4)
                                                                       affects the overlapping degree between the
i =1
                                                                       multidimensional fuzzy clusters. More specifically, as
where n is the number of training data vectors, c is                   this parameter increases, the overlapping degree also
the number of clusters, u ik is the membership degree                  increases. This means that for a specific value of the
                                                                       parameter m the overlapping degree between the
of the k-th training vector to the i-th cluster,                       clusters is known, and therefore, the locations of the
 m ∈ (1, ∞) is a factor to adjust the membership degree                cluster centers indicate the distances between the
weighting effect, xk ∈ℜ p are the input training data                  clusters. Thus, the premise parameter identification
                                                                       only concerns the estimation of the appropriate
vectors, and vi ∈ℜ p are the cluster centers. The                      cluster centers. To this end, the premise parameter
cluster centers and the respective membership                          estimation is based on iteratively applying the eqs (5)
and (6) to the input training data, where the resulted     With the premise parameters known, the respective
cluster centers provide the fuzzy rule premise             consequent parameters can be obtained by
parameters and the respective membership degrees           minimizing the J1 over the n input-output data pairs.
provide the firing degrees of the fuzzy rules.             Using eq. (7), eq. (8) gives that,
Therefore, the output of the fuzzy model can be
calculated as,                                                     n           c
                                                           J 1 = ∑ ( y k − ∑ u ik b i ) 2                                  (9)
          c         c                                             k =1        i =1
 ~ = u bi
 y k ∑ ik          ∑ u ik    (1 ≤ k ≤ n)
      i =1         i =1                                    One feasible way to minimize J1 is to employ the
                                                           well-known least squares algorithm. However, the
Taking into account the eq (4), the above equation is      utilization of this algorithm does not guarantee that
modified as follows,                                       the conditions 1 and 2 will be satisfied. Therefore, we
                                                           introduce the following procedure.
~ = u b i (1 ≤ k ≤ n)
y k ∑ ik                                            (7)
     i =1
                                                           Theorem 1
                                                           If m →1+ then the objective function J1, given in eq.
With the fuzzy c-partition of the input space              (9) can be calculated as,
introduced, the output space should be partitioned in
a similar way. Moreover, this partition should be                   n c
based on the following conditions [11, 12],                J1 =    ∑ ∑ (u ik ) 2 ( y k − b i ) 2                          (10)
                                                                  k =1i =1

Condition 1: If in the i-th fuzzy rule the vector xk is
the center element of the cluster X i then the output yk   Proof
should satisfy the rule’s consequence by concluding a      For 1 ≤ i ≤ c and 1 ≤ k ≤ n , from eq. (6) we obtain
truth degree equal to unity.                               that,

Condition 2: If in the i-th fuzzy rule the vector xk is                                           2 /( m −1)  −1   
                                                                         c                     
not the center element of the cluster Xi then the                               || x − v i ||                      
output yk should satisfy the rule’s consequence by          lim+ u ik =  ∑  k                                    =
                                                           m→1           j =1 || x k − v j ||                   
concluding a truth degree less than unity.                                                                      
The above conditions are referred to the matching
                                                            1, if || xk − vi || < || xk − v j || ∀ i ≠ j
degree between the premise and the consequent part         =
of each fuzzy rule. One feasible way to satisfy these       0, otherwise
two conditions is to perform clustering analysis in the    Thus, as m → 1+ the membership degrees in the input
product space (i.e. the input-output space) and then       space are given as follows,
induce fuzzy sets by projecting the resulted clusters
on each dimension. Such kinds of approaches are
                                                                  1, if x k ∈ X
investigated in [7,8,9]. However, the main drawback        u ik =                                                        (11)
of these approaches is that the consequent parameters             0, otherwise
are not calculated by the use of an optimizing
criterion. In order to solve this problem, we introduce    where X = X 1 ∪ X 2 ∪ ... ∪ X c is a crisp partition of
the following condition,                                   X.
                                                           From eq. (11) it follows that, there are k1 input data
Condition 3: The consequent parameters should be           vectors that belong to the cluster X1, k2 data vectors
estimated by minimizing the sum of the square errors       that belong to the cluster X2, …., and kc data vectors
(SSE) criterion.                                           that belong to the cluster Xc, such that,

The above condition has to be satisfied together with      k1 + k 2 + ... + k c = n                                       (12)
the conditions 1 and 2. The SSE criterion is given as,
                                                           Therefore, the following relation holds,
J 1 = ∑ ( y k − ~k ) 2
                y                                   (8)                c
     k =1                                                  ( y k − ∑ u ik b i ) 2 = ( y k − u li k b li ) 2
                                                                   i =1

                                                                                         = (u li k ) 2 ( y k − b li ) 2   (13)
where the index li corresponds to the crisp cluster                             Setting the partial derivative ∂J 1 ∂ b i equal to zero,
 X li at which the xk belongs to. Based on eqs (11),                            and solving with respect to bi, we can easily derive
(12), and (13) the objective function in (9) can be                             the eq. (16). This completes the proof of theorem 2.
modified as follows,

             k1                                      k2
                                                                                Summarizing, the premise parameters are calculated
J 1 = ∑ (u1k ) 2 ( y k − b1 ) 2 + ∑ (u 2 k ) 2 ( y k − b 2 ) 2                  by the eq (5) and the consequent parameters by the
           k =1                                     k =1
                                                                                eq. (16).
       + .... +        ∑ (u ck ) 2 ( y k − b c ) 2
                       k =1                                                     2.3 Fine Tuning of the Model Parameters
                                                                                In this section the model parameters, obtained in the
which means that,                                                               previous step, are further tuned by using a gradient
                                                                                descent approach. The objective function that is used
             c ki                                                               for this purpose is given as,
J1 = ∑ ∑ (uik ) 2 ( yk − bi ) 2                                          (14)
           i =1 k =1                                                                    1 n
                                                                                J2 =       ∑ ( y k − ~k ) 2
                                                                                       2 n k =1
The i-th crisp cluster Xi includes ki training vectors
and therefore the rest (n-ki) training data vectors are
assigned by Xi membership degrees equal to zero.                                By substituting eq. (7) into the above function we
Therefore, the following relation holds,                                        obtain that,

 ki                                                                                     1 n            c

∑ (uik )          2
                      ( yk − b ) = i 2                                          J2 =       ∑ ( y k − ∑ u ik b i ) 2
                                                                                       2 n k =1
                                                                                                     i =1
k =1

      ki                                   n − ki                               In order to minimize J2 the premise parameters have
=     ∑ (uik )        2              i 2
                          ( yk − b ) +      ∑              2       i 2
                                                    (uik ) ( yk − b )           to be adjusted as follows,
      k =1                                 k =1
       n                                                                                 β1    n                         ∂ u ik 
                                                                                              ∑  ( y k − ~k )
=     ∑ (uik ) 2 ( yk − bi ) 2                                           (15)   ∆v i =
                                                                                                          y          bi
                                                                                                                          ∂ vi 
                                                                                                                                                        (18)
      k =1                                                                                    k =1 

Replacing eq. (15) into eq. (14) we can derive the eq.                          where, based on (6), the partial derivative is given as,
(10). This completes the proof of theorem 1.                                                                                                      2
                                                                                                                          c    || x − v     ||  m −1
                                                                                                                          ∑  || x k − v i
                                                                                                                               k
                                                                                                                                             || 
                                                                                                                          j =1          j      
The next theorem provides the values of the                                     ∂ u ik          2                j ≠i
consequent parameters that minimize the objective                                      =                                                      (19)
                                                                                 ∂ v i (m − 1) ( x k − v i )                          2 
function in (10).
                                                                                                              c  || x k − v i ||  m −1 
                                                                                                             ∑    
                                                                                                                                         
Theorem 2                                                                                                     j =1 || x k − v j ||     
                                                                                                                                         
For 1 ≤ i ≤ c ; If the values of the membership                                 Relationally, the learning rule for the consequent
degrees u ik ( 1 ≤ k ≤ n) are fixed, then the values of                         parameters is as follows,
the consequent parameters bi that minimize the
                                                                                         β2        n
objective function J1, given in eq. (10), are calculated                        ∆ bi =
                                                                                                ∑ [( y k   − ~k ) u ik ]
                                                                                                             y                                           (20)
as,                                                                                             k =1

              n                                                                 In the above equations, the parameters β1 and β2 are
             ∑ (u ik ) 2       yk                                               the gradient descent learning parameters.
             k =1
bi =             n
                  ∑ (u ik )    2
                                                                                2.4 The Identification Algorithm
               k =1                                                             Based on the previous analysis, the proposed fuzzy
                                                                                modeling algorithm is now given as follows.
The Proposed Algorithm                                                        65
Suppose we are given n input-output data pairs of the
                                                                                                   Model Predictions
form ( x k ; y k ) (1 ≤ k ≤ n) . Initially select a small                                          Original Data
value for the parameter m, which is close to unity.
Set the number of rules c=2, and select a value for
the terminal condition parameters ε1 and ε2.

Step 1). Randomly, initialize the premise parameters                          50
v i (1 ≤ i ≤ c) and the consequent parameters
b i (1 ≤ i ≤ c) .

Step 2). For k = 1, 2, ..., n and i = 1, 2, ..., c ; Use the
                                                                                      1    15 29   43 57 71 85         99 113 127 141
eq (6) to calculate the membership degrees uik.                                                         Sample

Step 3). For i=1, 2, …, c; Update the premise                      Figure 1: Original and predicted values for the training
parameters vi using the eq. (5).                                            data set of the Box and Jenkins system (Case 1).

Step 4). For i=1, 2, …, c; Calculate the consequent                with 6 inputs: x(k), x(k-1), x(k-2), y(k-1), y(k-2), y(k-
parameters using the eq. (16).                                     3) and one output: y(k). In order to compare our
                                                                   method with other approaches, we performed two
Step 5).      Calculate the distance || b − bp || where            experimental cases namely, case 1 and case 2.
b = [ b1 , b 2 , ..., b c ]T and bp the previous state of b.
Step 6).      If || b − bp ||≤ ε 1 then go to step 7; else go
to step 2.

Step 7). Employ the gradient descent approach to                                      50

minimize J2, where the model parameter learning
                                                                                      45                         Model Predictions
rules are given by the eqs (18) and (20).
                                                                                                                 Original Data

Step 8).      Calculate the performance index of the                                  40
                    n                                                                   149 163 177 191 205 219 233 247 261 275 289
model: PI =         ∑ ( yk   − ~k ) 2 n . If PI ≤ ε 2 then stop;
                               y                                                                          Sample
                k =1
Else set c= c+1 and go to step 1.                                  Figure 2: Original and predicted values for the test data set
                                                                             of the Box and Jenkins system (Case 1).

The final result of the above iterative optimization is            In case 1 we used the first 148 input-output data as
that, with an input fully matching with one of rules’              training data to build the fuzzy model and the last 148
premise part, the corresponding output satisfies the               as test data to validate its performance. The terminal
consequence completely, meaning that the truth                     conditions were selected as ε1=10-4 and ε2=10-2, and
degree of each fuzzy rule is equal to unity. Thus, the             the learning rates for the gradient descent method
eq. (7) can be used for inference of the output from a             were: β1 = β2= 0.55. The final number of rules was
specific input data vector.                                        equal to c=3.The predicted and the original output
                                                                   values for the training data are given in Fig.1, where
3. Simulation Study                                                the corresponding Mean Square Error (MSE) was
In this subsection the proposed algorithm is applied               equal to 0.045. Fig. 2 shows the predicted and the
to the well-known Box and Jenkins data set [2],                    actual values for the validation data for which, the
which consists of 296 input-output measurements of                 MSE was equal to 0.251. The MSEs, which were
a gas-furnace process, obtained using a sampling                   obtained for the same case study by the method
ratio of 9 s. At each sampling time k the input x(k) of            developed in [14] were 0.071 for that training data,
this process is the gas flow rate and the output y(k) is           and 0.261 for the test data, meaning that our model
the output CO2 concentration. The proposed method                  performs better than this method.
was used to design a fuzzy model for this process
             65                                                   very efficient performance, while keeping the size of
                                                                  the model within reasonable and acceptable levels.
             55                                                   [1] T. Takagi, and M. Sugeno, Fuzzy identification of

                                                                  systems and its application to modeling and control,
             50                                                   IEEE Trans. Systems Man Cybern., Vol. 15 (1), 1985,
                                                                  pp. 116-132.
                                      Model Predictions           [2] M. Sugeno, M., and Yasukawa, T., A fuzzy-logic-
                                      Original Data               based approach to qualitative modeling, IEEE Trans.
             40                                                   Fuzzy Syst., Vol. 1 (2), 1993, pp 7-31.
                  1   43   85   127    169    211     253   295   [3]K. Nozaki, H. Ishibuchi, and H. Tanaka, A simple
                                                                  but powerful method for generating fuzzy rules from
                                                                  numerical data, Fuzzy Sets and Systems, Vol. 86,
Figure 3: Original and predicted values for the Box               1997, pp 251-270.
          and Jenkins system (Case 2).                            [4] J.S.R Jang, ANFIS: Adaptive-Network-based
                                                                  Fuzzy Inference Systems, IEEE Trans Systems Man
In case 2 we used all the data set to build the fuzzy             and Cybern., Vol. 23 (3), 1993, pp. 665-685.
model and to validate its performance. The terminal               [5] C.W. Xu, and Y.Z. Lu, Fuzzy Model
conditions were selected as ε1=10-4 and ε2=10-2, and              Identification and Self-Learning for Dynamic
the learning rates for the gradient descent method                Systems, IEEE Trans. Syst. Man & Cyber., Vol. 17
were: β1 = β2= 0.3. The final number of rules was                 (4), 1987, pp. 683-689.
equal to c=4. Fig.3 depicts the predicted and the                 [6] E. Kim, H. Lee, M. Park, and M. Park, A simply
actual values, where the MSE was equal to 0.1398.                 identified Sugeno-type fuzzy model via double
Table 4 compares the performance of the produced                  clustering, Information Sciences,Vol. 110, 1998, pp.
fuzzy model to other models that can be found in the              25-39.
literature. From this table we can easily notice that             [7] A.G. Skarmeta, M. Delgado, and M. Vila, About
our model achieves the best performance.                          the use of fuzzy clustering techniques for fuzzy
                                                                  model identification, Fuzzy Sets and Systems, Vol.
Table 4: Comparison results for the Box-Jenkins example           106, 1999, pp 179-188.
         (Case 2)
                                                                  [8] A. Kroll, Identification of functional fuzzy
               Model               of rules   MSE                 models using multidimensional reference fuzzy sets,
     Box and Jenkins[13]              ---    0.2020               Fuzzy Sets and Systems, Vol. 80, 1996, pp 149-158.
     Chen et al. [11]                  3     0.2678               [9] K. Hirota, and W. Pedrycz, Directional fuzzy
     Sugenoand Yasukawa [3]            6     0.1900               clustering and its application to fuzzy modeling,
     Xu and Lu [5]                    25     0.3280               Fuzzy Sets and Systems, Vol. 80, 1996, pp 315-326.
     Gomez-Skarmeta et al. [8]         2     0.1570               [10] Y. Nakamori, and M. Ryoke, Identification of
     Kroll [8]                         2     0.1495               fuzzy prediction models through hyper-ellipsoidal
     Our Model                         4     0.1398               clustering, IEEE Trans. Syst. Man and Cybern., Vol.
                                                                  24(8), 1994, pp.1153-1173.
                                                                  [11] J. Chen, Y. Xi and Z. Zhang, A clustering
4. Conclusions                                                    algorithm for fuzzy model identification, Fuzzy Sets
In this paper we have proposed a novel method to                  and Systems, Vol. 98, 1998, pp 319-329.
train fuzzy models. The method is developed so that               [12] J.C. Bezdek, Pattern Recognition with Fuzzy
emphasis is given on both the accuracy and the size               Objective Function Algorithms, Plenum Press, N. Y.,
of the produced model. In order to achieve these                  1981.
targets, the method follows a number of steps, which              [13] G.E.Box, and G.M. Jenkins, Time series
are independent each other, so that the result of each            Analysis, forecasting and control, San Francisco, CA:
step becomes the input of the next step. The basic                Holden Day, 1970.
design issue of the algorithm is that both the premise            [14] Y. Lin, and G.A. Cunningham, A new approach
and the consequent parts appear an equal contribution             to fuzzy-neural modeling, IEEE Trans. Fuzzy
to the firing degree of each rule. In order to                    Systems, Vol. 3, 1995, pp 190-197.
accomplish this, certain conditions are taken into
account. The application of the algorithm to a test
case shows that the algorithm is able to achieve a