# NEW APPROACH TO STRUCTURE OPTIMIZATION OF WAVELET NEURAL NETWORK

Document Sample

```					INTERNATIONAL JOURNAL OF                                             c 2005 Institute for Scientiﬁc
INFORMATION AND SYSTEMS SCIENCES                                      Computing and Information
Volume 1, Number 3-4, Pages 382–389

NEW APPROACH TO STRUCTURE OPTIMIZATION OF
WAVELET NEURAL NETWORK BASED ON ROUGH SETS
THEORY

WEI DONG, YUELING ZHAO, JIANHUI WANG, AND SHUSHENG GU

Abstract. In this paper, a new approach for constructing and training wavelet
network is proposed based on the time-frequency and the rough sets theory.
A learning algorithm is presented. The suggested algorithm utilizes the time-
frequency information contained in the training data suﬃciently, determines the
number of the hidden layer nodes and the weights of wavelet network(WNN),
and solves the wavelet network structure optimization problem. Based on the
rough sets theory, a new wavelet network with fewer nodes is constructed during
the process. The simulation result shows that the proposed method is simple
and eﬀective.

Key Words. rough sets, wavelet frames, wavelet network, and signiﬁcance of
attributes.

1. Introduction
During the last 10 years, wavelet theory has been developed in mathematics.
Wavelets are mathematical functions that cut up data into diﬀerent frequency com-
ponents, and then study each component with a resolution matched to its scale.
Unlike the Fourier transformation, the wavelet transformation has dual localiza-
tions, both in frequency and in time. These characteristics make wavelets an active
subject with many exciting applications, not only in pure mathematics, but also in
acoustics, image compression, turbulence, human vision, radar, earthquake predic-
tion, ﬂuid mechanics and chemical analysis.
The idea of using wavelet in WNN was proposed by Zhang and Benveniste [1].
Based on wavelet theory, WNN possesses the best function approximation ability,
that is to say, it has the ability to construct the model. Because the construct-
ing model algorithm is diﬀerent from common neural network back-propagation
algorithm, it can eﬀectively overcome intrinsic defect of common neural network.
WNN has been proposed as a novel universal tool for functional approximation,
which shows surprising eﬀectiveness in solving the conventional problem of poor
convergence or even divergence encountered in other kinds of neural networks. It
can dramatically increase convergence speed. The type of wavelet network Zhang
and Benveniste [1] studied can be considered as a special case of Radial Basis Func-
tion (RBF) network. The structure of the wavelet network is similar to that of the
RBF network except that in the wavelet network the radial basis functions are re-
placed by orthonormal scaling functions that are not necessarily radial-symmetric.
The use of wavelet frames provides a good and viable framework for determining

Received by the editors July 7, 2004 and, in revised form, January 22, 2005.
This research was supported by National Nature Science Foundation of China(Serial number
60274024,60474040) .
382
NEW APPROACH TO STRUCTURE OPTIMIZATION OF WNN BASED ON RS THEORY 383

the structure of the RBF network. The wavelet network performs well and is com-
pared favorably to the Multilayer Perceptron (MLP) and RBF networks and has
been applied successfully in many other areas [2].
In this study we focus on wavelet frames rather than on orthogonal wavelet bases
because the use of orthogonal wavelet bases requires certain restrictive conditions
that are seldom feasible. In addition, the training data are normally sparse, further
limiting the application of orthonormal wavelet bases. The wavelet functions pos-
sess a special property that can be expressed as follows: any function of L2 (R) can
be approximated to any prescribed accuracy with a ﬁnite sum of wavelet. It was
proved that families of wavelet functions, particularly wavelet frames, are universal
approximators, which provides the theoretical basis to the use of wavelet functions
in the framework of function approximation and process modeling. Wavelet frames
(with better regularities) are therefore considered a better choice in our study.
However, using of wavelet frames has one disadvantage: there is redundancy in
frames.
The greatest problem that the wavelet network based on wavelet frame is: be-
cause the wavelet frames can be linearly correlated, so it uses, at time- frequence
area, regular of intercepting being of the frame of wavelet bases function as the
neuron function of the hide layer node. The network often results in producing
has very great redundancy. So structure optimization based on wavelet frame is
signiﬁcant.
Some researches[3, 4] have been conducted to deal with the frame redundancy
issue. They put forward a kind of structure optimization method of wavelet net-
work on the basis of the adaptive projection algorithm. Backward deletion[5] only
applies to the wavelet network whose weight diﬀers greatly .However, the methods
suggested are quite involved and complex. But this method is relatively compli-
cated.
Rough sets theory has been proposed by Pawlak [6] for knowledge in databases
and experimental data sets, and has been widely accepted since the early 1980s. It
is especially useful as a tool to deal with inexact, uncertain or vague knowledge in
many artiﬁcial intelligence applications [5, 6]. The theory of rough sets has been
investigated in the context of expert systems, decision support systems, machine
learning inductive reasoning and pattern classiﬁcation. Data analysis in terms of
rough sets has been shown to be eﬀective for revealing relationships within impre-
cise data. The technique makes no attempt to build precise numerical models of
those relationships. Some hybrid methods have also been proposed. For example,
LING et al [7] used neural network optimization based on rough sets theory in
fault diagnosis systems. We propose to study the use of rough sets as a tool for
structuring the wavelet neural networks. This algorithm reduces the number of
hidden layer nodes of the network signiﬁcantly, determines the associated weights,
and solves wavelet network structure optimization problem in a new way. It is
shown to be simple and eﬃcient.
The rest of this paper is organized as follows: in section 2, we provide some
deﬁnitions and introduce the basic theory; section 3 contains the proposed method.
A simple and yet practical example with computation results is presented in section
4. We also demonstrate the eﬃciency of our proposed procedure by comparing our
results with earlier works. Section 5 concludes.

2. Basis notions
First, let us provide some basic notions and deﬁnitions.
384                     W. DONG, Y. ZHAO, J. WANG, AND S. GU

2.1. Rough sets and rules. Rough sets theory is based on the concept of an
upper and a lower approximation of a set, the approximation space and models of
sets. Rough sets have been employed to remove redundant conditional attributes
from discrete-valued data sets, while retaining their information content.
The rough sets approach to data analysis have many important advantages.
Some of them are: providing eﬃcient algorithms for ﬁnding hidden patterns in
data; ﬁnding minimal sets of data (data reduction); evaluating signiﬁcance of data;
generating sets of decision rules from data; oﬀering straightforward interpretation
of obtained results.
Knowledge representation can be viewed as a two-dimension input data table
called an attribute-value table. Rows of the table are labeled by objects (e.g., states,
processes, events, etc.), and columns of the table are labeled by attributes that
represent information about the corresponding objects. In rough sets theory[8, 9],
a decision table is denoted by S = (U, A, C, D), where U is universe of discourse, A
is a set of primitive attributes, and C, D ⊂ (A) are two subsets of attributes that
are called condition and decision attributes.
Let a ∈ A, P ⊆ A . A binary relation called the indiscernibility relation, is
deﬁned as follows:
(1)         IN D(P ) = {(x, y) ∈ (U × U ) : f or all a ∈ P, a(x) = a(y)}.
Let U/IN D(P ) denote the family of all equivalence classes of the relation IN D(P ).
For simplicity, notation U/P will be written instead of U/IN D(P ) . Equivalence
classes U/IN D(C) and U/IN D(D) will be called condition and decision classes,
respectively.
Lower approximation: Let R ∈ C and X ⊆ U . The R - lower approxi-
mation set of X is the set of all elements of U which can be with certainty
classiﬁed as elements of X, assuming knowledge R . It can be presented
formally as:
(2)                         RX = ∪{Y ∈ U/R : Y ⊆ X}.
Positive region: The C positive region of D is the set of all objects from the
universe U that can be classiﬁed with certainty to classes of U/D employing
attributes from C, i.e.
(3)                           P OSC (D) =           CX,
X∈C/D

where CX denotes the lower approximation of the set X with respect to
C, i.e., the set of all objects from U that can be with certainty classiﬁed as
elements of X basing on attributes from C.
Let c ∈ C. A feature c is dispensable in T , if P OS(C−c) (D) = P OSC (D);
otherwise feature c is indispensable in T .
If c is an indispensable feature, deleting it from T will cause T to be inconsistent.
T is independent if all c ∈ C are indispensable.
Reduct: A set of features R ⊆ C is called a reduct of C , if T = (U, A, R, D)is
independent andP OSR (D) = P OSC (D). In other words, a reduct is the
minimal feature subset preserving the above condition.
Core: The set of all the features indispensable in C is denoted by CORE(C).
We have
(4)                           CORE(C) = ∩RED(C),
where RED(C) is the set of all reducts of C.
NEW APPROACH TO STRUCTURE OPTIMIZATION OF WNN BASED ON RS THEORY 385

All of indispensable features should be contained in an optimal feature subset,
because removing any of them causes inconsistency in a decision table. As deﬁned,
CORE(C) is the set of all indispensable features. Hence the process of searching
indispensable features is that of ﬁnding CORE.
A feature selection method which is using rough set theory can be regarded
as ﬁnding such a reduction R with respect to the best classiﬁcation. Thus R is
used instead of C in a rule discovery algorithm. Selecting an optimal reduction
R from all subsets of features is not an easy work. Considering the combinations
among N features, the number of possible reductions can be 2N . Hence, selecting
the optimal reduction from all of possible reductions is NP-hard. For this reason,
various methods for ﬁnding approximate results have been proposed. The features
must be included in an optimal result and in an approximate result. It is clear that
if the accuracy of a decision table is not changed, all the indispensable features in
CORE cannot be deleted from C. The feature in CORE must be a member of
feature subsets. Note that not all of the features in an optimal feature subset must
be indispensable.
In summary, the problem of feature subset selection will become how to select
the features from dispensable features for forming the best reduction with CORE.
We use CORE as an initial feature subset. If CORE is not a reduction, some of
the dispensable features must be selected and added into it to make a reduction.
Let
card(P OSC (D))
(5)                        γ(C, D) =                   ,
card(U )
where card denotes cardinality of the set.
Signiﬁcance of attribute: For an arbitrary attribute a ∈ C − R, signiﬁ-
cance of attribute [10] is:
(6)                  SIG(a, R, D) = γ(R ∪ {a}, D) − γ(R, D).
The higher the value of SIG(a, R, D) , the more important the attribute.In
the case of R is already known,a is for the decision attributes.

2.2. Wavelet analysis and wavelet frames. Two categories of wavelet func-
tions, orthogonal wavelets and wavelet frames, were developed separately by dif-
ferent groups. Orthogonal wavelet decomposition is usually associated with the
theory of multi-resolution analysis. The fact that orthogonal wavelets cannot be
expressed in closed form is a serious drawback for their applicability to function
approximation and process modeling. Unlike orthogonal wavelets, wavelet frames
are constructed by simple operations of time location and scale of a single ﬁxed
function called the mother wavelet, which needs to satisfy conditions that are much
less stringent than the orthogonality conditions.
A wavelet ψj (x) is derived from its mother wavelet ψ(x) by the relation. More
accurately,

1 (x − b)
(7)                           ψa,b (x) = √ ψ        ,
a   a
where the time location factor and the scale factor are real numbers in R and R+ ,
respectively.
The Continuous Wavelet Transform (CW T )is deﬁned as follows:
386                    W. DONG, Y. ZHAO, J. WANG, AND S. GU

∗
(8)                    CW T {f (x); a, b} =            f (x)ψa,b (x)dx,

(the asterisk stands for conjugate complex).Time remains continuous but time-scale
parameters (b, a) are sampled on a so-called ”dyadic” grid in the time - scale plane
(b, a). Let

(9)               Cj,k = CW T {f (x), a = 2j , b = k2j }             j, k ∈ Z.
The wavelet base functions are:

j
(10)                          ψj,k (x) = 2− 2 ψ(2−j x − k).
A family ψj,k (t) is said to be a frame of L2 (R) if there exist two constants A > 0
and B < +∞ such that for any square integrable function f (x) , the following
inequalities hold:

2                           2          2
(11)                    A f        ≤     |< ψj,k f >| ≤ B f               ,

where f denotes the norm of function f and <, > the inner product sign of
functions . Families of wavelet frames of L2 (R) are generally considered universal
approximators. If f (x) ∈ L2 (R) , then

(12)                               f (x) =           Cj,k ψj,k .
j,k∈Z

3. Construction of wavelet networks using rough sets theory
3.1. Construction of wavelet networks based on wavelet frames. From
the wavelet theory and latest research reports, we know that an arbitrary function
f (x) ∈ L2 (Rn ) can be approximated closely based on wavelet frames as:

n
x − bt
(13)                           f (x) ≈           wt ψ(          ).
t=1
at

Formula (13) is a general mathematic description for WNN, where n is the number
of network nodes, ψ(•) is the basic wavelet, bt is the translation factor, at is the
dilation factor and wt is the link weight of the network.
According to equation (12), we develop a wavelet network based on wavelet
frames.The structure of the proposed wavelet network is showed in Fig1. The WNN
consists of three layers: input layer, hidden layer and output layer. The connections
between hidden units and output units are called weights wt . In the Input layer,
the number of nodes denotes the number of input variables. According to the
requirements of the problem, we select an input layer with one node, hidden layer
with k nodes, and the output layer with one node .The function of a hidden node is
selected by wavelet function while the weight between the input and hidden nodes
is selected by the scale factor and the threshold is selected by the time location
factor. These factors are obtained through time-frequency analysis. In this WNN,
the neural network simulator is described as follows.
NEW APPROACH TO STRUCTURE OPTIMIZATION OF WNN BASED ON RS THEORY 387

Figure 1. The model structure of WNN

3.2. Basic principles of structure optimization. First, construct an original
wavelet network according to the time-frequency information of input samples and
wavelet frames. We can regard the output of the WNN as the decision attribute
of the decision form, and regard the product of hidden nodes of the network and
weight values as the condition attributes. According to the RS, we can calculate
the attribute signiﬁcance and decision attribute of condition attribute separately.
Since it reﬂects the output size of the network to the hidden nodes, we can delete
one or several minimum hidden nodes after each iteration. This is because a smaller
weight value causes a large number of condition attribute values to be in the same
space. Incompatible rules are increased greatly, and dependency of attributes is
dropped. This method is better than those methods that take only weight values
into consideration.
Rough sets theory is a new mathematical tool to deal with imprecise, incomplete
and inconsistent data. The rough sets theory can deal with the discrete attributes
only. It can’t handle the continuous attributes. The procedure of discretization of
continuous attributes consists of (1) setting up a number of demarcation points in
the range of value region of continuous attributes, and (2) dividing the value region
of continuous attribute into a number of small areas. The key of discretization of
continuous attributes is how to determine the number and the position of demar-
cation points. Many discretization algorithms have been suggested by researchers.
A discretization algorithm based on fuzzy cluster is used by ZHONG and YU[11]
and is simple, reliable and eﬃcacious compared with others.

3.3. Speciﬁc steps.
(1): Conﬁrm the time-frequency concentrated area of the signal.
(2): Conﬁrm the time-frequency concentrated area of the wavelet frame ψj,k (x).
(3): Construct an original wavelet network according to the time-frequency
information of input samples and wavelet frames.
(4): The least square method is adopted to train the input data. Construct
the decision table based on the method described in 3.2.
(5): Following rules in the RS theory, calculate the attribute signiﬁcance to
decision attribute of condition attribute separately. The signiﬁcance reﬂects
388                    W. DONG, Y. ZHAO, J. WANG, AND S. GU

the output size of the network to the hidden nodes. After each round, we
can delete one or several minimum hidden nodes. After the training, we try
to test whether the performance index meets the demands. If the criteria
are not satisﬁed, go back to step (4). Otherwise, the last wavelet neural
network of unexpected turn represents the ﬁnal result.
4. A simulation study
In order to illustrate the performance of the proposed algorithm, we apply it to
the following function.

 0.504x + 5.008                  ,        −10 ≤ x < −2
(14)      f (x) =   x2                              ,        −2 ≤ x < 0
     −0.5(x−1)          2
12e           sin (0.3x + 0.7x) ,        0 ≤ x < 10.
The Mexico wavelet function is chosen

x2
(15)                           ψ(x) = (1 − x2 )e−    2   .
The combination of time-frequency analysis and RS is adopted. In the original
wavelet network there are 57 nodes in the hidden layer. We construct a wavelet
network with 24 hidden layer nodes based on rough sets theory. The simulation
result is displayed in Fig.2. The domain . For comparison, we show the results with
RS WNN and the primary function. Curve 1 represents the function f and curve
2 shows the approximations. We can see that the wavelet network gives the best
approximations.
The simulation result shows that a wavelet network based the rough sets theory
(WNN) is a viable technique that can be used to construct the neural network
optimally and our proposed algorithm is simple and eﬃcient.

Figure 2. Contrast of outputs and error between Original func-
tion and the WNN (Curve 1- original function; Curve 2- construc-
tion WNN; Curve 3-error)

5. Conclusion
In this paper we studied the issues related to constructing and training wavelet
network based on wavelet frames. With wavelet networks, the major diﬃculty is
that the number of nodes in the hidden layer is generally very large. Based on rough
sets theory, we propose a new learning algorithm for optimizing the structure of
NEW APPROACH TO STRUCTURE OPTIMIZATION OF WNN BASED ON RS THEORY 389

wavelet networks. Our new learning algorithm reduces the number of hidden layer
nodes of the network signiﬁcantly, determines the associated weights, and solves
wavelet network structure optimization problem in a new way. Compared with
other algorithms and methods, the proposed algorithm is shown to be simple and
eﬃcient. We are at the beginning of the project, therefore we can not report about
a real world application. This is a subject of our future work. We will explore the
limitations of this approach on a number of domains and we hope to show that this
idea is extendible to many other AI problems.
Acknowledgments
The author thanks the anonymous authors whose work largely constitutes this
sample ﬁle. This research was supported by National Nature Science Foundation
of China(Serial number 60274024,60474040).
References
[1] Zhang, Q.H. and Benvenise, A., Wavelet Network, IEEE Transaction on Neural Network,
Vol.3, pp: 889-898, 1992.
[2] Hornik, K., Stinchcombe, M. and White, H., Multilayer Feedforward Networks are Universal
Approximators, Neural Networks, Vol. 2, pp: 359-366, 1989.
[3] Zhang, Z.S., Liu, G.Z. and Liu, F., A Construction and Learning Algorithm on Adaptive
Wavelet Neural Networks. Science in China (Series E), Vol. 31(2), pp:172-181, 2001.
[4] Li, Y.G., Shen, J. and L, Z.Z., New Approach to Structure Optimization of Wavelet Neural
Network. Control Theory and Applications, Vol.20, No.3, pp: 329-331, 2003.
[5] Zhang, Q.H., Using Wavelet Network in Nonparametric Estimation. IEEE Trans on Neural
Networks, Vol. 8(2), pp: 227-236, 1997.
[6] Pawlak, Z., Rough Sets, International Journal of Information and Computer Science, Vol.11,
No.5, pp: 341-356, 1982.
[7] Pawlak, Z., Rough Sets-theoretical Aspects of Reasoning about Data, Kluwer Academic Pub-
lishers, Dordrecht, 1991
[8] Shi, F., Lou, Z.L. and Zhang, Y.Q., A Modiﬁed Heuristic Algorithm of Attribute Reduction
in Rough Set, Journal of Shanghai Jiaotong University, Vol.36(4), 2002.
[9] Ling, W.Y., Jia, M.P., et al. Optimizing Strategy on Rough Set Neural Network Fault Diag-
nosis System, Proceedings of the CSEE, Vol.23(5), pp: 98-102, 2003.
[10] Ras, Z.W. and Zemankova, M., Methodologies for Intelligent Systems, Elsevier Science Pub-
lishing Co.Inc. NewYork, pp: 325-391, 1987.
[11] Zhong, W. and Yu, J.S., Study on Soft Sensing Modeling via Fcm-based Multiple Models,
Journal of East China University of Science and Technology, Vol.26(1), pp: 83-87, 2000.

College of Information Science and Engineering Northeastern University Shenyang 110004
P.R.China
E-mail: dongweisy@yahoo.com.cn

College of Information Science and Engineering Northeastern University Shenyang 110004
P.R.China
E-mail: wjh570202@163.com

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 59 posted: 12/30/2009 language: English pages: 8
How are you planning on using Docstoc?