Docstoc

TRAINING THE NEURAL NETWORK USING LEVENBERG-MARQUARDT’S ALGORITHM TO OPTIMIZE

Document Sample
TRAINING THE NEURAL NETWORK USING LEVENBERG-MARQUARDT’S ALGORITHM TO OPTIMIZE Powered By Docstoc
					  International Journal of Advanced Research OF ADVANCED RESEARCH IN
  INTERNATIONAL JOURNAL in Engineering and Technology (IJARET), ISSN
  0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME
             ENGINEERING AND TECHNOLOGY (IJARET)

ISSN 0976 - 6480 (Print)
ISSN 0976 - 6499 (Online)
                                                                              IJARET
Volume 4, Issue 3, April 2013, pp. 93-100
© IAEME: www.iaeme.com/ijaret.asp                                            ©IAEME
Journal Impact Factor (2013): 5.8376 (Calculated by GISI)
www.jifactor.com




              TRAINING THE NEURAL NETWORK USING
       LEVENBERG-MARQUARDT’S ALGORITHM TO OPTIMIZE THE
         EVACUATION TIME IN AN AUTOMOTIVE VACUUM PUMP

                           Vijayashree1*, Kolla Bhanu Prakash2 and T.V. Ananthan3
   1, 2, 3
             Department of Computer Science and Engineering, Dr. MGR Educational and Research Institute
                               University, Maduravoyal, Chennai 600 095, India


  ABSTRACT

         Neural networks have been used for engine computations in the recent past. One reason for using
  neural networks is to capture the accuracy of experimental data while saving computational time, so
  that system simulations can be performed within a reasonable time frame. The main aim of this study is
  to optimize and arrive at a design base for a vacuum pump in an automotive engine using
  Levenberg-Marquardt’s (LM) Algorithm for Artificial Neural Networking (ANN). Design bases are
  created based on the previous products and by bench marking. Effortless application of brake is a
  preferred comfort feature in automotive application. To provide an easy and effective feeling, the
  braking mechanism needs to be assisted with external energy. This is optimized based on LM algorithm
  using the neural network to arrive at the optimum evacuation time..

  Index Terms: automotive engine, braking system, evacuation time, Levenberg-Marquardt’s (LM)
  Algorithm, neural networks, vacuum pump.

  I. INTRODUCTION

          Effortless application of brake is a preferred comfort feature in automotive application. To
  provide an easy and effective feeling, the braking mechanism needs to be assisted with external energy.
  Vane type Vacuum pump exactly serves this purpose, which is used to produce vacuum by evacuating
  the air in the vacuum booster. This vacuum is used to actuate the booster for the power brakes in the
  diesel-powered and Gasoline Direct Injection automobile. The capacity of the vacuum pump varies
  based on the weight and brake booster capacity of the vehicle. Therefore, it is necessary to have a
  design base with a proven technique, which will serve as a basis for faster product development.
          Neural networks and other machine learning algorithms are increasingly being used for engine
  applications [1]. These applications can be categorized as either real time control/diagnostic methods




                                                      93
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN
0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME

or predictive tools for design purposes. Some applications have even moved downstream of the engine
[2]. The present work aims to use neutral network technique using LM algorithm to arrive at the
appropriate evacuation time which is a critical parameter. The particular task selected is to minimize
the evacuation time in a vane type vacuum pump. The dataset used are the experimental results
conducted at UCAL Fuel Systems Ltd. Chennai.

II. VACUUM PUMP

       Vane type vacuum pump has a unique profile in which an eccentrically mounted rotor rotates the
vane as shown in the Fig.1. The movement of vanes creates pressure difference, which creates vacuum
in brake booster. Air enters the pump through inlet check valve assembly. Oil is circulated inside the
pump to lubricate the rotating parts and to maintain sealing between the high pressure and low pressure
regions [3, 4, 5]. The air and oil mixture are then expelled outside the pump through the reed valve. The
performance of the pump is specified by evacuation time of a specified tank volume [3].
                                   Evacuation time, t = (Vt / Q ) / ln (p1 / p2)
Where Vt is tank volume; p1 is atmospheric pressure and p2 is required pressure.
   Vane type vacuum pump is used to produce vacuum by evacuating the air in the vacuum booster.
This vacuum is used to actuate the booster for the power brakes in the diesel-powered and GDI
automobile. The capacity of the vacuum pump varies based on the weight and brake booster capacity of
the vehicle. Therefore, it is necessary to have a design base with a proven technique, which will serve as
a basis for faster product development.
     These results obtained from the existing pump were used for training the ANN using LM algorithm
to create the design base for any future design. Figure 1 shows the vacuum pump of capacity 110cc




                               .
                          Fig.1 Photograph of vacuum pump of capacity 110cc

III. LEVENBERG-MARQUARDT’S ALGORITHM

       The LM algorithm is an iterative technique that locates a local minimum of a multivariate
function that is expressed as the sum of squares of several non-linear, real-valued functions. It has
become a standard technique for nonlinear least-squares problems, widely adopted in various
disciplines for dealing with data-fitting applications. LM can be thought of as a combination of steepest
descent and the Gauss-Newton method. When the current solution is far from a local minimum, the
algorithm behaves like a steepest descent method: slow, but guaranteed to converge. When the current
solution is close to a local minimum, it becomes a Gauss-Newton method and exhibits fast
convergence.
Input:
  A vector function f : Rm → Rn with n ≥ m, a measurement vector x ∈ Rn and an initial parameters
estimate p0 ∈ Rm.

                                                   94
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN
0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME

Output:
  A vector p+ ∈ Rm minimizing ||x – f(p) ||2
Algorithm:
  k := 0; v := 2; p := p0;
  A := JTJ; ∈P := x – f(p); g := JT ∈P;
  stop := (||g||∞ ≤ ∈1); µ := τ * maxi=1, …, m (Aii);
  while (not stop) and (k < kmax)
  k := k + 1;
  repeat
  Solve (A + µI) δP = g;
  if (||δP || ≤ ∈2 ||p||)
  stop := true;
  else
  pnew := p + δP;
                                           T
  ρ := (||∈P ||2 − ||x – f(pnew)||2) / ( δ P (µδP + g));
           ∈
  if ρ > 0
  p = pnew;
 A := JTJ; ∈P := x – f(p); g := JT ∈P;
 stop := (||g||∞ ≤ ∈1);
 µ := µ * max(1/3, 1 – (2ρ – 1)3); v := 2;
 else
 µ := µ * v; v := 2 * v;
  endif
  endif
  until (ρ > 0) or (stop)
  endwhile
  p+ := p;

         The above is Levenberg-Marquardt nonlinear least squares algorithm. ρis the gain ratio, defined
by the ratio of the actual reduction in the error (||∈P ||2 ) that corresponds to a step δP and the reduction
                                                        ∈
predicted for δP by the linear model of Eq. (1). See text and [6,7] for details. When LM is applied to the
problem, the operation enclosed in the rectangular box is carried out by taking into account the sparse
structure of the corresponding Hessian matrix A.
       In the following, vectors and arrays appear in boldface and T is used to denote transposition. Also,
||.|| and ||.||∞ respectively denote the 2 and infinity norms. Let f be an assumed functional relation which
maps a parameter vector p ∈ Rm to an estimated measurement vector x = f(p), x ∈ Rn. An initial
parameter estimate p0 and a measured vector x are provided and it is desired to find the vector p+ that
best satisfies the functional relation f locally, i.e. minimizes the squared distance ∈T ∈ with ∈ = x - x
for all p within a sphere having a certain, small radius. The basis of the LM algorithm is a linear
                                                                                         ∂f (p)
approximation to f in the neighborhood of p. Denoting by J the Jacobian matrix                  , a Taylor series
                                                                                          ∂p
expansion for a small ||δP|| leads to the following approximation f (p + δP ) ≈ f (p) + J δP                  (1)
Like all non-linear optimization methods, LM is iterative. Initiated at the starting point p0, it produces a
series of vectors p1, p2, … that converge towards a local minimize p+ for f. Hence, at each iteration, it is
required to find the step δP that minimizes the quantity ||x − f (p + δP ) || ≈ ||x − f (p) − J δP || = ||∈− J δP||
                                                                                                          ∈
                                                                                                              (2)




                                                           95
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN
0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME

  The sought δP is thus the solution to a linear least-squares problem: the minimum is attained when J
δP − ∈ is orthogonal to the column space of J. This leads to JT (J δP − ∈) = 0, which yields the
Gauss-Newton step δP; as the solution of the so-called normal equations: JT J δP = JT ∈            (3)

        Ignoring the second derivative terms, matrix JT J in Eq.(3) approximates the Hessian of ½∈T∈   ∈
                       T                                                                        T
[18]. Note also that J ∈ is along the steepest descent direction, since the gradient of ½∈ ∈ is −JT∈.
                                                                                              ∈
The LM method actually solves a slight variation of Eq. (3), known as the augmented normal equations:
N δP = JT∈, with N ≡ JTJ + µI and µ > 0        (4)
      Where I is the identity matrix. The strategy of altering the diagonal elements of JTJ is called
damping and µ is referred to as the damping term. If the updated parameter vector p +δP with δP
computed from Eq. (4) leads to a reduction in the error ∈T∈, the update is accepted and the process
repeats with a decreased damping term. Otherwise, the damping term is increased, the augmented
normal equations are solved again and the process iterates until a value of δP that decreases the error is
found. The process of repeatedly solving Eq. (4) for different values of the damping term until an
acceptable update to the parameter vector is found corresponds to one iteration of the LM algorithm.
   In LM, the damping term is adjusted at each iteration to assure a reduction in the error. If the damping
is set to a large value, matrix N in Eq. (4) is nearly diagonal and the LM update step δP is near the
steepest descent direction JT∈. Moreover, the magnitude of δP is reduced in this case, ensuring that
excessively large Gauss-Newton steps are not taken.
        Damping also handles situations where the Jacobian is rank deficient and JTJ is therefore
singular [4]. The damping term can be chosen so that matrix N in Eq. (4) is nonsingular and, therefore,
positive definite, thus ensuring that the δP computed from it is in a descent direction. In this way, LM
can defensively navigate a region of the parameter space in which the model is highly nonlinear. If the
damping is small, the LM step approximates the exact Gauss-Newton step. LM is adaptive because it
controls its own damping: it raises the damping if a step fails to reduce ∈T∈ otherwise it reduces the
damping. By doing so, LM is capable of alternating between a slow descent approach when being far
from the minimum and a fast, quadratic convergence when being at the minimum’s neighborhood [8].
The LM algorithm terminates when at least one of the following conditions is met:

1. The gradient’s magnitude drops below a threshold ε1.
2. The relative change in the magnitude of δP drops below a threshold ε2.
3. A maximum number of iterations kmax is reached.
   The complete LM algorithm is shown in the above pseudocode; more details regarding it can be
found in [6]. The initial damping factor is chosen equal to the product of a parameter τ with the
maximum element of JTJ in the main diagonal. Indicative values for the user-defined parameters are τ =
10−3, ε1 = ε2 = 10−2, kmax = 100.

IV. METHODOLOGY OF NEURAL NETWORKS IN VACUUM PUMP PERFORMANCE
OPTIMIZATION

     The performance of the vacuum pump is determined by time required to evacuate air from the
reservoir. This function depends on the various parameters like temperature, oil pressure, rotation
speed etc. The vacuum pump development requires the procedure to develop the pump of any capacity
based on the customer requirement.
     In this first training stage, the inputs and the desired outputs are given to the NN. The weights are
modified to minimize the error between the NN predictions and expected outputs. Different types of
learning algorithms have been developed, but the most common and robust one is back-propagation.
The goal of the training is to minimize the error, and consequently to optimize the NN solution. Each
iterative step in which the weights are recalculated is called epoch. When the minimum is achieved, the
weights are fixed and the training process ends. Once a neural network has been trained to a satisfactory

                                                    96
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN
0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME

level, it may be used as a predictive tool for new data. To do this, only the inputs are given to the NN,
and the NN predicted outputs are calculated using the previous error minimizing weights.

V. RESULTS AND DISCUSSION

     The dataset used was obtained from UCAL Fuel Systems Ltd, Chennai. There were 4 sets of
training data, each set corresponding to a different combination of pump and tank capacity, speed,
pressure and evacuation time.
     There were 21x6 training data points and 4 input features. The target values were the 21x6
normalized (by the minimum possible evacuation time) values. There were 10 such sets for testing too.
No tuning set was required to be extracted from the training data, since because of the large number of
training data points, the training error as well as tune error decreased asymptotically, beyond a few
hundred epochs, and early stopping did not occur. The MATLAB neural network toolbox was used to
build the baseline neural networks. The Levenberg-Marquardt algorithm [9, 10] was used with the back
propagation algorithm. Twenty five hidden layers with an optimal 10 neurons having sigmoid
activation function, and the output layer having a ten neuron with a linear activation function was the
chosen configuration. The Nguyen-Widow method was used to initialize the weights. Evacuation time
predictions were made using this configuration (baseline case).

   The reasons to incorporate a physical model into a neural network are:
1. To make the network more robust. Even if confronted with a set of conditions very different from
   those encountered in the training data, the network should output realistic results.
2. To reduce dependence on training data, i.e. to enable the network to form a reasonable hypothesis,
   from small datasets.
3. To improve the prediction accuracy.

              Table 1 Experimental data for tank capacity 100 cc and pump capacity 3 cc.

                                                                   Speed
         Temperature
                                         400              1000               1500             2300
               50                        3.47             1.97                1.7             1.61
               90                        3.53             1.98                1.8              1.7
               120                       3.92             2.08                1.8             1.75
               150                       4.77             2.16               1.17             1.72


        Table 2 ANN result for tank capacity 100 cc and pump capacity 3 cc (hidden layers: 25)

       Temperature                                                Speed
      Evacuation time                 400                 1000              1500              2300
            50                      3.47912              1.7302            1.9189           1.60273
            90                      3.53071             1.98974            1.32223          1.67414
           120                      3.90548             2.18308            0.84523          1.73175
           150                      4.90085             1.78111            2.24074          1.67527

  The reported error is the mean square error over normalized evacuation time values. It is always the
test error, unless otherwise mentioned. It was noticed from error plots that most of the error occurred



                                                   97
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN
0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME

over the -0.2396 region (Fig.2). The other regions had much smaller errors and this error were therefore
chosen for comparison with the three new methods.




                                          Fig.2 Error histogram

       The mean square error of the model output to the target output is a typical measure of neural
network performance. However, it was found that there are practical difficulties in establishing
acceptance criteria for the mean square error. Therefore a normalised version of the mean square error
was implemented. This normalised mean square error used the nearer specification limit concept that
was modified to encompass the definition of an acceptable percentage error level. Here, the acceptable
error was equated to the typical level of propagated error that one would expect from the
instrumentation measuring the engine performance. This was consistent with the idea that it is
reasonable not to expect a higher standard of inference using the model than one could expect from
direct measurement of the engine performance.

  The performance obtained during the training are
  Performance = 0.1601
  trainPerformance = 8.4504e-008
  valPerformance = 0.4123
  testPerformance = 0.2283

       During training, the progress is constantly updated in the training window. Of most interest are
the performance, the magnitude of the gradient of performance and the number of validation checks.
The magnitude of the gradient and the number of validation checks are used to terminate the training.
The gradient will become very small as the training reaches a minimum of the performance. If the
magnitude of the gradient is less than 1e-5, the training will stop (Fig.3). This limit can be adjusted by
setting the parameter net.trainParam.min_grad. The number of validation checks represents the number
of successive iterations that the validation performance fails to decrease. If this number reaches 6 (the
default value), the training will stop.




                                                   98
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN
0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME




                                            Fig.3 Gradient plot

  The performance plot (Fig.4) shows the value of the performance function versus the iteration
number (epochs). It plots training, validation and test performances. The best validation performance is
0.17081 at epoch1.




                                           Fig.4 Performance plot

  The training state plot shows the progress of other training variables, such as the gradient magnitude,
the number of validation checks, etc (Fig.5). The error histogram plot shows the distribution of the
network errors. The regression plot shows a regression between network outputs and network targets.




                                      Fig.5 Training regeression plots

       The three axes represent the training, validation and testing data. The dashed line in each axis
represents the perfect result – outputs = targets. The solid line represents the best fit linear regression
line between outputs and targets. The R value is an indication of the relationship between the outputs
and targets. If R = 1, this indicates that there is an exact linear relationship between outputs and targets.

                                                     99
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN
0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME

If R is close to zero, then there is no linear relationship between outputs and targets. For this example,
the training data indicates a good fit. The validation and test results also show R values that greater than
0.9. The scatter plot is helpful in showing that certain data points have poor fits. Here in this R at
training, validation, test and with all the three are 0.083294, 0.13655, 0.80023 and 0.080557
respectively.

VI. CONCLUSION

       From the results obtained from the above Levenberg-Marquardt’s algorithm, it can be concluded
that the above algorithm works quite satisfactorily in optimizing the evacuation time in automotive
engines. The above optimization has been validated and found to be accurate to 5% level. The deviation
of NN optimized values were also found within 5%, when compared with experimental results.

VII. ACKNOWLEDGEMENT

      I wish to acknowledge Mr. J. Suresh Kumar, Deputy General Manager of UCAL Fuel Systems
Ltd, Chennai for his help in conducting the experiments and generating the data set to do this project
and validate the same in their prototype.

REFERENCES

[1] Indranil Brahma, Yongsheng He and Christopher J. Rutland, Improvement of Neural Network Accuracy for
    Engine Simulations, SAE Paper 2003-01-3227
[2] He, Y. and Rutland, C.J., “Application of Artificial Neural Network for Integration of Advanced Engine
    Simulation Methods”, Proceedings of the 2000 Fall Technical Conference of the ASME Internal Combustion
    Engine Division, ICE-Vol.35-1, 53-64, Paper No. 2000-ICE-304, 2000
[3] Chambers, A., Fitch, R. K., Halliday, B. S., “Basic Vacuum Technology,” ISBN 0-75-030495-2, 1998.
[4] Nagendiran, S., Sivanantham, R., and Kumar, J.,“Improvement of the Performance of Cam-Operated Vacuum
    Pump for Multi Jet Diesel Engine,” SAE Technical Paper 2009-01-1462, 2009, doi:10.4271/2009-01-1462.
[5] Nagendiran S R, Arun Subramanian, J Suresh kumar and Ramalingam Sivanantham Designing of Automotive
    Vacuum Pumps - Development of Mathematical Model for Critical Parameters and Optimization using
    Artificial Neural Networks, SAE Paper No.2012-01-0779K. Madsen, H. Nielsen, and O. Tingleff. Methods for
    Non-Linear Least Squares Problems. Technical University of Denmark, 2004. Lecture notes, available at
    http://www.imm.dtu.dk/courses/02611/nllsq.pdf.
[6] Manolis I.A. Lourakis and Antonis A. Argyros, Is Levenberg-Marquardt the Most Efficient Optimization
    Algorithm for Implementing Bundle Adjustment? Proceedings of the Tenth IEEE International Conference on
    Computer Vision (ICCV’05), IEEE Computer Society
[7] J. Dennis and R. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations.
    Classics in Applied Mathematics. SIAM Publications, Philadelphia, 1996.
[8] Indranil Brahma, Yongsheng He and Christopher J. Rutland, Improvement of Neural Network Accuracy for
    Engine Simulations SAE Paper 2003-01-3227
[9] Hagan, M.T. and Menjaj, M.B., “Training Feedforward Networks with the Marquardt Algorithm”, IEEE
    Transactions on Neural Networks, Vol. 5, No. 6, pp.989-993, 1994.
[10] Pallavi.H.Agarwal, Prof.Dr.P.M.George and Prof.Dr.L.M.Manocha, “Comparison Of Neural Network
    Models On Material Removal Rate Of C-Sic” International Journal Of Design And Manufacturing Technology
    (IJDMT) Volume 3, Issue 1, 2012, pp. 1 – 10, ISSN Print: 0976 – 6995, ISSN Online: 0976 – 7002
[11] Dharmendra Kumar Singh, Dr.Moushmi Kar And Dr.A.S.Zadgaonkar, “Analysis Of Generated Harmonics
    Due To Transformer Load On Power System Using Artificial Neural Network” International Journal of
    Electrical Engineering & Technology (IJEET) Volume 4, Issue 1, 2013, pp. 81 – 90, ISSN PRINT: 0976-6545,
    ISSN ONLINE: 0976-6553.




                                                   100

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:5/1/2013
language:
pages:8