# Generalized Algorithm To Perform Short Term Load Forecasting

Document Sample

```					Generalized Algorithm To Perform Short Term Load Forecasting

1   Introduction
Electric power demand has been growing exponentially in the recent past. This
is credited to the development of industries and growth of population in the urban
and the rural areas. This growth has led tomore random variations in the daily load
consumption pattern thus making the operation of the power system a more
cumbersome process. Short-term load forecasting also becomes difficult and hence
an accurate and robust methodology is the need of the hour.Short term load
forecasting helps the utility in operational planning, unit commitment, maintenance
scheduling and also in the allotment of spinning reserves. The utility, apart from all
the operational benefits, will also be in a position to bring down their revenue
losses. This has gained more importance with the advent of open access in the
deregulated electricity market.

Traditionally, short term load forecasting has been carried out using time
series analysis such as Box and Jenkins’ method, ARIMA model and regression
analysis. In this type of analysis, predicted load demand is modeled as a function
of previous historical loads. Also artificial intelligence techniques like genetic
algorithm, neural network, expert systems, fuzzy logic have been employed for
short term load forecasting.

The above mentioned methods focused on developing different strategies and
aimed at solving the problem by presenting the data differently. But they did not
emphasize on the neural network design for arriving at the optimal solution. The
neural network architecture was selected on a random basis and hence it cannot be
concluded that the solution obtained is optimal.
1.2 Proposed Method
The algorithm proposed in ourproject concentrates more on network
optimization by predicting the number of neurons in the hidden layer and also the
best suited training algorithm. The proposed algorithm adopts a two-step approach
wherein the output of the first step gives the optimal training algorithm and the
corresponding network size for optimal results. The next step involves the usage
of the best training algorithm for the give dataset and also incorporating the
number of neurons for optimal results.

The above mentioned attributes of the artificial neural networkobtained from
the first step of the algorithm will be used for the implementation of Short-term
Load Forecasting. The neural network, after assigning to it random weights and
biases, is trained with sufficient amount of data and is simulated with the best
training algorithm.

The Short-term load forecasting results that are obtained from the proposed
algorithm are validated by comparing the results with the actual load curves. This
is done by forecasting for Tamil Nadu state for the year 2011.

The algorithm proposed paves way for obtaining optimal results for short-
term load forecasting using artificial neural network. The following chapters will
deal with the different aspects involved in our project.

2.1 Introduction
Load forecasting is defined as the prediction of electrical load demand on the
power station at a particular time in future. Load forecasting helps an electric
utility to make important decisions including decisions on purchasing and
generating electric power, load switching, and infrastructure development. Load
forecasts are of extreme importance for energy suppliers,financial institutions and
other participants in electric energy generation, transmission, distribution and
market.Load forecasting has always been critical for taking planning and
operational decision conducted by utility companies. However, with the
deregulation of the energy industries, Load forecasting has become even more
important. With supply and demand fluctuating and the changes of weather
conditions and energy prices increasing by a factor of ten or more during peak
situations, load forecasting is vitally important for utilities.

2.2 Types of Load Forecasting
Load forecasting can be divided into three categories

 Short-Term Load Forecasting
 Medium-Term Load Forecasting
 Long-Term Load Forecasting

Short-term forecasts are usually done for a period ranging from one hour to one
week. Short-term load forecasting can help to estimate load flows and to make
decisions that can prevent overloading of the power system. Timely
implementations of such decisions lead to the improvement of network reliability
and to the reduced occurrences of equipment failures and blackouts.Various
functions of the power plant engineer like unit commitment, operational planning,
power purchase, etc., are made based on the Short-term load forecasting results.

Medium-term forecasts are carried out for a period usually ranging from a week
to a year. Medium-term forecasting serves the purposes of power system planning
and operation. Decisions that have to be made in planning and operation are solely
dependent on the results that are arrived from Medium-term load forecasting.

Long-term forecasts are done for a period longer than a year. Long-term load
forecasting results help in the future commissioning of generating stations and also
in determining the capacity of each generating unit.

2.3 Factors affecting Load Forecasting
For short-term load forecasting several factors should be considered, such as
time factors, weather data, and possible customers’ classes. The medium- and
long-term forecasts take into account the historical load and weather data, the
number of customers in different categories, the appliances in the area and their
characteristics including age, the economic and demographic data and their
forecasts, the appliance sales data, and other factors.

Weather conditions also influence the load. In fact, forecasted weather
parameters are the most important factors in short-term load forecasts. Various
weather variables could be considered for load forecasting. Temperature and
humidity are the most commonly used load predictors.

Historical load data such as load at the previous hour, load at the same hour in
the previous week and also seasonal components etc. influence the forecasting
results. Hence load forecasting algorithm should consider the aforesaid factors so
that it could perform accurately and reliably.
2.4 Methods for load forecasting
Over the last few decades a number of forecasting methods have been
developed. Two of the methods namely end-use and econometric approaches are
broadly used for medium and long-term forecasting.

A variety of methodsare used for short-term forecasting. The development,
improvements, and investigation of the appropriate mathematical tools will lead to
the development of more accurate load forecasting techniques. Some of the
methods used are

 Regression analysis
 Stochastic time series(AR ,ARIMA models)
 Neural Network
 Fuzzy Logic
 Knowledge-based Expert Systems

2.5 Conclusion
In this project, Short-term load forecasting is performed using Artificial
Neural Network out of the different methods mentioned above. The attributes of
the Artificial Neural Network which is used for performing Short-term Load
Forecasting is explained in detail in the next section.
3    Artificial Neural Networks

3.1 Introduction
Artificial neural network is a massively distributed processor made up of simple
processing units, which has a natural propensity for storing experiential knowledge
and making it available for use. It resembles the brain in two aspects

1. Knowledge is acquired by the network from its environment through a
learning process
2. Interneuron connection strengths, known as synaptic weights, are used to
store the acquired knowledge

The procedure used to perform the learning process is called a learning
algorithm, the function of which is to modify the synaptic weights of the network
in an orderly fashion to attain a desired design objective.

3.2 Basic Neuron Model
A neuron is an information processing unit that is fundamental to the operation
of a neural network. The neural network model has been shown in Fig.1 which
forms the basis of an artificial neural network.Here we identify three basic
elements of a neuronal model

i. A set of synapses, each of which is characterized by a weight of its own.
Specifically, a signal xj at the input of the synapse j connected to neuron k
is multiplied by the synaptic weight wkj. The first subscript refers to the
neuron in question and the second subscript refers to the input end of the
synapse to which the weight refers.
ii. An adder for summing up the input signals, weighted by the respective
synapses of the neuron.
iii. An activation function for limiting the amplitude of the output of a
neuron. The activation function is also referred to as the squashing
function in that it squashes the permissible amplitude range of the output
signal to some finite value. Typically, the normalized amplitude range of
the output of a neuron is written as the closed unit interval [0, 1] or
alternatively [-1, 1].In mathematical terms, we may describe a neuron k by
writing the following pair of equations.

m

uk =         wik xik
i=1

And

yk = φ(uk +bk )

where x1,x2,…..,xm are the input signals; WPD,WI1,…..,WIk are the synaptic weights
of neuron k; uk is the linear combiner output due to the input signals; bk is the bias;
k is the activation function and yk is the output signal of the neuronk.

Fig. 1 Basic neuron model
3.3 Network Architecture
The manner in which the neurons of neural network are structured is intimately
linked with the learning algorithm used to train the network. We may therefore
speak of learning algorithms (rules) used in the design of neural networks as being
structured.

In general we may identify three fundamentally different classes of network
architectures:

1. Single layer feedforward networks
In a layered neural network the neurons are organized in the form of layers.
In a simple layered network, an input layer of source nodes that projects
onto an output layer neurons. It is also known as acyclic type. Here, “single
layer” refers to output layer of computation nodes (neurons).

2. Multilayer feedforward networks
This distinguishes itself by the presence of one or more hidden layers, whose
computation nodes are correspondingly called hidden neurons or hidden
units whose function is to intervene between the external input and the
network output in some useful manner. The source nodes in the input layer
of the network supply respective elements of the activation pattern (input
vector), which constitute the input signals applied to the neurons
(computation nodes) in the second layer (i.e. the first hidden layer). The
output signals of the second layer are used as inputs to the third layer, and so
on for the rest of the network. Typically the neurons in each layer of the
network have as their inputs the output signals of the preceding layer only.
The set of output signals of the neurons in the output (final) layer of the
network constitutes the overall response of the network to the activation
pattern, supplied by the source nodes in the input (first) layer.

3. Recurrent networks
A recurrent network distinguishes itself from a feedforward neural network
in that it has at least one feedback loop. A recurrent network may consist of
a single layer of neurons with each neuron feeding its output signal back to
the inputs of all the other neurons. The feedback loops involve the use of
particular branches composed of unit-delay elements which result in a
nonlinear dynamical behavior, assuming that the neural network contains
nonlinear units.

3.4 Training
Learning is a process by which the free parameters of a neural network are
adapted through a process of stimulation by the environment in which the network
is embedded. The type of learning is determined by the manner in which the
parameter changes takes place. A prescribed set of well-defined rules for solution
of a learning problem is called a training (learning) algorithm.

Once the network weights and biases are initialized, the network is ready for
training (learning). The multilayer feedforward network can be trained for function
approximation (nonlinear regression) or pattern recognition. The training process
requires a set of examples of proper network behavior—network inputs p and
target outputs t.

The process of training a neural network involves tuning the values of the
weights and biases of the network to optimize network performance, as defined by
the network performance function. The default performance function for
feedforward networks is mean square error (mse)—the average squared error
between the networks outputs a and the target outputs t.

Backpropagation is a gradient algorithm in which the network weights are
moved along the negative of the gradient of the performance function.

Backpropagation Algorithm

Step 1: Pick the synaptic weights and biases from a uniform distribution whose
mean is zero and whose variance is chosen to make the standard deviation
of the induced local fields of the neurons lie at the transition between the
linear and saturated parts of the sigmoid activation function.

Step 2: Present the network with an epoch of training examples. For each example
in the set, perform the sequence of forward and backward computation
described in step 3 and step 4.

Step 3: Compute the induced local fields and function signals of the network by
proceeding forward through the network, layer by layer.

Compute the error signal

�������� ���� = �������� ���� − �������� (����)

where�������� ���� is the desired response of the neuron j

and�������� (����) is the output of the neuron j after the forward computation

��������
Step 4: Compute the local gradient of the network i.e.                      for all the neurons
��������

present in the network. Adjust the synaptic weights of the network in the
layer L according to the generalized delta rule
��������
����
������������ ���� + 1 = ������������ ���� + ����            (����)������������ (����)
��������

Step 5: Repeat step 3 and step 4 i.e. forward and backward computations by
presenting new epochs of training examples until the stopping criteria is
met.

The following training algorithms which are variants of the backpropagation
algorithms are being used for the purpose of training. The variations in these
algorithms are contributed by the calculation of the gradient which differs in its
own way. The calculation of this gradient for each algorithm is explained as
follows

3.4.1 Gradient Descent Backpropagation
Gradient Descent Backpropagation is a network training function that
updates weight and bias values according to gradient descent. It can train any
network as long as its weight, net input, and transfer functions have derivative
functions.

Backpropagation is used to calculate derivatives of performance function
with respect to the weight and bias variables X. Each variable is adjusted according

��������
�������� = �������� ∗
������������������������

3.4.2 Gradient descent with momentum backpropagation
This training algorithm provides faster convergence than the previous
algorithm. Momentum allows a network to respond not only to the local gradient,
but also to recent trends in the error surface.
Backpropagation is used to calculate derivatives of performance function with
respect to the weight and bias variables. Each variable is adjusted according to
gradient descent with momentum

dX = mc*dXprev + lr*(1-mc)*dperf/dX

3.4.3 Gradient Descent with Adaptive learning rate
The performance of the gradient descent algorithm can be improved the learning
rate is allowed to change during the training process. An adaptive learning rate
attempts to keep the learning step size as large as possible keeping the learning
stable.This can train any network as long as its weight, net input, and transfer
functions have derivative functions.

Backpropagation is used to calculate derivatives of performance function
with respect to the weight and bias variables. Each variable is adjusted according

dX = lr*dperf/dX

3.4.4 Gradient descent with momentum and adaptive learning rate
backpropagation
This algorithm combines the adaptive learning rate with momentum training.

Backpropagation is used to calculate derivatives of performance function
with respect to the weight and bias variables. Each variable is adjusted according
to gradient descent with momentum

dX = mc*dXprev + lr*mc*dperf/dX
3.4.5 Resilient Backpropagation
The purpose of Resilient Backpropagation is to eliminate the harmful effects
of the magnitudes of the partial derivatives. Only the sign of the derivative is used
to determine the direction of the weight update.

Backpropagation is used to calculate derivatives of performance function
with respect to the weight and bias variables. Each variable is adjusted according
to the following

dX = deltaX.*sign (gX);

where the elements of deltaX are all initialized to del0, and gX is the
gradient. In each iteration the elements of deltaX are modified in the following
manner.

If an element of gX changes signsin successive iterations, then the
corresponding element of deltaX is decreased by deltadec. If an element of gX
maintains the same sign in successive iterations, then the corresponding element of
deltaX is increased by deltainc.

In most of the conjugate gradient algorithms, the step size is adjusted at each
step size. A search is made along the conjugate gradient direction to determine the
step size that minimizes the performance function along that line.

Each variable is adjusted according to the following:

X = X + a*dX

where dX is the search direction. The parameter a is selected to minimize the
performance along the search direction. The line search function is used to locate
the minimum point. The first search direction is the negative of the gradient of
performance. In succeeding iterations the search direction is computed from the
new gradient and the previous search direction, according to the formula

dX = -gX + dXold*Z

where gX is the gradient. The parameter Z can be computed in several
different ways. For the Fletcher-Reeves variation of conjugate gradient it is
computed according to

Z = normnew2/norm2

where norm2 is the norm square of the previous gradient and normnew2 is
the norm square of the current gradient.

3.4.7 Conjugate Gradient Algorithm with Polak-Ribiére update
The storage requirements for Polak-Ribiére are slightly higher than for

Each variable is adjusted according to the following:

X = X + a*dX

where dX is the search direction. The parameter a is selected to minimize the
performance along the search direction. The line search function is used to locate
the minimum point. The first search direction is the negative of the gradient of
performance. In succeeding iterations the search direction is computed from the
new gradient and the previous search direction according to the formula

dX = -gX + dXold*Z

The parameter Z can be computed in several different ways. For the Polak-
Ribiére variation of conjugate gradient, it is computed according to the formula
Z = ((gX - gXold)'*gX)/norm2;

where norm2 is the norm square of the previous gradient, and gXold is the
gradient on the previous iteration.

3.4.8 Conjugate Gradient Algorithm with Powell-Beale Restarts
In this algorithm, each variable is adjusted according to the following:

X = X + a*dX;

where dX is the search direction. The parameter a is selected to minimize the
performance along the search direction. The line search function is used to locate
the minimum point. The first search direction is the negative of the gradient of
performance. In succeeding iterations the search direction is computed from the
new gradient and the previous search direction according to the formula

dX = -gX + dXold*Z;

where gX is the gradient. The parameter Z can be computed in several
different ways. For the Polak-Ribiére variation of conjugate gradient, it is
computed according to

Z = ((gX - gXold)'*gX)/norm2;

where norm2 is the norm square of the previous gradient, and gXold is the
gradient on the previous iteration.

3.4.9 Scaled Conjugate Gradient
The line search used in other conjugate gradient algorithms is
computationally expensive. The scaled conjugate gradient algorithm is aimed at
remove this problem.
This algorithm can train any network as long as its weight, net input, and
transfer functions have derivative functions. Backpropagation is used to calculate
derivatives of performance with respect to the weight and bias variables X.

3.4.10 BFGS quasi-Newton backpropagation
Newton’s method converges faster than conjugate gradient methods.Each
variable is adjusted according to the following:

X = X + a*dX;

where dX is the search direction. The parameter a is selected to minimize the
performance along the search direction. The line search function is used to locate
the minimum point. The first search direction is the negative of the gradient of
performance. In succeeding iterations the search direction is computed according
to the following formula:

dX = -H\gX;

where gX is the gradient and H is an approximate Hessian matrix.

3.4.11 One Step Secant Algorithm
This algorithm does not compute the Hessian matrix as in BFGS algorithm.
It assumes that in each iteration, the previous Hessian was an identity matrix. Each
variable is adjusted according to the following

X = X + a*dX;

where dX is the search direction. The parameter a is selected to minimize the
performance along the search direction. The line search function is used to locate
the minimum point. The first search direction is the negative of the gradient of
performance. In succeeding iterations the search direction is computed from the
new gradient and the previous steps and gradients, according to the following
formula:

dX = -gX + Ac*Xstep + Bc*dgX;

where gX is the gradient, Xstep is the change in the weights on the previous
iteration, and dgX is the change in the gradient from the last iteration.

3.4.12 Levenberg-Marquardt Algorithm
Backpropagation is used to calculate the Jacobian jX of performance
function with respect to the weight and bias variables X. Each variable is adjusted
according to Levenberg-Marquardt,

jj = jX * jX

je = jX * E

dX = - (jj+I*mu) \ je

where E is all errors and I is the identity matrix.

The adaptive value mu is increased by muinc until the change above results in
a reduced performance value. The change is then made to the network and mu is
decreased by mudec.

The parameter mem_reduc indicates how to use memory and speed to
calculate the Jacobian jX. Higher states continue to decrease the amount of
memory needed and increase training times.

3.4.13 Bayesian Regulation
Bayesian regularization minimizes a linear combination of squared errors
and weights. It also modifies the linear combination so that at the end of training
the resulting network has good generalization qualities
This Bayesian regularization takes place within the Levenberg-Marquardt
algorithm. Backpropagation is used to calculate the Jacobian jX of performance
function with respect to the weight and bias variables X. Each variable is adjusted
according to Levenberg-Marquardt,

jj = jX * jX

je = jX * E

dX = - (jj+I*mu) \ je

where E is all errors and I is the identity matrix.

The adaptive value mu is increased by muinc until the change shown above
results in a reduced performance value. The change is then made to the network,
and mu is decreased by mudec.

The parameter memreduc indicates how to use memory and speed to calculate
the Jacobian jX. If memreduc is 1, then the Levenberg-Marquardt algorithm runs the
fastest, but can require a lot of memory. Increasing memreduc to 2 cuts some of the
memory required by a factor of two, but slows Levenberg-Marquardt algorithm
somewhat. Higher values continue to decrease the amount of memory needed and
increase the training times.

3.5 Conclusion
A multilayer neural network is used for the implementation of short-term
load forecasting. The above mentioned training algorithms have been used for the
short-term load forecasting. The implementation of Short-term load forecasting
using Artificial Neural Network is elaborated in the next section.
4   Implementation of Artificial Neural Network for Short-term Load
Forecasting

4.1 Introduction
Load Forecasting can be performed using many methods like Regression
Analysis, stochastic time series models, Neural Network, Fuzzy Logic and
Knowledge-based Expert Systems. Of all these methods Neural Networks stands
out in the implementation of Load Forecasting. Its ability to recreate non-linear
relationships and its practice of parallel computing makes it far more favorable
than any other method. Because of parallel computing, the process is computed at a
faster pace than any other method. Also, any failure at one part of the process is
overlooked and the rest of the process continues towards the end result. The
implementation of neural network for the purpose of Short-term Load Forecasting
is discussed in the following sections.

4.2 Collection of Data
Before beginning the network initialization process, you first collect and
preparesample data. It is generally difficult to incorporate prior knowledge into
aneural network and therefore the network can only be as accurate as the datathat
is used to train the network.It is important that the data cover the range of inputs
for which the networkwill be used. Multilayer networks can be trained to
generalize well within therange of inputs for which they have been trained.
However, they do not havethe ability to accurately extrapolate beyond this range,
so it is important thatthe training data span the full range of the input space.After
the data is collected, there are two steps that need to beperformed before the data
are used to train the network: the data need to bepreprocessed, and they need to be
divided into subsets. The next two sectionsdescribe these two steps.
4.3 Pre-processing and Post-processing of Data
Neural network training can be made more efficient if certainpreprocessing
steps are performed on the network inputs and targets. The most common of
thesepreprocessing techniques are provided automatically when you create a
network, and they become partof the network object, so that whenever the network
is used, the data cominginto the network is preprocessed in the same way.

For example, in multilayer networks, sigmoid transfer functions are
generallyused in the hidden layers. These functions become essentially saturated
whenthe net input is greater than three. If this happens at thebeginning of the
training process, the gradients will be very small, and thenetwork training will be
very slow. In the first layer of the network, the netinput is a product of the input
times the weight plus the bias. If the input is very large, then the weight must be
very small in order to prevent the transferfunction from becoming saturated. It is
for this reason; it is standard practice to normalize the inputs before applying them
to the network.

Generally, the normalization step is applied to both the input vectors and
thetarget vectors in the data set. In this way, the network output always falls into
anormalized range. The network output can then be reverse transformed back into
the units of the original target data when the network is put touse in the field.It is
easiest to think of the neural network as having a preprocessing blockthat appears
between the input and the first layer of the network and apost-processing block that
appears between the last layer of the network and the output, as shown in the
Figure 4.1

Normalization is done by using functions such as mapminmax,mapstd,

processpca, fixunknowns and removeconstantrows. Usually mapminmax function
Pre-                              Neural                             Post-
Input                                                                                                Output
processing                        Network                            processing

Fig. 4.1 Processing of data

ispreferred for both in the input and the output side, though other functions may
also be used. Normalization is done between the limits 0.2 and 0.8 using the
mapminmax function which is given by the formula

�������� = (���������������� − ���������������� ) ∗ (�������� − ���������������� )/(���������������� − ���������������� ) + ����������������

4.4 Division of Data
When training multilayer networks, the general practice is to first divide the
data into three subsets. The first subset is the training set, which is used for
computing the gradient and updating the network weights and biases. The second
subset is the validation set. The error on the validation set is monitored during the
training process. The validation error normally decreases during the initial phase of
training, as does the training set error. However, when the network begins to
overfit the data, the error on the validation set typically begins to rise. The network
weights and biases are saved at the minimum of the validation set error. The test
set error is not used during training, but it is used to compare different models. It is
also useful to plot the test set error during the training process. If the error on the
test set reaches a minimum at a significantly different iteration number than the
validation set error, this might indicate a poor division of the data set.

There are four functions provided for dividing data into training, validation
and test sets. They are dividerand (divides the data randomly), divideblock (divides
the data into contiguous blocks), divideint (divides data into an interleaved
selection), and divideind (divides the data by index). The data division is normally
performed automatically when you train the network.

In our project, dividerand is used for the division of data. When
net.divideFcn is set to 'dividerand' (the default) the data is randomly divided into
the three subsets using the division parameters net.divideParam.trainRatio,
net.divideParam.valRatio and net.divideParam.testRatio. The fraction of data that
is placed inthe training set is trainRatio/(trainRatio+valRatio+testRatio), with
asimilar formula for the other two sets. The default ratios for training, testingand
validationare 0.7, 0.15 and 0.15 respectively.

4.5 Creation and Initialization of the Network

4.5.1 Creation of Network
After the collection and preparation of data, the next step is to create the
network object. To create a custom network, start with an empty network and set
its properties as desired

net = network;

The above statement creates an empty network and its properties are
modified as follows. The first two properties that have to be set are the number of
inputs and the number of layers the network needs. net.numInputs and
net.numLayers allows the user to select the same parameters respectively.

net.numInputs = 1;

net.numLayers = 2;
Now the network will have one input layer and 2 layers other than the input
layer. We should designate the number of neurons in each layer. The input layer
will have only eight neuronswhich is equal to the number of input variables.
Initially the number of neurons in the hidden layer is assigned to be one. This is
done by the following statements

net.inputs{1}.size = 8;

net.layers{1}.size = 1;

net.layers{2}.size = 1;

The hidden layer is the first of the two layers mentioned and the output layer
will form the second and the last layer. Next step involves the connection of layers.
The inputs should be connected to the input layer. This is done by the command

net.inputConnect(1) = 1;

Similarly, the connection to the output layer is established by
net.outputConnect(i) where i refers to the ith layer. The connection between the
hidden layer and the output layer is given by net.layerConnect(j,i) where the
outputs of the ith layer is connected to the jth layer. These two connections are
expressed as follows.

net.outputConnect(2) = 1;

net.layerConnect(2,1) = 1;

This brings to an end the discussion on the creation of the network and the
modification of the network parameters.
4.5.2 Initialization of Network
Before training a feedforward network, we must initialize the weights
andbiases. The configure command (init) automatically initializes the weights,
butyou might want to reinitialize them.This function takes a network object as
input and returns a network objectwith all weights and biases initialized. Here is
how a network is initialized (or reinitialized)

net.biasConnect = [1;1];

net = init(net);

The first statement indicates the attachment of biases to both the layers of the
network. The network is now initialized with random weights and biases and is
ready to be trained.

4.6 Training the Network
Once the network weights and biases are initialized, the network is ready for
training. The multilayer feedforward network can be trained for function
approximation (nonlinear regression) or pattern recognition. The training process
requires a set of examples of proper network behavior—network inputs p and
target outputs t.

The process of training a neural network involves tuning the values of the
weights and biases of the network to optimize network performance, as defined by
the network performance function net.performFcn. The default performance
function for feedforward networks is mean square error (mse) but here we use the
performance function mean absolute error (mae).

There are two different ways in which training can be implemented:
incremental mode and batch mode. In incremental mode, the gradient is computed
and the weights are updated after each input is applied to the network. In batch
mode, all the inputs in the training set are applied to the network before the
weights are updated.

For        training   multilayer    feedforward       networks,         any   standard
numericaloptimizationalgorithm can be used to optimize the performance function,
butthere are a few key ones that have shown excellent performance for
neuralnetwork training. These optimization methods use either the gradient of
thenetwork performance with respect to the network weights, or the Jacobianof the
network errors with respect to the weights.

The gradient and the Jacobian are calculated using a technique calledthe
backpropagation algorithm, which involves performing computationsbackward
through the network. For the training purpose, Backpropagation training algorithm
is used. Its variants like Levenberg-Marquardt, Quasi Newton algorithms,
Conjugate Gradient, Scaled Conjugate Gradient methods are being used for
training. These algorithms have already been discussed in detail in the previous
chapter.

4.7 Conclusion
This chapter describes the need of Artificial Neural Network for the
implementation of Short-term Load Forecasting and also puts forth the key aspects
that are to be considered during the implementation of Artificial Neural Network.
The Algorithm that is being proposed for Short-term Load Forecasting using
Artificial Neural Network is discussed in detail in the next section.
6 Development of Algorithm for Artificial Neural Network based Short-term

6.1 Introduction

An algorithm is developed to overcome the problem of random selection of
training algorithm and the neurons in the hidden layer, which is a problem that is
being faced by the traditional methods mentioned in the previous sections. The
next two paragraphs discuss about the need for concentrating on the training
algorithm and the hidden layer size.

There are numerous training algorithms which are variants of the
backpropagation algorithm. Each algorithm performs best for different
functionalities. The use of a random training algorithm is very much questionable
as some other algorithm can provide better results with the same dataset that is
being used. Hence the selection of training algorithm is very much necessary to
drive out the ambiguity involved in the process.

The selection of number of neurons in the hidden layer can also affect the
performance of the network. A random number will just provide a result not sure
of whether it is optimal or not. The number of neurons in the hidden layer should
not be very less as it would it not contribute to the generalization of non-linear
relationship. Also large number of neurons in the hidden layer will increase the
amount of computational time, hence making the process ineffective.

The proposed algorithm is aimed at achieving optimal results for short-term
load forecasting, keeping in mind the computational efficiency of the entire
process. These two attributes of the network are considered in the algorithm for
providing improvement in the performance of the neural network.
The development of algorithm is a two-step process. The first step involves
the training and simulation of the network for various training algorithms and for
various neuron sizes. This helps in obtaining the performance of various training
algorithms at different hidden layer sizes for the same dataset. The second step
involves the use the selected training algorithm and the optimal number of neurons
for obtaining the result of short-term load forecasting.

6.2 Algorithm

Step 1: Start

Step 2: Enter the number of input variables “a”

Step 3: Initialize i = 1

Step 4: Read the data of the input variable

Step 5: If i = a, go to step 6 else increment i by one and go to step 4

Step 6: Initialize “s” as the number of values in each variables

Step 7: Initialize i = 1

Step 8: Initialize j = 1

Step 9: Calculate the change in demand using the formula

����      ����−1   ����−2   ����−2
∆�������� = (�������� − �������� )/��������

����
Step 10: Normalize ∆�������� between 0.2 and 0.8 using the formula

�������� = (���������������� − ���������������� ) ∗ (�������� − ���������������� )/(���������������� − ���������������� ) + ����������������

Step 11: If j = s, go to step 12 else increment j by 1 and go to step 9
Step 12: Increment i by 1

Step 13: Initialize j = 1

Step 14: Calculate the change in the input variable using the formula

∆�������� = ������������ − ������������ −1

Step 15: Normalize ∆�������� between 0.2 and 0.8 using the formula

�������� = (���������������� − ���������������� ) ∗ (�������� − ���������������� )/(���������������� − ���������������� ) + ����������������

Step 16: If j = s, go to next step else increment j by 1 and go to step 14

Step 17: If I = a, go to next step else increment i by 1 and go to step 13

Step 18: Set layer {1}.size = a

Step 19: Set TrainFcn = 1

Step 20: Create the network and initialize weights and biases randomly

Step 21: Initialize n = 1

Step 22: Train and simulate the network

Step 23: Calculate Mean Absolute Percentage Error (MAPE) using the formula

1           ������������������������ −������������������������������������
���������������� =            ∗∑                                        ∗ 100 %
����                    ������������������������

Step 24: If n = 50, go to step 26 else go to next step.

Step 25: If n = 5*a, go to next step, else increment n by 1 and go to step 22

Step 26: Store the MAPE calculated
Step 27: If TrainFcn = 13 go to next step, else increment TrainFcn by 1 and go to
step 20

Step 28: Display BTF = the training algorithm corresponding to minimum MAPE
and N = neurons corresponding to the minimum MAPE

Step 30: Set TrainFcn = BTF

Step 31: Set Neurons = N

Step 32: Train and Simulate the network

Step 33: Print MAPE and Predicted value

Step 34: Stop
6.3 Flowchart

Start

Enter the number of inputs “a”

i=1

No
Is i =a                       i = i+1

Yes
Enter s = Number of input values in each variables

i=1

j=1

����      ����−1   ����−2   ����−2
∆�������� = (�������� − �������� )/��������

�������� = (���������������� − ���������������� ) ∗ (�������� − ���������������� )/(���������������� − ���������������� ) + ����������������

No
Is j =s                        j = j +1

Yes
A
A

i = i+1

j=1

∆�������� = ������������ − ������������−1

�������� = (���������������� − ���������������� ) ∗ (�������� − ���������������� )/(���������������� − ���������������� ) + ����������������

No
Is j =s                        j = j +1

Yes
No
Is i =a                        i = i +1

Yes
Set layer {1}.size = a

Set training algorithm = 1

C
Create network and initialize random weights and biases

n=1

B
B

Train and Simulate the network

1         ������������������������ −������������������������������������
Calculate ���������������� =        ∗∑                                      ∗ 100 %
����                  ������������������������

Yes
Is n =50

No
No
Is n =5*a                                   i = i +1
Yes
Store MAPE
No
Is TrainFcn = 13                                      TrainFcn++   B

Yes
Display BTF and N

Set TrainFcn = BTF and Neurons = N

Train and Simulate the network

Display MAPE and
Predicted value

Stop
5   Test Case and Analysis of Results

5.1 Introduction
The implementation of the proposed algorithm is based on the data obtained
from the Southern Regional Load Dispatch Centre (SRLDC) and also the Weather
data that was obtained from Wunderground. The data that is being used and also
the analysis of results obtained from the implementation of Neural Network is
discussed in detail in this chapter.

5.2 Test Data
For implementing Load Forecasting using Artificial Neural Network
sufficient amount of data that has some relationship with the final output, load
demand in this case, was needed for the purpose of training the Neural Network.
The peak load demand data for the state of Tamil Nadu is obtained from the
Southern Regional Load Dispatch Centre (SRLDC) for the year 2010 and 2011.
Since more inputs can improve the performance of the neural network, weather
data that included information like Temperature, Humidity, Dew Point and Wind
speed was obtained from www.wunderground.com. The peak demand data
obtained consisted of one value of the peak demand for one day. The weather data
included maximum temperature, minimum temperature, maximum humidity,
minimum humidity, dew point, wind speed.

5.3 Treatment of Data
The data to be sent into the neural network is preprocessed. The peak load
demand is preprocessed using the formula

����      ����−1   ����−2   ����−2
∆�������� = (�������� − �������� )/��������

Other data that includes previous week peak demand for the same day
(calculated from peak demand itself) and also the weather information for the day
such as maximum temperature, minimum temperature, maximum humidity,
minimum humidity, dew point and wind speed are preprocessed using the formula

∆�������� = ������������ − ������������−1

Once the preprocessing of data is done, it should be normalized between
certain limits before they are fed into the neural network. For normalization
mapminmax function is used which involves the formula

�������� = (���������������� − ���������������� ) ∗ (�������� − ���������������� )/(���������������� − ���������������� ) + ����������������

Saturation of input variables can be avoided by using appropriate limits for
each activation function used in the normalization process. The activation function
used on the input side is logsig transfer function and the one that is used on the
output side is purelin transfer function. These activation functions have been
explained in the previous chapters.

5.4 Training and Simulation of the Network
After the treatment of data, it is fed into the neural network. The network is
trained with the data with the assigned training algorithm and neurons in the
hidden layer and simulation is also done simultaneously. The process is repeated
for all training algorithm as the size of the hidden layer varies from i to 5*i where i
is the number of input variables. The values of Mean Absolute Percentage Error
(MAPE) are calculated at each simulation and the results are stored.

The training algorithm that corresponds to minimum MAPE is selected as
the best training algorithm (BTF) and the neurons corresponding to the same are
selected as the optimal number of neurons. The new training and the optimal
number neurons are selected. Training and simulation are performed with the new
parameters.
Peak Demand

Artificial Neural Network

Post-Processing Stage
Pre-Processing Stage
Previous Week Demand
Max. Temperature
Min. Temperature
Max. Humidity
Max. Humidity
Dew point
Wind Speed

Fig 7.1 Implementation of Artificial Neural Network

5.5 Results
Mean Absolute Percentage Error is taken as the measurement of accuracy
and hence a tabulation of MAPE for all the training algorithms for neurons ranging
from 8 to 40 is shown in table 5.1(a) and (b).

The network performs differently for each training algorithm and arrives at
different results. The variation of the MAPE for each is shown graphically in the
figure 7.2. From the 13 algorithms implemented in the neural network Resilient
Backpropagation (Trainrp) produced the best results for this particular dataset. The
results obtained are shown below

Training algorithm: Resilient Backpropagation

Number of neurons: 22

MAPE: 3.1889
Neurons   traingd traingda traingdm traingdx traincgb traincgp
8   3.5566 3.6142       3.6446 3.6542 4.2308 3.5538
9   3.6765 3.6773       4.9320 3.6950 3.7074 3.6926
10   3.3942 3.4636       3.5834 3.4933 3.4077 3.3898
11   3.7281 3.9953       4.4414 3.7557 3.7298 3.7357
12   3.6878 3.7062       4.3873 3.7035 3.7120 3.6050
13   3.6011 3.6626       3.6806 3.6246 3.4646 3.6141
14   3.7652 3.8461       4.1154 3.7971 3.8231 3.7647
15   4.1631 4.6565       4.5875 4.4307 4.6440 3.6785
16   3.8235 3.8845       4.1297 3.8503 3.8665 3.8031
17   4.3082 4.7334       5.7101 4.3180 4.3517 3.8279
18   4.0229 4.1954       4.8219 4.0568 3.7689 3.5565
19   4.3741 4.7705       4.7745 4.4445 3.6634 4.1703
20   3.7945 3.9384       5.7914 3.8190 3.6428 3.7795
21   4.1720 3.9922       4.0701 3.9848 4.2744 3.8421
22   4.1954 4.1806       5.2781 4.1830 3.7481 3.5589
23   4.8444 4.6714       5.2312 4.6697 3.6607 4.7133
24   4.4193 4.0606       4.2972 3.9868 3.9832 3.5944
25   4.9920 4.2516       4.6786 4.2094 4.5154 3.8038
26   3.9280 4.2385       4.4956 3.9376 3.8658 3.8714
27   4.5996 4.0278       5.9912 3.9121 3.8988 3.8919
28   4.8294 4.1418       4.6231 3.8996 4.4652 3.9469
29   5.1908 4.7105       5.1845 4.4469 4.3724 4.3734
30   4.9180 4.8041       5.3129 4.3819 3.7792 4.0353
31   5.2324 4.6641       4.6601 4.2932 3.9074 3.7326
32   4.8248 4.6545       4.5299 4.4363 3.9348 5.1736
33   4.6442 3.9915       5.5154 3.9617 4.6660 3.9471
34   4.4819 5.3709       5.8031 4.4482 4.4374 4.0862
35   4.8206 4.3382       4.4090 4.0333 3.7811 3.8015
36   5.9694 5.0785       4.9501 4.8601 5.0868 4.9670
37   5.1436 4.5973       5.2020 4.3688 3.6085 4.3352
38   4.6616 4.4122       4.6985 4.2381 3.8670 5.1917
39   4.9155 4.3986       4.9893 4.4027 4.6731 3.7218
40   5.5223 4.5349       4.6636 4.4342 3.7643 4.3519

Table 7.1(a) Variation of MAPE with the number of neurons
Neurons traincgf trainscg trainbfg trainoss trainrp trainbr trainlm
8 4.7305 3.5883 3.4412 3.5584 3.5222 3.3565 3.4780
9 3.9929 3.7028 3.7031 3.6962 3.6777 3.4012 3.5798
10 3.3918 3.3974 3.6649 3.3967 3.6146 3.3516 3.5709
11 3.8604 3.7508 3.7455 3.7815 3.5758 3.3224 3.5955
12 4.0028 3.6876 3.6912 3.6802 3.5802 3.2661 3.5821
13 3.8865 3.6096 3.5603 3.6519 3.4322 3.3417 3.5565
14 3.7632 3.8586 3.9183 3.7674 3.5679 3.2754 3.4304
15 5.5560 4.4004 3.6911 3.8419 3.6346 3.3654 3.7166
16 3.7545 3.8280 3.7086 3.8071 4.0016 3.3028 3.3985
17 3.8708 4.3831 3.9054 4.1713 3.9973 3.3168 3.6861
18 4.0418 3.8278 3.7321 3.6377 3.5565 3.2964 3.4083
19 4.1743 4.4441 3.5869 3.6464 3.5954 3.3501 3.5942
20 3.7802 3.7764 3.7031 3.7956 4.0947 3.3865 3.4476
21 3.8287 3.8554 3.7748 3.9916 4.2762 3.3504 3.6972
22 3.6790 3.9917 3.6842 3.6223 3.7509 3.1889 3.6890
23 4.6951 4.6795 3.8679 4.4104 3.6749 3.3988 3.7893
24 3.9901 3.9931 3.6208 3.9717 3.4682 3.3668 3.4117
25 3.4732 4.1719 3.6072 4.2780 4.2328 3.3750 4.1287
26 3.8884 3.8687 3.9019 3.8548 3.9721 3.2984 3.7376
27 3.8959 3.8949 3.7717 3.8754 3.4286 3.5958 3.3967
28 4.0499 3.8996 3.7976 3.8311 3.6527 3.3411 3.9200
29 4.3707 4.3106 3.9719 4.3626 3.7276 3.3721 3.6024
30 4.5084 4.1593 3.8026 3.8609 4.0198 3.3363 3.9529
31 5.3186 3.8768 3.7899 3.8667 3.5337 3.3318 3.4641
32 4.4387 4.4367 4.3636 4.4299 3.5577 3.3407 3.5315
33 3.9484 4.0093 3.6347 4.1600 3.8811 3.4154 3.8207
34 3.4397 4.3936 3.6877 4.5409 4.2813 3.3603 4.0805
35 4.3266 4.0263 3.7373 3.8333 3.8174 3.2435 3.7092
36 5.5198 4.9339 3.7556 4.1364 4.1922 3.3147 4.0773
37 4.3309 4.3346 3.7799 3.5802 3.6243 3.2823 3.6581
38 5.6628 3.6954 3.6069 4.1825 4.5596 3.4827 3.5465
39 3.7005 4.3383 3.8234 4.3441 3.9268 3.3728 4.4555
40 4.4426 4.3850 4.0083 3.9271 3.8938 3.4401 3.9163

Table 7.1(b) Variation of MAPE with the number of neurons
6

traingd
traingda
traingdm
5
traingdx
% MAPE

traincgb
traincgp
traincgf
4
trainscg
trainbfg
trainoss
trainrp
3
8   10   12   14   16   18   20     22   24   26      28   30   32   34   36   38   40   trainbr
trainlm
Number of Neurons

Fig. 7.2 Variation of MAPE with the number of neurons

Fig. 7.3 Actual Demand vs. Predicted Demand
5.6 Conclusion
The implementation of neural network for short-term load forecasting is done
using 13 training algorithms and the results depict that, for this particular dataset
trainrp proves to be the best training algorithm providing the most optimal result
and better computational efficiency.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 43 posted: 8/12/2012 language: English pages: 39