DATA CLASSIFICATION USING NEURAL NETWORKS

Document Sample
DATA CLASSIFICATION USING NEURAL NETWORKS Powered By Docstoc
					LARGE DATA CLASSIFICATION USING
            NEURAL NETWORKS
                           A

                    MINI PROJECT

                    SUBMITTED TO

     COMPUTER SCIENCE DEPARTMENT OF THE
     UNIVERSITY OF AGRICULTURE, ABEOKUTA

                          BY

ADELANI DAVID IFEOLUWA, 06/1166

AIGBERUA TOBI DEBORAH, 06/1172

IWOBHO AKHAZE ANTHONY, 06/1199

OWODUNNI ELIAS ADEFARASIN, 06/1223


COURSE: CSC 328 (COMPUTER APPLICATIONS)




                    SUPERVISED BY:

                DR ADEWOLE PHILIPS.
ABSTRACT

The three-layer artificial neural network (ANN) model with back-propagation (BP) algorithm
was used to classify customers of a German automobile company into various categories. The
classification entails three divisions namely: Germany, South Africa and Maldives of the
company. The primary aim was to break down the customers into different categories namely:
good, average and below average on the basis of invoicing data and acts as an important
component in data mining. The classification of data using Neural Networks entails a day to day
invoicing data of the customer as the base. An intelligent data is gotten from a raw data through
a process called data cleaning and relevance analysis. Extraction of data depends on a number
of factors like customers who order the maximum amount of invoicing quantity in each of the
three source systems. The intelligent data undergoes the following: conditioning, averaging,
preparing and normalizing. Normalization makes the data suitable for use in a three layer feed
forward ANN using the back propagation algorithm. On the basis of a number of iterations of
the “supervised” Input/Output training pairs, the ANN learns to master the classification of
customer data. The error in each iteration on the ANN is fed back to adjust the weights in the
previous layer, which makes the network an accurate classifier. The ANN makes use of; different
learning rates annealing schedules, various number of nodes in the hidden layer and different
activation functions which not only provides for various rates of error convergence but to
measure the confidence and support in data mining which helps to predict a customer
classification for a new customer of the company.
TABLE OF CONTENTS

1.0   INTRODUCTION ......................................................................... 1

2.0   LITERATURE REVIEW ............................................................. 3

3.0   RESEARCH METHOD ................................................................ 7

4.0   RESULTS AND DISCUSSION .................................................... 19

5.0   CONCLUSION ............................................................................... 20


      REFERENCES ............................................................................... 21
1.0                                  INTRODUCTION

An Artificial Neural Network (ANN) is an information processing paradigm (model that forms
basis for a theory) that is inspired by the way biological nervous system, such as the brain,
process information. The key element of this paradigm is the novel structure of the information
processing system. It is composed of a large number of highly interconnected processing
elements (neurones) working in unison to solve specific problems. ANNs, like people, learn by
example. An ANN is configured for a specific application, such as pattern recognition or data
classification, through a learning process. These basically consist of inputs (like synapses),
which are multiplied by weights (strength of the respective signals), and then computed by a
mathematical function which determines the activation of the neuron. Depending on the weights,
the computation of the neuron will be different. By adjusting the weights of an artificial neuron
we can obtain the output we want for specific inputs. But when we have an ANN of hundreds or
thousands of neurons, it would be quite complicated to find by hand all the necessary weights.
This process of adjusting the weights is called learning or training. Neural networks can be
categorized into single layer neural networks (just one layer of neurons - output layer only, no
hidden layers) and multilayer neural networks (two layers of neurons– one hidden and one output
layer).




There are numerous examples of commercial applications for neural networks. These include;
fraud detection, telecommunications, medicine, marketing, bankruptcy prediction, insurance,
data classification, the list goes on. Neural network is used in various commercial applications
based on its advantages: High Accuracy, Noise Tolerance, Independence from prior assumptions,
Ease of maintenance, Neural network overcome some limitations of other statistical methods
while generalizing them, and Neural networks performance can be highly automated, minimizing
human involvement.


        Data classification is the categorization of data for its most effective and efficient use. In
a basic approach to storing computer data, data can be classified according to its critical value or
how often it needs to be accessed. This kind of classification tends to optimize the use of data
storage for multiple purposes-technical, administrative, legal and economic. Data can be
classified according to any criteria, not only relative importance or frequently of use. Computer
programs exist that can help with data classification. Neural networks can be used to classify
data efficiently using back propagation algorithm. This ensures training of the neural network to
effectively classify data.

      In this work, the concept of classifying customers of the company is in three categories
namely: good, average and below average. Three locations were considered for selling invoice
data namely; Germany (VSET), South Africa (SAFRI) and Maldives (MAVSI). The data is
recorded in three dimensions: the quantity ordered by the customer (invoice quantity), the net
sales value (in currency) of the product and the time (in terms of date, month and year) when the
transaction through invoices takes place. The task of data cleaning           is done by collecting
sufficient amount of data from only three locations out of the available (many) whereby leading
to relevance analysis of the data. Extracted data undergo normalization and data transformation.
The data transformed is then put on a Multi Layer Feed Forward Neural Network (MLFFNN)
over a set of iterations. The iterations allow the Neural Network to master the art of classifying
the customers invoicing data under a tolerance limit of the error.




Keywords: Artificial Neural Networks, Large Data Classification, Back Propagation Algorithm,
Data Cleaning, Relevance Analysis, Normalization, Multilayer Feed Forward Neural Network,
MATLAB, Epoch, Mean Square Error, Percent Error, Performance and Confusion Matrix.
2.0                     LITERATURE REVIEW

2.1     PREVIOUS WORKS: A number of initiatives have been implemented in classification
of data using Neural Network. A few are discussed below:

2.1.1   Classification of wine samples by means of artificial neural networks and
discrimination analytical methods by Li-Xian Sun et al (1996), they talked about the three-
layer artificial neural network (ANN) model with back-propagation (BP) of error was used to
classify wine samples in six different regions based on the measurements of trace amounts of B,
V, Mn, Zn, Fe, Al, Cu, Sr, Ba, Rb, Na ,P, Ca, Mg, K using an inductively coupled plasma optical
emission spectrometer (ICP-OES). The ANN architecture and parameters were optimized. The
results obtained with ANN were compared with those obtained by cluster analysis, principal
component analysis, the Bayes discrimination method and the Fisher discrimination method. A
satisfactory prediction result (100%) by an artificial neural network using the jackknife leave-
one-out procedure was obtained for the classification of wine samples containing six categories.

2.1.2   Fuzzy Neuro Systems for Machine Learning for Large Data Sets by Rahul Kala et al
(2009), they talked about Artificial Neural Networks founding a variety of applications that
cover almost every domain. The increasing use of Artificial Neural Networks and machine
learning has led to a huge amount of research and making in of large data sets that are used for
training purposes. Handwriting recognition, speech recognition, speaker recognition, face
recognition are some of the varied areas of applications of artificial neural networks. The larger
training data sets are a big boon to these systems as the performance gets better and better with
the increase in data sets. The higher training data set although drastically increases the training
time. Also it is possible that the artificial neural network does not train at all with the large data
sets. This paper proposes a novel concept of dealing with these scenarios. The paper proposes the
use of a hierarchical model where the training data set is first clustered into clusters. Each cluster
has its own neural network. When an unknown input is given to the system, the system first finds
out the cluster to which the input belongs. Then the input is processed by the individual neural
network of that system. The general structure of the algorithm is similar to a hybrid system
consisting of fuzzy logic and artificial neural network being applied one after the other. The
system has huge applications in all the areas where Artificial Neural Network is being used
extensively. The huge amount of research over these times has resulted in the creation of big
databases, which are impractical for a single artificial neural network to train. Also the system
takes us closer to the imitation of the human brain which has specialized segments for all sorts of
scenarios. In order to test the working of the proposed system, we applied it over a synthetic
dataset. The dataset was built by random inputs. We also applied the algorithm to the problem of
face recognition. In both the cases, we got a better learning and a higher efficiency using our
system. The time required to train the system was also much less than the original single neural
network structure. This shows the impact of the algorithm.

2.1.3   Classification Study on DNA Microarray with Feedforward Neural
Network Trained by Singular Value Decomposition by Hieu Trung Huynh et al (2009) and
they made us to understand that DNA microarray is a multiplex technology used in molecular
biology and biomedicine. It consists of an arrayed series of thousands of microscopic spots of
DNA oligonucleotides,called features, of which the result should be analyzed by computational
methods. Analyzing microarray data using intelligent computing methods has attracted many
researchers in recent years. Several approaches have been proposed, in which machine learning
based approaches play an important role for biomedical research such as gene expression
interpretation, classification and prediction for cancer diagnosis, etc. In this paper, we present an
application of the feedforward neural network (SLFN) trained by the singular value
decomposition (SVD) approach for DNA microarray classification. The classifier of the single
hidden-layer feedforward neural network (SLFN) has the activation function of the hidden units
to be ‘tansig’. Experimental results show that the SVD trained feedforward neural network is
simple in both training procedure and network structure; it has low computational complexity
and can produce better performance with compact network architecture.
2.1.4   Biological data mining with neural networks: implementation and application of a
flexible decision tree extraction algorithm to genomic problem domains by Antony Browne
et al (2003), they made us to understand that in the past, neural networks have been viewed as
classification and regression systems whose internal representations were extremely di6cult to
interpret. It is now becoming apparent that algorithms can be designed which extract
understandable representations from trained neural networks, enabling them to be used for data
mining, i.e. the discovery and explanation of previously unknown relationships present in data.
This paper reviews existing algorithms for extracting comprehensible representations from
neural networks and describes research to generalize and extend the capabilities of one of these
algorithms. The algorithm has been generalized for application to bioinformatics datasets,
including the prediction of splice site junctions in Human DNAsequences. Results generated on
this datasets are compared with those generated by a conventional data mining technique (C5)
and conclusions drawn.
2.1.5   A hybrid artificial neural network model for Data Visualisation, classification, and
clustering by Teh Chee Siong (2006). In this thesis, the research of a hybrid Artificial Neural
Network (ANN) model that is able to produce a topology-preserving map, which is akin to the
theoretical explanation of the brain map, for data visualisation, classification, and clustering is
presented. The proposed hybrid ANN model integrates the Self-Organising Map (SOM) and the
kernel-based Maximum Entropy learning rule (kMER) into a unified framework, and is termed
as SOM-kMER. A series of empirical studies comprising benchmark and real-world problems is
employed to evaluate the effectiveness of SOM-kMER. The experimental results demonstrate
that SOM-kMER is able to achieve a faster convergence rate when compared with kMER, and to
produce visualisation with fewer dead units when compared with SOM. It is also able to form an
equiprobabilistic map at the end of its learning process. This research has also proposed a variant
of SOMkMER,i.e., probabilistic SOM-kMER (pSOM-kMER) for data classification. The
pSOMkMER model is able to operate in a probabilistic environment and to implement the
principles of statistical decision theory in undertaking classification problems. In addition to
performing classification, a distinctive feature of pSOM-kMER is its ability to generate
visualisation for the underlying data structures. Performance evaluation using benchmark
datasets has shown that the results of pSOM-kMER compare favourably with those from a
number of machine learning systems. Based on SOM-kMER, this research has further expanded
from data classification to data clustering in tackling problems using unlabelled data samples. A
new lattice disentangling monitoring algorithm is coupled with SOM-kMER for density-based
clustering. The empirical results show that SOM-kMER with the new lattice di entangling
monitoring algorithm is able to accelerate the formation of the topographic map when compared
with kMER. By capitalising on the efficacy of SOM-kMER in data classification and clustering,
the applicability of SOM-kMER (and its variants) to decision support problems is demonstrated.
The results obtained reveal that the proposed approach is able to integrate (i) human's
knowledge, experience, and/or subjective judgements and (ii) the capability of the computer in
yprocessing data and information objectively into a unified
framework for undertaking decision-making tasks.
2.2    THE BACKWARD PROPAGATION ALGORITHM
According to a lecture note (NeuralNet2002.pdf) compiled from Neural Networks for
Pattern Recognition, Bishop Christopher (1995) and Neural Networks in Finance and
Investing, Trippi et al (1996).

We will discuss the backprop algorithm for classification problems. There is a minor adjustment
for prediction problems where we are trying to predict a continuous numerical value. In that
situation we change the activation function for output layer neurons to the identity function that
has output value=input value. (An alternative is to rescale and recenter the logistic function to
permit the outputs to be approximately linear in the range of dependent variable values).The
backprop algorithm cycles through two distinct passes, a forward pass followed by a backward
pass through the layers of the network. The algorithm alternates between these passes several
times as it scans the training data. Typically, the training data has to be scanned several times
before the networks”learns” to make good classifications.


Forward Pass: Computation of outputs of all the neurons in the network
The algorithm starts with the first hidden layer using as input values the independent variables of
a case (often called an exemplar in the machine learning community) from the training data set.
The neuron outputs are computed for all neurons in the first hidden layer by performing the
relevant sum and activation function evaluations. These outputs are the inputs for neurons in the
second hidden layer. Again the relevant sum and activation function calculations are performed
to compute the outputs of second layer neurons. This continues layer by layer until we reach the
Output layer and compute the outputs for this layer. These output values constitute the neural
net’s guess at the value of the dependent variable. If we are using the neural net for
classification, and we have c classes, we will have c neuron outputs from the activation functions
and we use the largest value to determine the net’s classification. (If c = 2, we can use just one
Output node with a cut-off value to map an numerical output value to one of the two classes).
3.0                           RESEARCH METHOD

USING NEURAL NETWORKS TO CLASSIFY LARGE DATA

3.1    DATA CLEANING
Data cleaning is a process of removing noise (erroneous data) from the normal data, identify the
outliners in data and finally bring about data that is consistent and sufficient for analysis. Large
quantity of data for analyses purposes leads to the possibility of a large number of errors on
account of noise and outliners. The large data was simplified by considering three locations,
Germany, South Africa and Maldives for customer invoicing data. This choice provided the data
that was not only sufficient for analysis but also could be cured for noise and outliners. Figure 1
below shows how this reduction was achieved with the help of simple SQL statements.




       The SQL query used for this operation of data cleaning can be framed as under:
SELECT * INTO INV_TAB FROM INVOICE_SUMMARY_FILE WHERE SS_CODE IN
('SAFRI','MAVSI', 'VSET')


3.2    RELEVANCE ANALYSIS AND DATA SELECTION
Relevance Analysis was performed on the cleaned data. The core theme of this analysis lie in the
fact that the cleaned data was initially recorded under three different dimensions. The three
dimensions represent the locations where data are got from. These three dimensions were used to
frame 5 potential questions for the purpose of Relevance Analysis and Data Selection of the
customer invoicing data. The 5 questions were:
1. Customers who make the maximum number of invoices in each of the three source systems.
2. Customers who make the maximum consolidated purchase in terms of net sales value (in some
currency) in each of the three source systems.
3. Customers who order the maximum amount of invoicing quantity in each of the three source
systems.
4. Customers who have made the maximum number of invoices in the past two months in each
of the three source systems.
5. Customers who have the maximum amount (in some currency) spent on a single product or
part in each of the three source systems.


3.3     DATA TRANSFORMATION (NORMALIZATION) AND AVERAGING
The above extracted relevant data for top 15 customers for the three source systems based on the
above mentioned five criteria serves well to form the “Training Set” for our MLFFNN (Multi-
layer Feed Forward Neural Network) but on looking at the data, it was found that there is no
control over the units. The solution to this problem of comparison lies in Data Transformation
(Normalization) and further in the concept of Averaging. For example, in the source system
VSET the top customer having number 4100 has a Net Sales Value of 167206.96 EURO whereas
the top customer number I669625C in MAVSI Source System has a Net Sales Value of 6140
EURO, making the two very difficult to compare. Thus, there is a need for normalization of data.
3.3.1   NORMALIZATION
In this work, each value in the result was divided by the largest value keeping a check on the data
type of the result. The normalization is done for the three locations namely SAFRI, MAVSI and
VSET.
3.3.2   AVERAGING

In averaging, the values in the data were summed up and divided by the number of values that
was used in the sum. In the case of customer invoicing data set of top 15 customers the averaging
has been done across source systems and across classification questions. This means that the first
five values resulting from a classification question (out of the 15) were taken from a source
system. These values were summed and divided by five and get a single averaged value for that
classification question. This is done for the remaining two source systems as well so as to get a
total of three averaged values, for the first classification question, from the three source systems.
These three averaged values were further summed and divided by three to get a final figure
which is averaged across three source systems and for a single classification question. The same
technique is followed for the next five values (out of the 15) and also for the last five values (out
of the 15) per question per source system. The result is that for each classification question, three
values were got (first for good customers, second for average customers and third for below
average customers) for three source systems. As we have a set of five question and each question
results in three normalized averaged values, we get a total of 15 values. The first five are for
good customers, the next five are for average customers and the last five are for the below
average customers. A typical calculation for the above mentioned classification question No.3
has been shown below in Table 4 (Calculation for Question 3) for the three source systems.
Sum/15 refers to the sum of totals divided by 15. The value 0.30841 is for a good customer
where 0.01223 is for an average customer and a value of 0.00249 is for a below average
customer.

Table 4. Calculation for Question 3
       MAVSI           SAFRI          VSET            Sum/15
1      1                  1              1
2      0.277           0.1708          0.658
3       0.148          0.0968          0.05
4       0.072          0.0317          0.0251
5       0.041          0.0314          0.0235
Total 1.539            1.3307         1.7566          0.30841
1       0.041          0.0193         0.0126
2       0.031          0.0117         0.008
3      0.01            0.0065         0.0078
4      0.009           0.0062         0.0046
5      0.007           0.006          0.0034
Total 0.097            0.0497         0.0364          0.01223
1      0.006           0.0032         0.0032
2      0.006           0.0001         0.0028
3      0.004           0              0.0017
4       0.004          0              0.001
5      0.004           0              0.0006
Total 0.025            0.0033         0.0093          0.00249



3.4    THE MULTI LAYER FEED FORWARD NEURAL NETWORK (MLFFNN)
The results of the above analysis will serve as Inputs for a MLFFNN. We use the MLFFNN on
classification of customer invoicing data we use a modified back propagation algorithm. Below
we describe the conventional back propagation algorithm and make a mention of the
modification wherever it is made. The MLFFNN is shown below;
3.4.1   ARCHITECTURE OR TOPOLOGY
The 5 input nodes correspond to the 3 sets of 5 values each (of the customer invoicing data)
applied to the neural network as training samples. The 3 nodes in the output layer correspond to
the 3 levels of classification of customers as good, average and below average. For the purpose
of a 3 fold classification, 100 is used as “Desired Output” for a good customer, 010 is used for an
average customer and 001 is used as desired output for the category of below average customers.
3.4.2   INITIALIZING THE WEIGHTS AND TRAINING SAMPLE
                                       TRAINING INPUT
  Question INPUT A                         INPUT B                   INPUT C
  1           0.4301266                    0.0802533                 0.04794
  2           0.4888733                    0.09396                   0.0460733
  3           0.308413                     0.012233                  0.002493
  4           0.80125                      0.701267                  0.306189
  5           0.485115                     0.098404                  0.040798

                                       TRAINING RESULT
                           OUTPUT A         OUTPUT B      OUTPUT C
                           1                0             0
                           0                1             0
                           0                0             1
Where A stands for the training data for good customers, B stands for the training data for
average customers and C stands for the training data for below average customers.

3.4.3   TERMINATING CONDITION
             The mean squared error is below a threshold value (5.0e-7).
             The percent error is zero.
3.4.4   THE TEST SAMPLE
A testing sample (as Input) is applied when the network has been trained adequately and mean
squared error has fallen below the desired level and serves as a test of effectiveness of the
analysis model. A test case is prepared from any other unused source system or location. The
effectiveness is measured on the basis of how closely a MLFFNN is able to classify a test sample
correctly. A typical test sample T for the purpose of analysis is given below:
(0.3001266, 0.4088733, 0.208413, 0.85125, 0.65115)
For this test sample T, the MLFFNN should correctly classify the customer invoicing data in the
category of good customers.


3.5      IMPLEMENTATATION OF NEURAL NETWORK USING MATLAB 2008
MATLAB is software that can easily be used to implement artificial neural network. There are
other programming languages such as Java, C# and so on that can be used to implement neural
networks. MATLAB has been simplified to program mathematical functions and neural network
in an easily form without writing too much code. In this work, a MAT-file (which allows you to
input an “m x n” matrix in the format of an excel file) was created. The training input (INPUT),
training output data (OUTPUT), (test input and test output). The data is expressed in 3.4.2
(Initializing the weights and training sample).
Neural network can be implemented using three forms in MATLAB.
      (1) Using command-line functions
      (2) Using the Neural Network Toolbox TM clustering tool GUI (nprtool)
      (3) Graphical user interface (nntool).
This discussion is limited to the first two methods of implementation because graphical user
interface (nntool) is basically used for multilayer perception.


      1. Using command-line functions
After imputing all the data needed, the next steps are as follows;
               Create a new M-file which is in the form of a text editor.
               Using a particular algorithm for training, a neural network code for Pattern
                Recognition was written. The code was written using Scalar Conjugate Gradient
                (trainscg) algorithm.
               The code is written below.
                                         PROGRAM

Backpropagation Algorithm for multilayer feed forward artificial neural network

fprintf('INPUT represents training input while RESULT represents training output');
INPUT
RESULT
net = newpr(INPUT,RESULT,4,{},'trainscg'); % using Scaled Conjugate Gradient
(trainscg)
   [net,tr]=train(net,INPUT,RESULT); %Training of the network

    fprintf('to test the neural network, the RESULT needs to be tested with the result of the
network - which appears below');
    outInput = sim(net,INPUT)
    testINPUT
    testRESULT
    fprintf('the result of testing data appears below');
     outTest = sim(net,testINPUT)
     if round(outTest) == [1; 0; 0]
        disp('this implies that the customer is in a GOOD category');
     end
     plotperf(tr)
plotconfusion(RESULT,outInput)
[y_out,I_out] = max(outTest);
[y_t,I_t] = max(testRESULT);
diff = [I_t - 3*I_out];
g_g = length(find(diff==-2));
g_a = length(find(diff==-3));
g_b = length(find(diff==-1));
a_a= length(find(diff==0));
a_g= length(find(diff==3));
a_b= length(find(diff==-3));
b_b= length(find(diff==2));
b_g= length(find(diff==-1));
b_a= length(find(diff==0));
N = size(testINPUT,3);             % Number of testing samples
fprintf('Total testing samples: %d\n', N);
cm = [g_g g_a g_b; a_a a_g a_b; b_b b_g b_a]
cm_p = (cm ./ N) .* 100        % classification matrix in percentages
fprintf('Percentage Correct classification : %f%%\n', 100*(cm(1,1)+cm(2,2)+cm(3,3))/N);
fprintf('Percentage        incorrect       classification                     :       %f%%\n',
100*(cm(1,2)+cm(2,1)+cm(1,3)+cm(3,1)+cm(2,3)+cm(3,2))/N);
The output of the code will be on the command line, an interface of the training is shown and the
confusion matrix. The performance network is also plotted. It is shown below.
      OUTPUT OF THE PROGRAM IN COMMAND-LINE
INPUT represents training input while RESULT represents training output

INPUT =

  0.4301 0.0803 0.0479

  0.4889 0.0940 0.0461

  0.3084 0.0122 0.0025

  0.8013 0.7013 0.3062

  0.4851 0.0984 0.0408

RESULT =

  1   0      0

  0   1      0

  0   0      1

To test the neural network, the RESULT needs to be tested with the result of the network -
which appears below

outInput =

  1.0000 0.0003 0.0000

  0.0007 0.9993 0.0007

  0.0001 0.0006 0.9994

testINPUT =

  0.3001

  0.4089

  0.2084

  0.8512

  0.6512

testRESULT =

  1

  0
      0

the result of testing data appears below

outTest =

  1.0000

  0.0025

  0.0001

this implies that the customer is in a GOOD category

Total testing samples: 1

cm =

      1     0       0

      0     0       0

      0     0       0

cm_p =

 100            0       0

      0     0       0

      0     0       0

Percentage Correct classification : 100.000000%

Percentage Incorrect classification : 0.000000%

>>




(1)       Using the Neural Network ToolboxTM clustering tool GUI.
(i) Data inputted in MAT-file is used.
(ii) A command “nprtool” is typed on the command lines which show an interface of Neural
       Network Pattern Recognition tool used for classification of data.
(iii) The input and target data is inputted which can be accessed directly from the system
       using “browse” button.
(iv) By pressing next, you input the number of neurons in the hidden layer; for our project we
       inputted four neurons.
(v) Next, you have an interface where you press “train” button to train the neural network.
       From the training interface in fig. 2, the performance and the confusion matrix can be
       plotted which will have the same output using the command-line prompt.
(vi) The test input and test output is fed into the neural network and the network is tested (test
       network).
          Fig. 2


The training toolbox has the following characteristics;
   (1) Performance plotting: this is used to plot mean square error (MSE) against the epoch (a
       presentation of the entire training set to the neural network) on the graph shown below
          Fig. 3


Confusion Matrix: this is used to plot output matrix (output of the neural network) against the
target matrix (output of the training data). The values in the diagonal matrix (green colour)
represent data that are well classified while the one in the red colour represents data that are
misclassified.
Simulate data: a command “sim” is used to test the input data if truly the neural network
understands the training.
Mean Square Error: this is the average squared difference between outputs and targets. Lower
values are better. Zero means no error.

Percent Error: this indicates the fraction of samples which are misclassified. A value of 0
means no misclassifications, 100 indicates maximum misclassifications.
Fig. 4
4.0     RESULTS AND DISCUSSION

The output of the data classification using neural network has proved to be a good classifier. The
efficiency of the neural network depends on a number of neurons in the hidden layer. Actually,
there is no formula for selection of number of neurons. It is mainly by trial and error, the higher
the number of neurons the more efficient (accurate) the neural network. It should be noted that
large number of neurons can cause complication in classification. Having tried different number
of neurons, we used four (4) neurons which provided accurate classification since our input data
is not too large.

The Neural Network output is tested to be an accurate classifier using Mean Square Error (MSE)
and Percent Error. The MSE got was 1.26518e-7 which is the average square difference between
the output (the neural network result of the input-outInput in the code) and the target (RESULT-
in the code), the function ‘sim()’ (simulate) is used to get the output data when using code;
hence, the classification is good. A zero error can’t be got because the neural network cannot be
100% accurate. The percent error is zero which indicates no error in classification; this is shown
diagrammatically in confusion matrix. The test data gave a MSE of 4.90938e-7 and percent error
is also zero since the neural network classified the test data well.
5.0    CONCLUSION

From the entire analysis of the classification of customer invoicing data with the help of a Multi
Layer Feed Forward Neural Network, the following conclusions were made;
1. The framework of classification of data into distinct classes is independent of the entities used
as examples (customers, parts etc.) and thus the analysis is very general in nature.
2. After the MLFFNN learns to classify the customer invoicing data it would serve as a
forecasting tool, where an early invoicing data for an unknown customer would serve to forecast
this customer’s classification in days to come.
Neural network has proven to be a good classifier, a predicting tool and a forecasting tool of
data. It overcomes the limitation of statistical methods for analysing data and also provides
advantages of high accuracy, noise tolerance and ease of maintenance.
REFERENCES

Antony Browne, Brian D. Hudson, David C. Whitley, Martyn G. Ford, Philip Pictoni, (2003).
Biological data mining with neural networks: implementation and application of a flexible
decision tree extraction algorithm to genomic problem domains School of Computing, Guildford:
University of Surrey.

Bishop, Christopher (1995). Neural Networks for Pattern Recognition, Oxford.

Christos    Stergious     and      Dimitrios     Siganos     (1996).      Neural        Network,
www.doc.ic.ac.uk/˷nd/surprise_96/journal/vol4/cs11/report.html

Hieu Trung Huynh, Jung-Ja Kim and Yonggwan Won (2009). Classification Study on DNA
Microarray with Feedforward Neural Network Trained by Singular Value Decomposition,
Korea: Chonnam National University.

Li-Xian Sun á Klaus Danzer á Gabriela Thiel (1996) Classification of wine samples by means of
artificial neural networks and discrimination analytical methods, China: Hunan Normal
University.

Portia A. Cerny. Data mining and Neural Networks from a Commercial Perspective, Australia:
University of Technology Sydney.

Rahul Kala , Anupam Shukla , Ritu Tiwari (2009) Fuzzy Neuro Systems for Machine Learning
for Large Data Sets , India: Indian Institute of Information Technology and Management.

Teh chee siong (2006). A hybrid artificial neural network model for data visualisation,
classification, and clustering, University of sains: Malaysia

Trippi, Robert (1996). Neural Networks in Finance and Investing, McGraw Hill: Turban, Efraim
(editors )

Varun Dutt , V. Thiagaraj.The Concept of Classification in Data Mining using Neural Networks,
TamilNadu: Annamalai University.

(2007).What is data classification?
www.searchdatamanagement.techtarget.com/sDefinitions/0,,sid91_gci1152474,00.html

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2227
posted:11/4/2011
language:English
pages:24
Description: Classifying, predicting data using neural network on Matlab