Comparison Of Neural Network And Multivariate Discriminant Analysis In Selecting New Cowpea Variety

Document Sample
Comparison Of Neural Network And Multivariate Discriminant Analysis In Selecting New Cowpea Variety Powered By Docstoc
					                                                        (IJCSIS) International Journal of Computer Science and Information Security,
                                                        Vol. 8, No. 4, July 2010


                                                    Adewole, Adetunji Philip *

                             Department of Computer Science, University of Agriculture, Abeokuta


                                                          Sofoluwe, A. B.

                                 Department of Computer Science, University of Lagos, Akoka

                                                Agwuegbo , Samuel Obi-Nnamdi

                                  Department of Statistics, University of Agriculture, Abeokuta



In this study, neural networks (NN) algorithm and multivariate discriminant (MDA) based model were developed to classify ten (10)
varieties of cowpea which were widely planted in Kano. . In order to demonstrate the validity of our model, we use the case study to
build a neural network model using Multilayer Feedforward Neural Network, and compare its classification performance against the
Multivariate discriminant analysis. Two groups of data (Spray and Nospray) were used. Twenty kernels were used as training data set
and test data to classify cowpea seed varieties. The neural network classified the new cowpea seed varieties based on the information
it is trained with. At the end both methods were compared for their strength and weakness. It is noted that NN performed better than
MDA, so that NN could be considered as a support tool in the process of selection of new cowpea varieties.

KEYWORDS: Cowpea, Multivariate Discriminant Analysis (MDA), Neural Network (NN), Perceptron.

                                                                                                   ISSN 1947-5500
                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                            Vol. 8, No. 4, July 2010

1.0       Introduction

The history of neural networks begins with the earliest model of the biological neuron given by [5]. This model describes a neuron as
a linear threshold computing unit with multiple inputs and a single output of either 0, if the nerve cell remains inactive, or 1, if the cell
fires. A neuron fires if the sum of the inputs exceeds a specified threshold. In functional form, this gives f(x) = 1 for x greater than
some threshold, and f(x) = 0 otherwise (this is commonly known as the indicator function). In theory, such a "system" of neurons
presents a possible model for biological neural networks such as the human nervous system.

The [5] model was utilized in the development of the first artificial neural network by [12] in 1959. This network was based on a unit
called the perceptron, which produces an output scaled as 1 or -1 depending upon the weighted, linear combination of inputs.
Variations on the perceptron-based artificial neural network were further explored during the 1960s by [12] and by [15], among

In 1969 [6] demonstrated that the perceptron was incapable of representing simple functions which were linearly inseparable. This
includes the case of the "exclusive or" (XOR). Because of this fundamental flaw (and Rosenblatt's untimely death) the study of neural
networks fell into something of a decline during the 1970s. However, this limitation was overcome in the early 1980s. According to
[11] : The post-perceptron era began with the realization that adding (hidden) layers to the network could yield significant
computational versatility. This yielded a considerable revival of interest in ANNs (especially multilayered feedforward structures),
which continues to this day.

Presently, much research on neural networks is taking place within two areas: (a) the aforementioned multilayered feed-forward
networks, also known as multilayer perceptrons, and (b) symmetric recurrent networks, also known as attractor neural networks or
Hopfield nets. The former model is used for classification problems, while the latter is used for developing associative memory
systems. The investigation into neural network structures and performance has taken on a substantially pragmatic feel in recent years.
There is greater interest in using neural networks as problem-solving algorithms than in developing them as accurate representations
of the human nervous system. ANNs have been implemented to solve a variety of problems involving pattern classification and
pattern recognition.

2.0       Neural Networks and Standard Statistical Techniques

Similarities between ANNs and statistical methods definitely exist. Indeed, neural networks have been categorized as a form of
nonlinear regression. It has also been observed that multiple linear regression, a standard statistical tool, can be expressed in terms of a
simple ANN node. For example, given the linear equation y = b0 + b1x1 + ... + bnxn, the xi can be taken as the inputs to a node, the bi
taken as the corresponding weights, and b0 taken as the activation function. There are at least two key differences between ANNs and
statistical methods. Often remarked upon as a major drawback of ANNs is the fact that their internal functional structure remains
unknown once they have been trained. In effect, a neural network remains a "black box" that may produce useful results, but cannot be
precisely understood. Statistical procedures do not exhibit this sort of opaque design. The construction of a neural network is also
something of an ad-hoc process whereas there are commonly formalized guidelines for fitting the best model in statistics. The
performance of ANNs has been extensively compared to that of various statistical methods within the areas of prediction and

                                                                                                       ISSN 1947-5500
                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                            Vol. 8, No. 4, July 2010

classification. In particular, a fair amount of literature has been generated on the use of ANNs in time series forecasting. In an
examination of two time series without noise, [4] concluded that basic neural networks substantially outperform conventional
statistical methods. [7] found that a neural network and the Box-Jenkins forecasting system performed about the same for the analysis
of 75 different time series. Interestingly, the memory of a time series has been demonstrated to influence relative performance of
ANNs and the Box-Jenkins Model. The Box-Jenkins model slightly outperforms ANNs for time series with long memory, while the
reverse tends to be true for time series with short memory. Stern has also concluded that for time series analysis "NNs work best in
problems with little or no stochastic component". Neural network and statistical approaches to pattern classification have been
compared by a number of researchers. For the most part, reviews seem to be mixed. For instance, [3] concluded that neural networks
show little promise as real-world classifiers, while a case study examined by Yoon points to the superiority of ANNs over classical
discriminant analysis.

In a comprehensive study of classification techniques, [6] rated the performance of a large selection of neural network, statistical, and
machine learning algorithms on a variety of data sets. In the analysis of their results, they presented the top five algorithms for twenty-
two different data sets based on error rates. Though not conclusive, the study by Mitchie would seem to suggest that neural networks
aren't necessarily replacements for, or even preferable alternatives to standard statistical classification techniques.

2.1 Multivariate Discriminant Analysis
The term multivariate discriminant analysis refers to several different types of analyses. Classificatory discriminant analysis is used to
classify observations into two or more known groups on the basis of one or more quantitative variables. Classification can be done by
either a parametric method or a nonparametric method. A parametric method is appropriate only for approximating normal within-
class distributions. The method generates either a linear discriminant function or a quadratic discriminant function. When the
distribution within each group is not assumed to have any specific distribution different from the multivariate normal distribution,
nonparametric methods can be used to derive classification criteria. These methods include the kernel method and nearest-neighbor
methods. The kernel method uses uniform, normal, bi-weight, or tri-weight kernels in estimating the group-specific density at each
observation. The within-group covariance matrices or the pooled covariance matrix can be used to scale the data. The performance of
a discriminant function can be evaluated by estimating error rates (probabilities of misclassification). Error count estimates and
posterior probability error rate estimates can be evaluated. In multivariate statistical applications, the data collected are largely from
distributions different from the normal distribution. Various forms of nonnormality can arise, such as qualitative variables or variables
with underlying continuous but nonnormal distributions. If the multivariate normality assumption is violated, the use of parametric
discriminant analysis might not be appropriate. When a parametric classification criterion (linear or quadratic discriminant function) is
derived from a nonnormal population, the resulting error rate estimates might be biased.

2.1.1   Discriminant Function
A simple linear discriminant function transforms an original set of measurements on a sample into a single discriminant score. The
score or transform variable represents the sample’s position along line defined by the linear discriminant function. We can therefore
think of the discriminant function as away of collapsing a multivariate problem down into a problem which involves only one

                                                                                                       ISSN 1947-5500
                                                              (IJCSIS) International Journal of Computer Science and Information Security,
                                                              Vol. 8, No. 4, July 2010

variable. One method that can be used to find discriminant function is regression; the dependent variable consists of the differences
between the multivariate means of the group.

In matrix notation, we must solve an equation of the form

[Sp2]*[λ] = [D]………………………………… (1)
Where Sp2 is the m*m matrix of pooled variance and covariances of the m- variable, λ is the coefficient of the D-equation.
[λ] = [Sp2]-1*[D]………………………………… (2)
         To compute the discriminant function, we must determine the various entries in the matrix equation. The mean differences
are found simply by
Dj=Ậj-Bj=              /na-              /nb      …………………… (3)

In expanded form
  D1               Ậ1                        B1
  D2              Ậ2                         B2
     -      =      -          -     -
     -             -                -
     -             -                -
 Dm                Am                   Bm

We must construct the matrix of sum of square and cross product of all variables.

Sum of product of Matrix
A=                      –                               ……. (4)

Sum of product of Matrix
B=                            (                          …… (5)

Then the matrix of pooled variance can be found as

Sp2 = SPA + SPB                   / (na + nb - 2)………….. (6)

It could be observed that this equation for pooled variance is exactly the same as that used in the T2 test of the equality of multivariate
means. Although the amounts of mathematical manipulation that must be performed to calculate the coefficients of a discriminant
function appear large.

                                                                                                         ISSN 1947-5500
                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                             Vol. 8, No. 4, July 2010

The set of coefficients that can be found are entries in the discriminant function equation of the form:

   R0 = λ1ψ1+ λ2ψ2+ λ3ψ3 +…….+ λmψ…………………. (7)
The discriminant index, R0, is the point along the discriminant function line which is exactly halfway between the center of group A
and the center of group B. Then we substitute the multivariate mean of group A into the equation to obtain RA (that is, we set ψj = Aj)
and the mean of group B.

2.2    Multilayer Feedforward Neural Network
Feedforward neural networks (FF networks) are the most popular and most widely used models in many practical applications. Feed-
forward ANNs allow signals to travel one way only; from input to output. There is no feedback (loops) i.e. the output of any layer
does not affect that same layer. Feed-forward ANNs tend to be straight forward networks that associate inputs with outputs. They are
extensively used in pattern recognition. This type of organisation is also referred to as bottom-up or top-down. They are known by
many different names, such as "multi-layer perceptrons. The manner in which the neurons of a neural network are structured is
intimately linked with the learning algorithm used to train the network. We may therefore speak of learning algorithm used in the
design of neural networks as being structured. The classification of learning algorithms can be considered based on two fundamental
different classes (Single layer feed-forward, Multi-layer feed-forward) of neural network architecture.

The diagram bellow describes the structure of the multilayer feed-forward neural network.

            Input Layer         Hidden Layer        Output Layer

      a.   The activity of the input units represents the raw information that is fed into the network.
      b.   The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between
           the input and the hidden units.
      c.   The behavior of the output units depends on the activity of the hidden units and the weights between the hidden and output
           units. This simple type of network is interesting because the hidden units are free to construct their own representations of the
           input. The weights between the input and hidden units determine when each hidden unit is active, and so by modifying these
           weights, a hidden unit can choose what it represents.

                                                                                                        ISSN 1947-5500
                                                          (IJCSIS) International Journal of Computer Science and Information Security,
                                                          Vol. 8, No. 4, July 2010

3.0      Implementation
A package was developed and implemented using java and S-plus as the computing environment and run a on Pentium IV 1.80GHz of
processor, 512MB of RAM and 40GB of local disk. These packages can work on any windows based operating system but for best
performance, window XP professional is recommended.

The implementation could be started by lunching into the interface. It is followed by the menu editor which gives user access to the
whole application of neural network. The two classes (No Spray, Spray) of the data were first tested for the similarity/dissimilarity and
correlation test was performed to generate the training weights from the given data using S-Plus, and the results were shown below:

Table: Results of the analysis.

 Variable                  RankSumSquare(RSS)          Weight              Std.Error        Intercept

 Good Seed                                   84900.2            0.52       0.000000+00

 Bad Seed                                    84900.2            0.45       0.000000+00

 Germinating Seed                            1824.32            0.40       0.000000+00      spray.sw

 Seed Weight                                 7339.87            0.25       0.000000+00

 Days to flowering                          12295.91          -0.31        0.000000+00      spray.df

 Days to maturity                            1828.34            0.27       0.000000+00

 Plant stand/hectare                        20272.38            0.24       0.000000+00

Degree of freedom: 30 totals; 28 residual
Residual standard error (on weighted scale): 55.06496
The model for the correlation and the neural network are as follow:
Y = Cowpea yield


Yij =                  + Error

Where Wij = (X”X)-1X”Y

The figure shows the application as it appears when it initially starts up. Initially, the network activation weights are all zero. The
network has learned no patterns at this point.

                                                                                                       ISSN 1947-5500
                                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                                           Vol. 8, No. 4, July 2010

                                                  Fig2: neural network before training

To train the network to recognize the pattern 1000000, enter 1000000 under the “Input pattern to run or train”. Click the “Train”
button. Notice the weight matrix adjust to absorb the new knowledge.

                                               Fig3: neural network after the first training

Now the network will be tested. Enter the pattern 1000000 into the “Input pattern to run or train” (it should still be there from your
training). The output will be “1000000”. This is an auto associative network, therefore it echoes the input if it recognizes it.

Now try something that does not match the training pattern exactly. Enter the pattern “01000000” and click “Run”. The output will
now be “0111000”. The neural network did not recognize “01000000”, but the closest thing it knew was “0111000”. Now test the side
effect mentioned previously. Enter “0111111”, which is the binary inverse of what the network was trained with (“0111111”). The
networks always get trained for the binary inverse too. So if you enter “00000001”, the network will recognize it.

Then , the final test. Enter “0000000”, which is totally off base and not close to anything the neural network knows. The neural
network responds with “0111000”, it did try to correct, it has idea what you mean. You can play with the network more. It can be

                                                                                                      ISSN 1947-5500
                                                          (IJCSIS) International Journal of Computer Science and Information Security,
                                                          Vol. 8, No. 4, July 2010

taught more than one pattern. As you train new patterns it builds upon the matrix already in memory. Pressing “Clear” clears out the

4.0        Discussion of Result

Table 2:
Ten (10) varieties of Cowpea were used for classification and the results are shown bellow:

                Variety           MDA       Neural Network               Test
                IT845-224-4        A                A                       A
                IT86D-716          R                A                       A
                IT90K-277-2        A                A                       A
                Tvu-13743          A                A                       A
                Tvu-1890           A                A                       A
                Tvu-14476          A                A                       A
                IT86D-715          A                A                       A
                IT86D-719          A                A                       A
                Tvu13731           A                A                       A
                TvNu72             A                A                       A

               A = Accept
               R = Reject
For the NN, the results may be considered very promising. In ten (10) varieties that were tested, all the varieties were accepted in the
test, one (IT86D-716) is rejected by the multivariate discriminant but accepted by the neural network.

5.0    Conclusion
To explore new ways to help the process of selection of new cowpea varieties, neural networks (NN) algorithm and multivariate
discriminant (MDA) based model were developed to classify ten (10) varieties of cowpea which were widely planted in Kano and, the
methods (NN and MDA) used showed that Neural Networks could be considered as a promising technique to develop support tools
for the process of selection of new cowpea varieties. For the NN, the results may be considered very promising. In ten (10) varieties
that were tested, all the varieties were accepted in the test, one (IT86D-716) is rejected by the multivariate discriminant but accepted
by the neural network.


[1]        Alan S. Lapedes, Robert M. Farber: How Neural Nets Work. NIPS 1987: 442-456

[2]        Elisa R. and Marco B. (2007). Multivariate Statical Tools for the Evaluation of Proteomic 2D-maps. Protomics 4:53-66.

                                                                                                     ISSN 1947-5500
                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                            Vol. 8, No. 4, July 2010

[3]        Jyhshyan Lan, Michael Y. Hu, B. Eddy Patuwo, G. Peter Zhang: An investigation of neural network classifiers with unequal
misclassification costs and group sizes. Decision Support Systems 48(4): 582-591 (2010)

[4]     Lapedes, A. & Farber, R. (1987): Nonlinear signal processing using neural network. International Journal of Forecasting.
v14. 323-337.

[5]        McCulloch W.S. and Pitts W. (1943), "A Logical Calculus of the Ideas Immanent in Nervous Activity," Bulletin of
Mathematical Biophysics, 5, 115-133.
[6]        Minsky M. and Papert S. (1969): Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge, Mass.

[7]        Patil, P. N. and Ratnaparkhi, M. V. (2000). A nonparametric test for size biasedness. Jour. of Indian Statist. Assoc., 38, 369-

[8]        Patil, P. N. and Speckman, P. (2004). Constrained kernel regression. Jour. of Indian Statist. Assoc., 42, 87-98.

[9]        Potts W.J.E. (2000), Neural Network Modeling Course Notes, Cary: SAS Institute, Inc.
[10]       Quian N and Sejnowski T.J. (1988). "Predicting the Secondary Structure of Globular Proteins Using Neural Network
Models," Journal of Molecular Biology, 202, 865-884.
[11]       Robert J. Schalkoff: (1987): Analysis of the weak solution approach to image motion estimation. Pattern Recognition 20 (2):
[12]     Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,”
Psychol. Rev., Vol. 65, p. 386.
[13]     Rosenblatt, F. (1960). “Perceptron Simulation Experiments,” Proc. IRE, Vol. 48, pp. 301–309.
[14]     Trumbo B.E. Norton J.A. Freerks L. (1999), "Using CIS/ED to Make a Reading List on Neural Nets," STATs Magazine,
winter, 1999.
[15]     Widrow 1. B. and. Hoff M. E, Jr. (1960). “Adaptive Switching Circuits,” IRE WESCON Conv. Rec., Part 4, pp. 96–104.
[16] Zhang, G., B. Eddy Patuwo, et al. (1998). "Forecasting with artificial neural networks: The state of the art." International Journal
     of Forecasting 14(1): 35-62.

                                                                                                       ISSN 1947-5500

Description: The International Journal of Computer Science and Information Security is a monthly periodical on research articles in general computer science and information security which provides a distinctive technical perspective on novel technical research work, whether theoretical, applicable, or related to implementation. Target Audience: IT academics, university IT faculties; and business people concerned with computer science and security; industry IT departments; government departments; the financial industry; the mobile industry and the computing industry. Coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. Thanks for your contributions in July 2010 issue and we are grateful to the reviewers for providing valuable comments. IJCSIS July 2010 Issue (Vol. 8, No. 4) has an acceptance rate of 36 %.