Document Sample

ARTIFICIAL NEURAL NETWORKS A SEMINAR REPORT ON ARTIFICIAL NEURAL NETWORKS From the department of Computer Science College of Natural Sciences (COLNAS) University of Agriculture, Abeokuta (UNAAB) ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 1 ARTIFICIAL NEURAL NETWORKS A SEMINAR REPORT ON ARTIFICIAL NEURAL NETWORKS CSC(427) ............................ 1 INTRODUCTION ................................................................................................................................ 4 THE BRAIN,NEURAL NETWORKS AND COMPUTERS ........................................................... 6 The human brain ............................................................................................................................... 6 How the human brain learns ............................................................................................................ 6 ARTIFICIAL NEURONES ............................................................................................................. 7 Neural network behaviour ................................................................................................................ 8 Unsupervised learning ...................................................................................................................... 8 Human neural networks versus conventional computers .............................................................. 9 APPLICATION OF ARTIFICIAL NEURAL NET ......................................................................... 10 Real life applications ...................................................................................................................... 10 Neural network software ................................................................................................................ 10 Learning paradigms ........................................................................................................................ 10 CURRENT RESEARCHES .............................................................................................................. 12 Culture and Research...................................................................................................................... 12 Fundamentals of wavelet theory .................................................................................................... 13 Wavelet neural networks ................................................................................................................ 13 Wavelet back propagation neural networks .................................................................................. 14 Competitive neural ......................................................................................................................... 15 Parallel wavelet back propagation neural networks ..................................................................... 16 ERROR ESTIMATION ................................................................................................................. 16 TYPES OF MODELS ........................................................................................................................ 17 THE MULTI LAYER FEED FORWARD NEURAL NETWORK (MLFFNN) ...................... 17 Initializing the weights and training sample ................................................................................. 18 The modified back propagation algorithm .................................................................................... 19 Different learning rate annealing schedules .................................................................................. 19 Terminating condition .................................................................................................................... 20 The test sample ............................................................................................................................... 20 SELF ORGANIZING MAPS ........................................................................................................ 20 ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 2 ARTIFICIAL NEURAL NETWORKS Unsupervised learning .................................................................................................................... 20 GENERAL IDEA OF THE SOM MODEL.................................................................................. 20 THE SOM ALGORITHM ............................................................................................................. 21 SELF- ORGANISING MAP learning........................................................................................... 21 Map quality measures ..................................................................................................................... 22 Mapping precision .......................................................................................................................... 23 VISUALIZING THE SOM............................................................................................................ 23 Applications of SOM...................................................................................................................... 23 Disadvantages of SOM................................................................................................................... 23 LEARNING ALGORITHMS ............................................................................................................ 24 Simulated annealing ....................................................................................................................... 25 The basic iteration .......................................................................................................................... 25 Evolutionary computation .............................................................................................................. 28 Evolutionary algorithms ................................................................................................................. 28 Expectation maximization algorithm ............................................................................................ 29 Description ...................................................................................................................................... 29 Applications .................................................................................................................................... 29 NEURAL NETWORK SOFTWARE ............................................................................................... 31 Simulators ....................................................................................................................................... 31 SNNS research neural network simulator ..................................................................................... 31 Data analysis simulators ................................................................................................................. 31 STRENGTHS AND WEAKNESSES OF NEURAL NETWORK MODELS .............................. 32 CONCLUSION ................................................................................................................................... 34 ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 3 ARTIFICIAL NEURAL NETWORKS INTRODUCTION One type of network sees the nodes as ‗artificial neurons‘. These are called artificialneural networks (ANNs). An artificial neuron is a computational model inspired in the natural neurons. Natural neurons receive signals through synapses located on the dendrites or membrane of the neuron. When the signals received are strong enough (surpass a certain threshold), the neuron is activated and emits a signal though the axon. This signal might be sent to another synapse, and might activate other neurons. Figure 1.Natural neurons (artist’s conception). The complexity of real neurons is highly abstracted when modelling artificialneurons. These basically consist of inputs (like synapses), which are multiplied by weights(strength of the respective signals), and then computed by a mathematical function whichdetermines the activation of the neuron. Another function (which may be the identity) computes the output of the artificial neuron (sometimes in dependance of a certain threshold). ANNs combine artificial neurons in order to process information. The higher a weight of an artificial neuron is, the stronger the input which is multiplied by it will be. Weights can also be negative, so we can say that the signal is inhibited by the negative weight. Depending on the weights, the computation of the neuron will be different. By adjusting the weights of an artificial neuron we can obtain the output we want for specific inputs. But when we have an ANN of hundreds or thousands of neurons, it would be quite complicated to find by hand all the necessary weights. But we can find algorithms which can adjust the weights of the ANN in order to obtain the desired output from the network. This process of adjusting the ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 4 ARTIFICIAL NEURAL NETWORKS weights is called learning or training. The number of types of ANNs and their uses is very high. Since the first neural model by McCulloch and Pitts (1943) there have been developed hundreds of differentmodels considered as ANNs. When creating a functional model of the biological neuron, there are three basic components of importance. First, the synapses of the neuron are modeled as weights. The strength of the connection between an input and a neuron is noted by the value of the weight. Negative weight values reflect inhibitory connections, while positive values designate excitatory connections [Haykin]. The next two components model the actual activity within the neuron cell. An adder sums up all the inputs modified by their respective weights. This activity is referred to as linear combination. Finally, an activation function controls the amplitude of the output of the neuron. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 5 ARTIFICIAL NEURAL NETWORKS THE BRAIN,NEURAL NETWORKS AND COMPUTERS Artificial neural networks may either be used to gain an understanding of biological neural networks, or for solving artificial intelligence problems without necessarily creating a model of a real biological system. If we desire to build an intelligent machine then what better way to start than by imitating the human mind, evolution's most intelligent species. The human brain The human brain contains about 10 billion nerve cells, or neurons. On average, each neuron is connected to other neurons through about 10 000 networks, or synapses: the most massively connected network as yet known. The brain's network of neurons forms a massively parallel information processing system. Many studies suggest that humans may use less than 10 percent of their brains' potential power. While this anecdotal evidence has not been scientifically proven, it has become another mystery. Traditionally, the neuron has been regarded as simply a switch of some sort, giving an output for a particular combination of inputs - very like a computer logic gate. This view is however very wrong. Recent research has shown that neurons perform considerable processing in both space and time, the neuron output is the result of a vast computation, perhaps equivalent to one of our own supercomputers. A neuron is itself a cell, each of which we now believe to contain microtubule computers (thousands per cell, each operating at perhaps 10 million cycles per second). How the human brain learns Much is still unknown about how the brain trains itself to process information, so theories abound. In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites(input zone). The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. This spiking event is also called depolarization, and is followed by a refractory period, during which the neuron is unable to fire. At the end of each branch(output zone), a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity from the axon in the connected neurones. Transmission of an electrical signal from one neuron to the next is effected by neurotransmittors, chemicals which are released from the first neuron and which bind to receptors in the second. When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 6 ARTIFICIAL NEURAL NETWORKS another changes i.e., altering the strengths of connections between neurons, and by adding or deleting connections between neurons. Furthermore, they learn "on-line", based on experience. . ARTIFICIAL NEURONES We conduct these neural networks by first trying to deduce the essential features of neurones and their interconnections. We then typically program a computer to simulate these features. However because our knowledge of neurones is incomplete and our computing power is limited, our models are necessarily gross idealisations of real networks of neurones. It is possible to create a Universal Computer with a neural network architecture, but this is well beyond the current abilities, so let us start with something simple. Any computation requires an input, a process and an output. This three stage design can be emulated by having a set of input neurons (connected to a sensing device), these in turn connected to a set or sets of (hidden) neurons to process the inputs, which are themselves connected to a set of output neurons (driving a display device). Each set of neurons is called a layer. The number of neurons used for each layer, their interconnections and the number of layers optimum for any particular task is subject to much debate. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 7 ARTIFICIAL NEURAL NETWORKS Neural network behaviour Now let us compare this human activity with neural networks. Whenever we create a new neural network, it is like giving birth to a child. After, we start to train the network. Not surprisingly, we may have created the neural network for certain applications or purposes. Here, the difference between childbirth and neural networks is obvious; first, we decide why we need a neural net and create it. In the same way that a child becomes an expert in an area, we train the neural networks to become expert in an area. Several techniques have been suggested for this, broadly grou ped into two classes: The first assumes that we know what the result should be (like a teacher instructing a pupil). In this case we can present the input, check what the output shows and then adjust the strengths/connections until the correct output is given. This can be repeated with all available test inputs until the network gets as close to error free as possible.This type of technique is known as back-propagation (also as feedforward - its normal recognition mode). Many other forms of the technique are also used, with varying degrees of support and success. These supervised learning methods use manual reinforcement (strengthening of correct connections, weakening of poor ones) but are slow to train and have many other drawbacks, including inability to innovate (go beyond what is known). In our daily life, in many instances we have already transferred decision-making processes to computers. For example, say you attempt to purchase a product using a credit card over the Internet. For some reason, the billing address does not match the mailing address; it may be due to missing letters or misspelled words or other reasons. Although you are the correct person using a valid credit card, the purchase does not go through because the seller's computer does not allow transactions with a mismatch in the address. Although instances such as this happen daily in our lives, we tend to forget the computer's role in the decision. Unsupervised learning The complexity of our own brains means that we can achieve multiple categorisation, we recognise many aspects of any object at the same time. As yet Neural Network systems are very limited in comparison, but simple network structures are known to have the ability to self- organise. The second class of techniques make use of this idea. This type of unsupervised learning mimics the more interesting aspects of human behaviour, our ability to learn for ourselves, to add one and one and make three. In these cases we need the network to recognise features of the input data itself (categorise it) and to display its findings in some way as to be of use (which may include movement or other actions). Kohonen developed an algorithm (the Self-Organizing Map or SOM) to mimic the brain's ability to self organise and this forms the basis of most types of self-learning Neural Network. In this method, arrays of data (initially random) are compared to the input signal and the closest match found adjusted slightly to improve the fit. This is repeated for all input options, gradually leading to the network weights converging upon the set of input options encountered. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 8 ARTIFICIAL NEURAL NETWORKS Human neural networks versus conventional computers It is known that Both can learn and become expert in an area and both are mortal with energy usage of 30W.But they have comparisons classified under different cases: 1. processing elements: even though they have the same element size, the brain has up to 10^14 synapses as compared to 10^8 transistors of computers. 2. Processing speed : consider the time taken for each elementary operation: neurons typically operate at a maximum rate of about 100 Hz, while a conventional CPU carries out several hundred million machine level operations per second. 3. style of computation: using parallel, distributed system,the brain network is composed of a large number of highly interconnected neurones working in parallel to solve a specific problem. Neural networks learn by example. They cannot be programmed to perform a specific task. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be unpredictable. On the other hand, conventional computers use a cognitive approach to problem solving; the way the problem is to solved must be known and stated in small unambiguous instructions, which are then converted to a high level language program and then into machine code that the computer can understand. These machines are totally predictable. 4. fault tolerant : humans can forget but neural networks cannot. Once fully trained, a neural net will not forget. Whatever a neural network learns is hard-coded and becomes permanent. A human's knowledge is volatile and may not become permanent. The other difference is accuracy. Once a particular application or process is automated through a neural network, the results are repeatable and accurate. Whether the process is replicated ―n‖ times, the results will be the same and will be as accurate as calculated the first time. Human beings are not like that. The first 10 processes may be accurate, but later mistakes may happen. 5. Learning : the brain can learn (reorganize itself) from experience. This means that partial recovery from damage is possible if healthy units can learn to take over the functions previously carried out by the damaged areas. But without ‗thorough‘ supervised learning,the computer is known to work within what it knows.Scientists have manage to develop Genetic Algorithms and Fuzzy Logic to create systems relying on probabilistic matching. That is, we cannot be certain of the results we obtain, each result is merely more probable than the alternatives, the system just chooses that result with the highest likelihood. That may seem a drawback, yet almost certainly relates more closely to the actual workings of our own brains. 6. Intelligence, consciousness: this property of the human brain is what makes human the higher animals and the most dynamic of all living organisms. The issue for Artificial neural networks is the basis of frequent research being conducted up till date on how a computer can have ‗its own mind‘. That is, it has not been found yet. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 9 ARTIFICIAL NEURAL NETWORKS APPLICATION OF ARTIFICIAL NEURAL NET The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations and also to use it. This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical. Real life applications The tasks to which artificial neural networks are applied tend to fall within the following broad categories: Function approximation, or regression analysis, including time series prediction and modelling. Classification, including pattern and sequence recognition, novelty detection and sequential decision making. Data processing, including filtering, clustering, blind signal separation and compression. Application areas of ANNs include system identification and control (vehicle control, process control), game-playing and decision making (backgammon, chess, racing), pattern recognition (radar systems, face identification, object recognition, etc.), sequence recognition (gesture, speech, handwritten text recognition), medical diagnosis, financial applications, data mining (or knowledge discovery in databases, "KDD"), visualization and e-mail spam filtering. Moreover, some brain diseases, e.g. Alzheimer, are apparently, and essentially, diseases of the brain's natural NN by damaging necessary prerequisites for the functioning of the mutual interconnections between neurons and/or glia Neural network software Neural network software is used to simulate, research, develop and apply artificial neural networks, biological neural networks and in some cases a wider array of adaptive systems. Learning paradigms There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of network architecture can be employed in any of those tasks. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 10 ARTIFICIAL NEURAL NETWORKS Supervised learning: In supervised learning, we are given a set of example pairs and the aim is to find a function f in the allowed class of functions that matches the examples. In other words, we wish to infer how the mapping implied by the data and the cost function is related to the mismatch between our mapping and the data. Unsupervised learning: In unsupervised learning we are given some data x , and a cost function which is to be minimized which can be any function of x and the network's output, f. The cost function is determined by the task formulation. Most applications fall within the domain of estimation problems such as statistical modeling, compression, filtering, blind source separation and clustering. x is usually not given, but generated by Reinforcement learning: In reinforcement learning, data an agent's interactions with the environment. At each point in time t, the agent performs an action yt and the environment generates an observation xt and an instantaneous cost ct, according to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost, i.e. the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 11 ARTIFICIAL NEURAL NETWORKS CURRENT RESEARCHES In the recent years, neural networks are considered as moreappropriate techniques for solving the complex and timeconsuming problems. They are broadly utilized in civil andstructural engineering applications. Determining the dynamictime history responses of structures for the earthquakesloadings is one of the time consuming problems with a huge computational burden. In the present study, neural networks areemployed to predict the time history responses of structures.Some neural networks such as radial basis function (RBF),generalized regression (GR), counter propagation (CP), backpropagation (BP) and wavelet back propagation (WBP) neuralnetworks are used in civil and structural engineeringapplications [1-3]. As shown the performancegenerality of WBP for approximating the structural time historyresponses is better than that of the RBF, GR, CP and BP neuralnetworks. Therefore, in this study we have focused on WBPneural networks and its improvements. The most important phase in the neural networks training is data generation. Asemphasized in the relevant professional literatures there is no explicit method to select the training samples andtherefore this job is usually accomplished on the random basis. Current research area in neural network include -the function block to programmable logic controller library function -wavelet back propagation neural network for structural dynamic analysis amongst others For the purpose of this paper we will be taking a bit a deep look into research in the second one. Culture and Research. Therefore, in the large scale problems selection of propertraining data may require significant computer effort. Also, inthe case of such problems, to train a robust neural network,many training samples must be selected. In the present paper,we introduce a new neural system for eliminating the maindifficulties occurred in training mode of WBP neural networks.The new system is designed in two main phases. In the first phase, the input space is classified based on one criterion usinga competitive neural network. In the second phase, one distinctWBP neural network is trained for each class using datalocated. In this manner, a set of parallel WBP neural networksare substituted with a single WBP neural network. The neuralsystem is called parallel wavelet back propagation (PWBP)neural networks. The numerical results indicate that the performance generality of PWBP is better than that of thesingle WBP neural network. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 12 ARTIFICIAL NEURAL NETWORKS Fundamentals of wavelet theory Wavelet theory is the outcome of multi-disciplinaryendeavours that brought together mathematicians, physicistsand engineers. This relationship creates a flow of ideas thatgoes well beyond the construction of new bases or transforms.The term of wavelet means a little wave. A functionA New Wavelet Back Propagation NeuralNetworks for Structural Dynamic Analysish ∈L2 (R) (the set of all square integrable or a finite energy function) is called a wavelet if it has zero average on (−∞,+∞) +∞ ∫ h(t) dt= 0 (1) −∞ This little wave must have at least a minimum oscillation anda fast decay to zero in both the positive and negative directionsof its amplitude. These three properties are theGrossmann-Morlet admissibility conditions of a function that isrequired for the wavelet transform. The wavelet transform is anoperation which transforms a function by integrating it withmodified versions of some kernel functions. The kernelfunction is called the mother wavelet and the modified versionis its daughter wavelet. A function h ∈L2 (R) is admissible if: +∞ ∫ dω = 0 (2) −∞ where H(ω) is the Fourier transform of h(t). The constant h c isthe admissibility constant of the function h(t). For a given h(t),the condition h c < ∞ holds only ifH(0) = 0 .The wavelet transform of a function h ∈L2 (R) with respectto a given admissible mother wavelet h(t) is defined as: +∞ W f (a, b) = ∫ f(t) ha,b * (t)dt (3) −∞ where * denotes the complex conjugate. However, mostwavelets are real valued. Sets of wavelets are employed for approximation of a signaland the goal is to find a set of daughter wavelets constructed bydilated and translated original wavelets or mother wavelets thatbest represent the signal. The daughter wavelets are generated from a single mother wavelet h(t) by dilation and translation: ha,b (t) = h( ) (4) Where a > 0 is the dilation factor, b is the translation factor. Wavelet neural networks Wavelet neural networks (WNN) employing wavelets as theactivation functions recently have been researched as analternative approach to the neural networks with sigmoidalactivation ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 13 ARTIFICIAL NEURAL NETWORKS functions. The combination of wavelet theory andneural networks has lead to the development of WNNs. WNNsare feed forward neural networks using wavelets as activation function. In WNNs, both the position and the dilation of thewavelets are optimized besides the weights. Wavenet is anotherterm to describe WNN. Originally, wavenets did refer to neuralnetworks using wavelets. In wavenets, the position and dilationof the wavelets are fixed and the weights are optimized. Wavelet back propagation neural networks BP network is now the most popular mapping neuralnetwork. But it has few problems such as trapping into localminima and slow convergence. Wavelets are powerful signalanalysis tools. They can approximately realize thetime-frequency analysis using a mother wavelet. The motherwavelet has a square window in the time-frequency space. Thesize of the window can be freely variable by two parameters.Thus, wavelets can identify the localization of unknown signalsat any level. Activation function of hidden layer neurons in BPneural network is a sigmoidal function shown in Fig.1a. Todesign wavelet back propagation (WBP) neural network wesubstitute hidden layer sigmoidal activation function of BP with POLYWOG1 wavelet: HPOLYWOG 1 (t) (exp(1) (t) exp( t 2 / 2)) (5) Plot of POLYWOG1 with a=1 and b=0, is shown in Fig.1.b. Inthe resulted WBP neural network, the position and dilation ofthe wavelets as activation function of hidden layer neurons arefixed and the weights of network are optimized using the SCGalgorithm. In this study, we obtain good results considering b =0 and a = 2.5. The activation function of the hidden layer neurons is as (6). HPOLYWOG 1 (t) (exp(1) (t / 2.5) exp( (t / 2.5)2 / 2)) (6) Therefore, WBP is a modified back propagation neuralnetwork with POLYWOG1 hidden layer neurons activationfunction. And adjusting the weights of the neural network isperformed using SCG algorithm. Typical topology of WBP isshown in Fig.2. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 14 ARTIFICIAL NEURAL NETWORKS Competitive neural Fig.1:a) Sigmoidal function, b) POLYWOG1 wavelet Fig.2: Typical topology of WBP Some applications need to group data that may, or may notbe, clearly definable. Competitive neural networks can learn todetect regularities and correlations in their input and adapt theirfuture responses to that input accordingly. The neurons ofcompetitive networks learn to recognize groups of similar inputvectors. A competitive network automatically learns to classifyinput vectors. However, the obtained classes by the competitivenetwork depend only on the distance between input vectors. Iftwo input vectors are very similar, the competitive networkprobably will put them in the same class. There is nomechanism in a strictly competitive network design to saywhether or not any two input vectors are in the same class ordifferent classes. A competitive network simply tries to identifygroups as best as they can. Training of competitive network isbased on Kohonen [10] self-organization algorithm. A keydifference between this network and many other networks isthat the competitive network learns without supervision.During training the weights of the winning neuron are updatedaccording to: ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 15 ARTIFICIAL NEURAL NETWORKS Wij( k + 1) = Wij (k) + α[xj(k) - Wij (k)] (7) wherewijis the weight of competitive layer from input i toneuron j, xjis jth component of the input vector, α is learningrate and k is discrete time. Parallel wavelet back propagation neural networks As mentioned previously, in the case of large problems totrain a neural network with acceptable performance generality,it is quite necessary that an adequate number of training data tobe selected. Therefore, too much computer effort is required inthe training phase. To attain appropriate generalizationspending low effort we propose the PWBP neural networks. At first, the selected input-target training pairs are classifiedin some classes based on a specific criterion. In other words, theinput and target spaces are divided into some subspaces as the data located in each subspace have similar properties. Now wecan train a small WBP neural network for each subspace usingits assigned training data. Considering the mentioned simple strategy a single WBP neural network which is trained for allover the input space is substituted with a set of parallel WBPneural networks as each of them is trained for one segment of the classified input space.In PWBP, each WBP neural network has specific dilationfactor which may differ from that of the other WBP neuralnetworks. Therefore performance generality of PWBP neuralnetworks is higher than that of the single WBP neural network. Improving generalization process of PWBP is performedvery rationally and economically in comparison with that of thesingle WBP neural network. In other words, improvinggeneralization process and retraining of some small parallelWBP neural networks have low effort with respect to those ofthe single WBP neural network. Furthermore, it is veryprobable that some of the parallel WBP neural networks ofPWBP require no improving generalization. Selection a proper criterion for classification of the inputspace depends on the nature of the problem and its variablesthus recognition of the effective arguments to select an efficientcriterion has very significant influence on the generality ofPWBP. Determination the number of the classes depends on thecomplicacy and size of the input space and there are no special criteria for this mean. ERROR ESTIMATION In the present study to evaluate the error between exact andapproximate results, the root mean squared error (RMSE) iscalculated. 2 ) (8) where, xi and ‗xi are the ith component of the exact andapproximated vectors, respectively. nis the vectors dimension.To measure how successful fitting is achieved between exactand approximate responses, the Rsquare statistic measurementis employed. A value closer to 1 indicates a better fit. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 16 ARTIFICIAL NEURAL NETWORKS Rsquare= 1 - (9) where, x is the mean of exact vectors component. TYPES OF MODELS THE MULTI LAYER FEED FORWARD NEURAL NETWORK (MLFFNN) This is perhaps the most popular network architecture in use today, due originally to Rumelhart and McClelland. In this type of network discussed briefly the units each perform a biased weighted sum of their inputs and pass this activation level through a transfer function to produce their output, and the units are arranged in a layered feedforward topology. The network thus has a simple interpretation as a form of input-output model, with the weights and thresholds (biases) the free parameters of the model. Such networks can model functions of almost arbitrary complexity, with the number of layers, and the number of units in each layer, determining the function complexity. Important issues in Multilayer Perceptrons (MLP) design include specification of the number of hidden layers and the number of units in these layers. The number of input and output units is defined by the problem there may be some uncertainty about precisely which inputs to use, a point to which we will return later. However, for the moment we will assume that the input variables are intuitively selected and are all meaningful. The number of hidden units to use is far from clear. As good a starting point as any is to use one hidden layer, with the number of units equal to half the sum of the number of input and output units. Again, we will discuss how to choose a sensible number later. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 17 ARTIFICIAL NEURAL NETWORKS Figure 1: Architecture or Topology The example is a 3 layer feed forward Artificial Neural Network (ANN) using back propagation algorithm as a framework to classify customers of a company into different categories.The MLFFNN as shown in Figure 1 above for the classificationof customer invoicing data is a two layer feed forwardneural network having a variable architecture. The 5 input nodes correspond to the 3 sets of 5 values each (of the customer invoicing data) applied to the neural network as training samples. The 3 nodes in the output layer correspond to the 3 levels of classification of customers as good, average and below average. For the purpose of a 3 fold classification, a code has been used. 100 is used as “Desired Output” for a good customer respectively on the 3 output nodes whereas a code of 010 is used for an average customer. The code of 001 is used as desired output for the category of below average customers. Initializing the weights and training sample The weights in the network are initialized to small randomnumbers. Each unit has a bias associated with it. The biases are similarly initialized to small random numbers. Each training sample, X; (0.4301266, 0.4888733, 0.308413, 0.80125, 0.485115, A, 1, 0, 0) (0.0802533, 0.09396, 0.012233, 0.701267, 0.098404, B, 0, 1, 0) (0.04794, 0.0460733, 0.002493, 0.306189, 0.040798, C, 0,0, 1) ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 18 ARTIFICIAL NEURAL NETWORKS Where A stands for the training data for good customers, B stands for the training data for average customers and C stands for the training data for below average customers. The processing of the above sample in training mode is given below. The modified back propagation algorithm (1) Initialize all weights and biases in the network; (2) While terminating condition is not satisfied { (3) For each training sample X in samples { (4) // Propagate the inputs forward: (5) For each hidden or output layer unit j { (6) Ij = Σ wijOi + Ǿj; //compute the net input I of unit j with respect to the previous layer, i (7) Oj =a/1+ e-bIj;} // Where a and b are control parameters used for the purpose of epoch control and analysis. (8) // Back propagate the errors; (9) For each unit j in the output layer (10) Errj = Oj (a-Oj)(b/a))(Tj-Oj); // compute the error (11) For each unit j in the hidden layers, from the last to the first hidden layer (12) Errj = Oj (a-Oj)(b/a) (Tj-Oj) Σ Errk*wjk; // k compute the error with respect to the next higher layer, k (13) For each weight wij in network { (14) Δwij=(l) ErrjOi; // weight increment (15) Wij = wij + Δwij;} //weight update (16) For each bias Ǿj; in network { (17) Δ Ǿj = (l) Errj; // bias increment (18) Ǿj = Ǿj + Δ Ǿj;} // bias update }} Different learning rate annealing schedules The variable l is the learning rate parameter, a constant typically having a value between 0.0 and 1.0. We have used variable Learning-Rate Annealing Schedules for the learning ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 19 ARTIFICIAL NEURAL NETWORKS rate parameter. Ideally, we would like that the MLFFNN learns initially at a rapid rate (l = constant, where the value of constant is around 0.8) but as the MLFFNN proceeds through a number of epochs, we would want that it slows down in its learning. This slow down in the learning process is achieved by keeping a search – then – converge schedule, defined by Darkan and Moody (1992) as l = ηǒ / (1 + (n/τ). Where ηǒ is given a value of 0.8 and analysis is performed to find the optimal values of τ ( an value of 10000 causes convergence in the error space in an optimal number of epochs for the customer invoicing data). Terminating condition •The mean squared error is below a threshold value. •A pre specified number of epochs has expired The test sample A testing sample (as Input) is applied when we have trained the network adequately and mean squared error hasfallen below the desired level and serves as a test ofeffectiveness of the analysis model. A test case is preparedfrom any other unused source system or location (BSS isused in our invoicing example). The effectiveness is measured on the basis of how closely a MLFFNN is able to classify a test sample correctly. A typical test sample T for the purpose of analysis is given below: (0.3001266, 0.4088733, 0.208413, 0.85125, 0.65115) For this test sample T, the MLFFNN should correctly classify the customer invoicing data in the category of good customers. SELF ORGANIZING MAPS Unsupervised learning The complexity of our own brains means that we can achieve multiple categorisation, we recognise many aspects of any object at the same time. As yet Neural Network systems are very limited in comparison, but simple network structures are known to have the ability to self- organise. GENERAL IDEA OF THE SOM MODEL The Self-Organizing Map (SOM) was introduced by TeuvoKohonen in 1982. In contrast to many other neural networks using supervised learning, the SOM is based on unsupervised ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 20 ARTIFICIAL NEURAL NETWORKS learning. A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional) representation of the input space of the training samples, called a map. The SOM can thus serve as a clustering tool of high-dimensional data. Because of its typical two-dimensional shape, it is also easy to visualize. Another important feature of the SOM is its capability to generalize. In other words, it can interpolate between previously encountered inputs. THE SOM ALGORITHM In a self-organizing map, the neurons are placed at the nodes of a lattice, and they become selectively tuned to various input patterns (vectors) in the course of a competitive learning process. In reality, the SOM belongs to the class of vector-coding algorithms . That is, a fixed number of codewords are placed into a higher-dimensional input space, thereby facilitating data compression. An integral feature of the SOM algorithm is the neighbourhood function centered around a neuron that wins the competitive process. The algorithm exhibits two distinct phases in its operation: 1. Ordering phase, during which the topological ordering of the weight vectors takes place 2. Convergence phase, during which the computational map is fine tuned The SOM algorithm exhibits the following properties: 1. Approximation of the continuous input space by the weight vectors of the discrete lattice. 2. Topological ordering exemplified by the fact that the spatial location of a neuron in the lattice corresponds to a particular feature of the input pattern. 3. The feature map computed by the algorithm reflects variations in the statistics of the input distribution. 4. SOM may be viewed as a nonlinear form of principal components analysis. SELF- ORGANISING MAP learning The basic idea of SOM is simple yet effective. The SOM defines a mapping from high dimensional input data space onto a regular two-dimensional array of neurons. Every neuron i of the map is associated with an n-dimensional reference vector , where n denotes the dimension of the input vectors. The reference vectors together form a codebook. The neurons of the map are connected to adjacent neurons by a neighbourhood relation, which dictates the topology, or the structure, of the map. The most common topologies in use are rectangular and hexagonal. Adjacent neurons belong to the neighbourhood Ni of the neuron i. In the basic SOM algorithm, the topology and the number of neurons remain fixed from the beginning. The number of neurons determines the granularity of the mapping, which has an effect on the accuracy and ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 21 ARTIFICIAL NEURAL NETWORKS generalization of the SOM. During the training phase, the SOM forms an elastic net that folds onto the "cloud" formed by input data. The algorithm controls the net so that it strives to approximate the density of the data. The reference vectors in the codebook drift to the areas where the density of the input data is high. Eventually, only few codebook vectors lie in areas where the input data is sparse. The learning process of the SOM goes as follows: 1. One sample vector x is randomly drawn from the input data set and its similarity (distance) to the codebook vectors is computed by using e.g. the common Euclidean distance measure: 2. After the BMU has been found, the codebook vectors are updated. The BMU itself as well as its topological neighbours are moved closer to the input vector in the input space i.e. the input vector attracts them. The magnitude of the attraction is governed by the learning rate. As the learning proceeds and new input vectors are given to the map, the learning rate gradually decreases to zero according to the specified learning rate function type 3. The update rule for the reference vector of unit i is the following: 4. The steps 1 and 2 together constitute a single training step and they are repeated until the training ends. The number of training steps must be fixed prior to training the SOM because the rate of convergence in the neighbourhood function and the learning rate is calculated accordingly. After the training is over, the map should be topologically ordered. This means that n topologically close (using some distance measure e.g. Euclidean) input data vectors map to n adjacent map neurons or even to the same single neuron. Map quality measures After the SOM has been trained, it is important to know whether it has properly adapted itself to the training data. Because it is obvious that one optimal map for the given input data ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 22 ARTIFICIAL NEURAL NETWORKS must exist, several map quality measures have been proposed. Usually, the quality of the SOM is evaluated based on the mapping precision and the topology preservation. Mapping precision The mapping precision measure describes how accurately the neurons 'respond' to the given data set. For example, if the reference vector of the BMU calculated for a given testing vector xi is exactly the same xi, the error in precision is then 0. Normally, the number of data vectors exceeds the number of neurons and the precision error is thus always different from 0. A common measure that calculates the precision of the mapping is the average quantization error over the entire data set: VISUALIZING THE SOM The SOM is easy to visualize and over the years, several visualization techniques have been devised. Due to the inherently intricate nature of the SOM, however, not one of the visualization methods discovered so far, has proven to be superior to others. At times, several different visualizations of the same SOM are needed to fully see the state of the map. From this, in can be concluded that every existing visualization method has its merits and demerits. Unified distance matrix, or u-matrix, is perhaps the most popular method of displaying SOMs. Applications of SOM The most important practical applications of SOMs are in exploratory data analysis, pattern recognition, speech analysis, robotics, industrial and medical diagnostics, instrumentation and control. The SOM can also be applied to hundreds of other tasks where large amounts of unclassified data is available. Disadvantages of SOM One major problem with SOMs is getting the right data. Unfortunately you need a value for each dimension of each member of samples in order to generate a map. Sometimes this simply is not possible and often it is very difficult to acquire all of this data so this is a limiting feature to the use of SOMs often referred to as missing data. Another problem is that every SOM is different and finds different similarities among the sample vectors. SOMs organize sample data so that in the final product, the samples are usually surrounded by similar samples, however similar samples are not always near each other. If you have a lot of shades of purple, not always will you get one big group with all the purples in that cluster, sometimes the clusters will get split and there will be two groups of purple. Using colors we could tell that those two groups in reality are similar and that they just got split, but with most data, those two clusters will look totally unrelated. So a lot of maps need to be constructed in order to get one final good map. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 23 ARTIFICIAL NEURAL NETWORKS LEARNING ALGORITHMS There are many algorithms for training neural networks; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. They include: Back propagation by gradient descent, Rprop, BFGS, CG etc. Evolutionary computation methods, simulated annealing, expectation maximization and non-parametric methods are among other commonly used methods for training neural networks. This is related to machine learning. Recent developments in this field also saw the use of particle swarm optimization and other swarm intelligence techniques used in the training of neural networks. Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. A major focus of machine learning research is to ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 24 ARTIFICIAL NEURAL NETWORKS automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too complex to describe generally in programming languages, so that in effect programs must automatically describe programs. Artificial intelligence is a closely related field, as also probability theory and statistics, data mining, pattern recognition, adaptive control, and theoretical computer science. Simulated annealing Simulated annealing (SA) is a generic probabilisticmetaheuristic for the global optimization problem of applied mathematics, namely locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete (e.g., all tours that visit a given set of cities). For certain problems, simulated annealing may be more effective than exhaustive enumeration — provided that the goal is merely to find an acceptably good solution in a fixed amount of time, rather than the best possible solution. The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. The heat causes the atoms to become unstuck from their initial positions (a local minimum of the internal energy) and wander randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one. By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random "nearby" solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter T (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when T is large, but increasingly "downhill" as T goes to zero. The allowance for "uphill" moves saves the method from becoming stuck at local optima—which are the bane of greedier methods. In the simulated annealing (SA) method, each point s of the search space is analogous to a state of some physical system, and the function E(s) to be minimized is analogous to the internal energy of the system in that state. The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy. The basic iteration At each step, the SA heuristic considers some neighbours' of the current state s, and probabilistically decides between moving the system to state s' or staying in state s. The probabilities are chosen so that the system ultimately tends to move to states of lower energy. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 25 ARTIFICIAL NEURAL NETWORKS Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted. The neighbours of a state The neighbours of a state are new states of the problem that are produced after altering the given state in some particular way. For example, in the traveling salesman problem, each state is typically defined as a particular permutation of the cities to be visited. The neighbours of some particular permutation are the permutations that are produced for example by interchanging a pair of adjacent cities. The action taken to alter the solution in order to find neighbouring solutions is called "move" and different "moves" give different neighbours. These moves usually result in minimal alterations of the solution, as the previous example depicts, in order to help an algorithm to optimize the solution to the maximum extent and also to retain the already optimum parts of the solution and affect only the suboptimum parts. In the previous example, the parts of the solution are the parts of the tour. Searching for neighbours to a state is fundamental to optimization because the final solution will come after a tour of successive neighbours. Simple heuristics move by finding best neighbour after best neighbour and stop when they have reached a solution which has no neighbours that are better solutions. The problem with this approach is that a solution that does not have any immediate neighbours that are better solution is not necessarily the optimum. It would be the optimum if it was shown that any kind of alteration of the solution does not give a better solution and not just a particular kind of alteration. For this reason it is said that simple heuristics can only reach local optima and not the global optimum. Metaheuristics, although they also optimize through the neighbourhood approach, differ from heuristics in that they can move through neighbours that are worse solutions than the current solution. Simulated Annealing in particular doesn't even try to find the best neighbour. The reason for this is that the search can no longer stop in a local optimum and in theory, if the metaheuristic can run for an infinite amount of time, the global optimum will be found. The annealing schedule Another essential feature of the SA method is that the temperature is gradually reduced as the simulation proceeds. Initially, T is set to a high value (or infinity), and it is decreased at each step according to some annealing schedule—which may be specified by the user, but must end with T = 0 towards the end of the allotted time budget. In this way, the system is expected to wander initially towards a broad region of the search space containing good solutions, ignoring small features of the energy function; then drift towards low-energy regions that become narrower and narrower; and finally move downhill according to the steepest descent heuristic. It can be shown that for any given finite problem, the probability that the simulated annealing algorithm terminates with the global optimal solution approaches 1 as the annealing schedule is ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 26 ARTIFICIAL NEURAL NETWORKS extended. This theoretical result, however, is not particularly helpful, since the time required to ensure a significant probability of success will usually exceed the time required for a complete search of the solution space. Pseudocode The following pseudocode implements the simulated annealing heuristic, as described above, starting from state s0 and continuing to a maximum of kmax steps or until a state with energy emax or less is found. The call neighbour(s) should generate a randomly chosen neighbour of a given states; the call random() should return a random value in the range [0,1]. The annealing schedule is defined by the call temp(r), which should yield the temperature to use, given the fraction r of the time budget that has been expended so far. s ← s0; e ← E(s) // Initial state, energy. sbest ← s; ebest ← e // Initial "best" solution k ← 0 // Energy evaluation count. while k <kmaxand e >emax // While time left & not good enough: snew ← neighbour(s) // Pick some neighbour. enew ← E(snew) // Compute its energy. ifenew<ebestthen // Is this a new best? sbest ← snew; ebest ← enew // Save 'new neighbour' to 'best found'. if P(e, enew, temp(k/kmax)) > random() then // Should we move to it? s ← snew; e ← enew // Yes, change state. k ← k + 1 // One more evaluation done returnsbest // Return the best solution found. Actually, the "pure" SA algorithm does not keep track of the best solution found so far: it does not use the variables sbest and ebest, it lacks the first if inside the loop, and, at the end, it returns the current state s instead of sbest. While saving the best state is a standard optimization, that can be used in any metaheuristic, it breaks the analogy with physical annealing — since a physical system can "store" a single state only. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 27 ARTIFICIAL NEURAL NETWORKS In strict mathematical terms, saving the best state is not necessarily an improvement, since one may have to specify a smaller kmax in order to compensate for the higher cost per iteration. However, the step sbest ← snew happens only on a small fraction of the moves. Therefore, the optimization is usually worthwhile, even when state-copying is an expensive operation. Evolutionary computation In computer science, evolutionary computation is a subfield of artificial intelligence more particularly computational intelligence that involves combinatorial optimization problems. Evolutionary computation uses iterative progress, such as growth or development in a population. This population is then selected in a guided random search using parallel processing to achieve the desired end. Such processes are often inspired by biological mechanisms of evolution.The use of Darwinian principles for automated problem solving originated in the fifties. It was not until the sixties that three distinct interpretations of this idea started to be developed in three different places. Evolutionary programming was introduced by Lawrence J. Fogel in the USA, while John Henry Holland called his method a genetic algorithm. In GermanyIngo Rechenberg and Hans- Paul Schwefel introduced evolution strategies. These areas developed separately for about 15 years. From the early nineties on they are unified as different representatives (―dialects‖) of one technology, called evolutionary computing. Also in the early nineties, a fourth strea m following the general ideas had emerged – genetic programming. These terminologies denote the field of evolutionary computing and consider evolutionary programming, evolution strategies, genetic algorithms, and genetic programming as sub-areas. Evolutionary algorithms Evolutionary algorithms form a subset of evolutionary computation in that they generally only involve techniques implementing mechanisms inspired by biological evolution such as reproduction, mutation, recombination, natural selection and survival of the fittest. Candidate solutions to the optimization problem play the role of individuals in a population, and the cost function determines the environment within which the solutions "live" (see also fitness function). Evolution of the population then takes place after the repeated application of the above operators. In this process, there are two main forces that form the basis of evolutionary systems: Recombination and mutation create the necessary diversity and thereby facilitate novelty, while selection acts as a force increasing quality. Many aspects of such an evolutionary process are stochastic. Changed pieces of information due to recombination and mutation are randomly chosen. On the other hand, selection operators can be either deterministic, or stochastic. In the latter case, individuals with a higher fitness have a higher chance to be selected than individuals with a lower fitness, but typically even the weak individuals have a chance to become a parent or to survive. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 28 ARTIFICIAL NEURAL NETWORKS Expectation maximization algorithm In statistics, an expectation-maximization (EM) algorithm is a method for finding maximum likelihood estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM algorithm was explained and given its name in a classic 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin. They pointed out that the method had been "proposed many times in special circumstances" by earlier authors. EM is an iterative method which alternates between performing an expectation (E) step, which computes the expectation of the log-likelihood evaluated using the current estimate for the latent variables, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. However, the convergence analysis of the Dempster-Laird-Rubin paper was flawed. A correct convergence analysis was published by C. F. Jeff Wu in 1983. Wu's proof established the EM method's convergence outside of the exponential family, as claimed by Dempster-Laird- Rubin. Description Given a likelihood functionL(θ; x, z), where θ is the parameter vector, x is the observed data and z represents the unobserved latent data or missing values, the maximum likelihood estimate (MLE) is determined by the marginal likelihood of the observed data L(θ; x), however this quantity is often intractable. The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying the following two steps: Expectation step: Calculate the expected value of the log likelihood function, with respect to the conditional distribution of z given x under the current estimate of the parameters θ(t): Maximization step: Find the parameter which maximizes this quantity: Applications EM is frequently used for data clustering in machine learning and computer vision. In natural language processing, two prominent instances of the algorithm are the Baum-Welch ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 29 ARTIFICIAL NEURAL NETWORKS algorithm (also known as forward-backward) and the inside-outside algorithm for unsupervised induction of probabilistic context-free grammars. In psychometrics, EM is almost indispensable for estimating item parameters and latent abilities of item response theory models. With the ability to deal with missing data and observe unidentified variables, EM is becoming a useful tool to price and manage risk of a portfolio. The EM algorithm (and its faster variant OS-EM) is also widely used in medical image reconstruction, especially in positron emission tomography and single photon emission computed tomography. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 30 ARTIFICIAL NEURAL NETWORKS NEURAL NETWORK SOFTWARE Simulators Neural network simulators are software applications that are used to simulate the behavior of artificial or biological neural networks. They focus on one or a limited number of specific types of neural networks. They are typically stand-alone and not intended to produce general neural networks that can be integrated in other software. Simulators usually have some form of built-in visualization to monitor the training process. Some simulators also visualize the physical structure of the neural network. SNNS research neural network simulator Historically, the most common type of neural network software was intended for researching neural network structures and algorithms. The primary purpose of this type of software is, through simulation, to gain a better understanding of the behavior and properties of neural networks. Today in the study of artificial neural networks, simulators have largely been replaced by more general component based development environments as research platforms. Commonly used artificial neural network simulators include the Stuttgart Neural Network Simulator (SNNS), Emergent, JavaNNS and Neural Lab. In the study of biological neural networks however, simulation software is still the only available approach. In such simulators the physical biological and chemical properties of neural tissue, as well as the electromagnetic impulses between the neurons are studied.Commonly used biological network simulators include Neuron, GENESIS, Nest and Brian. Oter simulators are XNBC and the BNN Toolbox for MATLAB. Data analysis simulators Unlike the research simulators, the data analysis simulators are intended for practical applications of artificial neural networks. Their primary focus is on data mining and forecasting. Data analysis simulators usually have some form of preprocessing capabilities. Unlike the more general development environments data analysis simulators use a relatively simple static neural network that can be configured. A majority of the data analysis simulators on the market use self-organizing maps as their core. The advantage of this type of software is that it is relatively easy to use. This however comes at the cost of limited capability. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 31 ARTIFICIAL NEURAL NETWORKS STRENGTHS AND WEAKNESSES OF NEURAL NETWORK MODELS Philosophers are interested in neural networks because they may provide a new framework for understanding the nature of the mind and its relation to the brain (Rumelhart and McClelland 1986, Chapter 1). Connectionist models seem particularly well matched to what we know about neurology. The brain is indeed a neural net, formed from massively many units (neurons) and their connections (synapses). Furthermore, several properties of neural network models suggest that connectionism may offer an especially faithful picture of the nature of cognitive processing. Neural networks exhibit robust flexibility in the face of the challenges posed by the real world. Noisy input or destruction of units causes graceful degradation of function. The net's response is still appropriate, though somewhat less accurate. In contrast, noise and loss of circuitry in classical computers typically result in catastrophic failure. Neural networks are also particularly well adapted for problems that require the resolution of many conflicting constraints in parallel. There is ample evidence from research in artificial intelligence that cognitive tasks such as object recognition, planning, and even coordinated motion present problems of this kind. Although classical systems are capable of multiple constraint satisfaction, connectionists argue that neural network models provide much more natural mechanisms for dealing with such problems. Over the centuries, philosophers have struggled to understand how our concepts are defined. It is now widely acknowledged that trying to characterize ordinary notions with necessary and sufficient conditions is doomed to failure. Exceptions to almost any proposed definition are always waiting in the wings. For example, one might propose that a tiger is a large black and orange feline. But then what about albino tigers? Philosophers and cognitive psychologists have argued that categories are delimited in more flexible ways, for example via a notion of family resemblance or similarity to a prototype. Connectionist models seem especially well suited to accommodating graded notions of category membership of this kind. Nets can learn to appreciate subtle statistical patterns that would be very hard to express as hard and fast rules. Connectionism promises to explain flexibility and insight found in human intelligence using methods that cannot be easily expressed in the form of exception free principles (Horgan and Tienson 1989, 1990), thus avoiding the brittleness that arises from standard forms of symbolic representation. Despite these intriguing features, there are some weaknesses in connectionist models that bear mentioning. First, most neural network research abstracts away from many interesting and possibly important features of the brain. For example, connectionists usually do not attempt to ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 32 ARTIFICIAL NEURAL NETWORKS explicitly model the variety of different kinds of brain neurons, nor the effects of neurotransmitters and hormones. Furthermore, it is far from clear that the brain contains the kind of reverse connections that would be needed if the brain were to learn by a process like backpropagation, and the immense number of repetitions needed for such training methods seems far from realistic. Attention to these matters will probably be necessary if convincing connectionist models of human cognitive processing are to be constructed. A more serious objection must also be met. It is widely felt, especially among classicists, that neural networks are not particularly goodat the kind of rule based processing that is thought to undergird language, reasoning, and higher forms of thought. (For a well known critique of this kind see Pinker and Prince 1988.) We will discuss the matter further when we turn to the systematicity debate. Another common criticism of neural networks, particularly in robotics, is that they require a large diversity of training for real-world operation.A. K. Dewdney, a former Scientific American columnist, wrote in 1997, "Although neural nets do solve a few toy problems, their powers of computation are so limited that I am surprised anyone takes them seriously as a general problem-solving tool." Arguments for Dewdney's position are that to implement large and effective software neural networks, much processing and storage resources need to be committed. While the brain has hardware tailored to the task of processing signals through a graph of neurons, simulating even a most simplified form on Von Neumann technology may compel a NN designer to fill many millions of database rows for its connections - which can lead to abusive RAM and HD necessities. Furthermore, the designer of NN systems will often need to simulate the transmission of signals through many of these connections and their associated neurons - which must often be matched with incredible amounts of CPU processing power and time. While neural networks often yield effective programs, they too often do so at the cost of time and money efficiency. Arguments against Dewdney's position are that neural nets have been successfully used to solve many complex and diverse tasks, ranging from autonomously flying aircraft to detecting credit card fraud.Some other criticisms came from believers of hybrid models (combining neural networks and symbolic approaches). They advocate the intermix of these two approaches and believe that hybrid models can better capture the mechanisms of the human mind ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 33 ARTIFICIAL NEURAL NETWORKS CONCLUSION As it was shown in this write up, the artificial neural network has a broad field of applications. They can do classifications, clustering, experimental design, modeling, mapping, etc. The ANNs are quite flexible for adaption to different type of problems and can be custom- designed to almost any type of data representations. The warning, however, should be issued here to the reader not to go over-excited upon a new tool just because it is new. A method itself, no matter how powerful it may seem to be can fail easily, first, if the data do not represent or are not correlated good enough to the information sought, secondly, if the user does not know exactly what should be achieved, and third, if other standard methods have not been tried as well - just in order to gain as much insight to the measurement and information space of data set as possible. They require a lot of study, good knowledge on the theory behind it, and above all, a lot of experimental work before they are applied to their full extend and power. The computing world has a lot to gain from neural networks. Their ability to learn by example makes them very flexible and powerful. Furthermore there is no need to devise an algorithm in order to perform a specific task; i.e. there is no need to understand the internal mechanisms of that task. They are also very well suited for real time systems because of their fast response and computational times which are due to their parallel architecture. Neural networks also contribute to other areas of research such as neurology and psychology. They are regularly used to model parts of living organisms and to investigate the internal mechanisms of the brain.Perhaps the most exciting aspect of neural networks is the possibility that some day 'consious' networks might be produced. There are a number of scientists arguing that consciousness is a 'mechanical' property and that 'consious' neural networks are a realistic possibility. I would like to state that even though neural networks have a huge potential we will only get the best of them when they are integrated with computing, AI, fuzzy logic and related subjects. Finally, although neural networks are not perfect in their prediction, they outperform all other methods and provide hope that one day we can more fully understand dynamic, chaotic systems such as the stock market. ARTIFICIAL INYELLIGENCE BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 34

DOCUMENT INFO

Shared By:

Tags:
the network, neural networks, neural network, artificial neural networks, neural nets, the brain, artificial intelligence, output layer, artificial neurons, hidden layer

Stats:

views: | 583 |

posted: | 7/21/2010 |

language: | English |

pages: | 34 |

Description:
computer science seminar

SHARED BY

About
I am KENNETH OBI,a graduate of University of Agriculture Abeokuta.my discipline is COMPUTER SCIENCE. My interest in ICT.

OTHER DOCS BY kobiatech

How are you planning on using Docstoc?
BUSINESS
PERSONAL

Feel free to Contact Us with any questions you might have.