Intro to Neural Networks Abhijit Kedia (firstname.lastname@example.org) Batch 2002-06 6th March „05 Why would anyone want a `new' sort of computer? What are (everyday) computer systems good at... .....and not so good at? Good at : Fast arithmetic, Doing precisely what the programmer programs them to do Not So Good at: Massive parallelism, Fault tolerance, Adapting to circumstances, Interacting with noisy data or data from the environment Where can neural network systems help? where we can't formulate an algorithmic solution. where we can get lots of examples of the behaviour we require. where we need to pick out the structure from existing data. What is a Neural Network? The question 'What is a neural network?„ is ill-posed. -- Pinkus (1999) A method of computing, based on the interaction of multiple connected processing elements What is a neural network? Neural networks are a form of multiprocessor computer system, with simple processing elements a high degree of interconnection simple scalar messages adaptive interaction between elements Biological Motivation • Biological Learning Systems are built of very complex webs of interconnected neurons. • Information-Processing abilities of biological neural systems must follow from highly parallel processes operating on representations that are distributed over many neurons •ANNs attempt to capture this mode of computation The biological inspiration • The brain has been extensively studied by scientists. • Vast complexity prevents all but rudimentary understanding. • Even the behaviour of an individual neuron is extremely complex What can a Neural Net do? Compute a known function Approximate an unknown function Pattern Recognition Signal Processing Learn to do any of the above Brain and Machine • The Brain – Pattern Recognition – Association – Complexity – Noise Tolerance • The Machine – Calculation – Precision – Logic Features of the Brain • Ten billion (1010) neurons • Neuron switching time >10-3secs • Face Recognition ~0.1secs • On average, each neuron has several thousand connections • Hundreds of operations per second • High degree of parallel computation • Distributed representations • Die off frequently (never replaced) • Compensated for problems by massive parallelism Basic Concepts A Neural Network generally maps a set of inputs to a set Input 0 Input 1 ... Input n of outputs Number of inputs/outputs is Neural Network variable The Network itself is Output 0 Output 1 ... Output m composed of an arbitrary number of nodes with an arbitrary topology Basic Concepts Definition of a node: Input 0 Input 1 ... Input n W0 W1 ... Wn • A node is an element which performs the Wb + + function y = fH(∑(wixi) + Wb) fH(x) Connection Output Node Simple Perceptron Binary logic application Input 0 Input 1 fH(x) = u(x) [linear threshold] Wi = random(-1,1) W0 W1 Wb + Y = u(W0X0 + W1X1 + Wb) fH(x) Now how do we train it? Output Basic Training Perception learning rule ΔWi = η * (D – Y) * Xi η = Learning Rate D = Desired Output Adjust weights based on a how well the current weights match an objective Logic Training Expose the network to the logical X0 X1 D OR operation 0 0 0 Update the weights after each 0 1 1 epoch 1 0 1 1 1 1 As the output approaches the desired output for all cases, ΔWi will approach 0 Results W0 W1 Wb Details Network converges on a hyper-plane decision surface X1 = (W0/W1)X0 + (Wb/W1) X1 X0 Typical Activation Functions F(x) = 1 / (1 + e -k ∑ (wixi) ) Shown for k = 0.5, 1 and 10 Using a nonlinear function which approximates a linear threshold allows a network to approximate nonlinear functions Back-Propagated Delta Rule Networks (BP) Inputs are put through a Input 0 Input 1 ... Input n „Hidden Layer‟ before the output layer H0 H1 Hm ... Hidden Layer All nodes connected O0 O1 Oo between layers ... Output 0 Output 1 ... Output o BP Network Details Forward Pass: Error is calculated from outputs Used to update output weights Backward Pass: Error at hidden nodes is calculated by back propagating the error at the outputs through the new weights Hidden weights updated In Matrix Form For: n inputs, m hidden nodes and q outputs olk is the output of the lth neuron For the kth of p patterns vk is the output of the hidden layer ok is the true output vector Matrix Tricks p E(A, B) = k=1 Σ (tk – ok)T(tk – ok) tk denotes true output vectors The optimal weight matrix of B can be computed directly if fH-1(t) is known B‟ = fH-1(t)vT(vvT)* So… E(A, B) = E(A, B(A)) = E‟(A) Which makes our weight space much smaller Backpropagation: Purpose and Implementation Purpose: To compute the weights of a feedforward multilayer neural network adaptatively, given a set of labeled training examples. Method: By minimizing the following cost function (the sum of square error): E= 1/2 n=1 k=1[yk-fk(x )] N K n n 2 where N is the total number of training examples and K, the total number of output units (useful for multiclass problems) and fk is the function implemented by the neural net Backpropagation: Overview Backpropagation works by applying the gradient descent rule to a feedforward network. The algorithm is composed of two parts that get repeated over and over until a pre-set maximal number of epochs, EPmax. Part I, the feedforward pass: the activation values of the hidden and then output units are computed. Part II, the backpropagation pass: the weights of the network are updated--starting with the hidden to output weights and followed by the input to hidden weights--with respect to the sum of squares error and through a series of weight Backpropagation: The Delta Rule For the hidden to output connections (easy case) wkj = - E/wkj = n=1[yk - fk(x )] g‟(hk) Vj N n n n n = n=1k Vj N n n with • corresponding to the learning rate (an extra parameter of the neural net) • hn = M wkj Vjn k j=0 M is the number of hidden units n and d the number of input units •Vj = g(i=0 wjixi) and d n n n •k = g’(hk)(yk - fk(x )) n n Backpropagation: The Delta Rule II For the input to hidden connections (hard case: no pre-fixed values for the hidden units) wji = - E/wji n n N = - n=1n E/Vj Vj/wji (Chain Rule) n n n n = k,n[yk - n k(x )] g‟(hk) wkj g‟(hj)xi n f n = kwkjg‟(hj )xi N n n = n=1j= ii=0withn • hj n x d wjixi n • n = j g’(hn ) j k=1 K wkj k • and all the other quantities already defined BP: The Algorithm 1. Initialize the weights to small random values; create a random pool of all the training patterns; set EP, the number of epochs of training to 0. 2. Pick a training pattern from the remaining pool of patterns and propagate it forward through the network. 3. Compute the deltas, k for the output layer. 4. Compute the deltas, j for the hidden layer by propagating the error backward. 5. Update all the connections such that New Old wji = wjiOld + wji and wkj = wkj + wkj New 6. If any pattern remains in the pool, then go back to Step 2. If all the training patterns in the pool have been used, then set EP = EP+1, and if EP EPMax, then create a random pool of patterns and go to Step 2. If EP = EPMax, then stop. Hybrid LS RS/SA/GA Training Delta rule training may converge to a local minimum Hybrid Global Learning (HGL) will converge on a global minimum Randomize A [-0.5, 0.5] Minimize the Error function E‟(A) BP: The Momentum To this point, Backpropagation has the disadvantage of being too slow if is small and it can oscillate too widely if is large. To solve this problem, we can add a momentum to give each connection some inertia, forcing it to change in the direction of the downhill “force”. New Delta Rule: wpq(t+1) = - E/wpq + wpq(t) where p and q are any input and hidden, or, hidden and outpu units; t is a time step or epoch; and is the momentum parameter which regulates the amount of inertia of the weights. Other methods Simulated Annealing More accurate results Much slower Genetic Algorithms More accurate results Slower For details on methods and results see: S. Cho, Chow, C. Leung, “A neural-based crowd estimation by hybrid global learning algorithm”, Systems, Man and Cybernetics, Part B, IEEE Transactions on, Page(s): 535-541 Alternative Activation functions Radial Basis Functions Square Triangle Gaussian! Input 0 Input 1 ... Input n (μ, σ) can be varied at each hidden node to fRBF fRBF fRBF guide training (x) (x) (x) fH(x) fH(x) fH(x) Alternate Topologies Inputs analyze signal at multiple points in time RBF functions may be used to select a „window‟ in the input data Typical Topologies Set of inputs Set of hidden nodes Set of outputs Too many nodes makes network hard to train Supervised Vs. Unsupervised Previously discussed networks are „supervised‟ Need to be trained ahead of time with lots of data Unsupervised networks adapt to the input Applications in Clustering and reducing dimensionality Learning may be very slow Self Organizing Maps The basic Self-Organizing Map (SOM) can be visualized as a sheet-like neural- network array, the cells (or nodes) of which become specifically tuned to various input signal patterns or classes of patterns in an orderly fashion. Current Applications Investment Analysis Predicting movement of stocks Replacing earlier linear models Signature Analysis Bank Checks, VISA, etc. Process Control Chemistry related Monitoring Sensor networks may gather more data than can be processed by operators Inputs: Cues from camera data, vibration levels, sound, radar, lydar, etc. Output: Number of people at a terminal, engine warning light, control for light switch How to Go About it? Web Resources Books Neural Networks: Simon Haykin Neural Networks and Fuzzy Logic: B.Kosko Building Neural Networks: Skapura Neural Networks for Pattern Recognition: ?? How to Go About it? Fundamentals Very Strong Background in Mathematics. No Biology Required at all. Linear Algebra, Calculus, Probability and even Fourier Series to a certain extent. For ECE guys: Signals and Systems, DSP. References  L. Smith, ed. (1996, 2001), "An Introduction to Neural Networks", URL: http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html  Sarle, W.S., ed. (1997), Neural Network FAQ, URL: ftp://ftp.sas.com/pub/neural/FAQ.html  StatSoft, "Neural Networks", URL: http://www.statsoftinc.com/textbook/stneunet.html  S. Cho, T. Chow, and C. Leung, "A Neural-Based Crowd Estimation by Hybrid Global Learning Algorithm", IEEE Transactions on Systems, Man and Cybernetics, Part B, No. 4. 1999. Questions, if Any?