VIEWS: 14 PAGES: 93 POSTED ON: 11/19/2010
Computational Intelligence in Transportation Applications Ondřej Přibyl Czech Technical University in Prague Faculty of Transportation Sciences Department of Applied Mathematics pribylo@fd.cvut.cz 1 Organization Introduction, definition, history Theoretical Basics Artificial Neural Networks Fuzzy System ANFIS - Adaptive Neuro-Fuzzy Inference System Genetic algorithms Some Real World Applications – overview Discussion and Conclusions 2 Introduction to Computational intelligence Introduction Definition of terms Brief history 3 What is intelligence? The ability to learn or understand from experience The ability to acquire and retain knowledge The ability to respond quickly and successfully to a new situation The ability to use reason to solve problems 4 Is a system really intelligent? Turing test a proposal for a test of a machine's capability to perform human-like conversation (Turing 1950) Principle: Place both a human and a machine mimicking human responses outside the field of direct observation and use an unbiased interface to interrogate them. If the responses are distinguishable, the machine is not displaying intelligence. 5 Source Wikipedia, the free encyclopedia. Terminology Artificial Intelligence – “Subject dealing with computational models that can think and act rationally” Strong symbolic manipulation (Expert systems) Soft Computing – “Emerging approach to computing which parallels the ability of human mind to reason and learn in an environment of uncertainty and imprecision” (L. Zadeh) 6 Brief History (Jang et al. 1997) 1943 Invention of a computer Conventional AI Neural Networks Fuzzy Systems Other 1940s Cybernetics McCulloch - Pitts (Norbert Wiener) neuron model 1950s Artificial Intelligence Perceptron 1960s Lisp Language Adaline, Madaline Fuzzy sets (Zadeh) 1970s Knowledge engineering Back-propagation alg., Fuzzy controller Genetic (expert systems) Cognitron algorithm 1980s Self-organizing map Fuzzy modeling Artificial life Hopfield Net. (TSK model) Immune Boltzman machine modelling 1990s Neuro-fuzzy Genetic modeling programming ANFIS 7 Characteristics of soft computing Biologically inspired computing models Uses human expertise: IF-THEN rules or conventional knowledge representation New optimization techniques Numerical computation New application domains adaptive control, non-linear system identification, pattern recognition, … 8 Characteristics of soft computing (cont.) Model-free learning Intensive computation Fault tolerance Goal driven characteristics Real world applications 9 Artificial Neural Networks (ANN) Theoretical Background 10 Neuron Biological neuron Model of a neuron 11 Multilayer feedforward network (A two-layered feedforward NN) y The strength is in Output interconnectivity ! v1 vm 1 2 m Hidden Layer w11 wnm 1 2 n Input Layer 1 y 1 exp v f ( w, x) x1 x2 xn 12 Learning Wi Mathematically: x1 Initialisation: Wi = rand y Updating: Wnew=Wold + ΔW x2 Wn ΔW =? learning rule x3 Performance measure Training data set desired network Input output output Learning rule – gradient descent x1 x2 x3 d y 12 2 6 3 3 15 4 4 1 2 4 3 4 5 . 13 5 8 3 . Learning rate 13 Overfitting - problem of ANN f(t) f(t) t t f(t) f(t) Testing data set t t 14 How to solve problem of overfitting? Available data Testing data set Training data set Error Testing data set Training data set Testing data set Number of training epochs Validation data set Training data set 15 Features of ANN Fault Tolerance: Neural networks are robust Information in the form of weights is distributed all over the network They can survive the failure of some nodes and their performance degrades gracefully under faults Flexibility and Adaptability They can deal with information that is fuzzy, probabilistic, inconsistent and noisy They can adapt intelligently to previously unseen situations They can learn from examples presented to them and do not need to be programmed Parallelism: They embody parallel computing Make it possible to build parallel processing hardware for implementing them No logical operations are used after training Extremely fast computation can be achieved Learning delay Training neural networks is often time consuming but after training they can operate in real time 16 Features of ANN (cont.) Model free No rules are required to be given in advance There is no need to assume an underlying data distribution such as usually is done in statistical modeling Size and complexity For large scale implementations of ANN we need massive arrays of neurons “Black box” Individualrelations between the input variables and the output variables are not developed by engineering judgment The knowledge extraction is difficult 17 When to consider using ANN Input is high-dimensional (raw sensor data) Possibly noisy data Training time is unimportant Form of target function is unknown Human readability of result is unimportant When facing multivariate non-linear problems Common application fields Pattern recognition Function approximation Prediction, forecasting 18 Application of ANN to transportation Travel Behavior Modelling drivers behavior in signalised urban Intersection Driver decision making model Traffic Flow Intersection control Estimation of Speed-Flow relationship Traffic management Trip generation model Urban public transport equilibrium Incident detection Prediction parking characteristics Travel time prediction 19 Fuzzy Systems (FS) Theoretical Background 20 Why Fuzzy Systems? The knowledge in computers is usually binary coded – using Aristotelian's logic It is difficult to understand the representation of data inside computers How to teach computers to understand human expressions? Is it possible to design a model using direct expert rules 21 Fuzzy Set Theory Tall 1 Aristotelian's logic Tall person = 1 IF height >180cm 0 150 160 170 180 190 200 Heigth (cm) p1=0 p2=1 Fuzzy logic Tall Tall person x 1 0,6 0,4 Membership functions 0 150 160 170 180 190 200 Assignes each object to a Heigth (cm) grade of membership p1=0,4 p2=0,6 { 0,1 } 22 Linguistic variables 1 Young Middle Aged 0.8 Old Membership Grades Very Very 0.6 Old Young 0.4 0.2 0 0 10 20 30 40 50 60 70 80 90 100 X = age 23 Fuzzy Inference System (FIS) Also known as Fuzzy Model Three components Rule base – IF THEN rules Database (dictionary) – defines membership functions, … Reasoning mechanism – defines the defuzzification, … Different types Mamdani FIS Takagi-Sugeno FIS … 24 Fuzzy Inference System (Takagi-Sugeno Model) RULES: IF x = A1 AND y = B1 THEN f=p1x+q1y+r1 IF x = A2 AND y = B2 THEN f=p2x+q2y+r2 temperature pressure 25 Features of FS Knowledge is represented in the form of comprehensive linguistic rules Transparent control systems Able to deal with uncertain and imprecise information Suitable for problems involving human behavior Suitable for non-linear problems Uses expert knowledge Problems No standard method for transformation of human knowledge or experience into the FIS Even when human operators exist, their knowledge is often incomplete and episodic, rather then systematic No general procedure for calibrating the system Need for a good method for tuning the membership functions in order to maximize a performance index 26 Curse of dimensionality Design of Fuzzy Systems Expert knowledge Grid partitioning Cluster analysis Least square identification Decision tree technique 27 Applications of Fuzzy Systems When human reasoning and decision-making are involved Supervising, planning, scheduling Various types of information are involved Measurements and linguistic information Problems using natural language Very complex systems When there is some prior heuristic knowledge 28 Application of Fuzzy Systems to transportation - examples Human choice and decisions Route choice Mode choice Driver behavior Car-following behavior Lane-choice Control Parking space forecasting Ramp metering Intersection control Incident detection Other Vehicle routing problem Vehicle assignment problem Air traffic flow management 29 Adaptive Neuro-Fuzzy Inference System (ANFIS) Theoretical Background 30 Why ANFIS? Drawbacks of artificial neural networks Prior rule-based knowledge cannot be used Learning from scratch “Black box” Difficult to extract knowledge Requires large training data set Drawbacks of fuzzy systems Cannot learn There is no standard way how to represent human knowledge in rule base in FIS The human knowledge is often incomplete No known method how to design the membership functions 31 ANFIS An adaptive network that is functionally equivalent to FIS The parameters (i.e. the membership functions) are modified from examples – learning step The process Design a fuzzy system (using prior knowledge) Convert it into an adaptive network Train the network (modify its parameters based on examples) Convert it back into the fuzzy system 32 Sugeno FIS and its ANFIS equivalent 33 Learning in ANFIS forward pass backward pass MF param. fixed back propagation (premise) Rule param. least-squares fixed (consequence) 34 Advantages of ANFIS More robust than ANN Rule based representation Uses prior knowledge Adaptive learning! 35 Application of ANFIS Similar to applications fields of FS and ANN 36 Overall Comparison of different systems 37 Key features of particular systems Model Can resist Explains Suits small Can be adjusted Reasoning Suits complex Include Technique free outliers output data sets for new data process is visible models known facts Least squares regression Neural networks Fuzzy Systems ANFIS Yes No Partially Adapted from Gray and MacDonell, 1997 38 Genetic Algorithms (GA) Theoretical Background 39 What are genetic algorithms? Probabilistic search algorithm Based on Darwin’s Evolutionary theory Survival of the fittest Natural selection Terminology Population Set of solutions Chromosome Defines a solution Usually binary representation Fitness function Expresses the “quality of a solution” 40 Hill climbing methods Problems Function must have “nice” properties (differentiable function) Not always finds the global extreme - the initial conditions influence the result 41 A two dimensions optimisation problem Source: Frederic Dreier July 2002 42 GA-Based methods Advantages of GA-based approaches No requirements on the function Disadvantages of GA-based approach No guarantee of the result Sensitive to parameter setting In general we look for a I am 130 compromise between: m high Exploration (crossover and selection) Never converges Local improvements (mutation and selection) Finding local extremes 43 The principle of GA Create Initial population Selection Until stopping criterion is Crossover satisfied Mutation t=t+1 44 Proportionate to fitness function GA Operators 20% 15% Selection - Roulette wheel 25% (Tournament sel., rank sel., …) 40% Crossover (single point, two-point, uniform) Offspring Parent 1 Parent 2 Mutation 45 Example (1) – Random initiation one parameter, therefore each individual is a vector of length 1 46 Example (2) – operators Real value crossover Mutation Makes random changes on some individuals of the new generation. we set our mutation rate to 0.05 and draw a random number between 0 and 1 for each individual. If the number is smaller than our mutation rate, we change a parameter of the vector at random 47 Example (3) – next generation one parameter, therefore each individual is a vector of length 1 48 Problem of GA Many parameters to be set The number of individuals in a population? Which selection operator? Which crossover operator? Which mutation operator? Probabilities of selection, crossover, mutation? Stopping criterion? 49 Application field NP-hard problems (Traveling Salesman Problem, …) Search Optimization Learning 50 Application of GA to transportation Genetic Fuzzy Systems (see Fuzzy systems) Genetic Algorithms for Automated Tuning of Fuzzy Controllers Genetic Case-Based Reasoning (G-CBR) Multi-criteria transportation problems Vehicle Routing Problems Traveling Salesman Problem Optimization 51 Examples of Applications Artificial Neural Networks Fuzzy Systems Genetic Algorithms 52 Examples ANN Forecastingtravel time with neural networks Neural Network for Travel Demand Forecasting FS Adaptive Fuzzy Ramp Metering GA Clusteringof Activity Patterns Using Genetic Algorithms Data reduction using CGA GA for Traveling Salesman Problem 53 Forecasting travel time with neural networks 54 Introduction Objective – estimation of travel time using automatic vehicle identification (AVI) system Principle of Electronic Toll Collection 55 Study site Texas TransGuide System in San Antonio 53 AVI antennas covering 94 links updated every 5 minutes (rolling average) 56 Neural network structure 57 Example of results – prediction one time step ahead 58 Comparison of ANN to other methods MAPE - Mean Absolute Percentage Error 59 GA for Traveling Salesman Problem 60 What is TSP? The determination of a closed tour (starting and finishing at the same node) so that every node is visited exactly once, and the total cost (arc length) is minimized. NP-hard problem 9 No algorithm exists 1 2 Only heuristics 2 Application areas 4 5 Collection and delivery problems UPS, FedEx, USPS 4 Soft drink vendors 4 3 61 1 Representation: Random-Key GA Standard notation (represents order in which to visit) 143|25 143|32 Not feasible ! 245|32 245|25 Random-Key GA each gene is a random number from [0,1) visit nodes in ascending order of their genes Random key: 0.42 0.06 0.38 0.48 0.81 Decode as: 3 1 2 4 5 62 Comparison GI – Generalized Initialization, Insertion, and Improvement heuristics 63 Adaptive Fuzzy Ramp Metering 64 What is ramp metering To prevent or delay critical flow breakdown can have a huge benefit, with a relatively inexpensive implementation cost. smoothes the merge onto the freeway reduces mainline congestion by reducing the turbulence caused by merging platoons prevents downstream bottlenecks 65 Principle of fuzzy ramp metering 66 Learning – 2 approaches ANN theory – ANFIS Evolutionary algorithms – Genetic Fuzzy system 67 Results 68 Clustering of Activity Patterns Using Genetic Algorithms 69 Objectives Find individuals with similar Model 1 - Final Medoids: 3, 20, 25, 37, 48, activity patterns (helps to D M understand and to model activity W H 2 4 6 8 10 12 14 16 18 20 22 24 behavior) D M W Activity Patterns H 2 4 6 8 10 12 14 16 18 20 22 24 Sequence of all activities within a D M W given time period, usually 24 hours H 2 4 6 8 10 12 14 16 18 20 22 24 Representation: D M Each pattern is a vector of 144 W H (corresponds to 10 minutes long 2 4 6 8 10 12 14 16 18 20 22 24 D intervals) categorical values M W H D … discretionary activities 2 4 6 8 10 12 14 16 18 20 22 24 M … maintenance (shopping, etc.) Time (hours) W … work related activities H … all in home activities 70 Medoid-based clustering - principle Each object is a vector of categorical values •limits the usage of some common methods (k-means algorithm) Objects (activity patterns) 12 objects (N), 2 medoids (K) Each object in the data set belongs exactly to one medoid All objects belonging to the same medoid form a cluster The objects in each cluster are more similar (based on given dissimilarity measure) to each other than to objects in any other group 71 GA representation Each chromosome is a vector of the length K (number of clusters) Every element is obtained from a uniform distribution in the range (1, size of the data set N ) The ith value is an index of the ith median Example: K=2 Chrom = [k1, k2] , ki (1, N) 72 GA – population management Multiple population approach is applied The algorithm performs several iterations The population size is doubled in every iteration, by adding a randomly initiated population The inserted population has the same size as the current population Why insertion? To increase the diversity in the population To decrease sensitivity to the setting of the parameters To decrease the computational time 73 Example of progress of the best and average fitness functions Model 2 - Progress of the fitness • Number of objects (N=300) 5200 Best Fitness Average Fitness •Number of clusters (K=5) 5000 * N = 300, K = 5, # of iterations = 4, NP = 10, FinalP = 80, The fitness function •Number of iterations = 4 4800 GN = 40, Best Fitness = 3747 • Size of initial population 4600 4400 (NP = 10) 4200 • Size of the final population 4000 (FinalP = 80) 3800 •The algorithm performs 40 runs (GN) for each iteration 0 50 100 150 Population Number 74 Comparison of GA to standard PAM CGA PAM Fitness Elapsed Fitness Elapsed time time DATA 50 471 0.2 min 471 0.5 min DATA 100 1222 2-3 min 1232 1.8 min DATA 300 3721 3 min 3721 20 min DATA 600 7239 3 min 7239 86 min DATA 1000 11780 3-5 min 11780 170 min 75 Conclusion A genetic algorithm with modified selection operator and repeated insertion of randomly generated individuals was used for clustering of hierarchical data The algorithm performed fine for different sizes of the data sets The developed algorithm is rather robust towards the setting of its parameters 76 CGA for data reduction 77 Similarity between time series measured on different detectors of an urban road Porovnání vyhlazených průběhů ze dvou detektorů 2000 1800 1600 1400 Intenzita dopravy 1200 1000 800 600 Vyhlazena 502 400 Vyhlazena 503 200 0 00 04 09 13 18 22 27 31 36 40 45 49 54 58 03 07 12 16 21 25 30 34 39 43 48 52 57 7: 7: 7: 7: 7: 7: 7: 7: 7: 7: 7: 7: 7: 7: 8: 8: 8: 8: 8: 8: 8: 8: 8: 8: 8: 8: 8: Čas 78 Study area 11 2 3 1 4 8 10 12 5 13 6 9 14 7 18 15 17 16 79 What is the optimal number of clusters? Silhouette width K … number of clusters value between 2 and 8 80 The resulting clusters 81 Neural Network for Travel Demand Forecasting 82 Travel demand forecast Evaluation of future needs of an urban area Urban Transportation Planning System (UTPS) Study area is divided into TAZ (traffic analysis zones) Each described with socio-demographic indicators UTPS does not take into consideration Land Use!! This work uses ANN, Remote sensing (RS) and GIS to overcame this limitation 83 Problem formulation Occupied area (m2) Transportation system (m) RLU - Residential land use RTS – Road transp. system CLU - Commercial land use BTS – Bus transp. system SLU – Service land use STS – Subway transp. system Spatial distribution of TAZ TTS – Train transp. system Distance Dij (m) Input vector, Xij: 84 The model Structure 1 Structure 2 NN is a function approximator NN is a pattern classifier Relation between trips Tij and Output input vector Xij Level of urban movements, such as high, medium, low, … The output vector of training/testing data set must be quantified into z levels Forecasted trip value Non-linear mapping function 85 Integration of ANN, RS and GIS Data obtainment Aerial photographs are stored in RS DB O/D matrices are stored in Trips DB Maps containing transportation system information (roads, subway,…)are stored in Map DB RS data is used in GIS environment to process multispectral analysis and aerial photo interpretation generating land use patterns 86 Case study Boston Metropolitan area about 1400 square miles 3 million people MassGIS database Black and white digital ortophotos (1992) Bus route maps from MBTA Data from 1990 survey – trips as well as TAZ definition 87 Study area with the selected TAZ 88 Steps TAZ were defined Land use patterns were obtained following USGS classification system Transportation system was transformed in digital format inside the GIS Data 289 data vectors 75% training dataset, 25% testing dataset 89 Results - Structure 1 Four-layered structure (15, and 7 nodes in hidden layers) MSE = 10 90 Results - Structure 2 Number of levels, Z=5 Also four-layered structure (15, and 7 nodes in hidden layers) Training data set Imbalanced: 87% of total is in level 5 Balancing Generating of new data vectors (adding gaussian noice N(0, 0.001) for cathegory 1 to 4 Reducing the number of vectors in group 5 using LBG algorithm Recognition rate for levels MSE = 0.15 MSE = 0.10 91 Conclusion The formulation for a linear output showed to be less exact than the quantized one The models showed to be suitable for the given problem 92 Final Remarks Soft computing is not a method to solve all problems We have to apply it carefully to the right set of problems! 93