Docstoc

Soft Computing in Transportation Applications

Document Sample
Soft Computing in Transportation Applications Powered By Docstoc
					Computational Intelligence
in Transportation
Applications

Ondřej Přibyl
Czech Technical University in Prague
Faculty of Transportation Sciences
Department of Applied Mathematics
pribylo@fd.cvut.cz

                                       1
Organization
 Introduction, definition, history
 Theoretical Basics
     Artificial Neural Networks
     Fuzzy System
     ANFIS - Adaptive Neuro-Fuzzy Inference System
     Genetic algorithms

 Some Real World Applications – overview
 Discussion and Conclusions
                                                2
Introduction to
Computational
intelligence
Introduction
Definition of terms
Brief history
                      3
 What is intelligence?
 The ability to learn or understand from
  experience
 The ability to acquire and retain knowledge
 The ability to respond quickly and
  successfully to a new situation
 The ability to use reason to solve problems




                                                4
      Is a system really intelligent?
      Turing test
       a proposal for a test of a machine's capability to perform
       human-like conversation (Turing 1950)
      Principle:
       Place both a human and a machine mimicking
       human responses outside the field of direct
       observation and use an unbiased interface to
       interrogate them. If the responses are
       distinguishable, the machine is not displaying
       intelligence.
                                                                    5
Source Wikipedia, the free encyclopedia.
Terminology

   Artificial Intelligence – “Subject dealing with
    computational models that can think and act rationally”
       Strong symbolic manipulation (Expert systems)


   Soft Computing – “Emerging approach to computing
    which parallels the ability of human mind to reason and learn
    in an environment of uncertainty and imprecision” (L. Zadeh)



                                                              6
  Brief History (Jang et al. 1997)
1943 Invention of a computer
        Conventional AI           Neural Networks          Fuzzy Systems Other

1940s   Cybernetics               McCulloch - Pitts
        (Norbert Wiener)          neuron model

1950s   Artificial Intelligence   Perceptron

1960s   Lisp Language             Adaline, Madaline        Fuzzy sets
                                                           (Zadeh)
1970s   Knowledge engineering     Back-propagation alg.,   Fuzzy controller   Genetic
        (expert systems)          Cognitron                                   algorithm

1980s                             Self-organizing map      Fuzzy modeling     Artificial life
                                  Hopfield Net.            (TSK model)        Immune
                                  Boltzman machine                            modelling

1990s                                                      Neuro-fuzzy        Genetic
                                                           modeling           programming
                                                           ANFIS
                                                                                           7
 Characteristics of soft computing
 Biologically inspired computing models
 Uses human expertise: IF-THEN rules or
  conventional knowledge representation
 New optimization techniques
 Numerical computation
 New application domains
     adaptive control, non-linear system identification,
     pattern recognition, …
                                                       8
 Characteristics of soft computing
 (cont.)
 Model-free learning
 Intensive computation
 Fault tolerance
 Goal driven characteristics
 Real world applications




                                     9
Artificial Neural
Networks (ANN)


Theoretical Background

                         10
                    Neuron

                             Biological neuron




Model of a neuron
                                          11
 Multilayer feedforward network
 (A two-layered feedforward NN)

                    y
                                                        The strength is in
                             Output                      interconnectivity !
          v1            vm

      1         2            m        Hidden Layer

w11                              wnm


 1         2                     n     Input Layer                    1
                                                         y
                                                            1  exp  v f ( w, x) 
 x1        x2                    xn
                                                                                        12
Learning
              Wi
                                                Mathematically:
 x1
                                                    Initialisation: Wi = rand
                                   y                Updating: Wnew=Wold + ΔW
 x2
                        Wn                          ΔW =? learning rule
 x3
                                                    Performance measure
       Training data set
                         desired   network
      Input              output     output
                                                    Learning rule – gradient descent
x1      x2         x3       d         y
12       2          6       3         3
15       4          4       1         2
 4       3          4       5          .
13       5          8       3          .
                                                  Learning rate

                                                                                   13
       Overfitting - problem of ANN
f(t)                 f(t)




                 t                      t
f(t)                 f(t)




                                Testing data set




                 t                      t          14
How to solve problem of overfitting?
             Available data

   Testing data set


  Training data set



                              Error
                                                    Testing data set


                                                     Training data set
  Testing data set
                                      Number of training epochs
Validation data set

 Training data set



                                                                  15
    Features of ANN
   Fault Tolerance: Neural networks are robust
      Information in the form of weights is distributed all over the network
      They can survive the failure of some nodes and their performance degrades
       gracefully under faults
   Flexibility and Adaptability
      They can deal with information that is fuzzy, probabilistic, inconsistent and
       noisy
      They can adapt intelligently to previously unseen situations
      They can learn from examples presented to them and do not need to be
       programmed
   Parallelism: They embody parallel computing
        Make it possible to build parallel processing hardware for implementing them
        No logical operations are used after training
        Extremely fast computation can be achieved
   Learning delay
        Training neural networks is often time consuming but after training they can
         operate in real time
                                                                                  16
    Features of ANN (cont.)
   Model free
      No  rules are required to be given in advance
      There is no need to assume an underlying data
       distribution such as usually is done in statistical modeling
   Size and complexity
      For large scale implementations of ANN we need massive
       arrays of neurons
   “Black box”
      Individualrelations between the input variables and the
       output variables are not developed by engineering
       judgment
      The knowledge extraction is difficult

                                                                17
    When to consider using ANN
   Input is high-dimensional (raw sensor data)
   Possibly noisy data
   Training time is unimportant
   Form of target function is unknown
   Human readability of result is unimportant
   When facing multivariate non-linear problems
   Common application fields
     Pattern recognition
     Function approximation
     Prediction, forecasting


                                                   18
    Application of ANN to transportation
   Travel Behavior
        Modelling drivers behavior in signalised urban Intersection
        Driver decision making model
   Traffic Flow
        Intersection control
        Estimation of Speed-Flow relationship
   Traffic management
        Trip generation model
        Urban public transport equilibrium
        Incident detection
        Prediction parking characteristics
        Travel time prediction

                                                                       19
Fuzzy Systems (FS)


Theoretical Background

                         20
 Why Fuzzy Systems?
 The knowledge in computers is usually binary
  coded – using Aristotelian's logic
 It is difficult to understand the representation
  of data inside computers
 How to teach computers to understand
  human expressions?
 Is it possible to design a model using direct
  expert rules
                                               21
    Fuzzy Set Theory                            Tall
                                             1


   Aristotelian's logic
       Tall person = 1 IF height >180cm    0
                                                       150   160 170 180 190        200
                                                                                   Heigth (cm)

                                                               p1=0          p2=1
   Fuzzy logic
                                             Tall
       Tall person x                       1

                                           0,6
                                           0,4
   Membership functions                    0
                                                       150   160 170   180   190    200
     Assignes   each object to a                                                  Heigth (cm)
        grade of membership                                  p1=0,4          p2=0,6

                 { 0,1 }                                                                22
Linguistic variables
                     1
                                   Young        Middle Aged
                    0.8                                                  Old
Membership Grades




                                                                               Very
                              Very
                    0.6                                                        Old
                              Young

                    0.4


                    0.2


                     0
                          0   10      20   30   40      50     60   70         80     90   100
                                                     X = age
                                                                                                 23
    Fuzzy Inference System (FIS)
   Also known as Fuzzy Model
   Three components
        Rule base – IF THEN rules
        Database (dictionary) – defines membership functions, …
        Reasoning mechanism – defines the defuzzification, …
   Different types
      Mamdani  FIS
      Takagi-Sugeno FIS
     …


                                                            24
Fuzzy Inference System (Takagi-Sugeno Model)
                            RULES:
                            IF x = A1 AND y = B1 THEN f=p1x+q1y+r1
                            IF x = A2 AND y = B2 THEN f=p2x+q2y+r2




   temperature   pressure
                                                            25
    Features of FS
   Knowledge is represented in the form of comprehensive
    linguistic rules
        Transparent control systems
   Able to deal with uncertain and imprecise information
   Suitable for problems involving human behavior
   Suitable for non-linear problems
   Uses expert knowledge
   Problems
        No standard method for transformation of human knowledge or
         experience into the FIS
        Even when human operators exist, their knowledge is often
         incomplete and episodic, rather then systematic
        No general procedure for calibrating the system
        Need for a good method for tuning the membership functions in order
         to maximize a performance index
                                                                         26
        Curse of dimensionality
 Design of Fuzzy Systems
 Expert knowledge
 Grid partitioning
 Cluster analysis
 Least square identification
 Decision tree technique




                                27
 Applications of Fuzzy Systems
   When human reasoning and decision-making
    are involved
       Supervising, planning, scheduling
   Various types of information are involved
     Measurements    and linguistic information
 Problems using natural language
 Very complex systems
 When there is some prior heuristic knowledge
                                                   28
    Application of Fuzzy Systems to
    transportation - examples
   Human choice and decisions
        Route choice
        Mode choice
   Driver behavior
        Car-following behavior
        Lane-choice
   Control
        Parking space forecasting
        Ramp metering
        Intersection control
        Incident detection
   Other
        Vehicle routing problem
        Vehicle assignment problem
        Air traffic flow management   29
Adaptive Neuro-Fuzzy
Inference System
(ANFIS)

Theoretical Background

                         30
    Why ANFIS?
   Drawbacks of artificial neural networks
        Prior rule-based knowledge cannot be used
        Learning from scratch
       “Black box”
       Difficult to extract knowledge
        Requires large training data set
   Drawbacks of fuzzy systems
       Cannot learn
       There is no standard way how to represent human knowledge in rule
        base in FIS
       The human knowledge is often incomplete
       No known method how to design the membership functions


                                                                      31
    ANFIS
   An adaptive network that is functionally equivalent to FIS

   The parameters (i.e. the membership functions) are modified
    from examples – learning step
   The process
       Design a fuzzy system (using prior knowledge)
       Convert it into an adaptive network
       Train the network (modify its parameters based on examples)
       Convert it back into the fuzzy system




                                                                      32
Sugeno FIS and its ANFIS equivalent




                                      33
          Learning in ANFIS


                forward pass    backward pass

 MF param.
                    fixed       back propagation
 (premise)

  Rule param.
                least-squares        fixed
(consequence)




                                                   34
 Advantages of ANFIS
 More robust than ANN
 Rule based representation
 Uses prior knowledge
 Adaptive learning!




                              35
Application of ANFIS



   Similar to applications fields of FS and ANN




                                               36
Overall Comparison of
different systems




                   37
      Key features of particular systems

                      Model Can resist Explains Suits small Can be adjusted  Reasoning       Suits complex   Include
    Technique
                       free  outliers   output   data sets   for new data process is visible    models     known facts
   Least squares
    regression

 Neural networks


  Fuzzy Systems


       ANFIS


                                 Yes
                                  No
                                Partially




Adapted from Gray and MacDonell, 1997                                                                          38
Genetic Algorithms
(GA)


Theoretical Background

                         39
    What are genetic algorithms?
   Probabilistic search algorithm
   Based on Darwin’s Evolutionary theory
        Survival of the fittest
        Natural selection
   Terminology
      Population
         Set of solutions

      Chromosome
         Defines a solution

         Usually binary representation

      Fitness function
         Expresses the “quality of a solution”
                                                  40
Hill climbing methods
                   Problems
                     Function  must have “nice”
                      properties (differentiable
                      function)
                     Not always finds the global
                      extreme - the initial
                      conditions influence the
                      result




                                               41
A two dimensions optimisation problem




Source: Frederic Dreier July 2002   42
GA-Based methods
                      Advantages of GA-based
                       approaches
                           No requirements on the function
                      Disadvantages of GA-based
                       approach
                         No guarantee of the result
                         Sensitive to parameter setting
                      In general we look for a
      I am 130         compromise between:
        m high
                           Exploration (crossover and
                            selection)
                                Never converges
                           Local improvements (mutation
                            and selection)
                                Finding local extremes

                                                           43
                 The principle of GA
                          Create Initial
                           population



                            Selection


Until stopping
criterion is               Crossover
satisfied


                            Mutation
                  t=t+1
                                           44
                                                   Proportionate
                                                   to fitness function
    GA Operators                      20% 15%

   Selection - Roulette wheel              25%
    (Tournament sel., rank sel., …)   40%


   Crossover (single point, two-point, uniform)
                                                  Offspring
                    Parent 1
                    Parent 2

   Mutation



                                                                45
    Example (1) – Random initiation
   one parameter, therefore each individual is a vector of length 1




                                                                       46
    Example (2) – operators

   Real value
    crossover

   Mutation
     Makes random changes on some individuals of the new generation.
     we set our mutation rate to 0.05 and draw a random number between 0 and 1
      for each individual.
     If the number is smaller than our mutation rate, we change a parameter of the
      vector at random




                                                                                47
    Example (3) – next generation
   one parameter, therefore each individual is a vector of length 1




                                                                       48
Problem of GA
   Many parameters to be set
       The number of individuals in a population?
       Which selection operator?
       Which crossover operator?
       Which mutation operator?
       Probabilities of selection, crossover, mutation?
       Stopping criterion?


                                                       49
Application field
   NP-hard problems (Traveling Salesman Problem, …)
     Search
     Optimization
     Learning




                                                  50
 Application of GA to transportation
   Genetic Fuzzy Systems (see Fuzzy systems)
     Genetic  Algorithms for Automated Tuning of Fuzzy
      Controllers
 Genetic Case-Based Reasoning (G-CBR)
 Multi-criteria transportation problems
 Vehicle Routing Problems
 Traveling Salesman Problem
 Optimization

                                                          51
Examples of
Applications

Artificial Neural Networks
Fuzzy Systems
Genetic Algorithms
                             52
    Examples
   ANN
     Forecastingtravel time with neural networks
     Neural Network for Travel Demand Forecasting

   FS
     Adaptive   Fuzzy Ramp Metering
   GA
     Clusteringof Activity Patterns Using Genetic Algorithms
     Data reduction using CGA
     GA for Traveling Salesman Problem


                                                            53
Forecasting travel
time with neural
networks



                     54
    Introduction
   Objective – estimation of travel time using automatic vehicle identification (AVI) system
   Principle of Electronic Toll Collection




                                                                                                55
    Study site




   Texas TransGuide System in San Antonio
   53 AVI antennas covering 94 links
   updated every 5 minutes (rolling average)
                                                56
Neural network structure




                           57
Example of results – prediction one
time step ahead




                                  58
    Comparison of ANN to other methods
   MAPE - Mean Absolute Percentage Error




                                            59
GA for Traveling
Salesman Problem




                   60
What is TSP?
   The determination of a closed tour (starting
    and finishing at the same node) so that every
    node is visited exactly once, and the total cost
    (arc length) is minimized.
   NP-hard problem                             9
       No algorithm exists                    1            2

       Only heuristics                            2

   Application areas                      4                    5

       Collection and delivery problems
            UPS, FedEx, USPS                      4
            Soft drink vendors                4            3
                                                           61
                                                       1
Representation: Random-Key GA
   Standard notation (represents order in which to visit)
        143|25              143|32             Not feasible !
        245|32              245|25
    Random-Key GA
      each gene is a random number from [0,1)
      visit nodes in ascending order of their genes

Random key:          0.42 0.06 0.38 0.48 0.81
Decode as:             3    1   2     4    5
                                                                62
    Comparison
   GI – Generalized
    Initialization, Insertion,
    and Improvement
    heuristics




                                 63
Adaptive Fuzzy Ramp
Metering




                 64
 What is ramp metering
 To prevent or delay critical flow breakdown
  can have a huge benefit, with a relatively
  inexpensive implementation cost.
 smoothes the merge onto the freeway
 reduces mainline congestion by reducing the
  turbulence caused by merging platoons
 prevents downstream bottlenecks


                                           65
Principle of fuzzy ramp metering




                                   66
    Learning – 2 approaches
   ANN theory – ANFIS
   Evolutionary algorithms – Genetic Fuzzy system




                                                     67
Results




          68
Clustering of Activity
Patterns Using
Genetic Algorithms



                         69
    Objectives
   Find individuals with similar                    Model 1 - Final Medoids: 3, 20, 25, 37, 48,
    activity patterns (helps to                  D
                                                 M
    understand and to model activity             W
                                                 H
                                                     2   4   6   8 10 12 14 16 18 20 22 24
    behavior)                                    D
                                                 M
                                                 W
   Activity Patterns                            H
                                                     2   4   6   8 10 12 14 16 18 20 22 24
        Sequence of all activities within a     D
                                                 M
                                                 W
         given time period, usually 24 hours     H
                                                     2   4   6   8 10 12 14 16 18 20 22 24
        Representation:                         D
                                                 M
         Each pattern is a vector of 144         W
                                                 H
         (corresponds to 10 minutes long             2   4   6   8 10 12 14 16 18 20 22 24
                                                 D
         intervals) categorical values           M
                                                 W
                                                 H
             D … discretionary activities           2   4   6   8 10 12 14 16 18 20 22 24
             M … maintenance (shopping, etc.)                                          Time (hours)
             W … work related activities
             H … all in home activities
                                                                                            70
Medoid-based clustering - principle

Each object is a vector of categorical
values
    •limits the usage of some common
    methods (k-means algorithm)



                                          Objects (activity patterns)

   12 objects (N), 2 medoids (K)
   Each object in the data set belongs exactly to one medoid
   All objects belonging to the same medoid form a cluster
   The objects in each cluster are more similar (based on
    given dissimilarity measure) to each other than to objects in
    any other group
                                                                        71
       GA representation

   Each chromosome is a vector of the length K
    (number of clusters)
   Every element is obtained from a uniform
    distribution in the range (1, size of the data set N )
   The ith value is an index of the ith median

    Example: K=2
                   Chrom = [k1, k2] ,   ki  (1, N)

                                                      72
        GA – population management

   Multiple population approach is applied
     The algorithm performs several iterations
     The population size is doubled in every iteration, by
      adding a randomly initiated population
     The inserted population has the same size as the
      current population
   Why insertion?
     To increase the diversity in the population
     To decrease sensitivity to the setting of the parameters
     To decrease the computational time
                                                              73
              Example of progress of the best
              and average fitness functions
                                                                   Model 2 - Progress of the fitness

• Number of objects (N=300)                             5200                                   Best Fitness
                                                                                               Average Fitness
•Number of clusters (K=5)                               5000                             *
                                                                        N = 300, K = 5, # of iterations = 4,
                                                                        NP = 10, FinalP = 80,
                                 The fitness function
•Number of iterations = 4                               4800
                                                                        GN = 40, Best Fitness = 3747
• Size of initial population                            4600

                                                        4400
    (NP = 10)
                                                        4200
• Size of the final population
                                                        4000
    (FinalP = 80)
                                                        3800
•The algorithm performs 40
runs (GN) for each iteration                                   0       50                100               150
                                                                       Population Number                    74
Comparison of GA to standard PAM

                         CGA                       PAM
               Fitness         Elapsed   Fitness         Elapsed
                                 time                      time
   DATA 50      471            0.2 min    471            0.5 min
   DATA 100    1222            2-3 min   1232            1.8 min
   DATA 300    3721             3 min    3721             20 min
   DATA 600    7239             3 min    7239             86 min
   DATA 1000   11780           3-5 min   11780           170 min




                                                                   75
        Conclusion

   A genetic algorithm with modified selection
    operator and repeated insertion of randomly
    generated individuals was used for clustering of
    hierarchical data
   The algorithm performed fine for different sizes of
    the data sets
   The developed algorithm is rather robust towards
    the setting of its parameters

                                                      76
CGA for data
reduction




               77
Similarity between time series measured
on different detectors of an urban road
                                                             Porovnání vyhlazených průběhů ze dvou detektorů
                      2000


                      1800


                      1600


                      1400
  Intenzita dopravy




                      1200


                      1000


                      800


                      600

                                             Vyhlazena 502
                      400
                                             Vyhlazena 503

                      200


                         0
                        00

                             04

                                  09

                                       13

                                            18

                                                 22

                                                      27

                                                             31

                                                                  36

                                                                       40

                                                                            45

                                                                                 49

                                                                                      54

                                                                                           58

                                                                                                03

                                                                                                      07

                                                                                                           12

                                                                                                                16

                                                                                                                     21

                                                                                                                          25

                                                                                                                               30

                                                                                                                                    34

                                                                                                                                         39

                                                                                                                                              43

                                                                                                                                                   48

                                                                                                                                                        52

                                                                                                                                                             57
                       7:

                             7:

                                  7:

                                       7:

                                            7:

                                                 7:

                                                      7:

                                                           7:

                                                                  7:

                                                                       7:

                                                                            7:

                                                                                 7:

                                                                                      7:

                                                                                           7:

                                                                                                8:

                                                                                                      8:

                                                                                                           8:

                                                                                                                8:

                                                                                                                     8:

                                                                                                                          8:

                                                                                                                               8:

                                                                                                                                    8:

                                                                                                                                         8:

                                                                                                                                              8:

                                                                                                                                                   8:

                                                                                                                                                        8:

                                                                                                                                                             8:
                                                                                                Čas




                                                                                                                                                                  78
Study area

  11         2   3


             1
                 4       8
  10
        12           5
        13           6
                             9
        14           7
  18   15

  17   16



                                 79
    What is the optimal number of
    clusters?



   Silhouette width
   K … number of clusters
     value   between 2 and 8




                                    80
The resulting clusters




                         81
Neural Network for Travel
Demand Forecasting




                        82
    Travel demand forecast
   Evaluation of future needs of an urban area
   Urban Transportation Planning System (UTPS)
   Study area is divided into TAZ (traffic analysis
    zones)
     Each   described with socio-demographic indicators
   UTPS does not take into consideration Land Use!!

   This work uses ANN, Remote sensing (RS) and GIS
    to overcame this limitation
                                                           83
    Problem formulation
   Occupied area (m2)                   Transportation system (m)
        RLU - Residential land use          RTS – Road transp. system
        CLU - Commercial land use           BTS – Bus transp. system
        SLU – Service land use              STS – Subway transp. system
   Spatial distribution of TAZ              TTS – Train transp. system
        Distance Dij (m)

     Input vector, Xij:




                                                                     84
    The model
   Structure 1                               Structure 2
       NN is a function approximator             NN is a pattern classifier
       Relation between trips Tij and            Output
        input vector Xij                               Level of urban movements,
                                                        such as high, medium, low, …
                                                       The output vector of
                                                        training/testing data set must
                                                        be quantified into z levels
             Forecasted trip value
             Non-linear mapping function




                                                                                 85
    Integration of ANN, RS and GIS
   Data obtainment
      Aerial
            photographs are stored in RS DB
      O/D matrices are stored in Trips DB
      Maps containing transportation system information (roads,
       subway,…)are stored in Map DB
   RS data is used in GIS environment to process
    multispectral analysis and aerial photo interpretation
    generating land use patterns


                                                            86
    Case study
   Boston Metropolitan area
     about  1400 square miles
     3 million people

   MassGIS database
     Black and white digital ortophotos (1992)
     Bus route maps from MBTA
     Data from 1990 survey – trips as well as TAZ definition




                                                            87
Study area with the selected TAZ




                                   88
    Steps
   TAZ were defined
   Land use patterns were obtained following USGS
    classification system
   Transportation system was transformed in digital
    format inside the GIS
   Data
     289   data vectors
          75% training dataset, 25% testing dataset



                                                       89
    Results - Structure 1
   Four-layered structure (15, and 7 nodes in hidden layers)
   MSE = 10




                                                                90
    Results - Structure 2
   Number of levels, Z=5
   Also four-layered structure (15, and 7 nodes in hidden layers)
   Training data set
        Imbalanced: 87% of total is in level 5
   Balancing
      Generating of new data vectors (adding gaussian noice N(0, 0.001) for
       cathegory 1 to 4
      Reducing the number of vectors in group 5 using LBG algorithm


                              Recognition rate for levels

                                                                       MSE = 0.15
                                                                       MSE = 0.10
                                                                               91
    Conclusion
   The formulation for a linear output showed to be
    less exact than the quantized one
   The models showed to be suitable for the given
    problem




                                                       92
 Final Remarks
 Soft computing is not a method to solve all
  problems
 We have to apply it carefully to the right set of
  problems!




                                                93