PReVENT Fusion Forum e-Journal
Editorial
by Aris Polychronopoulos
Dear reader, This is the first Volume of the Sensor Fusion e-Journal. Let me first explain the motivation of this initiative; first of all, we think that fusion is a very challenging topic; it is always fun to deal with huge sets of sensor data, produce the best out of it and make useful decisions for safety and comfort applications. It is widely recognized that more attention should be paid to R&D activities and more space should be given in the literature for sensor data fusion. Therefore, we thought creating this eJournal which combines unique scien-
e-Journal - Volume 1 September 2006
Editor: Dr. Aris Polychronopoulos, arisp@iccs.gr Inside this issue: ProFusion Sensor Data Fusion activities PReVENT Sensor Data Fusion activities Research Papers Technical Correspondence Sensor Data Fusion news Contact
tific peer reviewed work on sensor fusion and other material such as technical correspondence, technical articles and news on the topic. The Journal is under the activities of PReVENT Fusion Forum which was established in September 2005 and is active on sensors and data fusion. You are welcomed to read and comment or contribute by sending any material related to sensor fusion for the 2nd Volume which will be published by June 2007. The call for contribution is in the news section. Have fun with fusion!
Message from PReVENT Management
by Maxime Flament
The integrated project PReVENT is the European Commission’s flagship project aimed at exploring the next generation of preventive and active safety systems. It contributes to road safety by developing and demonstrating applications and technologies that will save many lives in the future. The research on sensors and sensor data fusion is receiving the highest priority within PReVENT as complex detection of the vehicle environment represents the basis of all our investigated safety applications. The ProFusion PReVENT subproject contributes to a more reliable perception detecting moving vehicles and other vulnerable road users all around the ego-vehicle. Even if it adds some overhead, ProFusion benefits greatly from the integrated structure of the project bringing the field of sensor data fusion beyond the current state-of-the-art. Indeed, all concepts and methods proposed by ProFusion are tested on different PReVENT experimental vehicles dealing with longitudinal and lateral support, intersection safety and collision mitigation. ProFusion can therefore learn from concrete demands and requirements from advanced safety applications. On the other hand, the PReVENT applications can rely on better perception capabilities at same or lower cost which will hopefully bring preventive and active safety faster to all type of vehicles. Keep up the work ProFusion!
Editing by Niki Boutsikaki Niki@iccs.gr
Fusion Forum e-Journal Editorial board Chief Editor: Dr. Aris Polychronopoulos, arisp@iccs.gr Associate Editors: Dr. Angelos Amditis, a.amditis@iccs.gr Prof. Olivier Aycard, olivier.aycard@inrialpes.fr Dr. Erich Fuchs, fuchse@forwiss.uni-passau.de Kay Furstenberg, kf@ibeo-as.de Dr. Su-Birm Park, su.birm.park@delphi.com Dr. Ullrich Scheunert, ullrich.scheunert@etit.tu-chemnitz.de Thomas Tatschke, tatschke@forwiss.uni-passau.de
www.prevent-ip.org prevent@mail.ertico.com
e-Journal - Volume 1
Page 2
Message from ProFusion Coordinator
by Su-Birm Park
Dear Sensor Fusion Expert, This is the first eJournal of the Sensor Fusion Forum, which is an initiative out of the ProFusion2 consortium. The intention of the Fusion Forum is to streamline the research activities on the field of sensor data fusion, as done by ProFusion2 inside the IP PReVENT. Beside the Fusion Workshop, which is also organized by the Fusion Forum, this e-Journal is the ideal platform for you to exchange and discuss your topics in the automotive field of sensor data fusion. You can stay updated on all activities by subscribing to the Fusion Forum on the ProFusion home page (http:// www.prevent-ip.org/profusion/). While reading this eJournal, you will see that the focus of this edition is on the sensor data activities inside the IP PReVENT. But there are also important activities outside and one is the IST 2006 in Helsinki, which will be one major defining step for the Framework 7 projects. The Fusion Forum organizes a networking session on sensor data fusion and needs your support; you find more information in the Sensor Data Fusion News section at the end of this e-Journal. Important dates in the near future: - Special session on sensor data fusion on the ITS world congress in London (10th of October 2006). - Networking session on sensor data fusion on the IST 2006 conference (FP7) in Helsinki (21st to 23rd of November 2006) - 2nd Fusion Workshop in Paris (14th and 15th of March 2007) - Next eJournal (will be announced via the Fusion Forum) I hope our e-Journal helps you to get the right contacts, have fruitful discussions and a lot of new ideas. I would like to thank all supporters and contributors to this e-Journal.
Message from the European Commission
by Irmgard Heiber
Sensing the environment is the cornerstone of any system providing active, passive and preventive safety in vehicles. Today, safety systems are limited as they quite often rely only on a single sensor. Integrating data from multiples sensors is not a trivial and simple task and requests to have an elaborated fusion mechanism for a more precise and more reliable description of the vehicle environment. Sensor data fusion taking into account all onboard sensors and handing over the relevant information to the different safety systems is regarded is one of the major challenges for advanced safety systems. The European Commission has therefore strongly supported the implementation of a horizontal subproject on sensor data fusion in the integrated project PReVENT. Profusion1 defined the areas of common interest for PReVENT taking into account the concrete needs of vertical subprojects working on safety systems whereas Profusion2 then started the targeted
activity on sensor data fusion. One of the main achievements is a flexible and modular fusion architecture which is used as the basis for sensor data fusion in PReVENT and is currently implemented and tested in different subprojects to demonstrate the feasibility and advantages of sensor data fusion. Projects on cooperative systems like SAFESPOT have taken over this sensor data fusion approach and adapt it to their area. The European Commission is well aware of the potential of sensor data fusion as a basis for more effective safety systems and has foreseen it as one of the future research topics in the domain of "ICT for the Intelligent Car" of the 7th framework programme.
www.prevent-ip.org prevent@mail.ertico.com
e-Journal - Volume 1
Page 3
e-Journal Contents
Section 1: ProFusion Sensor Data Fusion activities
Grid based Fusion by Olivier Aycard p.4 Early Fusion approach by Thomas Tatschke p.6 Track-Level Fusion by Nikos Floudas p.8 Multi Level Fusion by Ulrich Scheunert p.10
Section 2: PReVENT Sensor Data Fusion activities
LATERAL SAFE Subproject by Angelos Amditis p.11 Data Fusion at SASPENCE Subproject by Heiko Cramer p.14 Sensor Data Fusion at COMPOSE by Thomas Tatschke p.15 A radar driven fusion with vision for vehicle detection by Alberto Broggi and Pietro Cerri p.17
Section 3: Research papers
Three-Level Early Fusion for Road User Detection by Rudi Lindl and Leonhard Walchshäusl p.19 Precise Host Localization in Urban Areas by Thorsten Weiss et al p.25 Feature Level Fusion for Object Classification by Stefan Wender et al p.31 ACC Vehicle Tracking with Joint Multisensor Multitarget Filtering of State and Existence by Mirko Maehlisch et al, p.37
Section 4: Technical Correspondence
SEFS — A Swedish IVSS Initiative on Sensor Data Fusion by Malte Ahrholdt p.44
Section 4: Sensor Data Fusion News
Networking Session on the IST 2006 conference in Helsinki by Su-Birm Park p.45 Sensor Data Fusion Workshop call for papers in the IEEE Intelligent Vehicles Symposium by Heiko Cramer and Aris Polychronopoulos p.45 1st Fusion Forum Workshop shows the path for sensor fusion deployment by Aris Polychronopoulos p.46 Call For Papers for the 2nd Volume of the e-Journal by ProFusion Consortium p.48 Announcement for the 2nd Fusion Forum Workshop by ProFusion Consortium p.49 Contact p. 50
www.prevent-ip.org prevent@mail.ertico.com
ProFusion Sensor Data Fusion activities
Page 4 PReVENT Fusion Forum e-Journal
Grid based Fusion by Olivier Aycard, INRIA
1. Introduction At the end of the 1980s, a new framework to multisensor fusion called occupancy grids (OGs) was introduced. An OG is a stochastic tessellated representation of spatial information that maintains probabilistic estimates of the occupancy state of each cell in a lattice. In this framework, each cell is considered separately for each sensor measurement, and the only difference between cells is the position in the grid. The main advantage of this approach is the ability to integrate several sensors in the same framework, taking the inherent uncertainty of each sensor reading into account, contrary to the Geometric Paradigm. The major drawback of the geometric approach is the number of different data structures for each geometric primitive that the mapping system must handle: segments, polygons, ellipses, etc. Taking into account the uncertainty of the sensor measurements for each sequence of different primitives is very complex, whereas the cell-based framework is generic and therefore can fit every kind of shape and can be used to interpret any kind and any number of sensors. For sensor data integration, OGs only require a sensor model which is the description of the probabilistic relationship that links a sensor measurement to a cell state, occupied (occ) or empty (emp). The eMotion group (http://emotion.inrialpes.fr) has a strong background on building sensor models to map environment using OGs for Intelligent Transports. In this paper, we describe how, in previous projects, we built sensor models for low and high level sensor data fusion. - Ox,y in O = {occ, emp}. Ox,y is the state of the bin (x,y), where (x,y) in Z2. Z2 is the set of indexes of all the cells in the monitored area. ii. Joint probabilistic distribution The lattice of cells is a type of Markov field and many assumptions can be made about the dependencies between cells and especially adjacent cells in the lattice. In this article sensor models are used for independent cells i.e. without any dependencies, which is a strong hypothesis but very efficient in practice since all calculus could be made for each cell separately. It leads to the following expression of a joint distribution for each cell.
“Occupancy grids only require a sensor model which is the description of the probabilistic relationship that links a sensor measurement to a cell state”
P (C x , y Z 1 ,..., Z N ) =
N 1 P (C x , y )× ∏ P (Z i | C x , y ) Z i =1
Given a vector of sensor measurements z=(z1,…,zN) we apply the Bayes rule to derive the probability for cell (x,y) to be occupied:
P(Ox , y Z1 ,..., Z N ) = P(Ox , y )× ∏ P(Z i | Ox , y )
N i =1
P(Ox , y = occ)× ∏ P(Z i | Ox , y = occ ) + P(Ox , y = emp)× ∏ P(Z i | Ox , y = emp )
N N i =1 i =1
(a) The Parkview platform
For each sensor i, the two conditional distributions P (Zi|occ) and P(Zi|emp) must be specified. This is called the sensor model definition. 3. Sensor Model for high level fusion in occupancy grids Sensor model In the PUVAME project, we had to track pedestrians on a car parking place using a set of off-board cameras. We use the ParkView platform which is composed of a set of six off-board analogue cameras, installed in a car-park setup such as their field-ofview partially overlap. For the application described here, as we used high level data, all data preprocessing basically consist of detecting pedestrians. Therefore, the video stream of each camera is processed independently by a dedicated detector. The role of the detectors is to convert each incoming video frame to a set of bounding rectangles, one by target detected in the image plane The construction of the sensor model associated with the detector observations given by different cameras. In this section, we only give an overview of the construction of this sensor model. The problem is that detector observations give information in the image space and that we search to have knowledge in the ground plan. We solve this problem projecting the bounding box in the ground plan, supposing that the ground is a plan, all the VRU stand on the ground and the complete VRUs is visible for the
2. Fusion in occupancy grids Bayesian fusion for a grid cell and several sensors.
(b) An image of a moving object acquired by one of the off-board video camera and the associated bounding box found by the detector.
i. Probabilistic variable definitions - Z =(Z1,…,Zs) a vector of s random variables (For a certain variable V we will note in capital case the variable, in normal case v one of its realization, and we will note p(v) for P([V=v]) the probability of a realization of the variable.), one variable for each sensor. We consider that each sensor i can return measurements from a set Zi.
P (C x , y Z 1 ,..., Z N ) =
N 1 P (C x , y )× ∏ P (Z i | C x , y ) Z i =1
www.prevent-ip.org prevent@mail.ertico.com
ProFusion Sensor Data Fusion activities
e-Journal - Volume 1 Page 5
camera. Also, we first search to segment the ground plan in three types of region: occupied, occulted and free zones using the bounding boxes information. Then, we introduce an uncertainty management, using a Gaussian convolution, to deal with the position errors in the detector. Finally, we convert this information into probability distributions.
Results Moreover, an implementation on Graphical Process Unit has been proposed and implemented to build real-time V-grid with high resolution. We made some tests with a Cycab equipped with 4 laser-range finder sensors. The sensors are placed on each corner of the Cycab. The construction of the V-grid is done in real time with high resolution. An illustration is given the figures in the right.
The resulting probability that the cells are occupied after the inference process with two cameras.
Results The Figure shows the same pedestrian seen by two cameras. The red area corresponds to the most probable position of the pedestrian: this area is the result of the fusion of the two yellow areas given the two cameras. The 3 green areas around the pedestrian correspond to the fusion between the occluded area of one camera with the free area of the other one. The area seen as free by the two cameras has a very low probability of occupancy. The 4 areas seen as free by one camera and out of the field of view of the second camera have a low probability of occupancy. 4. Sensor model for Low Level Fusion in occupancy grids Sensor model In this section, we summarize how we build a sensor model for raw data of a laser range-finder. The objective is to build an unique OG of the surroundings of an intelligent vehicle (the V-grid), equipped with several telemetric sensors. Every telemetric sensor uses the time-of-flight of a wave, and records detection events in a polar coordinate system due to the intrinsic polar geometry of wave propagation. Thus building an unique Cartesian occupancy grid involves to change from the sensor map (the Z-grid) to a local Cartesian map (the L-grid) and then to transform the L-grid into the V-grid with the good orientation and at good position. The method to build the Vgrid is divided in the next steps: 1. Firstly, for each ray of the laser range-finder, we define a 1D sensor model. 2. In a second step, this 1D sensor model is used to build an OG for each ray; 3. Thirdly, we define an efficient and precise algorithm to build a local Cartesian map for each laser-ranger finder; 4. Finally, we transform the L-grid into the V-grid.
Architecture of the perception system
5. Conclusion In this paper, we gave an overview of the Grid based Fusion Approach developed by our group in the ProFusion2 project. We showed that Grid is a framework to perform high level fusion and also low level fusion. In ProFusion2, we will develop new sensor models for high level fusion in collaboration with Volvo Technology and for low level fusion in collaboration with DaimlerChrysler. Our main objective is to have a robust perception using multi-sensor approaches to track the different objects surrounding a car. The whole architecture is depicted the figure above. This architecture is composed of two distinctive parts: a Grid based Fusion (described in this paper) and Extraction level and a Tracking level. In the first level, we perform fusion of data given by different sensors to build a map of the current environment (i.e. a snapshoot of the current environment). In a second step, using this map, we search the objects currently present in the environment. Finally, in the tracking level, we associate this list of objects with the list of pedestrians previously present in the environment.
Olivier Aycard is with GRAVIR-IMAG & INRIA Rhone-Alpes, Grenoble, France, Olivier.Aycard@inrialpes.fr
Figure (a) V-grid with only the first sensor measurements. (b) V-grid with the fusion of the two first sensors measurements. (c) V-grid with the fusion of the four sensors.
www.prevent-ip.org prevent@mail.ertico.com
ProFusion Sensor Data Fusion activities
Page 6 PReVENT Fusion Forum e-Journal
Early Fusion approach by Thomas Tatschke, FORWISS
The PReVENT subproject ProFusion2 addresses research work of common interest related to sensor data fusion including a modular architecture for environment perception. One of the fusion approaches inside this architecture is the promising early fusion methodology originated from the application-oriented subproject COMPOSE. Research and development activities are followed up within ProFusion2 to put this fusion method into a broader context – the ProFusion2 fusion framework. Furthermore an extension on the basics of this approach is conducted as follows after a short concept discussion. 1. Early Fusion Concept In contrast to state-of-the-art fusion, early fusion combines data provided by multiple and even diverse sensors at an early stage of the data processing chain and performs a joint data interpretation on aggregated data with respect to a common model basis. Therefore the input raw/feature data should have been determined based on the signal of the respective sensor alone (with models describing the sensing principle), but without any further modeling or tracking.
State-of-the art fusion Early fusion
“early fusion combines data provided by multiple and even diverse sensors at an early stage of the data processing chain and performs a joint data interpretation on aggregated data with respect to a common model”
of a bifocal stereo vision camera (see figure 2) and a 3D range camera (from the subproject UseRCams). The early fusion activities in ProFusion2 also contain studies concerning the assessment of different subsets of the sensor platform with regard to accuracy, robustness and reliability of the perception output. 3. Object Models
Figure 2: Processing of camera images Besides the conventional modeling of objects’ dynamic behavior this fusion approach requires object models of a special kind: These models specify in which way an object (i.e. what kind of data from an object) can be perceived using a certain sensor technology (see figure 3). In the early fusion system these models build the link between the sensor data and the objects to be detected in the following way: On the one hand these object models are used to deduce predicted measurement data for every sensor type from exiting objects with the help of the underlying dynamic model and the filter’s time prediction instructions. On the other hand these object models implicitly perform the fusion of different sensors’ information: Once the real data from the different sensors is mapped to the predicted measurements of the object, the obtained data updates the state of the originating object (with the help of the filter’s measurement correction procedure) and thus combines the information from the different sensors in each object. Additionally these object models are used for the object hypotheses generation and therefore build the basis of the object instantiation process: Whenever sensor data could not be associated with already existing objects, information from all sensors is aggregated and tested for significant matches with different kind of objects models and their particular sensor representation respectively. As these models, which map besides the shape, the respective sensor perception and dynamics also geometrical properties of the object, contain the ba-
Tracks
Fusion processing
Tracks
Hypotheses Consolidation
Single Sensor Preprocessing
Sensor Data Aggregation
Figure 1: Early fusion methodology In doing so, signatures of various sub-threshold findings in the data processing chain may interfere constructively and thereby contribute to an abovethreshold result to form a distinctive, well-recognized object instantiation. Thus, an increase of robustness, reliability and consistency in the environment perception is expected as the input from an individual sensor can be processed in view and with the help of the other sensors from the very beginning. 2. Sensor Platform The sensor configuration consists of a FIR camera, two long range and two short range RADAR as well as a laser scanner device (from the BMW COMPOSE demonstrator) with a ProFusion2 supplement
www.prevent-ip.org prevent@mail.ertico.com
ProFusion Sensor Data Fusion activities
e-Journal - Volume 1 Page 7
sic information on the underlying entity they are best suited for further classification issues (e.g. walking pedestrian, speeding car, etc.). As this kind of object modeling is essential for the early fusion methodology and its fusion procedure, the work within ProFusion2 also concentrates on an extension for new sensor types (e.g. see figure 2) and a refinement of already modeled objects takes place. 4. Data Association As the early fusion approach does not fuse tracks from sensor devices, but their raw/feature data, new data association methods are needed to cope with this challenging task. Besides specialized gating methods, which are essential to limit the huge number of possible mapping between sensor data and predicted measurements, sensor and data specific association measures are developed. In doing so different mapping methods, from the simple Global Nearest Neighbor approach up to advanced algorithms from the assignment theory, can be applied at the same time on the multi sensor data dependent on the underlying situation. As the object models allow a prediction of further data attributes (e.g. gradient of edges, length of laser scanner segments, etc.) in some cases, this information can also be used to support the data association step. 5. Filtering The fusion of raw/feature data is also a special environment for state-of-the-art filter. Therefore the analysis and assessment of different kinds of filtering methods (from the conventional Extended Kalman Filter, used within COMPOSE, to Monte Carlo or sampling-based filter algorithms, like e.g. particle filters) concerning their performance is part of the studies in ProFusion2. In doing so particular attention has to be put on the time consumption as the number of filter steps is proportional to the amount of associated measurement data. Additionally the filter has to cope with the non-linear correlation of data, states, object behavior and movement to tap the full potential of low-level data fusion, which makes this process quite challenging. 6. Data management Due to its large amount of data a low-level fusion approaches demand an exceptional data management as mentioned already before. In the case of early fusion this is more than true as the main task of this approach – the joint data interpretation and common object instantiation process – can not easily be parallelized without losing the additional synergetic effect of raw/feature data and the consistent object modeling.
7. Confidence Information Automotive safety applications, as for instance a collision mitigation application, require on the one hand an accurate environment perception output (e.g. for the calculation of the time-tocollision quantity). On the other hand such kind of application has to know about the reliability of the respective perception result (in particular, if an autonomous action like braking is triggered).
Zobject
Geometrical / topological vehicle model
Xobject Yobject
Zobject
RADAR view laser scanner view FIR view Camera view
object Applying inference algorithms to uncertain data we are interested in preciYobject sion and correctness of the results. Precision is a measure of the deviation of the results Figure 3: Illustration of a simplified car object if the measurements and the inference method are model applied under fixed observation conditions. Correctness on the other hand reflects the affiliation of the result to the class of correct object descriptions, e.g. the suitability of a particular object model instance inferred from a concrete measurement. The latter correctness measure is called a confidence measure.
X
In the field of environment perception runtime correctness analysis is rather rare. The current standard evaluation technique is to provide statistics of algorithms applied to real or synthetic data depending on problem parameters. In ProFusion2 a theoretical framework for combining the empirical and model-derived information in hierarchical manner is investigated in to achieve reliable confidence information with respect to the algorithmic result at runtime without exhaustive statistical testing and evaluation. The original contribution of ProFusion2 is basically the joint of a hierarchical object setup with classical stochastic approaches for confidence estimation. Informally confidence can be defined as a function that, given a model of some object inferred from feature data or aggregated from other objects, provides a value, which reflects the representation quality of the model with respect to the observed data. The work of ProFusion2 includes a formal definition of the term confidence in the context of the early fusion approach. Furthermore a fast evaluation technique for the presented confidence measure is given and a hierarchical modeling technique – the socalled inference tree – is presented, which admits uncertainty propagation for complex aggregated scenes.
Thomas Tatschke is with FORWISS Passau, tatschke@forwiss.unipassau.de
“An Automotive safety application has to know about the reliability of the respective perception result”
www.prevent-ip.org prevent@mail.ertico.com
ProFusion Sensor Data Fusion activities
Page 8 PReVENT Fusion Forum e-Journal
Track-Level Fusion by Nikos Floudas, ICCS
1. Object Refinement The track-based fusion within the object refinement layer is a distributed fusion approach. It assumes that tracking is carried out inside each individual sensor or system, and the tracks feed the track level fusion algorithms. It can be applied to automotive sensor networks with complementary or/and redundant field of view. The advantage of the approach is that it ensures system modularity and allows benchmarking, as it does not allow feedbacks and loops inside the processing. Expected results from research and development in track level fusion are consistency and avoidance of spurious or invalid perception information. The output of the track level fusion is aggregated tracks in the union of the sensor field of views. The internal track level fusion architectural modules are depicted in the Figure 1. It is implied that a set of track arrays are entering the fusion system while the output of object refinement process is consisted of the fusion object list. The internal functionalities in this architecture are the association (spatial track assignment and 2-D and N-D association), the track to track update (fusion) and the fused object management.
Track Level Object Level
“The advantage of the approach is that it ensures system modularity and allows benchmarking, as it does not allow feedbacks and loops inside the processing”
range, FOV, direction, location, accuracy and resolution is required to have. The main process of this module is to separate the sensor covered area around ego vehicle and consequently to separate the available tracks according to the sub-area they belong to. These could be blind areas not surveyed by any sensors, areas with one sensor and areas observed by two or more sensors. The main result of this process is to divide the fusion problem to a number of smaller fusion sub-problems. Track to Track Association The tracks that belong to areas without or surveillance by only one sensor just pass to the output without any additional processing. Then, for the tracks that are within the common sensors areas (2 sensors or more) an association measure is formed. This metric is for generating the hypotheses for association between tracks, and then the relative association matrix or other metric passes to the next level where the track to track assignment takes place. 2-D Assignment In the case of 2 sensors tracks the 2-D association problem is solved. The input to this module is the output of track to track association and its output is the pairs of tracks that are suitable for fusion and the not assigned tracks that simply pass to the next module.
Situation Level
Object refinement
Track to track association 2-D assignment Spatial track assignment N-D assignment Objects Multi Ssr Tracks
S1 Tracks
S-D Assignment In the case of tracks coming from more than 2 sensors then the solution to this problem the N-D with N to be 3 more takes place in this module. Usually this problem concerns the sequential generation of a 2-D problem out of the N-D and after that the solution is similar to this acquired in 2-D assignment. The assignment tracks (2 or more) that come from the output of the assignment modules are fused by this module. They are
Track fusion
Environment Data Structure
Situation Ref’t
Object management Sn Tracks
Motion Models
Figure 1: The object refinement internal functional module architecture Fusion Area Track Assignment This is the first function that is imposed to the tracks when they are entering the fusion system. The main objective of this is to decrease the computational load of the overall procedure and also to ensure the configurability and the interoperability of the procedure. A set of sensor configuration parameters are necessary for this module to work properly. At least the sensors’ maximum
Environment Description
S2 Tracks
Sensor 1: x1i (i = 1..N1) Sensor 2: x2i (i = 1..N2) … Sensor S: x3i (i = 1..NS)
2-Sensor Association Areas
S-Sensor Association Areas (>2)
Solution
Figure 2: Track-to-Track association
www.prevent-ip.org prevent@mail.ertico.com
ProFusion Sensor Data Fusion activities
e-Journal - Volume 1 Page 9
SD Assignment Matrix
ASSIGNED S-PLETS
YES Is the gap sufficiently small? fdual =max(fdual ,J 2*) gap = |JS-fdual|/|fdual| JS u Snew
Figure 3: 2D Assignment
S-D problem
Object Management
Successive Constraint Relaxation Phase
(S-1)-D subproblem
J S-1
u S-1new
In this module the fused or the not fused tracked objects are formatting the final object list output of the object refinement process. All the objects have an ID and in this module the initialisation, update, deletion of objects based on ID information takes place. Moreover, this module is that will handle in a final step object management issues such as duplications of objects, blind areas objects, transition of objects between different areas and all other relevant problems that might appear. 2. Situation Refinement The research and development process for situation analysis consists of two main components: The first step is to develop the appropriate level of domain specific knowledge for the road elements (e.g. road borders, lanes, obstacles) and the second to develop a decision making process that is able to codify and manipulate the knowledge mentioned above. In situation refinement the system is aware not only of the states of the road elements but has also knowledge of their relationships. The outcome of situation refinement enriches the environment model including additional attributes of the ego-vehicle and the obstacles (predicted paths, object to lane assignment, evidence for vehicle manoeuvres, etc.). The internal situation refinement modules of ICCS approach are depicted in the figure above. The output of object refinement process is the main input in this module. The internal functionalities that take place in this architecture are the assignment to objects in a specific lane and the prediction of the path of the ego and the other vehicles (moving objects). The final output is the ego-vehicle and moving objects manoeuvre classification together with a confidence index. The output of this module passes to the application or the HMI. Lane Assignment This module uses object position information and the available lane geometry and assigns a lane index to each of the vehicles accompanied with a confidence index. This information is very useful for the manoeuvre classification modules.
...
(r+1)-D subproblem
...
Jr+1 u r+1new
Enforce Constraint set r+1 (a 2D problem) and update ur+1
Relax Constraint set r+1 via Lagrangian multiplier vector ur+1
r-D subproblem
Jr
u rnew
...
3D subproblem
Relax Constraint set 3 via Lagrangian multiplier vector u3 2D Assignement optimal solution
...
J3 J 2* u 3 new
Enforce Constraint set 3 (a 2D problem) and update u 3
S-D ASSIGNMENT
Objects manoeuvre situation analysis)
classification
(Behaviour/
Figure 4: S-D Assignment
This module analyses the behaviour of the objects and classifies them according to a predefined discrete set of classes (e.g. overtaking, exceeding speed, parallel to the lane, lane change, etc.). The module assumes the existence of an environment model – i.e. descriptions of the road attributes and the lane properties and the output of the objects’ path and the lane assignment; it analyzes relationships between “objects” and produces a new structure. Ego manoeuvre classification (Behaviour/situation analysis) This module analyses the behaviour of the egovehicle and makes a classification according to a predefined discrete set of classes (e.g. overtaking, lane change, lane drifting, etc.). The module assumes the existence of an environment model – i.e. descriptions of the driver, the ego-vehicle dynamics, road attributes, the lane properties and the output of the ego path; it analyzes relationships between “objects” and produces a new structure.
Nikos Floudas is with the Institute of Communications and Computer Systems, nfloudas@iccs.gr.
“In situation refinement the system is aware not only of the states of the road elements but has also knowledge of their relationships”
www.prevent-ip.org prevent@mail.ertico.com
Constraint Enforcement and Lagrangian Multiplier Update Phase
updated and generate a fused object state and covariance that replaces the existing sensor level tracks.
NO
Relax Constraint set S via Lagrangian multiplier vector us
Enforce Constraint set S (a 2D problem) and update u S
ProFusion Sensor Data Fusion activities
e-Journal - Volume 1 Page 10
Multi Level Fusion by U. Scheunert, Chemnitz University of Technology
In the Multi Level Fusion approach information about objects are distributed over different levels of abstraction and are fused within and between these levels too. In detail these levels are for example (in low to high abstraction order): signal level; feature level; track level; object level; situation level. Raw or pre-processed data coming from different single sensors can be found at signal level. They are transferred to the processing chain where they are processed and fused with data from the same, higher or lower level. The chosen level of fusion is strongly connected to the specific object and also dependent on the model of the object itself. For that reason for every object a certain hierarchical fusion strategy can be defined. Level Fusion. One can apply a bottom-up strategy by using specific model knowledge about a real world situation to fuse primitive objects to less primitive objects at the same level or at a higher level of abstraction. The other way round – a top-down strategy – reverts to a model of an object which contains all relevant information about physical properties of the object itself called features. Using the knowledge about which component can be detected by what sensor the data of a lower level can be analyzed in detail to increase or decrease to degree of trust in the detected object. Confidence in ML-Fusion One often raised question about the belief in a certain (intermediate) processing result can be addressed by estimating the confidence in a sensors measurement data up to the degree of trust in the computed higher order objects. By evaluating these confidence measures one can decide whether to continue in the processing chain or to return to a lower level of abstraction. This kind of back loop can be used trying to reprocess the given data and to increase the evidence for a previous detected object. With the help of rating the final processing result by using confidence information a final decision-maker can decide whether to accept or to reject the concluding outcome of the processing chain. Confidence-Fusion The use of a valuation of intermediate results of the single levels of abstraction introduces the need for combining single values of confidence in lower levels to values of confidence in higher levels. This is strongly related to the combination of primitive objects in lower levels to less primitive objects in higher levels of abstraction. It is part of future research to find suitable ways for e.g. combining objects with high confidence with objects which have a lower degree of being certain. One has also to evaluate how the degree of composition of objects interferes with the confidence fusion of different objects on different levels. Conclusion Multi Level Fusion implements High-Level to LowLevel and/or Low-Level to High-Level fusion strategies and makes use of parallel or sequential processing chains. It includes sensor information, feature information and procedural knowledge in a model representation. Challenges in using Multi Level Fusion are to build a unified data structure which is able to handle the given and acquired knowledge, to find suitable features for robust object description and to come up with methods to overcome and reduce the computational complexity of the system.
“Multi Level Fusion implements HighLevel to LowLevel and/or LowLevel to HighLevel fusion strategies and makes use of parallel or sequential processing chains”
Figure 1: Multi Level Fusion functional architecture In doing this the tracking of an object can be supplied with data from tracked features, untracked features and from signal level. By the use of Multi Level Fusion it is possible to introduce back loops between the levels. Back Loops can be used to return to a lower level of abstraction for reprocessing certain data or to adapt fusion parameters. A special case of Multi Level Fusion is a processing on adaptive chosen levels. This allows the fusion strategy and the selection of a certain fusion level to be dependent on the actual sensor data and the observation situation of an object. That’s why a better processing strategy can be achieved in most cases. Bottom Up and Top Down Strategy There is no unidirectional way in performing Multi
www.prevent-ip.org prevent@mail.ertico.com
PReVENT Sensor Data Fusion activities
Page 11 PReVENT Fusion Forum e-Journal
LATERAL SAFE Subproject by Angelos Amditis
1. Introduction LATERAL SAFE is a subproject of the EU funded IP PReVENT that aims at developing a cluster of safety applications that prevent lateral/rear area related accidents and assist the driver in adverse or low visibility conditions and blind spot areas. The main objectives of LATERAL SAFE as reflected to its three applications are: the fusion algorithm. In addition, the processing at the sensor level before data entry into the fusion processor reduces the computational load on the central fusion processor. Moreover, since many sensors provide only tracks and not raw data, Track Fusion appears to be essential. 2. LATERAL SAFE Sensor Systems In this section the description of the algorithms of the modules comprising the perception layer of PReVENT/LATERAL SAFE application is presented. Perception layer is identified as the layer in the LATERAL SAFE architecture (and in general PReVENT applications) which intermediates between the sensor system and the applications. This layer gives a realistic representation of the environment and aims at enhancing the performance of single sensor systems providing more robust output to the application.
• LRM: A Lateral and Rear area Monitoring application enhancing the driver’s perception and decreasing the risk of collision in the lateral and rear area of the vehicle.
• LCW: A Lateral Collision Warning application that
detects and tracks obstacles in the lateral and rear field and warns the driver about an imminent risk of accident (collision, road departure, merging etc.).
LRR Output
“...aims at developing a cluster of safety applications that prevent lateral/ rear area related accidents and assist the driver in adverse or low visibility conditions and blind spot areas”
• LCA:
SRRn SVIP Output LRM
The work in LATERAL SAFE intends to allow the Camera1 extension of the operative SVIP Processing scenarios of the Advanced Camera2 Driver Assistance Systems beyond their current limits and allow of a data fusion Vehicle that coordinates the sensor modules of the application. The approach presupposes Perception Layer high sensor and sensor network processing for each sensor module. Then the track arrays are transmitted to the fusion system that deals with the issues of updating and fusing tracks belonging to the same object, maintaining identity and kinematics information for objects crossing the FOVs of the different sensor systems and providing an all around perception object list to the safety applications. Moreover this has to be performed in sensible amount of time so as to allow the safety applications to extract the warnings and inform the driver in time.
VCAN Output
Application/HMI Layer
Figure 1: LATERAL SAFE system architecture
The role of the perception is to:
• • • •
Carry out “perception enhancement" tasks independent of the application (generic) Describe in a formal way the environment and the traffic scenario (semantics) Support LATERAL SAFE functions under request (specific) Acts as a gateway between sensor systems and applications with well defined interfaces and I/O protocols.
Sensor-level distributed fusion was selected since it reveals a set of significant advantages that, besides the improvement on a single tracker, make it preferable in comparison to most other fusion techniques, with the exception of hybrid fusion. Briefly, a sensor level fusion system is flexible in the numbers and types of sensors and therefore it allows addition, removal or substitution of sensors or sensor systems without having to alter the fundamental structure of
The Perception Layer is always active (“ON”) and it always monitors and models the environment. LATERAL SAFE Perception Layer focuses on the modeling and representation of the obstacles and their
www.prevent-ip.org prevent@mail.ertico.com
Output
Output
SRR Processing
FUSION
LCW
LRM LRM Output LCW Output
A stand-alone Lane Change Assistance system with integrated blind spot detection assisting the driver in lane change manoeuvres while driving on roads with more than one lane per direction.
LCA FUS Output
SRR1 ...
SRR Output
LCA Output
LRR + LRR Processing
HMI
PReVENT Sensor Data Fusion activities
e-Journal - Volume 1 Page 12
kinematics in the supervised areas requested by the applications. The perception layer modules are: Long Range Radar (LRR), Short Range Radar network signal processing (SRR), Synthesized Vision Image Processing (SVIP) and Tracking and Data Fusion (FUS). The Perception layer modules and their role in the LATERAL SAFE applications are depicted in the following system architecture figure: The three processing modules provide four track arrays: the LRR tracked objects, the SRR network tracked objects at left and right sides and the SVIP tracked objects respectively. The fusion algorithm generates global fused objects and fulfils the generic perception layer objectives that are summarised above. 3. Long Range Radar Tracking The sensor used in the application of LATERAL SAFE prototypes is the second generation ACC2 LRR of Bosch. The LRR tracking is a conventional tracking system separated in 3 main sub-modules: data association, track management and filtering & prediction. The basic filtering approach in the LRR tracking is the EKF. For data association two working modes are selected; GNN and JPDA, using 1-to1 and N-to-1 measurements to track assignment. The standard 2D assignment problem is solved via the auction algorithm. The track management is dealt with ad-hoc rule according to “hits” and “misses” of measurements. 4. Short Range Radar Network Processing SRR network signal processing is based on sensor arrays installed on both sides of the vehicle. For this purpose, three sensors orientated perpendicular are placed at each side of the car. The raw sensor data (measurements) are filtered and single-sensor-multitarget-tracking is realized for each sensor independently by applying an Extended Kalman Filter with mixed coordinates. The single-sensor-trackedtargets obtained by all sensors belonging to the same array are integrated by multi-sensor data integration. Whenever the gates of n>=2 single-sensortracked-targets intersect, the n single-sensortracked-targets will be associated with each other and will be merged to an integrated target. Based on the integrated targets, multi-sensor-multi-targettracking is achieved by applying an EKF. A Converted Measurement Kalman Filter (CMKF) is used in order to get small errors due to the transformation from polar coordinates into rectangular coordinates. 5. Synthesized Vision Image Processing The vision system consists of both mono- and stereovision. The stereovision provides lateral location and distance for objects in the region where the field-of-views (FOV) of cameras overlap. Monovision provides an indication of presence of vehicles in the blind-spot region, complementing the coverage area of stereo vision. The combination of stereo- and mono-vision is provided with the use of three cameras: one in each side-mirror and one looking backwards located at
the center in the back. Detected vehicles from the stereo- and mono-vision processing are combined in a tracker, resulting in a single output to the fusion module. Range information (depth) is extracted by triangulating corresponding image points between the two images. In addition, elevation and azimuth information is given by combining the distance with the image points’ locations. Several steps can be discriminated in sensor systems based on stereo vision: determining the (relative) positions and orientation of the cameras (Calibration), finding correspondences between the stereo images (Disparity estimation) and computing the distance from these locations (Distance computation). 6. Fusion algorithm This section describes the data synthesis algorithm, which provides the object list of the rear and lateral area to the applications. The algorithm gets input from three sensor systems and the respective track arrays, namely: LRR track array, SRR left track array, SRR right track array and SVIP track array. The fusion algorithm includes the steps of: the synchronization of track arrays to a common time reference and the propagation of track info to this time, the transformation in a common spatial base according to the vehicle centered chosen coordinate systems. Then the tracks are separated to the tracks that are likely to be fused and those that are not. This is achieved due to the known and predefined FOV areas of sensor systems. Then track-to-track association follows for the case of “fuseable” tracks, together with the fused object management (initialization, confirmation, and deletion) and the fusion update applied to the associated track arrays. The algorithm as already mentioned takes into account the FOVs of the sensors and possible complementarities and redundancies. The time cycle of LRR tracked objects is: 100ms, of SVIP tracked objects is about 40ms, and of SRR tracked objects is: 40ms. The fusion algorithm follows the step of LRR tracked objects the other track arrays are updated to acquire a common time. State vectors and covariance are extrapolated to the “fusion time” after calculation of time delays. Then the track arrays are easily transformed to the vehicle centered common coordinate system and the division to several sub-problems of fusion according to the area of each object follows. For always the case of fusion of two tracks data association is carried out using the cross-covariance matrix method. Assuming we have 2 tracks i, j with state vectors xi, xj and covariance matrices Pi, Pj respectively, then the state difference and the crosscovariance matrix of the estimation error of the tracks is defined as follows:
~ x
ij
ˆ = x
i
ˆ − x
j
www.prevent-ip.org prevent@mail.ertico.com
PReVENT Sensor Data Fusion activities
Page 13 PReVENT Fusion Forum e-Journal
where l,m are the elements of covariance matrices one-to-one, and ρ is the correlation coefficient the correlation coefficient. A statistical distance between tracks i, j using crosscovariance is defined:
2 T d ij = ~ij ⋅ Pi + P j − Pij − Pij xT
[
]
−1
⋅ ~ij x
Pij (l , m) = ρ ⋅ Pi (l , m) ⋅ P j (l , m)
After the calculation of the distance - and consequently the matrix - the assignment problem can be solved in a similar manner with this of 2D data association. The fused object update is done with the Covariance Intersection method. (Covariance Intersection method deals with the problem of invalid incorporation of redundant information. The fused state and covariance are calculated as:
[
.
]
Figure 2: Synchronisation procedure of LS track ar-
1/ 2
x f = Pf ⋅ w⋅ P ⋅ x1 + (1− w) ⋅ P2 ⋅ x2 1
−1 −1
Pf
[ = [wP
]
−1
1
+ (1 − w )P2
−1 −1
]
where w in the interval [0,1]. The final step of fusion in the LATERAL SAFE application is that of the maintenance of the global fusion ID; for that reason the tracked object IDs are used. Each tracked object can be fused with another tracked object of an other track array, if it belongs to areas 2, 4 and 6, otherwise tracked objects that are not fused with others are simply added in the fused objects array. Every fused object observed for first time takes a fusion ID and also the track(s) ID that generated it are also saved. Then if the track(s) ID that are used in the future to update the track object coincide with one of the existing tracked IDs the fusion ID does not change.
Angelos Amditis is with the Institute of Communications and Computer Systems, a.amditis@iccs.gr
Figure 3: Fusion algorithm flowchart
Figure 4: Areas of fused objects
www.prevent-ip.org prevent@mail.ertico.com
PReVENT Sensor Data Fusion activities
e-Journal - Volume 1 Page 14
Data Fusion at SASPENCE by Heiko Cramer, Technical University of Chemnitz
Sensors Array Arra Scenario Assessment
Fusion for Object Estimation
HMI
In this paper, the data fusion approaches concerning the subproject SASPENCE inside the PReVENT project will be presented. The aim of the SASPENCE subproject is to develop an innovative system, able to perform a reliable and comfortable Safe Speed and Safe Distance concept, which helps the driver to avoid dangerous situations related to excessive speed or too small headway. To reach that aim, a reliable reconstruction of the scene around the own vehicle is necessary. Therefore, the vehicle is equipped with several sensor systems. A scanning Long Range Radar is used at a frequency of 77GHz. The radar is able to detect obstacles like cars, trucks or guardrails and trees up to a distance of 150m. A lane detection system provides information of the own lane using a 12-bit grayscale camera. Beside the information about the ego motion of the vehicle, a DGPS is used to deliver the position relative to the world coordinate system. The knowledge of the geometric information of the road is generated from maps, which are commercially available for today' s car navigation systems. Basically, the system contains two modules for the reconstruction of the scene in front of the own vehicle. One is the fusion for object estimation module, which gets information from the long range radar and information, which is delivered via the communication link. The second module is the fusion for road geometry estimation, which uses information from lane detection system and the long range radar, as well as from the DGPS and the digital maps. The fusion for object estimation module uses the information from the long range radar to estimate the position and velocity of every obstacle in the field of view of the sensors. A multi sensor multi target tracking system is used to reach this aim. In this tracking system, the estimation is realized using an extended Kalman filter with a kinematic model for the objects. The model assumes constant speed of the obstacles in two Cartesian coordinate directions. As the radar delivers range and angle for every detection, the measurement equation becomes nonlinear. The information of obstacles in the sensors field of view can be exchanged with another vehicle, equipped with the same system. Thereby, obstacles can be tracked, even when sensors in the own vehicle cannot detect them. If the other vehicle (v2) detects obstacles, which are invisible for the own vehicle, a transformation is necessary to transform the object positions from the vehicle coordinates of the other vehicle into the vehicle coordinates of the own vehicle (v1):
v1
Long-range Radar Lane Detection Maps & Navigation V2V Communication DGP Ego-data
Path Prediction for Vehicle
Reference Manouver
Warning
Fusion for Road Geometry Estimation Ego-path Prediction Position and Ego-state Estimation
Strategies
s=
xCC − x yCC − y
.. . …
x o (t k ) = g v 2→v1 ( v 2 x o (t k ),v1 x v 2 (t k ), t k )
Here, v2xo(tk) denotes the estimated position of the obstacle o at time tk in the vehicle coordinates of v2. v1 xν2(tk) is the estimated position of the v2 in coordinates of v1.
The second fusion module is the fusion for object estimation y module. In this module, a fusion method is used to combine inR R formation from a vision sensor, x, y a radar system, digital maps, a DGPS sensor and odometric L = ( x − x) + ( y − y) sensors to one common description of the road geometry in front of the vehicle. Thereby, the sensor information is used to localize the own vehicle relative to the map data. The first step of the processing is the approximation of the map data in a parametric road geometry description. This approximation algorithm fits a circular arc spline with G1 continuity to the polygonal map data. The arc spline is formed by joining several biarcs. In an iterative manner the biarcs are calculated with a max. error of 0.5 m to the origin map points. In the next step, the position of the vehicle relative to the map is estimated using an extended Kalman-Filter. Besides the exact position and orientation in the map, the state space x contains the lane width. All other information about the road geometry is provided by the approximated map (e.g. curvature, number of lanes). A dynamic model, which is based on the bicycle model is used for the Kalman-Filter state prediction. The module position and ego state estimation provides input data for the measurement function. As position, velocity and yaw rate are members of the state space, the measurement equation is linear. The position information is combined with the measurement of the lane detection sensor. The figure shows the principle of this map matching process. The measurement equation becomes nonlinear in this case. Additionally, the stationary detection from the long range radar (detection from the road border) are used to initialize the position estimation.
LaneDetect
LR LL
x LaneDetect
Polygonal Map Data
VC
2
2
CC
CC
xCC , yCC
www.prevent-ip.org prevent@mail.ertico.com
PReVENT Sensor Data Fusion activities
Page 15 PReVENT Fusion Forum e-Journal
Sensor Data Fusion at COMPOSE by Thomas Tatschke
Once the real data from the different sensors is mapped to the predicted measurements of the object, the obtained data updates the state of the originating object (with the help of the filter’s measurement correction procedure) and thus combines the information from the different sensors in each object.
1. Introduction Within this general context of PReVENT, COMPOSE addresses the problems of pre-crash, collision mitigation and protection of vulnerable road users. In this article, we focus on how the traffic situation is assessed. Due to the extreme criticality of this task, it has been widely acknowledged that resorting to more than one sensor enhances the robustness of perception. The process of merging data stemming from heterogeneous sensors is referred to as sensor data fusion, and turns out to be a key aspect in multisensorial perception. Here we summarise the concept results and algorithms for the data fusion strategies employed concurrently in COMPOSE: (i) the track-based fusion approach and (ii) the early fusion approach. 2. Track-based fusion
FIR camera
mation of tracks, as filtering dual-mode tracks exploits even further observed information and provides a way to refine it, that is, to concentrate higher probabilities on a smaller support area. A classical illustration can be taken from the case of radar used in conjunction with a far infrared camera, which is depicted in figure 1. The case shown here should hopefully passes the previous matching test with flying colours, and now that we assume that both distributions pertain to the same scene object, we try to derive a combined distribution from both original single-sensor distribuObstacle Representation of distributions around positions of objects Dual mode distribution
Track-based fusion can be seen Radar as a medium level fusion scheme. This means that releFI R vant sensors feed their raw sigf ie ld nal to devoted processing units Radar of field o vie f view which in turn analyse these sigw nals in order to extract candidate events (or detected objects) Figure 1: Canonical shapes of spatial distributions from each measurements. In a second step, assumcoming from a radar device and a FIR camera. ing the temporal consistency of objects in the environment, they chain events from successive meastions. A rule of the thumb suggests that it should urements into coherent tracks, representing relative roughly correspond to the intersection of single kinematics of detected objects. mode areas of sufficient probabilities. This is someAt this point, tracks are delivered to track-level futimes true, but the actual derivation goes through a sion. Fusion will attempt to build multi-mode tracks probabilistic computational framework which reby matching tracks coming from different sensors quires a bit more complexity. when relevant. Implementation Probabilistic fusion framework The fusion module receives sensor tracks that can We recall that the first step in fusion consists in ateither be matched to tracks from the alternate sentempting to match single node tracks coming from sor, thus forming ’couples’ referred to as dual-mode different sensors. This can be done primarily based tracks, or stay unmatched – single mode tracks on spatial probability distributions, which hopefully (’singles’). First, the module starts receiving new will give many cues barring incompatible candidate information from any sensor. Then it updates its own couples, whenever no sufficient probability values sensor track database to reflect input information. can be found simultaneously for both sensors at a Now the dual-mode (’couple’) database is updated, same location. In case where area of sufficient probso that whenever any member of a dual-mode track abilities do intersect, we shall evaluate precisely the was deleted in the previous step, the dual-mode probability that both tracks correspond to the same track no longer holds, and the matching sensor track object, and select exclusive matching couples based turns back to single (’widow’). on some global additive criterion using these probabilities. Matching tracks already obviously reinforce the credit of involved tracks, i.e. finding that the (supposedly) same object has been detected by two sensors makes its detection far more reliable. Fusion can actually do much more than just confirOnce the new data has been taken into account (‘Update from sensors’ part), track analysis can begin. It consists of three main operations:
• Association: when the dissimilarity between two
single mode tracks from different sensors is below some threshold, they are candidates for a match.
www.prevent-ip.org prevent@mail.ertico.com
PReVENT Sensor Data Fusion activities
e-Journal - Volume 1 Page 16
The best candidate pairs are then matched into dual-mode tracks, • Diverging modes: this operation detects whenever two sensor tracks belonging to a dual-mode track drift away from each other with time. In this case, they are returned to single mode pool, • Consolidation: the purpose of consolidation is to improve our knowledge of dual-mode tracks using independent data sources. The ‘output formatting’ action creates the environment description, i.e. the output of the fusion module, synthesising its internal state. In fact, trusted tracks sent to the application layer are in general the dual mode tracks and the singles located in exclusive sensor areas. 3. Early fusion Another approach, realised in COMPOSE, favours an ‘early fusion’ paradigm. The term “early” means to combine data provided by multiple and even diverse sensors at an early stage of the data processing chain. These input data can be slightly preprocessed – limited to untracked and raw/feature sensor data. During the subsequent fusion algorithm, data from one sensor is assessed with regard to the relevance of its information, always in the light of data provided by other sensors. Thereby the early fusion approach ensures consistency of models in the whole processing chain. In particular, one common environmental model is used for describing the same aspect of reality perceived from different sensors. Early Fusion Processing Generally speaking an abstract inference problem is composed of three circular steps namely time prediction, data association and measurement update. On top of this basic pattern we added further steps to come up with multi-sensor and multi-object demands. Data Acquisition As most of the used sensors are working on different clock rates and time is crucial in collision mitigation, we preserve a high time resolution by a semiasynchronous data acquisition. The fastest sensor with respect to the refresh rate is used to trigger this step. The actual data acquisition is done by polling every sensor for new data within each cycle. In addition, a slightly pre-processing of the raw data, i.e. noise reduction or edge extraction, is performed. Time Prediction and Filtering According to every objects’ state (position, orientation, velocity, etc.) at the last cycle, these states have to be predicted to the current time. This is performed via a standard Extended Kalman Filter as well as the respective filtering of information. Predicted Measurement Generation In the previous step for every object an updated representation (state) with regard to the current time is generated. These predicted states are the basis
for the following calculation, which estimates what each sensor would perceive under the assumption that every objects’ state was correctly predicted. In the following an rough overview of the general task of “predicted measurement generation” is given. The respective basic principle is illustrated in figure 2. So far all object models in our prototype are composed of simple polyhedrons. For all valid sensors within the current cycle the following steps are performed: - Sensor Transformation: All known objects (with their underlying models) are mapped into the sensor coordinate system. - Back-face Culling: Objects, which can not be perceived by the sensor, because they are occluded by other parts of the same object, are temporally removed. - Clipping: All objects residing outside the view-port of the sensor are ignored. - Occlusion Test: All objects, which are “invisible”, because they are occluded by other objects, are discarded. Measurement Generation: For the remaining objects respectively object parts the object specific predicted measurements are computed. This requires, that all object models have their own sensor specific representation (shape model) Data Association The next step within the aforementioned fusion cycle is the data association that extracts assigns corresponding pairs of real and predicted measurements. Due to the large data amount compared to most track-based fusion systems and the resulting complexity to determine the matching pairs, gating mechanisms are essential to support the fast finding of data correspondences. Hypothesis Generation The main goal of the hypothesis generation is a direct and complete detection of all so far untracked and possibly relevant objects in the sensors’ ranges. To limit the cost of computation, the hypothesis generation focuses on salient and unmatched sensor data in the detection range. Currently these unmatched salient data, where new object assumptions are placed, are RADAR responses, LIDAR segments and special vertical edges from the far infrared imaging device. To limit the amount of assumptions a first coarse pre-classification step rejects impractical assumptions and a second aggregation step tries to combine overlapping hypotheses.
Thomas Tatschke is with FORWISS Passau, tatschke@forwiss.unipassau.de
Figure 2: Generation of predicted measurements
“In particular, one common environmental model is used for describing the same aspect of reality perceived from different sensors”.
www.prevent-ip.org prevent@mail.ertico.com
PReVENT Sensor Data Fusion activities
Page 17 PReVENT Fusion Forum e-Journal
A radar driven fusion with vision for vehicle detection by Alberto Broggi and Pietro Cerri, University of Parma
The paper describes a vehicle detection system fusing radar and vision data ready to be used for ACC. Radar data are used to locate areas of interest on images. Vehicle search in these areas is mainly based on vertical symmetry. All vehicles found in different image areas are mixed together and a series of filters are applied in order to delete false detections. The current algorithms analyze images on a frame by frame basis, without any temporal correlation. Results and problems are discussed, and directions for future enhancements are provided. 1. Introduction Researches on preventive safety functions are now used for several driver assistance systems. The SeiSS1 study (Exploratory Study on the potential socio-economic impact of the introduction of Intelligent Safety Systems in Road Vehicles: SeiSS final report) estimated that Adaptive Cruise Control (ACC) that performs longitudinal control could save up to 4.000 accidents in 2010 if only 3% of the vehicles were equipped. ACC and any other system that is used for safety application needs a precise vehicle localization. Using only a radar can be critical for vehicle dimension and lateral position measurement. The fusion of radar and vision can provide position measures with good longitudinal and lateral accuracy. The advantages and the problems of fusing radar and camera data for vehicle detection are well known [1]; methods differ mainly for the fusion level: low, intermediate, and high level fusion have all proved to reach good results. Low level fusion combines several sources of raw data to produce new raw data that is expected to be more informative and synthetic than the inputs [2]. This work is developed using high level fusion and focuses on validation of radar targets, as shown by Sole [3].In this context, radar targets can either correspond to a vision target, in our case a vehicle, or not: different vision algorithm scan be used for this purpose. The search for vehicle features provides a simplified way of localizing vehicles: symmetry is a characteristic that is common to most vehicles. Some research groups have already used symmetry to localize vehicle [4], and used a variety of methods to find symmetry on images: using edges, pixel intensity, and other features. The vehicle detection algorithm used in this work is based on symmetry [5] and uses radar data in order to localize areas of interest. Data fusion operates at high level: the vision system is used to validate radar data and to increase their accuracy.
1
performed using calibration data achieved by fine intrinsic and extrinsic camera parameters measurements, as well as radar calibration. Since parameters measurement is performed only once, at system setup, and no stabilization is currently applied, errors may occur when extrinsic parameters change (mainly due to vehicle pitch) due to road roughness or vehicle acceleration. Moreover radar may intrinsically provide an incorrect lateral position: points may not be centered onto the obstacle shape or even fall outside it. In the definition of the image area used by vision to validate radar objects, wide margins are used both on its left and right sides in order to bypass possibly inaccurate radar data (vertical and lateral offset). The area height is defined to be half of its width; the area bottom is positioned at a fixed percentage of height below the radar points in a way that the vehicle should be included even in case of strong pitch variations. Only radar data that refer to points inside the image are considered; since the chosen radar horizontal angular field of view are approximately the same as the camera one, almost all radar points can be remapped into the image. In order to simplify and speed up the following steps of the algorithm and to delete details of too close vehicles, all the areas are re sampled to a fixed size. 3. Interest area evaluation In this project the generated interest areas are used to localize vehicles, but could also be used to search for road features and other obstacles as well; the system has been tested for guard rail and pedestrian search with promising results. 3.1. Symmetry computation Symmetry computation is the basis of the algorithm, and the most time consuming part as well. Only binarized edges are used in order to reduce execution time First of all the Sobel operator is used to find edges module and orientation; then two images are built, one containing the almost-vertical edges and the other with the almost-horizontal edges. The symmetry is computed for every column of the vertical edges image, on different sized bounding boxes whose height match the image height and with a variable width ranging from 1 to a predetermined maximum value. The computed value is saved in a 2D data structure (hereinafter referred to as an image) whose coordinates are determined as follows :the column is the same as the symmetry axis and the row depends on the considered bounding box width. This image is then used to search for interesting columns. 3.2. Interesting columns An interesting column is defined as having a high value in the symmetry image. A column wise histogram is then used to locate candidate columns. In correspondence to these columns the vertical edges
Exploratory Study on the potential socio-economic impact of the introduction of Intelligent Safety Systems in Road Vehicles:
2. Fusion The first step of the algorithm converts radar objects into the image reference system, using a perspective mapping transformation that projects the radar point onto the object base. This transformation is
www.prevent-ip.org prevent@mail.ertico.com
PReVENT Sensor Data Fusion activities
e-Journal - Volume 1 Page 18
symmetry is checked to obtain the expected vehicle width. More specifically if a high value of symmetry is present for smaller widths too, it means that the algorithm has detected a small object; in this case the column is discarded. 3.3. Bounding Boxes generation Up to now the algorithm provides information about the vehicle’s center position, but since vehicles need to be detected with a high precision, a precise bounding box detection is mandatory. Each peak in the vertical edges symmetry image that survived the previous filtering is used: the width of the symmetry box is given by the distance between the peak itself and the top of the symmetry image; the box is then centered within the column. The shadow under the car is a strong invariant which is always present, even in dark days. The algorithm looks for the vehicle shadow in order to find its base. Since other shadows are present on the road as well, the algorithm looks for a high concentration of edges above the horizontal edge; if no base can be detected in correspondence to the peak, the column is discarded. 3.4. Results mixing When all radar data have been examined, all the boxes framing the detected vehicles are re sampled to their original size and mixed together. Using an inverse perspective mapping transformation, real width and position of vehicles can be computed. In the computation of these values, radar provides distance while vision provides position and width so that the radar precision on distance measurement and the vision refinement ability are capitalized together. Unfortunately not all detected boxes are correct: some false positives caused by road signs or other objects in the scene can be present as well. A filter is used to discard some false positives: it removes too large or too small boxes that are unlikely to represent a vehicle. It is also possible that a vehicle is detected in more than one search area (it happens when the radar returns multiple radar points in correspondence to a single object), so overlapping results may be present. Only one box per vehicle is expected as a final result, so a further step is required to merge similar boxes and eliminate redundant ones. 4. Results This vehicle detection system was tested in extra urban and highway environments with good results. A lot of possible scenarios are considered. It is important to remember that no tracking is used at the present moment, as it will be introduced at a later stage of the project. The system has been proved to localize with a good precision in almost all situation the closest preceding vehicle. Other vehicles can be detected as well. The system capability is not restricted to preceding vehicles, also approaching vehicles can be detected. System performances decrease in case of hard traffic and noisy scenarios. Figure 1 shows good results
obtained in different scenarios. 5.Conclusions In this paper a method to fuse radar data and vision is described. This method reaches good results both in extra urban and highway environments. The system can localize the closest preceding vehicle with a good precision, and can localize a large part of other vehicles as well: it is ready for ACC application and it is also promising for other safety applications that need localization of all the receding vehicles. A hardware or software image stabilization might provide a more precise perspective mapping transformation: while a tracking algorithm might be very helpful to increase the robustness of the system and the detection persistence.
References
[1] J. Laneurit, R. C. C. Blanc, and L. Trassoudaine, “Multisensorial data Fusion for global Vehicle and Obstacles absolute Positioning,” in Procs. IEEE Intelligent Vehicles Symposium 2003, Columbus, USA, June2003, pp. 138–143. [2] M. Maehlisch, R. Schweiger, W. Ritter, and K. Dietmayer, “Sensor fusion Using Spatio-Temporal Aligned Video and Li for Improved Vehicle Detection,” in Procs. IEEE Intelligent Vehicles Symposium 2006, Tokyo,Japan, June 2006. [3] A. Sole, O. Mano, G. Stain, H. Kumon, Y. Tamatsu, and A. Shashua, “Solid or not solid: Vision for radar target validation,” in Procs. IEEE Intelligent Vehicles Symposium 2004, ma, Italy, June 2004, pp. 819–824. [4] C. Hoffman, T. Dang, and C. Stiller, “Vehicle Detection fusing 2D Visual Features,” in Procs. IEEE Intelligent Vehicles Symposium 2004, ma, Italy,June 2004, pp. 280– 285. [5] A. Broggi, P. Cerri, and P. C. Antonello, “Multi-Resolution Vehicle Detection using Artificial Vision,” in Procs. IEEE Intelligent Vehicles Symposium 2004, ma, Italy, June 2004, pp. 310–314.
Fig. 1. Examples of correct results: the algorithm works reliably in simple cases (a); it detects both vehicles moving away and approaching (b); it works even in hard cases, such as rain (c) and noisy scenarios (note the double radar detection) (d); it can detect multiple cars (e) and truck (f ).
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 19 PReVENT Fusion Forum e-Journal
Three-Level Early Fusion for Road User Detection
Rudi Lindl and Leonhard Walchshäusl BMW Group Research and Technology, 80992 Munich, Germany email: {rudi.lindl, leonhard.walchshaeusl}@bmw.de
Abstract This paper deals with a novel three-level sensor fusion approach in order to detect and track cars and pedestrians. The underlying perception system is composed of a far infrared imaging device, a laser scanner and several radar sensors, which operate integrated into a BMW sedan. At three different levels fusion is applied to approach the generation of a robust and accurate description of the area in front of the vehicle. Based on this environment perception a preventive safety application is outlined, which autonomously brakes in case of an inevitable accident.
1. INTRODUCTION Statistic evidence of the European Union shows that accidents resulting in fatalities or serious injuries are caused to the highest percentage by collisions of cars with vulnerable road users. This fact points to the urgent need for active and passive automotive safety systems as a significant contribution to the overall road safety. For this purpose the Preventive and Active Safety Applications project (PReVENT), an European automotive industry activity co-funded by the Sixth Framework Programme of the European Commission (EC), was established. Within the PReVENT subproject COMPOSE, one conceptual application aims at collision mitigation of cars by means of au-to-no-mous braking in case of inevitable pedestrian accidents or rear-end collisions in urban areas. However, an erroneous application of emergency braking caused by false alarms would greatly impede road safety improvement not lastly due to the major setback such an incident would represent for driver acceptance. Therefore, an active autonomous intervention in the process of driving requires an outstanding degree of perception performance, particularly with regard to availability, robustness and accuracy. Current off-the-shelf single sensor approaches can hardly fulfil these challenging demands. Accordingly, the potential of a multi sensor system in combination with a novel three-level early fusion approach is researched in this paper. 1.1 Related Work Takizawa et al. [TYI04] propose a fusion method for the detection of vehicles. Lidar data and image features are combined to a fusion vector which is classified by a principle component analysis. Although detected vehicles are tracked by a Kalman filter, fusion is only utilized at a single level (within the classification process). A sensor fusion architecture based on Bayesian networks is offered by Kawasaki [KK04]. A Bayesian network describes the fusion system in a causality model, which makes the fusion algorithm easy to understand. The proposed architecture and algorithm was tested with a perception system composed of millimeter wave radar plus vision sensor for vehicle tracking. 1.2 Overview This paper focuses on a novel three-level early fusion approach
based on only slightly pre-processed sensor data. In chapter 2 we briefly give a taxonomy on different sensor fusion techniques. The envisaged safety application is presented in chapter 3. In the following chapter 4 the sensor configuration and the resulting sensor data is discussed. Chapter 5 is dedicated to the novel three-level early fusion approach. After a short motivation with respect to early fusion, section 5.1 gives an overview of the tracking cycle and explains the three levels of fusion. Finally, the last section gives an overview on the system architecture and implementation details of the fusion system. 2 SENSOR FUSION TAXONOMY Sensor Fusion comprises a very wide domain and one has to deal with many varieties. Elmenreich [Elm02] proposes an universal definition: “Sensor Fusion is the combining of sensory data or data derived from sensory data such that the resulting information is in some sense better than would be possible when these sources were used individually.” There are several ways to categorize sensor fusion approaches like regarding the point in time when the fusion is performed (see figure 1) or considering the interaction role between two sensors (see Brooks [BI97]). 2.1 Time based taxonomy
Figure 1: Time based fusion taxonomy. Raw data fusion In early or raw data fusion systems data provided by multiple and even diverse sensors is combined at an early stage of the data processing chain. In addition, a
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
e-Journal - Volume 1 Page 20
joint data interpretation with respect to a com-mon mo-del basis is performed. Data from one sensor is assessed with regard to the relevance of its information, always in the context of data provided by other sensors. Feature fusion Feature fusion combines various features such as edges, corners, segments or positions. These features are generated by a pre processing which acts independently of the other sensors. Track fusion In track-based or de-cis-ion fusion approaches several sensor data streams are processed independently from each other until the level of object data is reached. Based on these independent results a common decision has to be made. At this point object-data rather than sensor data is combined. 2.2 Interaction based taxonomy Complementary fusion If two sensors work independently from each other the fusion is called complementary. For example, two sensors surveying the environment in two non overlapping areas work in a complementary fashion. Cooperative fusion In cooperative fusion systems multiple sensors are working together in a tight and coupled manner. Take two cameras for 3D reconstruction based on stereo computer vision algorithms as an example. Competitive fusion Competition is introduced in a fusion system if sensors are operating redundant that is to say two or more sensors estimate the same object property. Strategies have to be introduced in order to solve the conflicts which arise if sensors disagree about object properties. 3 COLLISION MITIGATION APPLICATION The target application of the demonstration system is collision mitigation by means of autonomous braking. According to a German accident analysis [Bun01], most of the accidents with vulnerable road users happen in urban areas on straight, unprotected roads. Therefore, the collision mitigation application will mainly focus on the pre mentioned scenarios. The basis for the envisaged autonomous braking is a probabilistic situation assessment. Only if an inevitable collision is detected, the system engages the brakes autonomously. Subject to the condition that a perfect environment perception would be educible, this would have high potential to attenuate or even prevent accidents, since machines are capable of reacting much faster and more efficiently than human drivers. 4 PERCEPTION SYSTEM The central challenge for many advanced driver assistance systems is an adequate perception of the vehicle’s environment, a high degree of reliability and last but not least a high degree of measurement precision. One of the key factors to meet these requirements is a multi sensor perception system which is explained in the following section. 4.1 Sensor Configuration BMW has set up an experimental car with the following sensor configuration (see figure 2) to investigate the potential of multisensor perception.
Figure 2: BMW experimental car equipped with the following sensor configuration: (a) laser scanner, (b) long range radars, (c) grey-scale camera, (d) far infrared camera, (e) short range radars. Concentrating on the surveillance of the area in front of the vehicle, these cooperative sensors operating on the basis of distinct physical principles, complement each other both in effective range and spatial accuracy. The usage of a far infrared (FIR) sensor guarantees both perception at bad lighting conditions and straightforward vehicle and pedestrian detection since they have a characteristic signature regarding their temperature (exhaust system respectively uninsulated body parts as head and limb). As most pedestrian scenarios covered by the experimental vehicle are situated in the area to the right side of the road, this sensor is mounted at the right of the frontal bumper. Long and short range radar sensors are surveying the environment ahead providing a seamless transition in distance and field of view resolution. Moreover, a laser scanning (lidar) device is mounted beneath the number plate to enhance the detection and tracking quality for both pedestrians and vehicles. The visual grey-scale cameras are currently used for supervising and controlling purposes only. 4.2 Sensor Data In the following a short survey of the different types of sensor data in combination with their pre processing is given. 4.2.1 Radar The radar sensors provide information about the relative position pr and relative velocity ur of an object (see figure 3a and 3b). Accordingly, a radar measurement is defined as following: (1)
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 21 PReVENT Fusion Forum e-Journal
(3) (4)
(b) (a) Figure 3: Radar reflection of a vehicle. Green and blue boxes represent short range radar reflections. Red and magenta triangles are long range radar reflections. (a) Radar reflections projected into grey-scale image. (b) Radar reflections within the virtual 3D environment. 4.2.2 Lidar The lidar sensor is capable of providing up to 1400 reflection points of the scanned environment. In order to reduce the cost of computation in the subsequent tracking process correlated raw measurements are aggregated to single lines. Several connected lines l1,...ln are combined to segments (see figure 4b). With respect to the nomenclature of graph theory this segment represents a simple path. A lidar segment measurement is defined as following: (2) (a) (b)
Figure 5: Liberalised edges with positive (red lines) respectively negative (blue lines) gradient extracted from a far infrared image. (a) Edges of a pedestrian. (b) Edges of a vehicle. 5 THREE LEVEL EARLY FUSION SYSTEM In contrast to track-based fusion (see section 2) “early fusion” combines data provided by multiple and even diverse sensors at an early stage of the data processing chain and performs a joint data interpretation with respect to a common model basis (cf. [WLVT06). In doing so, signatures of various sub-threshold findings in the data processing chain may interfere constructively and there-by contribute to an above-threshold result to form a distinctive, well-recognized object instantiation. Thus, an increase of robustness, reliability and consistency in the environment perception is expected as the input from an individual sensor can be processed in view and with the help of the other sensors. To come up with this early fusion demand we enhanced several of the basic tracking steps mentioned in the following. 5.1 Levels of Fusion Generally speaking tracking can be performed by three circular steps namely time prediction, data association (data matching) and measurement update (correction) [FP02]. On top of this basic pattern we added further steps to cope with early fusion and multi-object demands (see figure 6). Fusion is utilized at three different levels (hypotheses generation, classification and measurement update) of the tracking system (see highlighted steps of figure 6). Firstly, fusion during the hypotheses generation improves the system response time as the initial guess can be estimated more precisely. Secondly, the classification of objects is more robust if features of all available sensors are taken into account. Finally, fusion at the filtering level provides more precise and high available tracking results as redundant and complementary sensor data is combined. These extensions as well as the fundamental structure of the system are discussed in the subsequent sections.
Figure 4: Reflections and segment data of a vehicle generated by a four-layer lidar sensor. Green, red, yellow and blue boxes represent the lidar echo at different layers. The red line illustrates the result of the pre processing (segment data). (a) Lidar responses projected into grey-scale image. (b) Lidar reflections and segments within the virtual 3D envi-
4.2.3 Far infrared Vertical edges (Ê+ and Ê- ) with positive respectively negative gradient are extracted from the far infrared image by a sobel operator (see figure 5). A subsequent coarse pre-clas-si-fi-cation step rejects irrelevant edges. Within the early fusion processing (see section 5) a common three-dimensional sensor data description is necessary. Therefore, a projection converts image plane edges Ê into their corresponding threedimensional representation E. Accordingly, the height H of an image edge Ê is calculated. A far infrared measurement Mf+, respectively Mf-, is defined as following:
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
e-Journal - Volume 1 Page 22
Figure 7: Example for hypotheses generation and aggregation. The cyan boxes represent new instantiated hypotheses. An aggregation has occurred with the car hypothesis ahead. Classification: Classification occurs at three different phases in the data processing pipeline (see figure 8). According to the particular demands for the recognition, adequate and adapted classifiers are utilized.
Figure 6: Overview of the tracking cycle. The cycle starts at the blue circle with the data acquisition. The emphasized components hypotheses generation, classification, and measurement update mark the three levels of fusion. Data acquisition: As most of the used sensors are working on different clock rates and time is crucial in collision mitigation applications, we preserve a high time resolution by a semiasynchronous data acquisition. The actual data acquisition is done by polling every sensor for new data. Time update: According to every objects’ state (position, orientation, velocity, etc.) at the previous cycle, these states have to be estimated for the current time. As an example, this can be performed on the basis of the objects’ underlying dynamic models. Predicted measurement generation: In the previous step for every object an updated representation (state) with regard to the current time is generated. These estimated states are the basis for the following step, which predicts what each sensor would measure under the assumption that every objects’ state was correctly estimated. Data association: The next step within the aforementioned tracking cycle is the data association that extracts and assigns corresponding pairs of real and predicted measurements. An algorithm from Hopcroft [HK73] for maximum matching in bipartite graphs is used for this purpose. Hypotheses Generation: A priority goal of the hy-pothe-ses generation is a direct and complete detection of all so far untracked and possibly relevant objects in the sensors’ ranges. Thereto, a high rate of errors of second kind is consciously taken into account. Usually, a subsequent classification procedure as well as an observation of the objects over time can select and eliminate irrelevant assumptions. To limit the cost of computation, the hypotheses generation focuses on salient and unmatched sensor data. Currently, unmatched salient measurements where new assumptions are placed, are radar responses , lidar segments within certain dimensions and pairs of vertical edges from the far infrared imaging device. An aggregation step tries to combine overlapping hypotheses to one hypothesis. The initial state for this new hypothesis is composed of all measurements from all involved sensors (compare the illustration 7). Therefore, the oscillating phase caused by the Extended Kal-man Filter may be shortened.
Figure 8: Classification phases of an object life cycle. (1) coarse pre-classification, (2) hypotheses classification, (3) object revalidation.
1. Hypotheses are initialized on salient and unmatched sensor data. A first coarse pre-clas-si-fi-ca-tion step rejects impractical assumptions by checking if the width or height of a hypothesis lies below a threshold . This simple criterion ensures a very efficient processing resulting in a high throughput. Furthermore, a high rate of errors of second kind is consciously taken into account since a direct and complete detection of relevant objects is mandatory. 2. Relevant objects (pedestrians and motorcars) are determined by a recognition process on active hypotheses. For that reason certain state components of a hypothesis are taken as feature input for a decision tree. The feature vector is composed of the hypothesis age , velocity , dimension , variance and the existence of adjacent far-infrared image edges. These features are derived from different sensor measurements. 3. Classified objects and hypotheses are checked for their confidence values in regular time intervals. This step is performed by an object revalidation process. Hypotheses and classified objects are removed from the virtual environment, if they are no
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 23 PReVENT Fusion Forum e-Journal
longer supported by sensor data over a certain period of time. Measurement update: A conventional Extended Kal-man Filter (EKF) (see [WB95] for instance) has been chosen since it handles the nonlinearities of this application quite well. For every assigned pair of real and predicted measurement, which has been calculated before, a measurement update on the underlying object is performed. In doing so, the information of several measurements enhance the states by updating the objects’ state values and furthermore, lowering the estimation error covariance. There-by, for each assigned sensor data a measurement update step is conducted before the next cycle starts with the object’s state prediction in time. With the notation of [BW95] the equations at time step k of the EKF’s measurement update can be written as
component (see figure 9) was implemented in OpenGL. This allows for an acceleration by a 3D graphics adapter and consequently disburdens the central processing unit.
(5)
(6)
(7) The following specific term: Figure 9: Screenshots of our demonstrator iFuse visualizing the virtual automotive environment. (a) Bird's eye view with own car (blue), sensor data and detected objects. (b) Same scene as (a) seen from a perspective projection with far infrared visualisation and gray-scale camera image.
has already been evaluated during the calculation process of predicted measurements and thus equation (6) can be written as
(8) for every pair (yk, yk ) of measurement and predicted measurement, matched by the data association process. As all sensor data is projected into the 3D global world coordinate system, the entries of the Jacobian can be easily deduced from the underlying object-model without any further complex and time consuming calculations. 5.2 Implementation Details A cyclic top-down architecture has been implemented to facilitate the detection, classification and tracking of relevant road users over time. The real world vehicle surroundings and the sensor configuration are reflected by a virtual environment, which is modelled as a hierarchical scene-graph structure [BW95], ensuring centralized data access and efficient spatial dependency processing. Furthermore, a vector-quaternionscalar (VQS) [RH94] representation has been chosen in order to achieve coordinate system transformations between the entities of the scene-graph. The topological object modelling is based on a winged-edge representation [BGZ02]. Essential tasks, like sensor coordinate transformation, clipping or occlusion testing can be easily performed and adapted for a specific sensor, since both the topological and the spatial modelling is widely-used in computer graphics. To allow an efficient graph traversal as well as a decoupling of algorithm and data portions, the so called Visitor Design Pattern [BMRS96] has been used extensively. The visualization
6 CONCLUSION This paper proposed a novel three-level early fusion approach to detect and track cars and pedestrians in real-time. Early fusion is applied at three different levels of a common tracking approach. Firstly, fusion during the hypotheses generation has shown to improve the system response time as the intial guess can be estimated more precisely. Secondly, the classification of objects is more robust since features of all available sensors are taken into account. Finally, fusion at the filtering level provides more precise and high available tracking results as redundant and complementary sensor data is combined. However, it has to be considered that due to the high amount of raw sensor data real-time demands are difficult to preserve. 7 FURTHER WORK The classification quality and the system response time can be further improved by utilizing more complex classifying algorithms like neuronal networks or support vector machines. Further research is needed in order to evaluate the suitability of these algorithms with respect to multi-sensor demands. In addition, an increased set of object types like trucks, cyclists or motor-cyclists will improve the granularity of the perception system and could allow for the conceptual implementation of other applications. Within the hypothesis generation step, the potential of a Kalman filtered measurement aggregation has to be evaluated. Object dependent filtering techniques like a parti-
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 24 PReVENT Fusion Forum e-Journal
cle filter will be applied in order to achieve a more robust and granular tracking. In addition to these improvements, an extensive evaluation of the perception system is planned. 8 ACKNOWLEDGEMENT The three-level fusion approach presented in this paper is part of the main results achieved in the COMPOSE-project which is an application-driven subproject of the PReVENT Integrated Project, an automotive initiative co-funded by the European Commission’s Sixth Framework Programme for active road safety. COMPOSE aims at collision mitigation and protection of vulnerable road users by (semi-) automated braking and to this end develops robust and reliable environment perception systems. The one which is based on a novel three-level early fusion approach is presented in this paper. REFERENCES
[BI97] Richard R. Brooks and S. Sitharamar Iyengar. Real-time distributed sensor fusion for time-critical sensor readings. In Optical Engeneering, volume 36, pages 767–779, March 1997. [BGZ02] H.J. Bungartz, M. Griebel, and C. Zenger. Einfuhrung in die Computergraphik. Wiesbaden: Vieweg, 2002. [BMRS96] Frank Buschmann, Regine Meunier, Hans Rohnert, and Peter Sommerlad. Pattern-Oriented Software Architecture: A System of Patterns, volume 1. John Wiley and Sons Ltd, 1996. [Bun01] Statistisches Bundesamt. Verkehrsunfälle, Dezember und Jahr 2001. Technical Report 8, Statistisches Bundesamt, 2001. Reihe 7. [BW95] B.D. Allen Gary Bishop and Greg Welch. Tracking: Beyond 15 minutes of thought: Siggraph 2001 course 11. Technical report, University of North Carolina at Chapel Hill, 1995. [Elm02] Wilfried Elmenreich. Sensor Fusion in Time-Triggered Systems. PhD thesis, Technische Universität Wien, 2002. [FP02] David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach. Prentice Hall, 2002. FOR d 02:1 1.Ex. [HK73] John E. Hopcroft and Richard M. Karp. An n algorithm for maximum matchings in bipartite graphs. [KK04] Naoki Kawasaki and Uwe Kiencke. Standard platform for sensor fusion on advanced driver assistance system using bayesian network. In 2004 IEEE Intelligent Vehicles Symposium Proceedings, pages 250-255, University of Parma, Italy, June 2004. IEEE. [RH94] Warren Robinett and Richard Holloway. The visual display transformation for virtual reality. Technical Report TR94-031, University of North Carolina at Chapel Hill, Department of Computer Science, Chapel Hill, NC, USA:, 10 1994. [TYI04]\ Hiroomi Takizawa, Kenichi Yamada, and Toshio Ito. Vehicles detection using sensor fusion. In Proceeding of IEEE Intelligent Vehicles Symposium 2004, pages 238 {243, Parma, Italy, June 14-17 2004. IEEE. [WB95] Greg Welch and Gary Bishop. An introduction to the Kalman Filter. Technical Report TR95-041, University of North Carolina at Chapel Hill, Department of Computer Science, Chapel Hill, NC 27599-3175, 1995. [WLVT06] Leonhard Walchshäusl, Rudi Lindl, Katrin Vogel, and Thomas Tatschke. Detection of road users in fused sensor data streams for collision mitigation. In Jürgen Valldorf and Wolfgang Gessner, editors, Proceedings of the 10th International Forum on Advanced Microsystems for Automotive Applications (AMAA'06), pages 53-65, Berlin, April 2006. VDI/VDE/IT.
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
e-Journal - Volume 1 Page 25
Precise Host Localization in Urban Areas
Thorsten Weiss+, Stefan Wender+, Kay C. Fuerstenberg*, Klaus C. J. Dietmayer+ University of Ulm, Albert-Einstein-Allee 41, 89081 Ulm, Germany {thorsten.weiss, stefan.wender, klaus.dietmayer}@uni-ulm.de * IBEO Automobile Sensor GmbH, Fahrenkroen 125, 22179 Hamburg, Germany kf@ibeo-as.de Abstract - Robust ego-localization is an essential technology for future intelligent vehicles and cooperative systems. In this paper a multi sensor fusion framework for the precise determination of the position and the orientation of a host vehicle is proposed. The data of different sensor systems is combined, such as laser scanners, high accuracy digital maps, GPS, and the vehicle’s on-board yaw rate sensor and wheel speed encoders. In order to increase the accuracy of the host position and orientation estimation, stationary objects, such as posts of traffic lights and traffic signs are used as landmarks, which are detected by an automotive laser scanner. With known position of the landmarks, the position and orientation of the vehicle framework is proposed, which fulfils these requirements and allows for the determination of the vehicle’s position in the 10 cm level. The basic idea of this approach is to use stationary traffic infrastructure objects as landmarks (features), such as posts of traffic lights and signs as well as house walls. These features are registered offline in a detailed digital map. The laser scanner is able to detect the objects online. With known positions of these objects in the laser scan, the position of the vehicle is determined accurately in the 10 cm level relative to the digital map. After the determination of the position and orientation of the vehicle relative to the map, the positions of traffic infrastructure elements, such as the accurate position of lanes, sidewalks or house walls, which were also registered in
+
1. INTRODUCTION Cooperative systems using vehicle to vehicle or infrastructure to vehicle communication are a main topic in today’s research in intelligent vehicles [1]. In order to use these systems efficiently, precise information about the position and orientation of the vehicles communicating with each other is indispensable to relate the communication partners to each other. However, the requirements concerning the position accuracy of the communication partners differ for different scenarios. Highway applications, such as communication based ACC (Adaptive Cruise Control) or highway merging assistants depend on a correct association of the vehicles to the lanes and a longitudinal position accuracy in the range of several meters, for instance [1]. Commercially available SBAS DGPS receivers, which will be integrated in modern cars, provide position measurements with an accuracy in the range of several meters [2]. Therefore, an association of a vehicle to the correct lane with a GPS only system is not practicable. Deploying environmental sensors, such as video systems or laser scanners, to the GPS system, the overall positioning accuracy can be improved significantly [3, 4]. However, in urban areas the requirements for positioning systems are harder. GPS outages and more inaccurate or invalid position measurements in house canyons and tunnels aggravate a precise determination of the vehicle’s position and orientation [2]. Due to this fact, a definite association of the vehicle to a lane or the precise positioning of a vehicle on an intersection is a hard task in urban areas even with the help of common navigation maps using map matching approaches. The observation of lane markings to determine the correct position of a vehicle in urban areas using video sensors is difficult, because lane markings are often occluded by proceeding cars and the robust detection of the various lane marking types is difficult. Therefore, the challenge is to develop a precise positioning system working robustly even in urban high dynamic environments in order to provide cooperative systems necessary information. Furthermore, a detailed environmental description of the stationary traffic infrastructure in form of digital maps will be helpful. In this work a localization
Figure 1: Intersection scenario. The grey objects are detected by an environmental sensor. The green car transmits its position via vehicle to vehicle communication. With the help of a detailed digital map in conjunction with the precisely known position and orientation of the host vehicle, the objects are related to the environmental description. the map, can be related to the vehicle as shown in Figure 1. Furthermore, the laser scanner provides the positions and the
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 26 PReVENT Fusion Forum e-Journal
dynamic states of moving objects, such as cars, trucks, and pedestrians [5]. This allows for distinguishing a pedestrian walking on the sidewalk from another one walking in the street, for instance. The information provided by the laser scanner, the detailed digital map and communicating road users gives an accurate representation of the environment of the host vehicle. Beyond cooperative systems, this approach gives important information to future driver assistant and safety applications such as urban ACC, PreCrash, turning and intersection crossing assistants as well as intelligent situation assessment algorithms especially in urban areas. Furthermore, classification algorithms can be improved by using a detailed digital map containing the precise positions of lanes in a street section [6].
2. LASER SCANNER The multi–layer laser scanner ALASCA XT (Automotive LAser SCAnner) by the company IBEO Automobile Sensor GmbH acquires distance profiles of the vehicles environment of up to 240° horizontal field of view at a variable scan frequency from 10 Hz up to 25 Hz. The range resolution is 4 cm and the angular resolution is 0.1° to 1°. The laser scanner has four individual scan planes with a vertical opening angle of 3.2° which allows for compensating pitch automatically. The ALASCA XT works in bad weather conditions like fog and rain using four-targets and four-echo technology [7]. The ALASCA XT is shown in Figure 2. The laser scanner is integrated in a rugged enclosure in the front bumper of our testing vehicle. Thus, the field of view of the laser scanner is reduced to about 160° horizontally.
Figure 2: Left: The ALASCA XT laser scanner. Right: Testing vehicle with the integrated laser scanner.
3. MAP BUILDING Today’s commercial digital maps fulfil the requirements for navigation applications and fleet management systems. These maps contain information about the traffic infrastructure, such as the number of lanes, the positions of intersections, points of interest and attributes, such as the permitted driving direction. The maps are generated using different data sources. Surveying departments in most countries generate accurate land register maps, where streets, forests, and buildings are registered accurately. Work is being done in extracting map features from air photographs, which were taken from satellites or airplanes. The most cost intensive and time consuming method to obtain important information for digital maps is field
exploration, which is indispensable to provide correct navigation information [8]. However, the position accuracy of these methods is in the meter level. In order to obtain more accurate and more detailed maps, additional data is required. Mapping approaches using laser scanner or video cameras are in the focus of mobile robotics research [9, 10]. Many approaches consider the problem of generating consistent grid maps. Within the approach proposed in this work, stationary objects are used to determine the precise position and orientation of a vehicle in a street scenario. These stationary objects must be detectable by a laser scanner. The laser scanner acquires horizontal distances profiles of the of the vehicle’s environment. Hence, vertical elongated objects are capable landmarks. Posts of traffic lights or signs, as well as walls or crash barriers are applicable stationary objects which can be used as landmarks. These objects remain at the same position for a long period of time and they keep their shape in contrast to hedges, trees or other non stationary objects. For this reason, the positions and shapes of these objects are determined and stored to a digital map. The extracted stationary objects, which are capable for the positioning approach, are called map features. In order to achieve high position accuracy on the one hand and a rapid mapping process on the other hand a semi-automatic mapping approach using laser scanners was developed and implemented. The basic idea of the mapping approach is to generate accurate grid maps of the region to be mapped. Therefore, the region is subdivided into small boxes. The distance profiles of the laser scanner are accumulated and the distance measurement points are registered to the corresponding boxes. The position and the orientation of the mapping vehicle must be known precisely in order to generate an accurate and consistent grid map. For this reason the mapping vehicle in this prototype implementation is equipped with a Novatel RTK-GPS (Real Time Kinematic GPS) receiver providing position measurements in the centimetre level. A stand alone RTK-GPS receiver does not provide accurate position measurements in all urban scenarios due to the fact, that temporarily few satellites are in the field of view of the receiver or the signals from the satellites are temporarily interrupted. Especially in tunnels or street canyons, RTK-GPS must be supported by an Inertial Measurement Unit (IMU) to assure accurate position measurements during GPS outages. Another possibility for the mapping of limited regions is to use the integrated yaw rate sensor and wheel speed encoders using a dead reckoning approach or other precise localization techniques. In order to relate the grid map to a world coordinate system a reference coordinate point and the orientation of the gridmap relative to a reference coordinate system must be defined. In this approach the geographic WGS-84 coordinate system is used as a reference coordinate system as most GPS receivers provide their position measurements in this datum. The positions of features in common digital navigation maps are also often stored in WGS-84 coordinates. For this reason the WGS-84 coordinates of the map features are determined, which allows for adding the map features to common navigation map databases [11]. Figure 3 shows an exemplarily grid map of an intersection scene with some extracted map features. With the help of this system the positions of map features on an intersection can be determined within a few minutes with very high accuracy. Reference measurements using RTK-GPS in static mode have shown that the position accuracy of the features is in the 10 cm level.
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
e-Journal - Volume 1 Page 27
calculation of its position and a transmission delay. The time delay is up to several hundred milliseconds, which leads to position errors when used in moving vehicles. Regarding a vehicle with a speed at 100 km/h and a GPS receiver with a time delay of 500 ms, the position error due to the latency is 13.9 m. The point of time the measurement is performed is determined using the PPS (pulse per second) signal of the GPS receiver. With the help of this signal the time delay ΔΤGPS is determined after the position data is made available to the application. The Temporal Prediction module performs a position prediction from the point of time the Figure 3: Grid map of an intersection. Three exemplarily features were extracted and their WGS-84 measurement was coordinates were determined with respect to the precisely known origin (green circle). The video refer- performed to the subsequent laser scan after receiving the position message of the GPS receiver. Therefore, a bicycle model defined in a geographic coordinate system is applied. With respect to the length of the semi-major axis 4. LOCALIZATION α=6378137,0m, the length of the semi-minor axis b=635752.42m and the eccentricity e=α2-b2/α2 of the WGS-84 4.1 System Architecture ellipsoid, its meridian curvature Μκ(λm) and its cross curvature The localization approach uses several sensor systems to Nκ(λm) as well as the position measurement φm, λm, hm, the determine the position and orientation of the host vehicle. Standard DGPS receivers will be integrated in vehicles equipped with a navigation system and they provide position measurements in the range of 5-10 m in urban areas [2]. The vehicle’s onboard sensors, such as yaw rate and cross acceleration sensors as well as wheel speed encoders are used to determine the translation and rotation of a vehicle between two laser scans using a dead reckoning algorithm. The laser scanner in conjunction with detailed digital maps containing the position of map features allows for determining the position of the vehicle precisely. Figure 4 shows the architecture of the localization algorithm. In the following sections the algorithms of the modules are introduced. 4.2 GPS Data Processing The module Temporal Prediction compensates for the time delay between the point of time a GPS position measurement was performed and the data is made available to the application. The reasons for this delay are the Figure 4: System Architecture. Several sensor systems are combined in order to determine the position and orientation of the vehicle relative to a digital map computation time of the receiver for the
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 28 PReVENT Fusion Forum e-Journal
actual course angle of the vehicle with respect to geographic north αm and its velocity vm the predicted longitude φpr, latitude λpr and the altitude hpr is determined:
48.4197 48.4196
latitude [°]
λ pr = λm + φ pr = φm +
h pr = hm
vm ⋅ ΔTGPS ⋅ cos α m Mm + h vm ⋅ ΔTGPS ⋅ sin α m (N m + h ) cos α m
(1)
48.4196 48.4195 48.4195 48.4194 Spline GPS measurement Temporal Prediction Association 9.942 9.9422 9.9424 longitude [°] 9.9426
(2) (3)
48.4194 48.4193
M k (λm ) =
N k (λm ) =
(1 − e
a(1 − e )
2
sin 2 λm
)
3
(4)
a 2 1− e sin 2 λm
(5)
Figure 5: Results of the GPS module of a drive through a roundabout. The velocity of the vehicle was about 40 km/h. The time delay of the DGPS receiver in the prototype implementation is alternating between 360 ms and 720 ms positioning system, which is used for initialization. If standard GPS receivers are used, the ROI must be chosen up to 30m x 30m. In the second step an algorithm clusters close laser scanner distance measurements to segments [11]. Because of the large dimensions of the ROIs, many segments are inside the ROIs in urban areas. The challenge is to identify those segments, which correspond to the transformed features. If the dimension of a segment in the ROI of a feature exceeds the dimension of the feature significantly, the segment will be excluded from succeeding steps (see Figure 6a, segment S2). The basic idea to solve this association problem is the definition of a translation and rotation invariant representation of geometric arrangements using triangles. Triangles are completely characterized by the lengths of their three side lengths. The side lengths of a triangle do not depend on the position and the orientation of the triangle. In our approach a set of three transformed features represents the vertices of the triangle. The three side lengths are charted to a 3D diagram, where each of the three side lengths is plotted in one axis as shown in Figure 6b. Therefore, each arrangement of three features has a specific point (feature point) in the 3D chart. This is also done with the segments in the ROIs. Consequently, for each combination of three segments a segment point can be defined. The distance between the feature point and the segment points (TrAss-Distance) is a characteristic for the similarity of the arrangement of the three features and the segments. The segment point, which has the lowest distance to the feature point in the 3D chart, represents the segment arrangement, which is most similar to the feature arrangement. If the lowest TrAss-Distance falls below a threshold, the features are associated to the segments. If there are more than three features in the field of view of the laser scanner, a mesh grid will be build up using Delaunay Triangulation as shown in Figure 7c. For each feature triple a segment triple is searched. In a post processing step, the meshes are compared to exclude invalid associations. The computation time of the Mesh-TrAss algorithm in urban high
Figure 5 shows an example of a drive through a roundabout. With the help of this simple temporal prediction algorithm, the position error due to the latency time can be reduced significantly. 4.3 Landmark Navigation This section introduces the algorithms for the determination of the position and orientation of the vehicle using laser scanners and map features. 4.3.1. WGS-84 Coordinate Adjustment The Motion Estimation module determines the translation and rotation of the vehicle between two consecutive laser scans in Cartesian coordinates using the onboard sensor data. The module WGS-84 Coordinate Adjustment determines the change of the geographic coordinates with respect to this data. 4.3.2. Segment Feature Association with Mesh-TrAss (MeshTriangle Association) The main challenge in landmark navigation in urban areas is the localization of map features in the distance profile of a laser scanner in highly dynamic environments. In order to solve this problem a new association algorithm was developed. In a first step, map features, which are potentially in the field of view of the laser scanner, are transformed into the laser scanner coordinate system with respect to an initial position. In the test setup DGPS is used for the initialization of the module. Other positioning systems, such as local radio beacon or GSM based localization systems can also be used, if GPS is not available. After the transformation of the features into the laser scanner coordinate system, the transformed features are rotated and translated in the distance profile. The reason for this is the inaccuracy of the position and orientation measurements of the initial measurements. Therefore, a ROI (Region Of Interest) around each transformed feature is defined (see Figure 6a). The dimension of the ROI must be chosen large enough to assure that the landmarks can be found within the ROIs, depending on the accuracy of the
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
e-Journal - Volume 1
ROI: F0
9
Page 29
S4
ROI: F3
8
F0
S3 S2
F3
S8 S7
F0
lF0-F2
7 8 7 6 7 8
lF0-F1
F1
S0 S1
5
5
6
lF1-F2
F2
S5
ROI: F1 ROI: F2
F1
S4
feature triangle
Figure 6: Left: The features are transformed into the laser scanner coordinate system with respect to an initial position measurement provided by GPS. Close scan points are clustered. Right: A set of three features is represented by one specific point in the 3D chart.
F2
S4 most similiar segment triangle S3 S8 S7
S4
F0
feature triangle
S3 S8 S7 segment triangles
F0
F0
S3
segment mesh S8 S7
F3
feature triangle
F3
F3
feature mesh
F1
S0
S1
F2
S5
F1
S0
S1
F2
S5
F1
S0
association
S1
F2
S5
Figure 7: Triangles are defined with the segments as its vertices (a). Each segment triple is represented as a specific point in the 3D chart in Figure 6b. The segment triangle, which is most similar to the feature triple, is associated (b). The feature mesh and the segment mesh are compared in a post processing step (c). dynamic environments is about 2 ms on a Pentium 4-M / 1,6 GHz. Consequently, the approach is applicable in real time. 4.3.3. Local and Global Pose Correction A position correction algorithm determines the translation ti= [tx, ty]T and the rotation φi between the feature and the segment arrangement after the association. For n-pairs of points a distance function Edist between the points Pi (coordinates of the segments) and the points P΄i (coordinates of the transformed features) is defined: 4.5 Traffic Infrastructure Maps With known position and orientation of the vehicle additional traffic infrastructure objects, which are not detected by the laser scanner, such as lane markings or sidewalks are transformed to the laser scanner coordinate system. Moving objects detected by the laser scanner are related to the infrastructure description accurately. 5. RESULTS With the help of the RTK-GPS receiver (Real Time Kinematic) Novatel DL-4 plus, we are able to determine the position of the vehicle with an accuracy of about 3 cm in kinematic mode. Therefore, the RTK-GPS receiver is used as a reference system in order to analyze the position accuracy of the localization framework. In order to assure high position accuracy of RTK-GPS, the measurements were performed on a free testing field, without any near buildings, bridges, or trees. Several map features were placed and the positions of these map features were determined using the approach presented in section 3 (see Figure 9). The scene was passed by the testing vehicle at different speeds and yaw rates. The result of the localization approach was compared to the RTK-GPS position measurement. In Figure 8 the absolute position error of the localization framework is shown for different runs through the scene. The absolute error is in the range of about 10 cm. Consequently, the framework allows the localization of a
E dist = ∑ (R (ϕ i ) ⋅ p i + t i − p ′ ) i
i =1
n
2
(6)
A closed form solution for the minimization of Εdist is obtained to determined the translation vector ti and the rotation angle φ in the Local Pose Correction module [13]. In the Global Pose Correction module the WGS-84 coordinates of the vehicle are adjusted. 4.4 Position Tracking The position and orientation, which was determined by the localization approach in section 4.3, is tracked by a Kalman Filter based Position Tracker.
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 30 PReVENT Fusion Forum e-Journal
abd. position error [m] error [m]
localization with map features
0.4 0.3 0.2 0.1 0.05 0 40
0.1 0.05 0 46
localization with DGPS and onboard sensors
abs. position
Run 1
42
44
46
48
50 time [s]
52
0.2 0.1 0 337.5
54
56
58
60
Run 1 Run 2
46.5
47
47.5
48
338
338.5
map features
localization with map features
abd. positionerror [m] abs. position error [m] 4 3 2 1 0 337 338 339 340
localization with DGPS and onboard sensors
Run 2
Figure 9: Test intersection. Six map features were registered in a digital map. The intersection was passed several times. The results of two exemplarily test drives are shown in Figure 8. REFERENCES Dirk Reichardt, Maurizio Miglietta, Lino Moretti, Peter Morsink, Wolfgang Schulz, CarTALK 2000 Safe and Comfortable Driving Based Upon Inter-Vehicle-Communication, Proceedings of IEEE Intelligent Vehicles Symposium 2002, Versailles, France, June 2002 Ahmed El-Rabbany, Introduction to GPS: The Global Positioning System Artech House, 2001 Klaus Dietmayer, Nico Kaempchen, Kay Fuerstenberg, Joerg Kibbel, Roadway Detection and Lane Detection using Multilayer Laserscanner Lane Detection using Laserscanners, Advanced Microsystems for Automotive Applications, Berlin, Germany, March 2005. Jean Laneurit, Roland Chapuis, Frdric Chausse, Accurate Vehicle Positioning on a Numerical Map, International Journal of Control, Automation, and Systems, vol.3, no.1 pp15-31, March 2005. N. Kaempchen; M. Buehler; K. Dietmayer, Feature-Level Fusion for Free-Form Object Tracking using Laserscanner and Video, Proceedings of IEEE Intelligent Vehicles Symposium 2005, Las Vegas, USA, 2005 S. Wender; T. Weiss, Improved Object Classification of Laserscanner Measurements at Intersections using Precise High Level Maps Proceedings of the 8th International IEEE Conference on Intelligent Transportation Systems, Vienna, Austria, September 1316, 2005. Website: IBEO AS Automobile Sensor GmbH: http://www.ibeoas.de, 2006 Website: Navteq http://www.opengeospatial.org/, 2006 Sebastian Thrun, Learning Occupancy Grid Maps With Forward Sensor Models, Autonomous Robots, 15:111127, 2003. K. Fürstenberg; T. Weiss, Feature-Level Map Building and Object Recognition for Intersection Safety Applications, Proceedings of IEEE Intelligent Vehicles Symposium 2005, Las Vegas, USA, June 2005 K. Ch. Fuerstenberg, K. Dietmayer, Fahrzeugumfelderfassung mit mehrzeiligen Laserscannern Journal Technisches Messen 71 (2004) 3, Oldenburg Verlag, Munich, 2004 Feng Lu, Evangelos Milios, Robust Pose Estimation in Unknown Environments by Matching 2D Range Scans, Department of Computer Science, York University, North York, Canada, 1994.
341
342 343 time [s]
344
345
346
347
348
Figure 8: Measurements were performed on a free testing field. Two representative runs are shown. In run 1, the vehicle drove straight through the intersection at low speed (30 km/h). In run 2 the vehicle passed the scene with a higher speed and a high curve speed. The green bars in the diagram show the periods, where features could be associated and the position of the vehicle was determined with the help of the map features. The red bars show the period, the car has passed the scene and no landmark is in the field of view of the laser scanner. Here, the position is determined using the onboard sensors. That’s the reason why the error increases in run 2. The absolute position error of many other drives was in the same range. vehicle in the 10 cm level relative to a digital map even in urban high dynamic environments. To achieve this, the segment feature association has to work properly. Many runs in real urban intersection scenes have shown that the association works robust and reliable even in high dynamic environments with a lot of moving objects and also in the case, when landmarks were removed. CONCLUSIONS In this work a framework for the localization of vehicles in urban areas using laser scanner and map features is proposed. The Mesh-TrAss algorithm allows the robust association of map features to the clustered distance profile of the laser scanner even in high dynamic environments. The algorithm also works for scenarios, where map features were removed or are occluded. With the help of reference measurements using RTK-GPS the high position accuracy of the localization approach in the range of 10 cm level was shown. Acknowledgement This work is supported and partly funded by IBEO Automobile Sensor GmbH.
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
e-Journal - Volume 1 Page 31
Feature Level Fusion for Object Classification
Stefan Wender+, Thorsten Weiss+, Kay C. Fuerstenberg*, Klaus C. J. Dietmayer+ University of Ulm, Albert-Einstein-Allee 41, 89081 Ulm, Germany {stefan.wender, thorsten.weiss, klaus.dietmayer}@uni-ulm.de * IBEO Automobile Sensor GmbH, Fahrenkroen 125, 22179 Hamburg, Germany kf@ibeo-as.de Abstract – A sensor fusion system is proposed, which can simultaneously serve multiple advanced driver assistant systems. The prototype implementation fuses information of a laser scanner, ESP sensors, a DGPS and a high level map at the feature level. The entire system consists of sensor data preprocessing, tracking, classification and application dependent optimization of the system’s output. Since preprocessing and tracking algorithms were described in former works, this paper concentrates on the classification algorithms and the optimization of the system’s output. The classification benefits from both pattern classification and rule based a priori knowledge. Furthermore a temporal filter of the classification result is applied. The system simultaneously adapts its output to the requirements of different applications in a very efficient manner. Finally, the performance of the prototype implementation is evaluated for different applications in urban as well as in highway scenarios. 1. INTRODUCTION of high level fusion can be found in [2] and [3]. The advantage of this approach is the reusability of tracking and classification algorithms and the scalability of the fusion system. Unfortunately, this method of sensor fusion cannot consider all available information, since only the abstract object layer is passed to the fusion system. The fusion system cannot exploit the detailed information of the sensors’ raw data to solve possible inconsistencies in the fused object lists. In contrast, the low level fusion approach fuses the sensors’ raw data. The fusion system is based on a common object model for all sensors and provides one consistent environment description. It can benefit from redundant measurements of different sensors, at the cost of a higher communication load
+
Modern cars are available with several advanced driver assistant systems. Additional systems are in the focus of research to be developed for future cars. Today’s systems consist of an exclusively used sensor, which observes the environment of the vehicle, sensor data processing hardware and an actuator. A realization with a large number of systems in one car by the means of this concept will be complex and expensive, because redundant hardware and software will have to be installed. A more efficient integration of advanced driver assistant systems in future cars will benefit from a common sensing and data preprocessing platform. A combination of a small number of appropriate sensors should be fused to provide a common vehicle environment description, which can be exploited by many different applications Sensor Preprocessing Adaptation for Application sensor 1 simultaneously. 1 application 1 1 This work proposes a common data preprocessing unit, which manages the necessary tasks of data preprocessing, object Preprocessing Sensor Adaptation for Application tracking and classification. Since other sensor 2 2 Tracking Classification application 2 2 works already considered the task of preprocessing and sensor fusion for object tracking [1] the focus will be on the object classification task. Adaptation for The object classification enhances the Sensor Preprocessing Application application n sensor m information content of the provided vehin m cle environment description. Advanced driver assistant systems can benefit from this additional information by the means Fig. 1. Overview of the vehicle environment perception system: Several sensors are of enhanced robustness and reliability, fused at the feature level. The system performs preprocessing, object tracking and classiwhich includes a reduction of false fication. The output is optimized for different applications. alarms, for instance. Four types of objects should be distinbetween sensors and fusion system. If the sensors’ raw data is guished from the background. The foreground objects slightly preprocessed, the fusion is often called feature level “Pedestrian”, “Bike”, “Car”, and “Truck” should be correctly defusion [1] or early fusion [4]. tected and classified. All remaining objects should be assigned The proposed fusion system prefers the second fusion type to the class “Background”. and performs a feature level fusion for tracking and classificaIn general, there are two oppositional methods of sensor fution. Both parts of the system additionally focus on the scalabilsion. The high level fusion approach, which is often called track ity of the fusion architecture by keeping it as general as possito track fusion, combines sensors at the object level after the ble. application of tracking and classification algorithms. Examples
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 32 PReVENT Fusion Forum e-Journal
Static features
Object features “Pedestrian” [ 0,1 ]
Object features “Pedestrian” [ 0,1 ]
Object features “Pedestrian” [ 0,1 ]
Reflectors
Feature extraction
Dynamic features
Pattern classifier
“Bike” [ 0,1 ] “Car” [ 0,1 ] “Truck” [ 0,1 ] “Background” [ 0,1 ]
Rule based a priori knowledge
“Bike” [ 0,1 ] “Car” [ 0,1 ] “Truck” [ 0,1 ] “Background” [ 0,1 ]
Temp. filter
“Bike” [ 0,1 ] “Car” [ 0,1 ] “Truck” [ 0,1 ] “Background” [ 0,1 ]
l w
Fig. 3. A vehicle’s laser scanner measurements in the bird’s eye view: The features “object size” and
Map based features
Fig. 2. Object classification: Real valued object features are calculated based on all information of the different sensors. A pattern classifier is combined with rule based a priori knowledge to calculate a membership value for each class. The membership vectors are 2. SYSTEM OVERVIEW The entire system consists of preprocessing, tracking and object classification (Figure 1). One part of the laser scanner data preprocessing is the segmentation. The laser scanner’s measurements are clustered based on the distance between these measurements [5]. The object tracking observes objects over time. This is performed using Kalman Filter estimation [1]. The object classification combines all available information to estimate the object class. The output of the system consists of a list of observed, tracked and classified objects, which is optimized for each served application. 3. OBJECT CLASSIFICATION The object classification fuses the sensor information at the feature level. Several real valued object features are obtained from synchronous and calibrated sensors. These features are exploited by all parts of the object classification. A pattern classifier combines the features and calculates membership values between 0 and 1 for each of the distinguished classes “Pedestrian”, “Bike”, “Car”, “Truck”, and “Background”. Afterwards, rule based a priori knowledge is applied to verify and correct the output of the pattern classifier. A temporal filter includes classification results of former time steps. The complete configuration of the Might belong to the road object classification is given in Figure 2. 3.1 Feature Extraction This part of the classification collects information from all available sensors. Partly redundant features in terms of real values are extracted from all available sensors for each observed object. These features represent the input of the classification. Three different feature types have to be distinguished. 3.1.1 Static Features The first feature group is directly calculated with the measurements of the laser scanner. The sensor provides information about the observed object position as well as its width and length. If an object is partly occluded,
Certainly belongs to the road
the size of the occluding object will be evaluated. This information is exploited to estimate the maximal size of the occluded object. In addition, the laser scanner allows for the identification of retro reflectors, which can be observed at vehicle’s license plates or rear lights. The size of detected reflectors and their position on the object contour provide significant features to identify cars. 3.1.2 Dynamic Features The object tracking provides information about the object speed. Absolute object speeds are estimated since a compensation of the host vehicles motion is applied within the tracking. This compensation is based on sensor information of the ESP (electronic stability program). The dynamic information is stored in several features, which cover object speed, maximum speed, average speed, object age, and covered distance since the first observation of the object.
Ignored by tracking and classification
Does certainly not belong to the road Host vehicle
fig. 4. High level map of an intersection with applied uncertainties: The uncertainties of ego position and orientation result in several areas of confidence with respect to the road. The white area certainly belongs to the road. The light gray area partly belongs to the road, but parts of this area are also located outside the road. The dark gray area certainly does not belong to the road. In addition, there are black areas, which are ignored by all tracking and classification algorithms.
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
e-Journal - Volume 1 Page 33
3.1.3 Map Based Features The combination of high level maps, Differential GPS and ego localization is exploited to obtain additional features. The sensor information is based on a precise map of the vehicle’s environment. This map contains precise information about the position of the road. In addition, the evaluated field of view can be reduced by defining areas, which should be ignored by tracking and classification algorithms. These areas usually do not contain relevant objects, but houses or gardens, for example. The combination of DGPS and the actual laser scanner measurements allow for a precise determination of the host vehicle’s position in this map in the 10cm and 1° level [6]. This information is accompanied by uncertainties of position and orientation. These uncertainties have to be applied to the high level map. Thereby several areas of confidence with respect to the high level map are calculated [7]. These areas are illustrated in Figure 4. The white area in the center certainly belongs to the road. The light gray area may belong to the road, but it may partly be outside the road. The dark gray area certainly does not belong to the road. Finally, the black regions describe areas, which certainly should be ignored by tracking and classification algorithms. The host vehicle’s position in this map allows for a determination of the positions of observed objects in the map. While measurements in the black areas are directly removed, the other areas are utilized to calculate object features. Two binary features are calculated for each object. The first feature indicates that the object is in the white area, which means the object is on the road. The second feature marks objects in the dark gray area. These objects are located outside the road. Both features are analyzed in the rule based part of the system. 3.2 Pattern Classification The pattern classifier interprets the ex“Pedestrian” tracted object [ 0,1 ] features to Object feature 1 estimate the class of the “Bike” Object [ 0,1 ] object. It calcufeature 2 lates a mem“Car” bership value [ 0,1 ] between 0 and 1 for each “Truck” class (Figure [ 0,1 ] 5). The value 0 Object means that the “Background” feature k class is im[ 0,1 ] probable. The membership value 1 indiFig. 5. Pattern classifier: This part of the cates a very classification evaluates the object features to probable class. calculate membership values between 0 and 1 The class with for each class. The value 0 indicates an im- the highest probable class, while 1 corresponds to a prob- value is the most probable class for the processed object. This type of output was chosen instead of a direct class selection, as it is much more flexible to additional manipulation. The following parts of the system can directly filter and change the
Pattern classifier Object features
membership values based on additional knowledge. The pattern classifier is designed by the One-VS-All principle. It consists of five separate classifiers, which are based on artificial neural networks (ANN). Each of the networks calculates one membership value by trying to separate one class from all other classes. The networks’ parameters were estimated by a training procedure with appropriate labeled sample data. More details about the pattern classifier can be found in [8]. The object features are passed together with the membership values to following parts, which manipulate this basic estimation of the membership. 3.3 Rule Based Part The output of the pattern classifier is verified and corrected by applying a priori knowledge, if available. This part is able to guarantee all requests to the object classification, which can be expressed by the terms of rules. For example, a car cannot have a width of 4 m. The implementation follows a quite simple concept: It assigns the membership value 0 to all implausible classes of an object (Figure 6).
Object features “Pedestrian”
[ 0,1 ]
Object features “Pedestrian”
[ 0,1 ]
“Bike”
[ 0,1 ]
“Car”
[ 0,1 ]
Rule based a priori knowledge
“Bike”
[ 0,1 ]
“Car”
[ 0,1 ]
“Truck”
[ 0,1 ]
“Truck”
[ 0,1 ]
“Background”
[ 0,1 ]
“Background”
[ 0,1 ]
Fig. 6. Application of rule based a priori knowledge: The membership values are manipulated by exploiting rule based restrictions to the object features.
Operating point selection Selected class for application 1 Selected class for application 2
Application object filter
Max
Classification
Operating point selection
Application object filter
Max
Operating point selection
Application object filter
Max
Selected class for application n
Fig. 7. Application dependent optimization: The system output is optimized for each served application. The tradeoff between small numbers of false alarms and high detection rates is made for each application. Afterwards, the output can be filtered with respect to requirements to regions of interest and object classes. Finally, a maximum operator selects the output class.
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 34 PReVENT Fusion Forum e-Journal
Restrictions to the object size and speed for each object class are applied. In addition, the features of the observed object on the map are analyzed. If an object is on the road, the membership value of the class “Background” will be set to 0, because background objects are not supposed to appear on the road. If an object certainly is located outside the road, the membership values of the classes “Car” and “Truck” will be set to 0, because these vehicles usually drive on the road. 3.4 Temporal Filter Classification results of former time steps are included in the current classification by a temporal filter. This filter applies a temporal mean to each of the object’s membership values. Thus, the classification becomes smoother and single misclassifications due to a temporal missing observability of some object features loose the influence on the classification. 4. APPLICATION DEPENDENT OPTIMIZATION The vehicle environment description is optimized and filtered to the requirements of each served application. This part performs a tradeoff decision between high detection rates and low numbers of false alarms. The number of output classes is also reduced to the application’s requests. In addition, it is possible to reduce the output to given regions of interest. Figure 7 shows a block diagram of the application dependent optimization. 4.1 Classifier performance The system performance is a tradeoff between high numbers of detected objects and a small number of false alarms. The first aim is indicated by the detection rate (Equation (1)), which describes the percentage of all correctly detected and classified foreground objects (“Pedestrian”, “Bike”, “Car”, and “Truck”).
detection rate = true positives true positives + false negatives
Applications’ requirements Usually, different applications have different requirements to the necessary information about the vehicle’s environment. Some require extremely low false detection rates. Others can handle higher numbers of false alarms, but prefer higher detection rates. Furthermore, some applications have smaller regions of interest than others or only need one or two of the foreground object classes. The results in the following section show that an optimization of the system’s output to the application’s requirements can significantly improve the classification result. The subsequent subsections will show an efficient way to perform this task. 4.3 Operating point selection The application of the system to a set of labeled test data yields one detection rate and one false detection rate. The tradeoff between these two measures can be manipulated by shifting the a priori classification probabilities of the five classes. The shifting is realized by adding an a priori classification weight vector w to the calculated membership vector m of an object (Equation (4)). For example, a high value in the last entry of this vector will increase the probability of classifying an object as a truck. The a priori classification weight vectors can also remove unnecessary object classes by putting negative values with a high absolute value in the corresponding entry of the vector.
⎛ mBackground ⎜ ⎜ mPedestrian m' = m + w = ⎜ mBike ⎜ ⎜ mCar ⎜m ⎝ Truck
⎞ ⎛ wBackground ⎟ ⎜ ⎟ ⎜ wPedestrian ⎟+⎜w ⎟ ⎜ Bike ⎟ ⎜ wCar ⎟ ⎜w ⎠ ⎝ Truck
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
(4)
(1)
This number of false alarms is usually described by the false positive rate, which is closely connected with the number of background objects:
false positive rate
=
false positives true negatives + false positives
(2)
Unfortunately, the number of background objects is not clearly defined in the vehicle environment perception task. The exact number of objects will be quite questionable, if unstructured areas like woods with a lot of close objects are observed, for instance. In addition, many background objects are even not observed at all by the sensors. For this reason, another measure is calculated to indicate the false alarms. The false detection rate describes the percentage of errors on all detections of the classifier:
Appropriate a priori classification weight vectors are determined by numeric optimization. This optimization maximizes the detection rate for given false detection rates with respect to the application’s requirements. An operating point curve is created by plotting the achieved detection rates over the corresponding false detection rates. This curve is similar to the well-known Receiver Operating Characteristic (ROC) and describes the system’s performance with respect to the application’s requests. Nevertheless, it must be emphasized, that both curves are not directly comparable since the ROC is based on the false positive rate instead of the false detection rate to describe the system’s performance. The implement ed method to modify the system’s output to the applications’ requirements is very efficient and effective, since the tradeoff between few false alarms and high detection rates is performed online with a single vector addition per application. 4.4 Application Dependent Object Filter Many applications do not need the whole information of the complete field of view of the sensors. This part allows for selecting a region of interest that fits the application’s requirements. This filter can significantly reduce the number of false alarms.
false detection rate =
false positives true positives + false positives
(3)
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
e-Journal - Volume 1 Page 35
4.5 Output Class This part selects the estimated class for the object output. The system selects the class, which corresponds to the highest value in the final membership vector. An output of the membership vector is also possible. 5. RESULTS The system was evaluated for different use cases. The results were obtained from several test sets, which of course were not used for the classifier training process. The test sets were chosen depending on the application. An operating point curve was created for each analyzed application. Each curve benefits from application-dependent optimizations, which were described in the corresponding subsection. 5.1 Urban areas This part of the system evaluation is performed on a sample data set with urban sequences and intersections. These are the most complex scenarios of vehicle environment perception. 5.1.1 General Object Classification Without Optimization The first evaluation is for a general environment description.
1 0.9 0.8 Detection rate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 False detection rate 1
1 0.9 0.8 Detection rate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 False detection rate 1
Fig. 9. Operating point curve for pedestrian detection: The red line shows the results, if the output is optimized for the pedestrian detection application. The dashed blue line shows the results for the pedestrian detection with the general classification without optimizations. ments of the European project SAFE-U [9]. Figure 9 shows the results. The performance of the general classification on the task of pedestrian detection is shown by the dashed line. This shows the improvement of the adaptation. 5.2 Highways and Motorways The third evaluation was performed to detect vehicles on highways and motorways. This adaptation only classifies cars and trucks as vehicles. There is no difference between cars and trucks. The remaining objects are interpreted as background objects. The output can be used to classify objects as target vehicles for actual adaptive cruise control applications (e. g. in [10]). The optimization is performed with a highway
1 0.9 0.8 Detection rate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 False detection rate 1
Fig. 8. Operating point curve for the general object classification: This plot shows the detection rate over the false detection rate. This evaluation was performed without special adaptation to an application. The corresponding output requires a complete classification of all classes. This is the most difficult task, because the high number of alternative classes causes many possibilities of misclassification. No special optimizations of the classifiers output can be applied. The achieved operating point curve is shown in Figure 8. 5.1.2. Pedestrian Detection in Urban Areas The second optimization is for pedestrian protection. For example, this adaptation can serve an application, which performs a pedestrian warning or lifts the engine hood to mitigate a crash with a pedestrian. Only pedestrians have to be detected. The remaining objects should be classified as background. Therefore, all other object classes are interpreted as background. The evaluation analyzes only objects, which are located in a region of interest. This ROI is defined by the sensor require-
Fig. 10. Operating point curve for ACC: The red line shows the results, if the output is optimized for the adaptive cruise control application. Only vehicles are detected. The dashed blue line shows the results for the same task with the general unmodified object classification.
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 36 PReVENT Fusion Forum e-Journal
data set. The evaluated objects are restricted by the following rules: A classification is not performed, before the object is observed three times. Only objects in a field of view of ±30 are analyzed, because object candidates should be in front of the test vehicle. All objects have to move faster than 5 m/s in the direction of ±45. It does not seem to be useful to follow objects in other directions. Again, the performance of the general classification is shown in comparison. Since the general classification was evaluated on a general sample data set, it is not possible to compare the results directly. Therefore, the calculated operating points of the general classification are applied to the highway data set. This result is compared to the performance of the specialized classification in Figure 10.
5.3 Processing Time The system was evaluated on an Intel Pentium M 1.7 GHz. The complete system needs 17 ms processing time in the worst case. The classification and the output optimization only consume a small amount of this time. Both parts together consume only 2 ms in the worst case. The applied sensor sample rates were 10 Hz and 20 Hz. This means the system is applicable in real time for both sensor sample rates. Every adaptation to a new application only needs an additional vector addition. Therefore, additional applications can be served without significant increase of processing time. CONCLUSION A sensor fusion system for the purpose of vehicle environment perception was introduced. The sensor information is fused at the feature level to benefit from all available sensor information while keeping the fusion architecture as general as possible. A prototype implementation was developed, which fuses a laser scanner, the vehicle’s onboard sensors, a DGPS and detailed maps. A classification module was proposed, which fuses measurements of multiple sensors in terms of real valued features. The classification combines methods of pattern matching and rule based a priori knowledge. The provided vehicle environment description can simultaneously serve different applications. A significant improvement of this environment description’s quality was achieved by optimizing the output to the requirements of the applications. This optimization is easy to implement with very low computational costs and can be performed simultaneously for each served application in real time. The prototype implementation was evaluated for different applications with labeled test sequences. This includes the evaluations in urban areas as well as on highways. The improvement of the application dependent optimizations was highlighted for each evaluation. Acknowledgement This work was supported and partly funded by IBEO Automobile Sensor GmbH in Germany. REFERENCES [1] N. Kaempchen, M. Buehler, K. C. J. Dietmayer, “FeatureLevel Fusion for Free-Form Object Tracking using Laser-
scanner and Video”, Proceedings of 2005 IEEE Intelligent Vehicles Symposium, Las Vegas, USA, 2005 [2] R. Moebus, U. Kolbe, “Multi-Target Multi-Object Tracking, Sensor Fusion of Radar and Infrared”, Proceedings of 2004 IEEE Intelligent Vehicles Symposium, Parma, Italy, 2004 [3] R. Labayrade, C. Royere, D. Aubert, “A Collision Mitigation System using Laser Scanner and Stereovision Fusion and its Assessment”, Proceedings of 2005 IEEE Intelligent Vehicles Symposium, Las Vegas, USA, 2005 [4] L. Walchshaeusl, R. Lindl, K. Vogel, “Detection of Road Users in Fused Sensor Data Streams for Collision Mitigation”, Proceedings of Advanced Microsystems for Automotive Applications 2006, , Berlin, Germany, April 2006 [5] S. Wender, K. Ch. Fuerstenberg, K. Dietmayer, “Object Tracking and Classification for Intersection Scenarios Using A Multilayer Laserscanner”, Proceedings of ITS 2004, 11th World Congress on Intelligent Transportation Systems, Nagoya, Japan, 2004 [6] T. Weiss, N. Kaempchen, K. C. J. Dietmayer, “Precise Ego-Localization in Urban Areas using Laserscanner and High Accuracy Feature Maps”, Proceedings of 2005 IEEE Intelligent Vehicles Symposium, Las Vegas, USA, 2005 [7] S. Wender, T. Weiss, K. C. Fuerstenberg, K. C. J. Dietmayer, “Object Classification exploiting High Level Maps of Intersections”, Proceedings of Advanced Microsystems for Automotive Applications 2006, , Berlin, Germany, April 2006 [8] S. Wender, K. C. J. Dietmayer, “Statistical Approaches for Vehicle Environment Classification at Intersections with a Laserscanner”, Proceedings of ITS 2005, 12th World Congress on Intelligent Transportation Systems, San Francisco, USA, 2005 [9] “Strategies in terms of pedestrian protection”, Deliverable 6 of the EC project SAVE-U, http://www.save-u.org/file_html/library.htm [10] R. Schulz, “Laserscanner based Advanced Adaptive Cruise Control for speed ranges between expressway and full stop”, Proceedings of ITS Europe, Hannover, Germany, 2005
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 37 PReVENT Fusion Forum e-Journal
ACC Vehicle Tracking with Joint Multisensor Multitarget Filtering of State and Existence
1
Mirko Maehlisch1, Werner Ritter2 and Klaus C. J. Dietmayer1 University of Ulm, Dept. of Measurement, Control and Microtechnology, Ulm, Germany 2 DaimlerChrysler AG, Research and Technology, Dept. REI/AI, Ulm, Germany uni-ulm.maehlisch@daimlerchrysler.com
Abstract – In this contribution we describe a vehicle detection and tracking system incorporating multibeamlidar, vision and ESP data, that does not rely on target motion in the detection stage. As the system is intended for Automatic Cruise Control (ACC) and Automatic Emergency Braking (AEB) applications, this is a crucial feature especially in traffic jam situations. The system utilizes vision based appearance classification with the Viola-Jones Classifier supported by lidar range constraints in the detection stage, a motion compensation of the host vehicle from ESP data and the recently introduced Probability Hypothesis Density Filter, that allows for temporal filtering of target states and classification hints simultaneously. 1. INTRODUCTION Several driver assistance systems for ACC and AEB were already introduced to the market while research and development are currently focusing on improving performance with sensor fusion approaches. Almost all of the known systems implement the classical sequential approach of detection, tracking and validation or classification. The benefits of this approach are the ability to distribute the processing stages over different nodes of the sensor network and the information reduction into object and track lists, allowing efficient real time processing and low bus load. However, making decisions in the detection stage by thresholding features to compute an object list, possibly for each sensor separately, discards information. The so called low-level or feature-level sensor fusion approaches address this problem by incorporating features from all sensors together for decision making. But even these approaches do not use the temporal prior knowledge from the past feature values for detection purposes. There are circumstances, where one cannot effort to loose any available information for object detection. Apparently, this is the situation in low sensor Signal-to-Noise-Ratio (SNR) environments, where thresholding features will permanently lead to false object lists. Trying to detect vehicles with stealth technology is an low-SNR example and therefore military scientists first developed the concept of Track-before-Detect (TBD). The other case where no information should be omitted emerges from intolerable high costs of false application decisions, based on false environment descriptions provided by the sensor system. Considering applications like ACC stopand-go, where missed detections are disastrous and AEB, where braking for nothing can cause severe accidents, these concepts are interesting for automotive sensing, too. The key idea of TBD is to filter detection hints together with the position and velocity measurements to compute a temporal smoothed estimate for both, existence and state. Ronald Mahler developed the Finite Set Statistics (FISST, [1]) formalism to jointly address the detection, the data association and the tracking problem. In this contribution, we present a vehicle detection system based on lidar and vision, that implements the Probability Hypothesis Density Filter (PHD, [2]), a computational tractable formulation of FISST. Furthermore, we extend the object-list interface to the detection stage of the original PHD Filter to allow the temporal filtering of individual target detection features. 2. SENSORS 2.1 Sensor Setup The sensor system incorporates data from a multibeamlidar, a vision sensor the host vehicles inertial sensors from the ESP system. The lidar has 16 measurement beams arranged in one degree steps. Each beam is widening with 1° horizontally and 2.4° vertically so the total Field of View (FOV) is 16 degrees. The sensor is able to detect targets from up to 200 meters in advantageous atmospheric conditions. Although the sensor internally tracks targets, our system accesses the raw lidar echoes, provided with angle, distance and amplitude. The vision sensor is an automotive near infrared sensitive CMOS camera mounted behind the
Fig. 1. Left: Research vehicle with sensor integration points, Right: Lidar sensor module
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 38 PReVENT Fusion Forum e-Journal
windshield. 2.2 Sensor Alignment In order to use the data of different sensors together, they must be aligned in the spatial and temporal [3] domains. In the absence of FlexRay – a real-time automotive bus architecture, the temporal alignment of a sensor fusion network is difficult. We chose to use a
Image Transfer
Fig. 4. Reflection spots of the lidar sensor visualized with the NIR camera (left), connected components analysis (right) central processing node for timestamp generation and synchronization (Fig. 2). The Real-Time Clock node (RTC) synchronizes to GPS UTC time and can produce trigger events for certain sensors. A limitation of this architecture is the necessity to estimate the internal sensor processing latencies between measurement and bus activity. The spatial sensor alignment is the task of determining the three rotation and three translation degrees between the local sensor coordinate systems. Figure 3a shows the pinhole camera model with the camera coordinate system and the 16 measurement beams of the lidar sensor, reduced to rays starting in the origin of the local lidar coordinate system. Assumed, the alignment between camera and the vehicle is known either from static extrinsic calibration or on-line determination perhaps from a lane recognition module, a procedure for computing the inter sensor alignments is presented next. As the lidar sensor emits radiation at 905 nm and the vision sensor was taken from a night vision system, and thence is sensitive in the Near-Infrared (NIR) waveband, there is a frequency overlap allowing to visualize the lidar emissions with the imager just by putting a flat wall in front of the vehicle, if the visible waveband was cut out with filters or by dimming the light (Fig. 4). After extracting the lidar echoes with morphological image processing operations, the center of gravity from each echo is stored as a measurement of the lidar sensor models medial beam axis penetration point through the calibration wall. The procedure proceeds by changing the position and/or orientation of the wall relative to the vehicle and by recording additional sixteen feature points per iteration. By hypothesizing a certain alignment T between the sensors, the pinhole camera model projection B allows for the projection of the lidar medial axis rays into the image domain.
Video
TTL CAN
Timestamp Message
Mikrocontroller
CAN
ISR ISR ISR
CAN CAN CAN
Lidar Radar ESP
TTL RS232
1.
Gateway
PC
Data Message
2. Clock 3. Trigger
ISR ISR
GPS
Fig. 2.
Temporal sensor alignment – the central clock
(a)
(b)
r r g i (T ) = BT 0 + q BT ⋅ (cos α i , sin α i , 0 ,1) T − BT 0 (1)
Fig. 3. (a) sensor models - local vehicle coordinate system (pink), pinhole camera model (turquoise, green), lidar sensor model (blue, red) (b) right: result of fitting projected medial lidar beam axes (blue) into the measurement points (red crosses)
q ∈ R , i = 1 .. 16
[
]
To compute the alignment estimate T between the sensors, a least squares fitting between the N measured feature points and the M projected lidar model rays is defined.
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 39 PReVENT Fusion Forum e-Journal
ˆ T = arg min
T
M −1 N −1 i=0 j =0
∑ ∑ d(p
ij
, g i (T )) 2
(2)
presented cross calibration method is the fact, that the camera must be NIR sensitive, but most manufacturers introduce video cameras in a night vision context first. 3. VEHICLE DETECTION Depending on their distance, lidar echoes cover more or less area in the image domain. If the vertical and horizontal beam expansion angles are known and the lidar echoes are assumed to be elliptic beam cross sections parallel to the YZ-plane of the local lidar coordinate system, each echo can be represented by drawing the two extremal points of its surrounding ellipse in both coordinate directions. The sensor alignment T and the camera model B allow the transformation of these four points into the image domain, visualized as diamonds in figure 5. The projected lidar echoes generate Region of Interest (ROI) for an image sub-window classifier. The three dimensional subwindow search space, consisting of the horizontal and vertical box positions and the window scale, can be drastically reduced by skipping each sub-window candidate without a significant overlap with one of the echo projections. The distance measures from the lidar sensor together with a statistic of relevant object heights allow the computation of an expected object height interval in the image domain, which furthermore restricts the size of the sub-window candidates. The classifier itself is the popular Adaboost-classifier-cascade based on Haar-wavelet like features introduced by Viola and Jones [4] and modified in our research group [5]. Adjacent positive classified boxes are clustered together according to a scale invariant distance measure derived from the box positions and sizes. Each entry in the final object list is a cluster of detection boxes, with its associated ROI generating lidar echo. 4. VEHICLE TRACKING In theory, the combined state space of tracking an unknown varying number of other vehicles with position relative to the host vehicle and velocities over ground is given by the concatenation of the dynamic state of the host vehicle Xego and a Random Finite Set (RFS, [1]) containing the individual target states of the other vehicles.
Fig. 5. – spatial sensor alignment between lidar and video (left) and application of the sub-window classifier (right) The optimization w.r.t. T is non-linear due to the projection of the pinhole camera model and therefore iterative techniques are required. Although the optimization converges to a certain alignment, it can be shown, that there are infinite local lidar coordinate system alignments whose projections result in the same projected image coordinate fan. All these possible preimages have the same orientations, but their origins vary along the ray, given by the origin of the camera coordinate system and the estimated translation vector. This is why the presented scheme is only able to estimate five of the six degrees of freedom of the inter sensor alignment and the Euclidian sensor distance has to be taken from CAD data or measured somehow else. If all of the six degrees of freedom are required from the calibration method, then the calibration wall must be equipped with optical markers and the main advantage of the method is sacrificed, the ability to use any simple wall in any undefined position and orientation as a calibration object. With the help of the estimated alignment (Fig. 3b), it is possible to project the echoes of the lidar sensor into the image domain (Fig. 5). The main limitation of the
X =(Γ , X
ego
)
(3)
4.1 Ego-Motion Estimation To estimate the dynamic state of the host vehicle, we applied an Extended-Kalman-Filter (EKF) to the nonlinear measurement equations resulting from the Ackermann steering geometry model (Fig. 6).
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 40 PReVENT Fusion Forum e-Journal
The state space vector consists of the velocity magnitude of the mid point of the rear axis, its acceleration, the yaw rate and the yaw acceleration.
Therefore, the model would be able to determine the dynamic state even without the ESP yaw-rate sensor measurement ω . 4.2 Multi-Target Tracking The tracking stage implements the Probability Hypothesis Density Filter with a particle representation. The PHD D (x) of a random finite set is the first order statistical moment of the multi-target posterior p(.)
& & & X ego = ( v e , v e , ϕ e , ϕ&e ) (4)
(7)
D ( x )=
∑ n! ∫ p ({ x , y , y
n=0 1
∞
1
2
,..., y n }) d y1 ... dy n (8)
Thus, the PHD is defined in the single target state space and integrates to the expected number of targets:
E (| Γ |) =
∫
D ( x ) dx
(9)
The time forward prediction operator of the PHD filter recursion is given by:
Dk|k −1(xk | Zk −1) = Bk|k −1(xk ) + pdeath∫ fk|k −1(xk | xk −1)Dk −1|k −1(xk −1 | Zk −1)
(10) Fig. 6. – Ackermann steering geometry The prediction equation uses a discrete wiener process noise acceleration model with process noise covariance:
⎛ G σ v&2 G ′ Q ego = ⎜ ⎜ 0 ⎝ ⎞ ⎟ , G = ( Δ t ,1 ) (5) ′⎟ Gσ G ⎠ 0
2 & ϕ&
Bk|k-1(x) denotes the birth PHD, hence the intensity of target births between time k-1 and k at the state space location x and pdeath is the probability of target death. The function fk|k-1(xk|xk-1) is the ordinary single target Markov state transition probability implicitly given by the stochastic difference equation:
x k = f ( Δ t , x k −1 , c k − 1 , v k −1 ) = E ( c k − 1 , Δ t )T ( Δ t ) x k − 1 + v k −1
(11) The individual target state vector was chosen according to the free motion model in homogenous coordinates. As the application requires velocities to be estimated over ground, the prediction equation incorporates the estimate of the ego-motion EKF, predicted to the environmental measurement timestamp, as a control vector
The measurement vector is composed of the four wheel revolutions and the yaw rate obtained from the ESP system:
Z ego = ( R FL , R FR , R RL , R RR , ω ) ′ (6)
The measurement equations derive from the Ackermann steering model:
& c k −1 = ( v e , ϕ e ) ′
While T is the ordinary first-order Taylor transition matrix, E corrects the predicted target states by subtracting the host vehicle position and orientation change:
⎛ rc ⎜ ⎜ 0 & E ( ve , ϕ e , Δ t ) = ⎜ − rs ⎜ ⎜ 0 ⎜ 0 ⎝ 0 rc 0 − rs 0 rs 0 rc 0 0 0 rs 0 rc 0 tx ⎞ ⎟ 0⎟ t y ⎟, ⎟ 0⎟ 1⎟ ⎠ rs = rc = Δ xe = Δye = tx = ty = & sin( ϕ e Δ t ) & cos( ϕ e Δ t ) & veϕ e−1rs & veϕ e−1 (1 − rc ) − Δ y e rs − Δ xe rc − Δ y e rc + Δ xe r
R FL R RL
/ FR
/ RR
1 2 4 v e2 ± 4 l w v e ϕ& e2 + l w ϕ& e2 + 4 l l2 ϕ& e2 2u 2 v e ± l w ϕ& e = 2u =
ω =
180
π
ϕ& e
(7)
The advantage of the Ackermann ego-motion model, compared to the commonly used bicycle model [6], is the ability to use the differences of the four measured wheel revolutions to infer the host vehicle yaw-rate, as all four tires run at different radii in a constant rolling turn (Fig. 6).
(12)
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 41 PReVENT Fusion Forum e-Journal
The process noise is sampled from a Gaussian distribution parameterized according to the discrete white noise acceleration model. As the two environmental sensors measure with different frequencies, the system performs different update steps for each sensor. Since any clustering operations on the poor resolution lidar scan are errorprune, the system treats the lidar data as an unresolved distance scan without extracting objects. Actually, the lidar innovation is the ordinary Sequential Importance Resampling (SIR) filter update with a Gaussian measurement likelihood function with empirically determined radial and angular uncertainties. The subsequent re-sampling step preserves the total PHD weight. While the lidar sensor only provides spatial evidence – mainly in longitudinal direction – the video sensor is able to contribute precise position information in lateral direction as well as existence evidence.
p p ( z ) = p ( vehicle | f a ∧ f s ) = p ( f a | vehicle ) p ( f s | vehicle ) p ( f a | vehicle ) p ( f s | vehicle ) + p ( f a | clutter ) p ( f s | clutter )
generating lidar echo, associated to the video detection box cluster and although the feature is not as discriminating as the size feature, it can be observed that amplitudes of true positives tend to higher values, whereas false positives are quite equally distributed. The measurement likelihood function f(z|x) of the video
Fig. 7. – The vehicle detection utilizes the detection box cluster size (left) and the lidar echo amplitude feature (right).
(13) For the video update, we use a novel innovation for the PHD filter, which allows the incorporation of individual detection probabilities for each object hypothesis from the detection stage instead of approximating the detector characteristics from a Poisson false alarm model. The value ps is the detector sensitivity, thus the probability that a hypothesis is generated for a real world object at all, and can be approximated from ground truth
Dk |k ( xk | Z k ) = (1 − ps ) Dk |k −1 ( xk | Z k −1 ) + ∑
z∈Z k
sensor is a Gaussian model, whose mean and covariance are individually computed for each detection cluster from the spreading of the member boxes bottom line midpoints (Fig. 8). The matrices B and TVC2CC are the pinhole camera model projection matrix and the transformation matrix from the vehicle to the camera coordinate system in homogeneous coordinates.
p p ( z ) f ( z | xk ) Dk |k −1 ( xk | Z k −1 )
∫ f ( z | y) D
k |k −1
( y | Z k −1 )dy
data. The detection precision Pp(z) is the probability (14 of the case, that a hypothesis is a true positive and no false alarm. To determine these values, the system analyzes two features in a Bayesian framework to compute an existence probability per hypothesis. The feature fs is the size of the detection box cluster, hence the number of boxes it contains. As the subwindow classifier quantization steps for scanning the image are defined in camera coordinates and projected in image coordinates, this is a scale and distance independent feature. Figure 7/left shows the value distributions of the cluster sizes determined from ground truth data. Obviously, the cluster sizes for true positives (green curve) are fairly equally distributed within the interval of observed values, while almost all sizes of the false alarms show smaller values. The second feature fa is the attenuation corrected echo amplitude of the ROI
Fig. 8. – video measurement with projected lidar Echo (diamonds) and detection cluster (cyan)
f ( z | x ) = Ν ( BT
VC 2 CC
⋅ x , z , Σ ( z )) (15)
4.3 Target Extraction The extraction of targets and thereby a decision making is the final task within the system. To extract multi-target state estimations from the PHD surface that is given by a particle approximation, the EM algorithm [7] is a popular
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 42 PReVENT Fusion Forum e-Journal
choice. The EM algorithm approximates the PHD surface with a Gaussian mixture, to be able to supply mean and covariance of target states to the application. We modified the EM algorithm commonly used in the literature [8], to allow the mixture weights to represent the PHD mass covered by the associated kernel, instead of summing to unity. Thus the mixture weights are interpreted as existence probabilities and are therefore the natural choice to be thresholded for the final detection decision. All current measurements as well as the extracted targets of the previous time-step define the initial kernel positions μj in the state space. All kernels start with the same covariance Pj and have equal weights vj. Given a weighted particle set (xi,wi) the EM algorithm iteratively computes the following equations until no significant change in μj , Pj or vj occurs between two steps:
e ij =
wi v j Ν ( x i , μ j , Pj )
(16) IMAGES learn set 2862 2766 5628 SEQUENCES 55 15 70
∑
N i =1
M
l =1 l
v Ν ( x , μl , Pl )
i
v j = ∑ e ij
μj =
1 vj
∑e x
i =1 N i j i =1 i j
N
i
test set total
i
1 Pj = vj
∑ e (x
− μ j )( x − μ j )′
i
Fig. 9. – receiver operating characteristics curve of the vehicle detection system (left) and evaluation set sizes (right)
Fig. 10. – Algorithm result (top): a Gaussian mixture approximates position (blue crosses) and velocity (arrows) from the joint multitarget PHD particle set (pink), bottom: estimated target positions (red asterisks) and the host vehicle trace (blue line) recorded over time for four targets
www.prevent-ip.org prevent@mail.ertico.com
Research Paper
Page 43 PReVENT Fusion Forum e-Journal
Gaussian kernels, whose covariance became singular during the iterations are deleted from the mixture set. To detect targets, a threshold is applied to the remaining kernel weights . 5. RESULTS AND FURTHER WORK The detection performance of the system was measured in terms of sensitivity and precision and is shown as a Receiver Operating Curve (ROC) in figure 9. The ROC was computed by varying the decision probability threshold and the demonstration system works with an operation point at 83% sensitivity and no false alarms on the test set. Figure 10 shows the system output in a traffic scene, where four targets are simultaneously tracked. A real-time feature-level sensor fusion system incorporating spatio-temporal aligned vision and multibeam lidar measurements for robust vehicle detection has been developed and evaluated. The detection process does not rely on target motion, hence the system is applicable for traffic jam and stop-and-go applications. Further development focuses on radar-video cross calibration and the integration of radar features in the decision process to improve the system performance especially in bad weather conditions. Besides showing the systems capability for night vision ACC, we will investigate closed form solutions of the PHD-Filter with linearized process and measurement equations, as all models are Gaussian. ACKNOWLEDGEMENT The presented research was supported by NIRWARN (Near-Infrared Warning), BMBF 01M3157B, Germany. REFERENCES [1] R. Mahler. Random sets: “Unification and computation for information fusion - a retro-spective assessment”, In Proc. Intl. Conf. on Inf. Fusion, pages 1– 20, Stockholm, Sweden, 2004. [2] T.Zajic, R.Mahler. “A particle-systems implementation of the phd multitarget tracking filter”, In Proceedings of SPIE, Signal Processing, Sensor Fusion, and Target Recognition XII, pages 291–299, 2003. [3] N.Kaempchen, K.C.J.Dietmayer, "Data synchronization strategies for multi-sensor fusion", Proceedings of ITS 2003, 10th World Congress on Intelligent Transport Systems, November 2003, Madrid, Spain [4] P.Viola, M.Jones, “Robust real-time object detection”, In Second international Workshop on statistical and computational Theories of Vision-Modeling, Learning, Computing, and Sampling, Vancouver, Canada, July 13 2001.
[5] I.Kallenbach, R.Schweiger, G.Palm, O.Loehlein, “Multi-Class Object Detection in Vision Systems Using a Hierarchy of Cascaded Classifiers”, In Proc. IEEE Intelligen Vehicles Symposium, Tokyo, 2006 [6] B.Mourllion, D.Gruyer, A.Lambert, “Variance Behaviour and Signification in Probabilistic Framework Applied to Vehicle Localization”, In Proc. IEEE Intelligen Vehicles Symposium, Tokyo, 2006 [7] A.P.Dempster, N.M.Laird, D.B.Rubin. “Maximumlikelihood from incomplete data via the em algorithm.”, J. Royal Statist. Soc. Ser. B., 39:1–38, 1977. [8] O.Erdinc, P.Willett, Y.Bar-Shalom. “Probability hypothesis density filter for multitarget multisensor tracking.” In Proceedings of International Conference on Information Fusion, Philadelphia, USA, 2005.
www.prevent-ip.org prevent@mail.ertico.com
Research Paper Technical Correspondence
e-Journal - Volume 1 Page 44
SEFS — A Swedish IVSS Initiative on Sensor Data Fusion
The research initiative “SEFS – SEnsor Fusion for Safety systems” is part of the Swedish IVSS (Intelligent Vehicle Safety Systems) programme. Project Aim In current automotive safety systems, the state of the own vehicle is often well known while information about the vehicle’s surrounding is missing. This implies that it is very hard to detect potentially dangerous traffic situations. For this reason, external sensors have been introduced. Typical examples are radar and camera systems. These different sensing technologies have individual strengths and weaknesses. To fulfil the objectives of future automotive safety systems, methods for fusing sensor data are required to both reduce cost, system complexity and number of components involved and to increase accuracy and confidence of sensing. The SEFS project therefore focuses on the development of methods and algorithms for sensor data fusion to provide a consistent perception of the environment that can be utilized by a set of different safety and comfort applications. Project Partners The SEFS project is a cooperation of industrial partners at Volvo Technology AB, Volvo 3P, Volvo Car Corporation and Mecel AB as well as researchers at Chalmers University of Technology and at Linköping University. SEFS involves four Ph.D. students both at universities and industrial partners. The project started in January 2005 and will continue until December 2009. Methods The objective of the SEFS project is to determine a consistent representation of the environment of the ego vehicle based on different sensor observations. The perception of the environment includes detecting, tracking and classifying surrounding objects as well as recognizing lanes and observing the own position on the road. The data fusion task has been structured as shown in the SEFS fusion structure in Figure 1. From the sensors, observations are received mainly on an object level. Based on them, an approach with data association and state estimation is applied with a parallel classification of objects and scenarios. Recent research results on scenario classification are described in [1] and [2]. From the estimated states and classification results, a description of the environment is derived that combines all of the original sensor observations to a consistent representation. In the SEFS project, two demonstrators – a car and a truck (see Figure 2) – will be equipped with a set of sensors each. For the Volvo car, the aim is to provide a perception all-around the ego vehicle by different types of sensors with a focus on vision and radar systems. Of course, the area in front of the car is of special interest and therefore observed by both short and long-range sensors. For the Volvo truck, sensor observations of the areas in front of and at the side of the vehicle are considered. In both cases, sensors with complementary observation areas are used to achieve a good coverage. Additionally, particularly relevant areas are observed by a set of different sensors in parallel to achieve a particular reliability of the resulting environment description. Acknowledgement The work of SEFS is funded by the Swedish IVSS initiative [3]. This support is hereby gratefully acknowledged. Contact Project Manager: Dr. Malte Ahrholdt, Volvo Technology Email: malte.ahrholdt@volvo.com References [1]Lennart Svensson, Joakim Gunnarsson. A New Motion Model for Tracking of Vehicles. Proceedings of the 14th IFAC Symposium on System Identification, Newcastle, Australia, March 2006. [2] Thomas B. Schön, Andreas Eidehall, Fredrik Gustafsson. Lane Departure Detection for Improved Road Geometry Estimation. Proceedings of the IEEE Intelligent Vehicles Symposium, Tokyo, Japan, June 2006. [3] IVSS programme web site: http://www.ivss.se/
SEFS Demonstrator Vehicles and Sensor Observation Areas
Figure 1 – SEFS Fusion Structure
www.prevent-ip.org prevent@mail.ertico.com
Sensor Data Fusion News
Page 45 PReVENT Fusion Forum e-Journal
Networking session on the ITS2006 conference at Helsinki, November 2006
Networking Session on the IST 2006 conference in Helsinki (21st to 23rd of November 2006; http:// europa.eu.int/information_society/istevent/2006/ conference/index_en.htm) IST 2006 is being held as the Commission launches FP7, its Seventh Framework Programme for Research and Development, so one of the main themes of the event will be FP7's ICT objectives and procedures. The Fusion Forum is organizing a networking session on the topic of sensor data fusion in order to continue the research and development beyond ProFusion2. The networking sessions at IST 2006 provide an open forum for exchanging views and ideas on how to address these challenges. Networking Sessions provide a meeting place for communities
that possibly never worked together before. If you are interested in guiding, cooperating or following the sensor data fusion development, go on the IST 2006 web site, register yourself and express your interest for the networking session on Sensor Data Fusion (http://ec.europa.eu/ information_society/istevent/2006/cf/itemlist.cfm? type=Session). The session will take place in Helsinki, the 22nd of November 2006, 11:00am—12:30pm. We are looking forward to meeting you there.
Sensor data fusion workshop at IEEE Intelligent Vehicles Symposium, June 2007
The 2007 IEEE Intelligent Vehicles Symposium (IV'07) which is an annual forum sponsored by the IEEE Intelligent Transportation Systems Society will take place in Istanbul during June 13-15, 2007. Fusion Forum will organize a workshop on the 12th of June, one day before the official opening of the Conference. The workshop includes two sessions of 4-5 presentations each focuses on Sensor Data Fusion for automotive safety applications. All papers of the sessions pass through the peerreview procedure of IEEE and will be published in the Conference Proceedings. The General Chair of the 2 sessions is Dr. Heiko Cramer from the Technical University of Chemnitz.
The workshop will take place on June 12, 2007 in the Gümüssuyu, Taksim campus of Istanbul Technical University, within walking distance of the symposium venue. Information for the Conference can be found at: http://www.iv2007.itu.edu.tr/ You will soon receive more information for the submission procedure for the workshop.
www.prevent-ip.org prevent@mail.ertico.com
Sensor Data Fusion News
Page 46 PReVENT Fusion Forum e-Journal
1st Fusion Forum Workshop shows the path for sensor fusion deployment 8th of March 2006, Brussels
More than 70 sensor fusion experts from 50 organisations gathered in the 8th of March at the Volvo premises in Brussels for the first Fusion Forum workshop. A presentation from the ProFusion2 Consortium explaining its progress in creating a framework for data fusion was met with great interest. Such a framework will allow safety applications to combine sensory data derived from various sources such as radars, cameras, etc. This will give the applications more reliable and accurate information to work with, resulting in better and more dependable systems. The goal is to obtain a wide view in space and time of the vehicle’s
from DELPHI gave a talk on “Practical Issues in Sensor Fusion for Automotive Safety”. “Virtual sensors and situational awareness” Abstract: Future versions of advanced driver assistance systems and collision avoidance systems can benefit from the progress of sensor fusion. The goal here is to make better use of available sensors, facilitate the introduction of new sensors and provide flexible modular sensor architecture. The outputs from sensor fusion can either be virtual sensors based on indirect measurements of the physical quantity of interest, or a high-precision vehicle state estimate. The presentation will exemplify how a unified approach to target tracking, road prediction and vehicle state estimation can improve all tasks compared to truly decentralized solutions. Several examples of virtual sensors will also be given “Practical Issues in Automotive Safety” Sensor Fusion for
environment, using a limited number of low-cost sensors. ProFusion2 workshop took place under the activities of the Fusion Forum which includes organizations and experts from Europe. Through the Fusion Forum, ProFusion2 will ensure the exchange of expertise from diverse groups, such as OEMs, suppliers and research communities and complement the expertise of PReVENT partners bringing them together with sensor data fusion experts. The topics of the 1st Fusion Forum Workshop were: • Research advances on sensor data fusion in automotive safety applications • Deployment of sensor data fusion in future applications • Presentation of the ProFusion2 project and its preliminary results • Establishment and promotion of the Fusion Forum activities In the workshop, two keynote speeches were given. Prof. Fredrik Gustafsson from Linköping University gave a talk on “Virtual sensors and situational awareness”, while Dr. David Schwartz
Abstract: “Practical issues in sensor fusion for automotive safety are presented based on Delphi's experience developing adaptive cruise control, warning, and mitigation systems. Many of the classical issues in defence systems for multisensor data fusion are not significant issues in the automotive applications; however there are equally many new challenging problems. We will review our current fusion approach and discuss both what currently works where there are key needs for improvement and suggestions of possible approaches” In the workshop, two panel discussions took place in parallel. The participants of the workshop were encouraged to attend the session of their interest and contribute actively to the discussions. The panel discussions were moderated by members of the ProFusion2 Consortium; fusion experts were invited to give short talks and introduce fusion topics for discussion. The outcome of the sessions was presented by the moderators in the plenary session. Session: Research activities on Sensor Data Fusion The panel discussion on Researches on Sensor Data Fusion has begun with 3 presentations of Michel Parent (INRIA) on “Sensor Fusion on Road
www.prevent-ip.org prevent@mail.ertico.com
Sensor Data Fusion News
Page 47 PReVENT Fusion Forum e-Journal
PReVENT FUSION Forum Workshop
Vehicles: State of the Art and Future Needs”, Hermann Rohling (TUHH) on “Pedestrian Recognition based on Data Fusion Techniques” and Alexander Kirchner (VW) on “Fusion Architecture – What is the best Fusion System?”. After these 3 presentations, a discussion on the need of Researches on Sensor Data Fusion takes place. The most important points of this discussion are reported in the following sections. To design Safety and Comfort Functions for automotive applications, there is a need to perceive the vehicles surrounding. Today, highperformance sensors are available for Safety Functions on board vehicles. But for most functions, no sensor alone can fulfil the requirements of Safety functions. So, combinations of Sensors and Sensor Data Fusion are needed to achieve the level of reliability required for Safety Functions. The first part of the discussion was focused on the technical framework for robust and reliable multisensor perception designed in ProFusion1. This framework proposes an answer to different needs: it should be seen as a conceptual framework to have a common language regarding perception and Sensor Data Fusion. This framework also proposes an organisation of the different components of a perception system. Finally, this framework is a first step towards standardisation (at least on a conceptual level). Actually, this framework answers to a strong need of standardization of the different component of a perception system and the interfaces between these components. An other important issue that has been addressed during the discussion is the need to have a common physical environment model. This physical environment model constitutes a common interface between the perception system and the application. It should also be able to provide the required information about the environment for the Safety and Comfort applications. In conclusion, the panel discussion has addressed both the design and development of a technical framework for robust and reliable multi-sensor perception and the design of a common physical environment model. Olivier Aycard, GRAVIR-IMAG & INRIA, Session Chair Session: Application Requirements for Sensor Data Fusion The panel discussion on application requirements has been introduced by presentations of Reiner Wertheimer (BMW) on “Motivation and Requirements for Sensor Data Fusion”, Dirk Linzmeier (DaimlerChrysler) on “Sensor Data Fusion applications” and Uwe Kaiser-Dieckhoff (BOSCH) on “Difference in requirements for safety and convenience driver assistance functions on sensor data fusion”. In this section, a summary of the panel discussion shall be derived: The main reason why environment sensors are considered in automotive applications is the demand for a perception of the vehicles surrounding for active safety and comfort applications. Then, data fusion methods are employed to systematically combine information from different sources to a common perception. One requirement to sensor data fusion systems that has been discovered during this session is to generally allow a modular, scalable and configurable system framework. From the viewpoint of applications, different functions produce different requirements to the data fusion system. Uwe Kaiser-Dieckhoff (BOSCH) has illustrated this by comparing comfort and safety functions. Here different requirements with regard to both the observation and the processing models can be found such that the data fusion system – if used for several functionalities – has to be compatible to a set of different applications. Another perhaps even more important issue has been found to be the question of reliability. One of the main reasons for the use of data fusion architectures is the demand for high-reliability perception. On the other hand, data fusion also increases system complexity, which in turn might jeopardize the expected reliability gain. The question arises, how single sensor failures or misdetections shall be dealt with. The answer to this question cannot be given without returning to the applications’ requirements, since different applications have different expectations, e.g. on the sensor system’s false alarm rate. Reiner Wertheimer (BMW) suggested a solution for this issue by employing even low level sensor data instead of hard detection decisions by each individual sensor’s signal processing. For the modelling part, robust approaches have been requested to allow the sensor system to deal even with complex situations or misleading perceptions. Summarizing the discussion result, both a modular framework and a reliable and robust fusion approach have been the core requirements during the panel discussion. Malte Ahrholdt, Volvo Technology, Session Chair
www.prevent-ip.org prevent@mail.ertico.com
Sensor Data Fusion News
Page 48 PReVENT Fusion Forum e-Journal
PReVENT Fusion Forum Call For Papers for the 2nd e-Journal
PReVENT is an Integrated Project (IP) cofunded by the European Commission DG INFSO to contribute to road safety. ProFusion2 is the European automotive sensor data fusion research arena. It is the PReVENT project which works on sensor data fusion (SDF), developing a common SDF framework for automotive active safety applications and carrying out research on environment modeling and data fusion algorithms. OBJECTIVES ProFusion2 will publish 3 Volumes of electronic scientific journals in July 2006, June 2007 and January 2008 on “Advances on Sensor Data Fusion for Automotive Safety Applications”. The objectives of the ProFuTopics of interest for the research contributions are (not limited): data association performance evaluatarget tracking tion of SDF environnent models SDF architectures sensor data fusion sensor networks target recognition theoretical advances target classification on SDF situation awareness artificial intelligence fusion ontologies SUBMISSION PROCEDURE Regular papers as well as correspondences in the above topics are actively encouraged. Authors are invited to submit papers describing advances, applications and new concepts on sensor data fusion. Manuscripts should be submitted in MS Word format to PreventFusionForum@iccs.gr. The research papers should not exceed 10 pages (Times New Roman, one column single spaced text, 12pt); the industrial and other contributions should not exceed 2 pages. sion2 e-Journal are: • Publish unique peer-reviewed research papers on sensor data fusion for automotive environment and models. • Promote research activities and developments of ProFusion2 project and in general inform about ProFusion2 and PReVENT sensor data fusion activities. Promote the use of sensor data fusion to the automotive industry by invited or submitted contributions from the industry All volumes will be sent to the ProFusion2 Fusion Forum, i.e. the key people in automotive sensor data fusion. The e-Journal will be also available on-line at http:// www.prevent-ip.org/profusion
Topics of interest for the industrial contributions are (not limited): Deployment and market issues applications development using multisensor systems cost benefit analysis products and prototypes R&D industrial projects CLOSING DATE Closing date of all submissions for the 2nd Volume is the 1st of April 2007. Notification of acceptance is 1st of May 2007. Final manuscripts are due to 1st of June 2007. All other material (news, technical reports, correspondence, etc.) that you may want to publish should be sent to PreventFusionForum@iccs.gr 1 month prior to the publication of the e-Journal. Closing date for the 2nd Journal is the 1st of May for this contribution.
www.prevent-ip.org prevent@mail.ertico.com
PReVENT Fusion Forum 2nd Workshop
Paris 14-15 March 2007
The PReVENT/ProFusion2 Consortium is pleased to invite you to attend the 2nd Fusion Forum Workshop that will take place at DELPHI Paris Office, on the 14th and the 15th of March 2007. The first Fusion Forum workshop took place successfully in Brussels, March 2006 and gathered 80 key persons from the industry and academia. More information at http://www.prevent-ip.org/profusion (Events section). PReVENT is an Integrated Project (IP) co-funded by the European Commission DG INFSO to contribute to road safety. ProFusion2 is a sub-project of PReVENT and works on sensor data fusion (SDF), developing a common SDF framework for automotive active safety applications and carrying out research on environment modeling and data fusion algorithms for object tracking. The topics of the 2nd Fusion Forum Workshop are: • Research advances on sensor data fusion in automotive safety applications • Deployment of sensor data fusion in future applications • Presentation of the ProFusion2 project and its results • Establishment and promotion of the Fusion Forum activities The topics will be addressed through: • Keynote speeches from academia and industry worldwide on sensor data fusion advances and deployment • Presentations from ProFusion2 Consortium on sensor data fusion activities • Poster session, demonstrating results from sensor data fusion activities. The workshop will start at 10.30am on the 14th of March and it will finish at 5pm on the 15th of March. A detailed agenda will be soon announced on the ProFusion website. The workshop will host only plenary talks; One hour per day will be devoted to a poster session/demonstration session. Should you want to present your work through a poster or a demonstration, please contact us PreventFusionForum@iccs.gr.
Objectives
Workshop Location
Delphi France 64 avenue de la Plaine de France BP 65059 Tremblay en France, 95972 ROISSY CDG Cedex Please register to the workshop @ http://www.prevent-ip.org/profusion. For any additional information please send an e-mail to: PreventFusionForum@iccs.gr. We recommend that you register as soon as possible since space is limited and priority will be given to the earliest time of receipt of the registration. Closing date: 28/2/2007
Registration
Registration Fee Contact
There is no registration fee for the attendance of the workshop Contact ProFusion2 leader: Dr. Su-Birm Park DELPHI ELECTRONICS & SAFETY Tel : +49 202 291 4484 e-mail : su.birm.park@delphi.com Contact the ProFusion2 dissemination manager: Dr. Aris Polychronopoulos ICCS, GREECE Tel : +30 2107723865 e-mail : arisp@iccs.gr
www.prevent-ip.org prevent@mail.ertico.com
ProFusion
For several years research and development (R&D) activities have been keeping on developing and improving preventive safety equipment and functions. More recent advances have led to an ‘agreement’ on the utilization of common sets of sensors for distinct safety applications and to sensor data fusion for the improvement of performance and robustness of functions. As IP PReVENT addresses all preventive safety issues and develops functions that widely rely on multi-sensor approaches and sensor data fusion, it is planned to have cross-functional activities that ensure and urge use of multi-sensor systems beyond the current state-of-the-art. This is the role of ProFusion subproject, which follows a two-step procedure: during a short initial phase of 6 months, the subproject ProFusion I was in close connection to all vertical subprojects (VSPs) in PReVENT that lead activities related to sensors and sensor data fusion. At the end of this preparatory phase, recommendations for a proposal on sensors and sensor data fusion concepts were given. In a second phase, the R&D-oriented project ProFusion II is focusing on research work of common interest in the field of sensors and sensor data fusion.
Contact Details
Fusion Forum e-Journal Editorial board Chief Editor: Dr. Aris Polychronopoulos, arisp@iccs.gr Associate Editors: Dr. Angelos Amditis, a.amditis@iccs.gr Dr. Olivier Aycard, olivier.aycard@inrialpes.fr, Dr. Erich Fuchs, fuchse@forwiss.uni-passau.de, Kay Furstenberg, kf@ibeo-as.de Dr. Su-Birm Park, su.birm.park@delphi.com Dr. Ullrich Scheunert, ullrich.scheunert@etit.tu-chemnitz.de Thomas Tatschke, tatschke@forwiss.uni-passau.de
Editing by Niki Boutsikaki (ICCS)
Fusion Forum contact: PreventFusionForum@iccs.gr
www.prevent-ip.org prevent@mail.ertico.com