Data Fusion

Learn More

Within a plausible scenario for the sensorization related to Smart Mobility in a Smart City, coexist different types of technologies for traffic data gathering and data sources. The processing of such data will provide more or less Smart information as a function of the sophistication and computational effort put into play. Roughly the technologies that may coexist in this scenario could be:

  • Magnetic loops which, generally, measure Flow, Occupancy, Speed and vehicle classification
  • Alternative technologies that provide measurements similar to those of magnetic loops such as Magnetometers, Radars and CCTV with image processing
  • Technologies for the identification of vehicles or electronic footprint such as cameras with license plate recognition, BlueTooth/WiFi sensors and sensors for the lecture of electronic IDs (TAGs). The function of these sensors is the detection of the travel time and speed between consecutive sensors.
  • Automatic Vehicle Location, GPS+3G or similar
  • Cooperative systems: They represent future technological possibilities on the mid-long term and they can take shape in two generic families: V2V and V2I

The situation de facto that all these types of scenarios present may be described in the following way:

  • Different types of sensors that measure the same traffic variables with different degrees of precision and/or time aggregation
  • Different types of sensors that measure different traffic variables with different precision and/or time aggregation

Therefore, sets of heterogeneous measures to which may be added complementary data such as meteorological data, events or seasonal variations.

The immediate questions are: What to do with those sets of heterogeneous data? How to extract the most, best and more efficient traffic information from that data?

Betterways proposal relies on two components:

  • INTEGRATION: creation of profiles that correspond to homogeneous typologies as a function of events, external to the traffic system, which may have configured them like, for instance day-type, time slice or meteorological conditions. The number and variety of profiles depends, obviously, on the amount of historical data and gathered typologies
  • MULTISENSOR DATA FUSION. Perhaps one of the definition that better translates the concept is: “A technique by which data from several sensors are combined through a centralized data processor to provide comprehensive and accurate information” or, as synthetically states professor Van Lint from Delft University, “The science of extracting valuable information from raw traffic data”

The first steps of the process correspond to the generation of the integrated database, which combines data coming from different technologies with that of the different types of events for the generation of profiles.

There is a global consensus among experts with respect to the fundamental outcome of the Data Fusion techniques, which Dailey (1996) and Ou (2011) summarize in the following way:

  • Reliability increase: more than a single sensor may confirm the objective
  • Ambiguity reduction: the joint information coming from multiple sensors reduces the set of hypothesis about the objective
  • Detection improvement: the integration of multiple measurements of the same objective provides an improved ratio of signal/noise thus increasing the security about detection
  • Robustness increase: a sensor may provide information where others are not present, are not operative or are not efficient
  • Improvement of the spatial and temporal coverage: one sensor works where or when another one cannot
  • Providing a unique data sequence coming from several input series

There is also global consensus in classifying the Data Fusion techniques in three different levels. Their generic characteristics are described in Dailey (1996) as:

Fusion Level General Method Specific Technology
Level 1 Data association
Positional estimation
Gating Techniques
Kalman Filter
Level 2 Fusion through ID
Pattern Recognition
Bayesian theory of Decision, Evidential Reasoning of Dempster-Schafer, Adaptive Neuronal Networks
Clustering methods
Level 3 Artificial Intelligence Systems based on knowledge, Blackboard Architecture, Fuzzy Logic

Varshney (1997) proposes a more elaborated version which suits better the Data Fusion applications for the case of traffic data.

Fusion Level Purpose Method
Level 1 Raw data processing Methods for the estimation of the state, Digital filters, Kalman filter, Particles filters, etc
Level 2 Derivation of the distinctive characteristics and behavioural patterns Classification/inference methods (Statistical recognition of patterns, Evidential Reasoning of Demster-Chaffer, Bayesian methods, Neuronal networks, correlation measurements, Fuzzy Sets theory, ...)
Level 3 Decision making and event/incident detection Decision Support systems, knowledge based systems

Level 1 fusion is oriented towards processing the raw data coming from sensors and the estimation of the basic states of the system under analysis. The objective of data fusion at this level is to translate these data into basic information for determining the system’s traffic state: intensities, occupancies, speeds. The objective of Level 2 fusion is to derivate distinctive characteristics and behavioral patterns from the estimation of the previous level. In case of vehicular traffic systems this means the sort-term prediction of the evolution of the traffic state, the detection of incidents, etc. Level 3 fusion may be considered like a level for decision making about the system based on the information provided by Level 1 and Level 2 fusion.

Ou (2011) proposes an approach that allows characterizing the methods for traffic data fusion as if they were composed by two main components: a core and a capsule.

The CORE represents the physical laws and the hypothesis that support the traffic flow theory like, for instance:

  • Physical laws: law of the conservation of the number of vehicles
  • Hypothesis: traffic flow is homogeneous in a given region of space-time
  • Law and hypothesis: the ones that support the first-order traffic flow models (continuity equations), the macro fundamental diagram, etc.
  • Linear models
  • Hypothesis of Gaussian distributions of the model’s variables and of the measurements’ errors

The SHELL represents the assimilation techniques like, for instance, the statistical and Kalman Filter techniques able to combine models and data in an optimum way.

Betterways has followed the approaches from Varshney (1997) and Ou (2011) in order to implement a data processing method with three fusion levels.

  • Level 1 has the purpose of processing the raw data provided by each sensor in order to obtain a set of valid data without outlier detections, complete, without missing data as a consequence of sensor errors or the filtering of outlier detections. In case of missing data, the necessary data to fill the gaps is taken from a sequence of predicted data consistent with the measured sequence for the time interval being processed. An estimation and a solid prediction of the traffic state can then be obtained from that set of valid data.
  • The estimation of the traffic state goes into the Level 2 fusion procedures for the update of the historical profiles with the new obtained data. This keeps on improving the historical profiles as they are permanently updated.
  • Finally Level 3 fusion allows decision making through the use of dynamic traffic models.

Level 1 implies the previous execution of two processes:

  • Filtering of raw data: independently of how refined and precise is a technology, there will always be outliers that have to be filtered, otherwise the available data series would include biased observations, with distorted deviations due to the presence of outliers.
  • Missing Data Supply: the elimination of outlier measurements generates gaps in the observations’ time series that have to be filled in order to count with complete series. This is the purpose of missing data models.

A complementary application of the methods implemented by Betterways is the space-time traffic state reconstruction through interpolation between data points and the removal of high-frequency noise, preserving most of the relevant dynamic information. The application of this method for the fusion of heterogeneous data sources for space-time reconstruction makes it more robust and allows obtaining estimations at spots for which a single data source would not be enough.

Figure: (a) Spots with data from loops within space-time (b) Average speeds from the trips between Bluetooth sensors (c) Space-time traffic state reconstruction from loop data (d) Space-time traffic state reconstruction from Bluetooth data (e) Space-time traffic state reconstruction for a section of the network coming from all data sources (Data from loops and trips between three Bluetooth sensors at Gran Vía avenue on the 03/02/2012)

Last, the Macro Fundamental Diagram (MFD) estimation from Geroliminis and Daganzo (2008) is also included: a new global information that can be read as a measure of the global capacity of the network and its state in a given moment can be generated from local data provided by RSU deployed all over the transport network plus GPS data coming from vehicles.