Data Hub

Data from each of the field sources are received by the ICM Data Hub, represented in red in the figure below.

The primary functions of the Data Hub are:

  • Receive data from field elements via existing corridor traffic management centers and regional data networks: Various data receivers receive data and prepare it for processing by the ICM system. These receivers are generally expected to be built for the specific interfaces defined by each field data source, both in transmission method (REST or SOAP web services, socket-based streaming, file-based, or others) and the intended initial path required for system data processing. The preferred method of information transfer for the system is the Traffic Management Data Dictionary (TMDD) standard developed by the Institute for Traffic Engineers (ITE), currently at version 3.03d (with certain modifications).
  • Process data received from field elements: Data from field elements must be validated for completeness and data quality prior to use by downstream system components. With such a variety of data sources, often for the same type of field elements, data must be transformed into a common set of data semantics. The Data Hub processes all incoming data into a standardized format, TMDD for transportation assets, GTFS for transit information, or others depending upon the data being received, with a common set of data definitions as well (such as a single naming standard for all streets within the corridor). However, it is critical to note that this transformation into a standardized format for processing within the data hub, while it maintains the data structures within TMDD, does not generally maintain an XML data format and internal communications within the data hub (and the DSS) do not use SOAP protocols.
  • Data messaging and communications: Data provided to downstream systems, as well as internal system command, control, and status data—indeed, all data within the ICM system—is made available via an internal data bus. The specific message technology is based on the type, size, frequency, and message persistence requirements of the data, data producer, and data consumers. Exceptions to these methods within the data hub are limited, and are generally used for large block bulk data archiving/data transport between persistence stores. In order to accommodate multiple CMS vendors, and to provide communication between the data hub's Kafka messaging system and the DSS ActiveMQ messaging system, an Apache Camel based interface between the data hub and the DSS and CMS systems is included within the data hub.
  • Data persistence: The Data Hub also provides data persistence capabilities, allowing for persistence of raw and processed data, system command/control/status, and other system information in a central repository. This data persistence is broken into different time layers, including live real-time data, young data (0-90 days), aggregated data, and archived data. Since data persistence within the data hub is limited to operational uses, and the architecture is based on a pattern of core services connected by messaging, multiple data persistence technologies are utilized specific to the needs of the system and the data being stored.

The figure below shows additional design details for the Data Hub: