Archive for April 2011

Multi-Nodal, Multi-Variable, Spatio-Temporal Datasets

April 21, 2011

Multi-Nodal, Multi-Variable, Spatio-Temporal Datasets are large-scale datasets encountered in real-world data-intensive environments.

Example Dataset #1

A basic example would be the heat distribution within a chimney at a factory. Heat sensors are distributed throughout the chimney and readings are taken are periodic intervals. Since the laws of Thermodynamics within a chimney are well understood, the interaction between the monitoring devices can be modeled. Predictive analysis could, conceivably be performed on the dataset and chimney cracks could be detected, or even predicted, in real-time.

In this scenario, data points consist of 1) multiple sensors or data acquisition devices, 2) multiple spatial locations, 3) temporally separated samples. When a sensor fails, it is simply removed from the processing and kept out of the processing until the sensor is repaired (during plant maintenance).

Example Dataset #2

An example would be the interconnected river and lake levels within a single geographic area. Distinct monitoring points are located at specific geo-spatial locations; geo-spatial points with interconnected transfer functions and models. Each of the monitoring points consist of multiple data acquisitions, and each data acquisition is sampled at random (or predetermined) intervals.

As a result, data points consist of 1) multiple sensors, 2) multiple spatial locations, and 3) temporally separated samples. In this scenario, sensors may fail – or become temporarily offline in a random, unpredictable manner. Sensors must be taken out of the processing until data validity returns. Due to the interconnectedness of the sensor locations, and the interrelationships between the sensors, sufficient redundant data could be present to permit suitable analytical processing in the absence of data.

Example Dataset #3

The most complex example could be aerial chemical contamination sampling. In this scenario, the chemical distribution is continuously changing at the result of understood, but not fully predictable, weather behavior. Sampling devices would consist of 1) airborne sampling devices (balloons) providing specific, limited sample sets, 2) ground based mobile sampling units (trucks) providing extensive sample sets, and fixed based (pole mounted) sampling units whose data is downloaded in relatively long intervals (hours or days).

In this scenario, multiple, non-uniform data sampling elements are positioned in non-uniformly (and mobile) located positions, with data collection performed in fully asynchronous fashion. This data cannot be stored in flat-table structures and it must provide enough relevant information to fill-in the gaps in data.


%d bloggers like this: