Archive for the ‘Real-Time’ category

Making of a Fly

February 6, 2012

While watching a TED video about algorithms, mention was made of an unrealistic price on Amazon. Apparently two retailers had an out-of-control computer feedback loop.

One company, with lots of good customer points, is in the habit of selling products a little higher than the competition. Anyone’s guess why, but facts are facts – they routinely price merchandise about 25% higher than the competition (and rely on the customer experience points to pull customers away?).

Well, the competition routinely prices merchandise a little lower than the highest priced competitor: about 1% less.

So these computer programs began a game of one-upmanship. A $10.00 product was listed for $12.70 by the first company. Later in the day, the second company’s computer listed the same product for 1% less – $12.57. So the process repeated:  $15.96 and $15.80. Then $20.07 and $1987. The process continued until the book was listed for $23,698,655.93, plus shipping. (all numbers illustrative)

This story illustrates one of the challenges to automated feedback loops. An engineering instructor once explained it – if the gain feedback is a positive value greater than 1, the feedback will either oscillate, or latch-up.

More on feedback controls for real systems another day.

Read more here:


The Value of Real-Time Data, Part 2

September 1, 2011

Previously, predictive analytics was summarized as “system anticipates” ( But that left a lot unsaid. Predictive analytics is a combination of statistical analysis, behaviour clustering, and system modeling. No one piece of predictive analytics can exist in a vacuum; the real-time system must be statistically analyzed, its behaviour grouped or clustered, and finally a system modeled that can use real-time data to anticipate the future – near term and longer.

Examples of predictive analytics in everyday life include credit scores, hurricane forecasts, etc. In each case, past events are analyzed, clustered, and then predicted.

The result of predictive analytics is, therefore, a decision tool. And the decision tree will, to some degree, take into account a predictive analysis.

The output of Predictive Analytics will be descriptive or analytic – subjective or objective. Both outputs are reasonable and viable. Looking at the hurricane predictions, there are analytical computer models (including the so-called spaghetti models) that seek to propose a definitive resulting behaviour; then there are descriptive models that seek to produce a visualization and comprehension of the discrete calculations. By extension, one can generalize that descriptive predictions must be the result of multiple analytic predictions. Perhaps this is true.

Returning to the idea that predictive analytics is comprised of statistical analysis, clustering analysis, and finally system modelling, we see that a sub-field of analytics could be considered: reactive analytics. Reactive analytics seeks to understand the statistical analysis, and even the clustering analysis, with an eye to adapt processes and procedures – but not in real-time. Reactive Analytics is, therefore, the Understanding portion of the Data-Information hierarchy ( Predictive Analytics is, therefore, the Wisdom portion of the Data-Information hierarchy.

The Physics of Measurements

June 13, 2011

An NPR report this morning addressed the new rotary-action mechanical heart. Instead of pulsing, it pumps like a conventional pump. (KUHF News Article) As interesting as this mechanical marvel is, an obscure quote discussing the process of measurement deserves some recognition.

Dr. Billy Cohn at the Texas Heart Institute says “If you listened to [the cow’s] chest with a stethoscope, you wouldn’t hear a heartbeat. If you examined her arteries, there’s no pulse. If you hooked her up to an EKG, she’d be flat-lined. By every metric we have to analyze patients, she’s not living. But here you can see she’s a vigorous, happy, playful calf licking my hand.”

The point to be made here is that neither the process of measurement, nor the device performing the measurement, is the process or event being measured; they are intrusive and will affect the reading. In fact, the measurement will always impact, in some way, the activity or event being measured. For example, electronic voltage measurement must withdraw a small stream of electrons to perform that measurement. This small stream represents power – measurable, demonstrable power that can, and does, modify the electronics being measured. Likewise in health, blood pressure cuffs, in a very real way, will alter the blood flow and resultant blood pressure reading. In fact, users of blood pressure cuffs are told that, after two failures at getting a clear reading, one should stop trying because the results will be skewed.

Generally measurements are performed as true real-time and as post real-time. Electrical voltage in the previous example is performed in real-time. But the speed of a vehicle is actually performed after-the-fact by measuring either the distance travelled over a specific interval or measuring the number of related events (magnetic actuator readings from a rotating wheel). Similarly, blood pressure may be an instantaneous measurement, but blood pulse rate is actually the number of pulses detected over a period of time (ie 15 seconds) or the time between pulses (a reciprocal measurement).

Having said that, most physical world measurements, including voltage, blood pressure, vehicle speed, etc.) are actually filtered and processed so that random events or mis-measurements are effectively removed from the resulting display. For example, a doctor may read 14 beats during a 15 second period leading him to declare a pulse rate of 14×4 or 56 (beats per minute). But what if he barely missed a pulse before starting the 15 second window, and barely missed the next pulse at the end? Perhaps the correct reading should be 59! Further complicate this error by mis-counting the number of pulses or be misreading the second-hand of the watch.

Typically normal measurement inaccuracy is compensated through various corrections, filters, gap fillers, and spurious removal. In this heart rate example, the actual reading can be improved by

  • removing outliers (measured events that do not cluster) typically the ‘top 5%’ and ‘bottom 5%’ of events are removed as extraneous
  • filtering the results (more complex than simple averages, but with the same goal in mind)
  • gap-filling (inserting an imagined pulse that was not sensed while a patient fidgets)
  • spurious removal (ignoring an unexpected beat)
  • increasing the size of the sample set

Discounting the fact that an unexpected beat should not be ignored by the doctor’s staff, the above illustrates how measurements are routinely processed by post measurement processes.

Finally, adjustments must be made to compensate for the impact of the measurement action, or the environment. In the case of the voltage measurement, the engineer may have to mathematically adjust the actual reading to reflect a true, un metered, performance. The experimental container may also require compensation. Youth swimming does this all of the time – some swim meets occur in pools that are 25 yards long while others occur in pools that are 25 meters long. Although this does not change the outcome of any individual race, it does affect the coach’s tracking of contestant improvement and the league’s tracking of record setting – both seasonal and over time.

So the cow in the NPR article may be clinically dead while grazing in the meadow, neither are in dispute – the cow is alive, and the cow is clinically dead. The fault here lies in the measurement. And once again, the valid and reliable scientific principles and techniques are suddenly no longer either valid or reliable.

Real-Time Data in an Operations/Process Environment

May 16, 2011

The operations/process environment differs from the administrative and financial environments in that operations is charged with getting the job done. As such, the requirements placed on computers, information systems, instrumentation, controls, and data is different too. Data is never ‘in balance’, data always carries uncertainty, and the process cannot stop. Operations personally have learned to perform their job while waiting for systems to come online, waiting for systems to upgrade, or even waiting for systems to be invented.

Once online, systems must be up 100% of the time, but aren’t. Systems must process data from a myriad of sources, but those sources are frequently intermit or sporadic. Thus the processing, utilization, storage, and analysis of real-time data is a challenge totally unlike the systems seen in administrations or financial operations.

Real time systems must address distinct channels of data flow – from the immediate to the analysis of terabytes of archived data.

Control and Supervision: Real-time data is used to provide direct HMI (human-machine-interface) and permit the human computer to monitor / control the operations from his console. The control and supervision phase of real-time data does not, as part of its function, record the data. (However, certain data logs may be created for legal or application development purposes.) Machine control and control feedback loops require, as a minimum, real-time data of sufficient quality to provide steady operational control.

Forensic Analysis and Lessons Learned: Captured data (and, to a lesser extent, data and event logs) are utilized to investigate specific performance metrics and operations issues. Generally, this data is kept in some form for posterity, but it may be filtered, processed, or purged. Nevertheless, the forensic utilization does represent post-operational analytics. Forensic analysis is also critical to prepare an operator for an upcoming similar process – similar in function, geography, or sequence.

Data Mining: Data mining is used to research previous operational events to locate trends, areas for improvement, and prepare for upcoming operations. Data mining is used identify a bottleneck or problem area as well as correlate events that are less than obvious.

Proactive / Predictive Analytics: The utilization of data streams, both present and previous, in an effort to predict the immediate (or distant) future requires historical data, data mining, and the application of learned correlations. Data mining may provide correlated events and properties, but the predictive analytics will provide the conversion of the correlations into positive, immediate performance and operational changes. (This utilization is not, explicitly AI, artificial intelligence, but the two are closely related)

The data-information-knowledge-understanding-wisdom paradigm: Within the data—>wisdom paradigm, real-time data is just that – data. The entire tree breaks out as:

  • data – raw, untempered data from the operations environment (elemental data filtering and data quality checks are, nevertheless, required).
  • information – presentation of the data in human comprehensible formats – the control and supervision phase of real-time data.
  • knowledge – forensic analytics, data mining, and correlation analysis
  • understanding – proactive and forward-looking changes in behavior characteristic of the proactive / predictive analytics phase.
  • wisdom – the wisdom phase remains the domain of the human computer.

Related Posts:

Data Mining and Data, Information, Understanding, Knowledge

The Digital Oilfield, Part 1

The Data-Information Hierarchy

Multi-Nodal, Multi-Variable, Spatio-Temporal Datasets

April 21, 2011

Multi-Nodal, Multi-Variable, Spatio-Temporal Datasets are large-scale datasets encountered in real-world data-intensive environments.

Example Dataset #1

A basic example would be the heat distribution within a chimney at a factory. Heat sensors are distributed throughout the chimney and readings are taken are periodic intervals. Since the laws of Thermodynamics within a chimney are well understood, the interaction between the monitoring devices can be modeled. Predictive analysis could, conceivably be performed on the dataset and chimney cracks could be detected, or even predicted, in real-time.

In this scenario, data points consist of 1) multiple sensors or data acquisition devices, 2) multiple spatial locations, 3) temporally separated samples. When a sensor fails, it is simply removed from the processing and kept out of the processing until the sensor is repaired (during plant maintenance).

Example Dataset #2

An example would be the interconnected river and lake levels within a single geographic area. Distinct monitoring points are located at specific geo-spatial locations; geo-spatial points with interconnected transfer functions and models. Each of the monitoring points consist of multiple data acquisitions, and each data acquisition is sampled at random (or predetermined) intervals.

As a result, data points consist of 1) multiple sensors, 2) multiple spatial locations, and 3) temporally separated samples. In this scenario, sensors may fail – or become temporarily offline in a random, unpredictable manner. Sensors must be taken out of the processing until data validity returns. Due to the interconnectedness of the sensor locations, and the interrelationships between the sensors, sufficient redundant data could be present to permit suitable analytical processing in the absence of data.

Example Dataset #3

The most complex example could be aerial chemical contamination sampling. In this scenario, the chemical distribution is continuously changing at the result of understood, but not fully predictable, weather behavior. Sampling devices would consist of 1) airborne sampling devices (balloons) providing specific, limited sample sets, 2) ground based mobile sampling units (trucks) providing extensive sample sets, and fixed based (pole mounted) sampling units whose data is downloaded in relatively long intervals (hours or days).

In this scenario, multiple, non-uniform data sampling elements are positioned in non-uniformly (and mobile) located positions, with data collection performed in fully asynchronous fashion. This data cannot be stored in flat-table structures and it must provide enough relevant information to fill-in the gaps in data.

The Digital Oilfield, Part 2

January 30, 2011

(originally posted on blogspot January 18, 2010)

“The improved operational performance promised by a seamless digital oil field is alluring, and the tasks required to arrive at a realistic implementation are more specialized than might be expected.” (

Seamless integrated operations requires a systematic view of the entire exploration process. But the drilling operation may be the largest generator of diverse operational and performance data and may produce more downstream data information than any other process. Additionally, the drilling process is one of the most legally exposing processes performed in energy production – BP’s recent Gulf disaster is an excellent example.

The seamless, integrated, digital oilfield is data-centric. Data is at the start for the process, and data is at the end of the process. But data is not the objective. In fact, data is an impediment to information and knowledge. But data is the base of the information and knowledge tree – data begets information, information begets knowledge, knowledge begets wisdom. Bringing data up to the next level (information) or the subsequent level (knowledge) requires a systematic and root knowledge of the data available withing an organization, the data which should be available within an organization, and the meaning of that data.

Data mining is the overarching term used in many circles to define the process of developing information and knowledge. In particular, data mining is taking the data to the level of knowledge. Converting data to information is often no more complex that producing a pie chart or an x-y scatter chart. But that information requires extensive operational experience to analyze and understand. Data mining takes data into knowledge tier. Data mining will extract the tendencies of operational metrics to fortell an outcome.

Fortunately, there are several bright and shinning examples of entrepreneurs developing the data-to-knowledge conversion. One bright and promising star is Verdande’s DrillEdge product ( Although this blog does not support or advocate this technology as a matter of policy, this technology does illustrate an example of forward thinking and systematic data-to-knowledge development.

A second example is PetroLink’s modular data acquisition and processing model ( This product utilizes modular vendor-agnostic data accumulation tools (in particular interfacing to Pason and MD-TOTCO), modular data repositories, modular equation processing, and modular displays. All of this is accomplished through the WITSML standards (

Future blogs will consider the movement of data, latency, reliability, and synchronization.

The Digital Oilfield, Part 1

January 30, 2011

(originally posted on blogspot January 17, 2010)

The oil business (or bidniz as the old-hands call it) has evolved in drilling, containment, control, and distribution. But the top-level system view has gone largely ignored. Certainly there are pockets of progress. And certainly there are several quality companies producing centralized data solutions. But even these solutions focus on the acquisition of the data while ignoring the reason for the data.

“Simply put Digital Energy or Digital Oilfields are about focusing information technology on the objectives of the petroleum business.” (, January 17, 2011)

Steve Hinchman, Marathon’s Senior VP of World Wide Production, in a speech to 2006 Digital Oil Conference says “Quality, timely information leads to better decisions and productivity gains.” and “Better decisions lead to better results, greater credibility, more opportunities, greater shareholder value.”

“Petroleum information technology (IT), digitized real-time downhole data and computer–aided practices are exploding, giving new impetus to the industry. The frustrations and hesitancy common in the 1990s are giving way to practical solutions and more widespread use by the oil industry. Better, cheaper and more secure data transmission through the Internet is one reason why.” (The Digital Oilfield, Oil and Gas Investor, 2004)

Future Digital Oilfield development will include efforts to integrate drilling data into its engineering and decision making. This integration consists of:

  1. Developing and integrating the acquisistion of data from all phases of the drilling operation. The currently dis-joint data will be brought together (historical and future) into a master data store architecture consisting of a Professional Petroleum Data Model (, various legacy commercial systems, and various internal custom data stores.
  2. Developing a systematic real-time data approach including data processing, analysis, proactive actioning, and integrated presentations. Such proactive, real-time processing includes collision avoidance, pay-zone tracking and analysis, and rig performance. Included is a new technology we a pushing for analytical analysis and recommendations for the best rig configuration and performance.
  3. Developing a systematic post-drill data analysis and centralized data recall for field analysis, offset well comparison, and new well engineering decisions. Central to this effort will include data analysis, data mining, and systematic data-centric decision making.

Watch the Digital Oilfield over the next few months as the requirements for better control, better prediction, and better decision making take a more significant center-stage.

%d bloggers like this: