Posted tagged ‘Data’

What is Content?

September 8, 2011

Several internet articles and blogs address the meaning of content from an internet perspective. From this perspective, content is the (meaningful) stuff on a page, the presentation of information to the seeker.

But content within an operations-centric perspective is entirely different. And the databases and operational tools must be content data reflecting the desired information being sought in the pursuit of knowledge. Thus, paraphrasing Scottie Claiborne (, “content is the stuff in your operations system;  good content is useful information”.

Therefore, content is the meaningful data and the presentation of this data as information.

Content can, and should be, redundant. Not redundant from a back-up perspective; redundant from an information theory perspective – data that is inter-related and inter-correlated. (Data that is directly calculated need not be stored, however, the method of calculation may change and therefore the original calculation may prove useful.) Data that is inter-correlated may be thought of in terms of weather: wind speed, temperature, pressure, humidity, etc. are individual, measurable values but the inter-relate and perfectly valid inferences may be made in the absence of one or more of these datums. When the historical (temporal) and adjacent (geospatially) datums are brought into the content, then, according to information theory, more and more redundancy exists within the dataset.

Having identified the basis of content, the operations system designer should perform content analysis. Content analysis is both qualitative and quantitative. But careful attention to systems design and systems management will permit increased quantification of the results. What is content analysis in its most base form: the designer asking the questions “What is the purpose of the data? What outcomes are expected from the data? How will the data be imparted to produce the desired behavior?”

So how do we quantify the importance of specific data / content? How do we choose which data / content to retain? This question is so difficult to answer, the normal response is to save everything, forever. And since data not retained is data lost, and lost forever, this approach seems reasonable in a world of diminishing data storage costs. But, then, the cost and complexity of information retrieval becomes more difficult.

The concept and complexity of data retrieval is left for another day…


The Data-Information Hierarcy, Part 3

August 31, 2011

The Data-Information Hierarchy is frequently represented as
Data –> Information –> Knowledge –> Understanding –> Wisdom.

Or it is sometimes shortened to 4 steps, omitting Understanding. But, in fact, there are two predecessor steps: chaos and symbol. These concepts have been discussed in prior blogs (

Chaos is that state of lack of understanding best compared to the baby first perceiving the world around him. There is no comprehension of quantities or values, but a perception of large and small.

Symbol (or symbolic representation) represents the first stages of quantification. As such, symbolic representation and quantification concepts from the predecessor to Data.

So the expanded Data-Information hierarchy is represented in the seven steps:

Chaos –>
          Symbol –>
                    Data –>
                              Information –>
                                        Knowledge –>
                                                  Understanding –>

Continuing with this Data-Hierarchy paradigm, we can represent the five primary steps with the simple explanation:

  • Data and Information : ‘Know What’
  • Knowledge : ‘Know How’
  • Understanding : ‘Know Why’
  • Wisdom : ‘Use It’

The Physics of Measurements

June 13, 2011

An NPR report this morning addressed the new rotary-action mechanical heart. Instead of pulsing, it pumps like a conventional pump. (KUHF News Article) As interesting as this mechanical marvel is, an obscure quote discussing the process of measurement deserves some recognition.

Dr. Billy Cohn at the Texas Heart Institute says “If you listened to [the cow’s] chest with a stethoscope, you wouldn’t hear a heartbeat. If you examined her arteries, there’s no pulse. If you hooked her up to an EKG, she’d be flat-lined. By every metric we have to analyze patients, she’s not living. But here you can see she’s a vigorous, happy, playful calf licking my hand.”

The point to be made here is that neither the process of measurement, nor the device performing the measurement, is the process or event being measured; they are intrusive and will affect the reading. In fact, the measurement will always impact, in some way, the activity or event being measured. For example, electronic voltage measurement must withdraw a small stream of electrons to perform that measurement. This small stream represents power – measurable, demonstrable power that can, and does, modify the electronics being measured. Likewise in health, blood pressure cuffs, in a very real way, will alter the blood flow and resultant blood pressure reading. In fact, users of blood pressure cuffs are told that, after two failures at getting a clear reading, one should stop trying because the results will be skewed.

Generally measurements are performed as true real-time and as post real-time. Electrical voltage in the previous example is performed in real-time. But the speed of a vehicle is actually performed after-the-fact by measuring either the distance travelled over a specific interval or measuring the number of related events (magnetic actuator readings from a rotating wheel). Similarly, blood pressure may be an instantaneous measurement, but blood pulse rate is actually the number of pulses detected over a period of time (ie 15 seconds) or the time between pulses (a reciprocal measurement).

Having said that, most physical world measurements, including voltage, blood pressure, vehicle speed, etc.) are actually filtered and processed so that random events or mis-measurements are effectively removed from the resulting display. For example, a doctor may read 14 beats during a 15 second period leading him to declare a pulse rate of 14×4 or 56 (beats per minute). But what if he barely missed a pulse before starting the 15 second window, and barely missed the next pulse at the end? Perhaps the correct reading should be 59! Further complicate this error by mis-counting the number of pulses or be misreading the second-hand of the watch.

Typically normal measurement inaccuracy is compensated through various corrections, filters, gap fillers, and spurious removal. In this heart rate example, the actual reading can be improved by

  • removing outliers (measured events that do not cluster) typically the ‘top 5%’ and ‘bottom 5%’ of events are removed as extraneous
  • filtering the results (more complex than simple averages, but with the same goal in mind)
  • gap-filling (inserting an imagined pulse that was not sensed while a patient fidgets)
  • spurious removal (ignoring an unexpected beat)
  • increasing the size of the sample set

Discounting the fact that an unexpected beat should not be ignored by the doctor’s staff, the above illustrates how measurements are routinely processed by post measurement processes.

Finally, adjustments must be made to compensate for the impact of the measurement action, or the environment. In the case of the voltage measurement, the engineer may have to mathematically adjust the actual reading to reflect a true, un metered, performance. The experimental container may also require compensation. Youth swimming does this all of the time – some swim meets occur in pools that are 25 yards long while others occur in pools that are 25 meters long. Although this does not change the outcome of any individual race, it does affect the coach’s tracking of contestant improvement and the league’s tracking of record setting – both seasonal and over time.

So the cow in the NPR article may be clinically dead while grazing in the meadow, neither are in dispute – the cow is alive, and the cow is clinically dead. The fault here lies in the measurement. And once again, the valid and reliable scientific principles and techniques are suddenly no longer either valid or reliable.

Real-Time Data in an Operations/Process Environment

May 16, 2011

The operations/process environment differs from the administrative and financial environments in that operations is charged with getting the job done. As such, the requirements placed on computers, information systems, instrumentation, controls, and data is different too. Data is never ‘in balance’, data always carries uncertainty, and the process cannot stop. Operations personally have learned to perform their job while waiting for systems to come online, waiting for systems to upgrade, or even waiting for systems to be invented.

Once online, systems must be up 100% of the time, but aren’t. Systems must process data from a myriad of sources, but those sources are frequently intermit or sporadic. Thus the processing, utilization, storage, and analysis of real-time data is a challenge totally unlike the systems seen in administrations or financial operations.

Real time systems must address distinct channels of data flow – from the immediate to the analysis of terabytes of archived data.

Control and Supervision: Real-time data is used to provide direct HMI (human-machine-interface) and permit the human computer to monitor / control the operations from his console. The control and supervision phase of real-time data does not, as part of its function, record the data. (However, certain data logs may be created for legal or application development purposes.) Machine control and control feedback loops require, as a minimum, real-time data of sufficient quality to provide steady operational control.

Forensic Analysis and Lessons Learned: Captured data (and, to a lesser extent, data and event logs) are utilized to investigate specific performance metrics and operations issues. Generally, this data is kept in some form for posterity, but it may be filtered, processed, or purged. Nevertheless, the forensic utilization does represent post-operational analytics. Forensic analysis is also critical to prepare an operator for an upcoming similar process – similar in function, geography, or sequence.

Data Mining: Data mining is used to research previous operational events to locate trends, areas for improvement, and prepare for upcoming operations. Data mining is used identify a bottleneck or problem area as well as correlate events that are less than obvious.

Proactive / Predictive Analytics: The utilization of data streams, both present and previous, in an effort to predict the immediate (or distant) future requires historical data, data mining, and the application of learned correlations. Data mining may provide correlated events and properties, but the predictive analytics will provide the conversion of the correlations into positive, immediate performance and operational changes. (This utilization is not, explicitly AI, artificial intelligence, but the two are closely related)

The data-information-knowledge-understanding-wisdom paradigm: Within the data—>wisdom paradigm, real-time data is just that – data. The entire tree breaks out as:

  • data – raw, untempered data from the operations environment (elemental data filtering and data quality checks are, nevertheless, required).
  • information – presentation of the data in human comprehensible formats – the control and supervision phase of real-time data.
  • knowledge – forensic analytics, data mining, and correlation analysis
  • understanding – proactive and forward-looking changes in behavior characteristic of the proactive / predictive analytics phase.
  • wisdom – the wisdom phase remains the domain of the human computer.

Related Posts:

Data Mining and Data, Information, Understanding, Knowledge

The Digital Oilfield, Part 1

The Data-Information Hierarchy

Multi-Nodal, Multi-Variable, Spatio-Temporal Datasets

April 21, 2011

Multi-Nodal, Multi-Variable, Spatio-Temporal Datasets are large-scale datasets encountered in real-world data-intensive environments.

Example Dataset #1

A basic example would be the heat distribution within a chimney at a factory. Heat sensors are distributed throughout the chimney and readings are taken are periodic intervals. Since the laws of Thermodynamics within a chimney are well understood, the interaction between the monitoring devices can be modeled. Predictive analysis could, conceivably be performed on the dataset and chimney cracks could be detected, or even predicted, in real-time.

In this scenario, data points consist of 1) multiple sensors or data acquisition devices, 2) multiple spatial locations, 3) temporally separated samples. When a sensor fails, it is simply removed from the processing and kept out of the processing until the sensor is repaired (during plant maintenance).

Example Dataset #2

An example would be the interconnected river and lake levels within a single geographic area. Distinct monitoring points are located at specific geo-spatial locations; geo-spatial points with interconnected transfer functions and models. Each of the monitoring points consist of multiple data acquisitions, and each data acquisition is sampled at random (or predetermined) intervals.

As a result, data points consist of 1) multiple sensors, 2) multiple spatial locations, and 3) temporally separated samples. In this scenario, sensors may fail – or become temporarily offline in a random, unpredictable manner. Sensors must be taken out of the processing until data validity returns. Due to the interconnectedness of the sensor locations, and the interrelationships between the sensors, sufficient redundant data could be present to permit suitable analytical processing in the absence of data.

Example Dataset #3

The most complex example could be aerial chemical contamination sampling. In this scenario, the chemical distribution is continuously changing at the result of understood, but not fully predictable, weather behavior. Sampling devices would consist of 1) airborne sampling devices (balloons) providing specific, limited sample sets, 2) ground based mobile sampling units (trucks) providing extensive sample sets, and fixed based (pole mounted) sampling units whose data is downloaded in relatively long intervals (hours or days).

In this scenario, multiple, non-uniform data sampling elements are positioned in non-uniformly (and mobile) located positions, with data collection performed in fully asynchronous fashion. This data cannot be stored in flat-table structures and it must provide enough relevant information to fill-in the gaps in data.

The Data-Information Hierarcy, Part 2

February 11, 2011

Data, as has been established is the organic, elemental source quantities. Data, by itself, does not produce cognitive information and decision-making ability. But without it, the chain Data –> Information –> Knowledge –> Understanding –> Wisdom is broken before it starts. Data is recognized for its discrete characteristics.

Information is the logic grouping and presentation of the data. Information, in a more general sense, is the structure and encoded explanation of phenomena. It answers the who, what, when, where questions ( but does not explain these answer, nor instill a wisdom necessary to act on the information. “Information is sometimes associated with the idea of knowledge through its popular use rather than with uncertainty and the resolution of uncertainty.” (An Introduction to Information Theory, John R. Pierce)

The leap from Information through Knowledge and Understanding into Wisdom is the objective sought be data / information analysis, data / information mining, and knowledge systems.

What knowledge is gleaned from the information; how can we understand the interactions (particularly at a system level); and how is this understanding used to create correct decisions and navigate a course to a desired end point.

Wisdom is the key. What good is the historical volumes of data unless informed decisions result? (a definition of Wisdom)

How do informed decisions (Wisdom) result if the data and process are not Understood?

How do we achieve understanding without the knowledge?

How do we achieve the Knowledge without the Information?

But in a very real sense, Information is not the who, what, when, where answers to the questions assimilating data. Quantified, Information is the measure of the order of the system; or conversely the measure of the lack of disorder, the measure of the entropy of the system.

Taken together, information is composed of the informational context, the informational content, and the informational propositions. Knowledge, then, is the informed assimilation of the information. And the cognitive conclusion is the understanding.

Thus Data –> Information –> Knowledge –> Understanding –> Wisdom through the judicious use of:

  • data acquisition,
  • data retention,
  • data transition,
  • knowledge mining,
  • cognitive processing, and
  • the application of this combined set into a proactive and forward action plan.

The Data-Information Hierarchy

January 31, 2011

(originally posted on blogspot January 29, 2010)

I see much internet attention given to the data –> information –> knowledge –> understanding –> wisdom tree. Most will omit the step of understanding. Many will overlook data or wisdom. But all five stages are required to move from total oblivion to becoming the true productive member of society that pushes us forward. To paraphrase, “it is a process”.

“The first sign of wisdom is to get wisdom; go, give all you have to get true knowledge.” ( This central and key verse in proverbs rings true today. Wisdom is the objective, but the progression to wisdom must begin with assimilating the data and information into knowledge. And from knowledge comes understanding, from understanding comes wisdom.

In keeping with the phylosophical / religious examination of knowledge and wisdom, explains it as:

Wisdom is the principal thing – the most important matter in life. Wisdom is the power of right judgment – the ability to choose the correct solution for any situation. It is knowing how to think, speak, and act to please both God and men. It is the basis for victorious living. Without wisdom, men make choices that bring them pain, poverty, trouble, and even death. With it, men make choices that bring them health, peace, prosperity, and life.

Understanding is connected to wisdom, and it is also an important goal. Understanding is the power of discernment – to see beyond what meets the eye and recognize the inherent faults or merits of a thing. Without understanding, men are easily deceived and led astray. Without it, men are confused and perplexed. With it, men can see what others miss, and they can avoid the snares and traps of seducing sins. With it, life’s difficulties are simple.

As great as this biblical basis is, for the effort of business and scientific endeavours require all five steps. But, as Cliff Stoll said, “Data is not information, Information is not knowledge, Knowledge is not understanding, Understanding is not wisdom.” Data is sought, information is desired, but wisdom is the objective.

“We collect and organize data to achive information; we process information to absorbe knowledge; we untilize the knowledge to gain understanding; and we apply understanding to achieve wisdom.” (Mark Reynolds)

Russell Ackoff provides a definition of the five stages

  1. Data: symbols
  2. Information: data that are processed to be useful; provides answers to “who”, “what”, “where”, and “when” questions
  3. Knowledge: application of data and information; answers “how” questions
  4. Understanding: appreciation of “why”
  5. Wisdom: evaluated understanding.

Data itself is not able to be absorbed by the human. It is individual quantums of substance, neither understandable nor desirable.

Information is the recognizable and cognitive presentation of the data. The reason that the Excel chart is so popular is that it allows the manipulation of data but the perception of information. Information is processable by the human, but is only a historical concept.

Knowledge is the appropriate collection of information, such that it’s intent is to be useful. Knowledge is a deterministic process. To correctly answer such a question requires a true cognitive and analytical ability that is only encompassed in the next level… understanding. In computer parlance, most of the applications we use (modeling, simulation, etc.) exercise some type of stored knowledge. (

Understanding an interpolative and probabilistic process. It is cognitive and analytical. It is the process by which I can take knowledge and synthesize new knowledge from the previously held knowledge. The difference between understanding and knowledge is the difference between “learning” and “memorizing”. (

Wisdom is the only part of the five stages that is future based, forward looking. Systems Engineering courses have wisdom as the basis for hteir existance, whether they regognize it or not. An alternate definition of Systems Engineering could be the effort to put knowledge and understanding to use. And as such, a large part of the Systems Engineering course work is to teach the acquisition of knowledge and understanding while pressing for the student’s mind to explore its meaning and function.

Neil Fleming observes: (

  • A collection of data is not information.
  • A collection of information is not knowledge.
  • A collection of knowledge is not wisdom.
  • A collection of wisdom is not truth.

Where have we come from, what are we doing here, where are we going? These questions look like philosophical questions. But they form the basis for the data, information, knowledge, understanding, wisdom tree. And once the level of wisdom is achieved in any study, the question of where are we going is effectively answered.

%d bloggers like this: