What is Content?

Posted September 8, 2011 by ProfReynolds
Categories: Data - Information - Knowledge - Understanding - Wisdom, Information Theory, Operations

Tags: , , ,

Several internet articles and blogs address the meaning of content from an internet perspective. From this perspective, content is the (meaningful) stuff on a page, the presentation of information to the seeker.

But content within an operations-centric perspective is entirely different. And the databases and operational tools must be content data reflecting the desired information being sought in the pursuit of knowledge. Thus, paraphrasing Scottie Claiborne (http://www.successful-sites.com/articles/content-claiborne-content1.php), “content is the stuff in your operations system;  good content is useful information”.

Therefore, content is the meaningful data and the presentation of this data as information.

Content can, and should be, redundant. Not redundant from a back-up perspective; redundant from an information theory perspective – data that is inter-related and inter-correlated. (Data that is directly calculated need not be stored, however, the method of calculation may change and therefore the original calculation may prove useful.) Data that is inter-correlated may be thought of in terms of weather: wind speed, temperature, pressure, humidity, etc. are individual, measurable values but the inter-relate and perfectly valid inferences may be made in the absence of one or more of these datums. When the historical (temporal) and adjacent (geospatially) datums are brought into the content, then, according to information theory, more and more redundancy exists within the dataset.

Having identified the basis of content, the operations system designer should perform content analysis. Content analysis is both qualitative and quantitative. But careful attention to systems design and systems management will permit increased quantification of the results. What is content analysis in its most base form: the designer asking the questions “What is the purpose of the data? What outcomes are expected from the data? How will the data be imparted to produce the desired behavior?”

So how do we quantify the importance of specific data / content? How do we choose which data / content to retain? This question is so difficult to answer, the normal response is to save everything, forever. And since data not retained is data lost, and lost forever, this approach seems reasonable in a world of diminishing data storage costs. But, then, the cost and complexity of information retrieval becomes more difficult.

The concept and complexity of data retrieval is left for another day…

The Value of Real-Time Data, Part 2

Posted September 1, 2011 by ProfReynolds
Categories: Data - Information - Knowledge - Understanding - Wisdom, Predictive Analytics, Real-Time

Tags: ,

Previously, predictive analytics was summarized as “system anticipates” (http://profreynolds.wordpress.com/2011/08/31/the-value-of-real-time-data/). But that left a lot unsaid. Predictive analytics is a combination of statistical analysis, behaviour clustering, and system modeling. No one piece of predictive analytics can exist in a vacuum; the real-time system must be statistically analyzed, its behaviour grouped or clustered, and finally a system modeled that can use real-time data to anticipate the future – near term and longer.

Examples of predictive analytics in everyday life include credit scores, hurricane forecasts, etc. In each case, past events are analyzed, clustered, and then predicted.

The result of predictive analytics is, therefore, a decision tool. And the decision tree will, to some degree, take into account a predictive analysis.

The output of Predictive Analytics will be descriptive or analytic – subjective or objective. Both outputs are reasonable and viable. Looking at the hurricane predictions, there are analytical computer models (including the so-called spaghetti models) that seek to propose a definitive resulting behaviour; then there are descriptive models that seek to produce a visualization and comprehension of the discrete calculations. By extension, one can generalize that descriptive predictions must be the result of multiple analytic predictions. Perhaps this is true.

Returning to the idea that predictive analytics is comprised of statistical analysis, clustering analysis, and finally system modelling, we see that a sub-field of analytics could be considered: reactive analytics. Reactive analytics seeks to understand the statistical analysis, and even the clustering analysis, with an eye to adapt processes and procedures – but not in real-time. Reactive Analytics is, therefore, the Understanding portion of the Data-Information hierarchy (http://profreynolds.wordpress.com/2011/08/31/the-data-information-hierarcy-part-3/). Predictive Analytics is, therefore, the Wisdom portion of the Data-Information hierarchy.

The Value of Real-Time Data

Posted August 31, 2011 by ProfReynolds
Categories: Data and Data Mining, Forensic Analysis, Information Theory, Knowledge Systems, Predictive Analytics

Tags: ,

Real-Time data is a challenge to any process-oriented operation. But the functionality of the data is difficult to describe in such a way that team members not well versed in data management. Toward that end, four distinct phases of data have been identified:

  1. Real-Time: streaming data
    visualized and considered – system responds
  2. Forensic: captured data
    archived, condensed – system learns
  3. Data Mining: consolidated data
    hashed and clustered – system understands
  4. Predictive Analytics: patterned data
    compared and matched – system anticipates

A mnore detailed explanation of these phases may be:

Control and Supervision: Real-time data is used to provide direct HMI (human-machine-interface) and permit the human computer to monitor / control the operations from his console. The control and supervision phase of real-time data does not, as part of its function, record the data. (However, certain data logs may be created for legal or application development purposes.) Machine control and control feedback loops require, as a minimum, real-time data of sufficient quality to provide steady operational control.

Forensic Analysis and Lessons Learned: Captured data (and, to a lesser extent, data and event logs) are utilized to investigate specific performance metrics and operations issues. Generally, this data is kept in some form for posterity, but it may be filtered, processed, or purged. Nevertheless, the forensic utilization does represent post-operational analytics. Forensic analysis is also critical to prepare an operator for an upcoming similar process – similar in function, geography, or sequence.

Data Mining: Data mining is used to research previous operational events to locate trends, areas for improvement, and prepare for upcoming operations. Data mining is used identify a bottleneck or problem area as well as correlate events that are less than obvious.

Proactive / Predictive Analytics: The utilization of data streams, both present and previous, in an effort to predict the immediate (or distant) future requires historical data, data mining, and the application of learned correlations. Data mining may provide correlated events and properties, but the predictive analytics will provide the conversion of the correlations into positive, immediate performance and operational changes. (This utilization is not, explicitly AI, artificial intelligence, but the two are closely related)

The Data-Information Hierarcy, Part 3

Posted August 31, 2011 by ProfReynolds
Categories: Data - Information - Knowledge - Understanding - Wisdom, Information Theory, Knowledge Systems, Uncategorized

Tags: , , , ,

The Data-Information Hierarchy is frequently represented as
Data –> Information –> Knowledge –> Understanding –> Wisdom.

Or it is sometimes shortened to 4 steps, omitting Understanding. But, in fact, there are two predecessor steps: chaos and symbol. These concepts have been discussed in prior blogs (http://profreynolds.wordpress.com/2011/01/31/the-data-information-hierarcy/
http://profreynolds.wordpress.com/2011/02/11/the-data-information-hierarcy-part-2/).

Chaos is that state of lack of understanding best compared to the baby first perceiving the world around him. There is no comprehension of quantities or values, but a perception of large and small.

Symbol (or symbolic representation) represents the first stages of quantification. As such, symbolic representation and quantification concepts from the predecessor to Data.

So the expanded Data-Information hierarchy is represented in the seven steps:

Chaos –>
          Symbol –>
                    Data –>
                              Information –>
                                        Knowledge –>
                                                  Understanding –>
                                                            Wisdom

Continuing with this Data-Hierarchy paradigm, we can represent the five primary steps with the simple explanation:

  • Data and Information : ‘Know What’
  • Knowledge : ‘Know How’
  • Understanding : ‘Know Why’
  • Wisdom : ‘Use It’

The Physics of Measurements

Posted June 13, 2011 by ProfReynolds
Categories: Industry and Applications, Real-Time

Tags: ,

An NPR report this morning addressed the new rotary-action mechanical heart. Instead of pulsing, it pumps like a conventional pump. (KUHF News Article) As interesting as this mechanical marvel is, an obscure quote discussing the process of measurement deserves some recognition.

Dr. Billy Cohn at the Texas Heart Institute says “If you listened to [the cow's] chest with a stethoscope, you wouldn’t hear a heartbeat. If you examined her arteries, there’s no pulse. If you hooked her up to an EKG, she’d be flat-lined. By every metric we have to analyze patients, she’s not living. But here you can see she’s a vigorous, happy, playful calf licking my hand.”

The point to be made here is that neither the process of measurement, nor the device performing the measurement, is the process or event being measured; they are intrusive and will affect the reading. In fact, the measurement will always impact, in some way, the activity or event being measured. For example, electronic voltage measurement must withdraw a small stream of electrons to perform that measurement. This small stream represents power – measurable, demonstrable power that can, and does, modify the electronics being measured. Likewise in health, blood pressure cuffs, in a very real way, will alter the blood flow and resultant blood pressure reading. In fact, users of blood pressure cuffs are told that, after two failures at getting a clear reading, one should stop trying because the results will be skewed.

Generally measurements are performed as true real-time and as post real-time. Electrical voltage in the previous example is performed in real-time. But the speed of a vehicle is actually performed after-the-fact by measuring either the distance travelled over a specific interval or measuring the number of related events (magnetic actuator readings from a rotating wheel). Similarly, blood pressure may be an instantaneous measurement, but blood pulse rate is actually the number of pulses detected over a period of time (ie 15 seconds) or the time between pulses (a reciprocal measurement).

Having said that, most physical world measurements, including voltage, blood pressure, vehicle speed, etc.) are actually filtered and processed so that random events or mis-measurements are effectively removed from the resulting display. For example, a doctor may read 14 beats during a 15 second period leading him to declare a pulse rate of 14×4 or 56 (beats per minute). But what if he barely missed a pulse before starting the 15 second window, and barely missed the next pulse at the end? Perhaps the correct reading should be 59! Further complicate this error by mis-counting the number of pulses or be misreading the second-hand of the watch.

Typically normal measurement inaccuracy is compensated through various corrections, filters, gap fillers, and spurious removal. In this heart rate example, the actual reading can be improved by

  • removing outliers (measured events that do not cluster) typically the ‘top 5%’ and ‘bottom 5%’ of events are removed as extraneous
  • filtering the results (more complex than simple averages, but with the same goal in mind)
  • gap-filling (inserting an imagined pulse that was not sensed while a patient fidgets)
  • spurious removal (ignoring an unexpected beat)
  • increasing the size of the sample set

Discounting the fact that an unexpected beat should not be ignored by the doctor’s staff, the above illustrates how measurements are routinely processed by post measurement processes.

Finally, adjustments must be made to compensate for the impact of the measurement action, or the environment. In the case of the voltage measurement, the engineer may have to mathematically adjust the actual reading to reflect a true, un metered, performance. The experimental container may also require compensation. Youth swimming does this all of the time – some swim meets occur in pools that are 25 yards long while others occur in pools that are 25 meters long. Although this does not change the outcome of any individual race, it does affect the coach’s tracking of contestant improvement and the league’s tracking of record setting – both seasonal and over time.

So the cow in the NPR article may be clinically dead while grazing in the meadow, neither are in dispute – the cow is alive, and the cow is clinically dead. The fault here lies in the measurement. And once again, the valid and reliable scientific principles and techniques are suddenly no longer either valid or reliable.

The Big Crew Change

Posted May 17, 2011 by ProfReynolds
Categories: Data and Data Mining, Digital Oilfield, Industry and Applications, Information Systems, Knowledge Systems, Uncategorized

Tags: , , , ,

“The Big Crew Change” is an approaching event within the oil and gas industry when the mantle of leadership will move from the “calculators and memos” generation to the “connected and Skype” generation. In a blog 4 years ago, Rembrandt observes:

“The retirement of the workforce in the industry is normally referred to as “the big crew change”. People in this sector normally retire at the age of 55. Since the average age of an employee working at a major oil company or service company is 46 to 49 years old, there will be a huge change in personnel in the coming ten years, hence the “big crew change”. This age distribution is a result of the oil crises in ‘70s and ‘80s as shown in chart 1 & 2 below. The rising oil price led to a significant increase in the inflow of petroleum geology students which waned as prices decreased.”

Furthermore, a Society of Petroleum Engineers study found:

“There are insufficient personnel or ‘mid-carrers’ between 30 and 45 with the experience to make autonomous decisions on critical projects across the key areas of our business: exploration, development and production. This fact slows the potential for a safe increase in production considerably”

A study undertaken by Texas Tech University make several points about the state of education and the employability of graduates during this crew change:

  • Employment levels at historic lows
  • 50% of current workers will retire in 6 years
  • Job prospects: ~100% placement for the past 12 years
  • Salaries: Highest major in engineering for new hires

The big challenge: Knowledge Harvesting. “The loss of experienced personnel combined with the influx of young employees is creating unprecedented knowledge retention and transfer problems that threaten companies’ capabilities for operational excellence, growth, and innovation.” (Case Study: Knowledge Harvesting During the Big Crew Change).

In a blog by Otto Plowman, “Retaining knowledge through the Big Crew Change”, we see that

“Finding a way to capture the knowledge of experienced employees is critical, to prevent “terminal leakage” of insight into decisions about operational processes, best practices, and so on. Using of optimization technology is one way that producers can capture and apply this knowledge.When the retiring workforce fail to convey the important (critical) lessons learned, the gap is filled by data warehouses, knowledge systems, adaptive intelligence, and innovation.”

When the retiring workforce fail to convey the important (critical) lessons learned, the gap is filled by data warehouses, knowledge systems, adaptive intelligence, and innovation. Perhaps the biggest challenge is innovation. Innovation will drive the industry through the next several years. Proactive intelligence, coupled with terabyte upon terabyte of data will form the basis.

The future: the nerds will take over from the wildcatter.

Real-Time Data in an Operations/Process Environment

Posted May 16, 2011 by ProfReynolds
Categories: Data - Information - Knowledge - Understanding - Wisdom, Data and Data Mining, Forensic Analysis, Industry and Applications, Predictive Analytics, Real-Time

Tags: , , , , , , , , ,

The operations/process environment differs from the administrative and financial environments in that operations is charged with getting the job done. As such, the requirements placed on computers, information systems, instrumentation, controls, and data is different too. Data is never ‘in balance’, data always carries uncertainty, and the process cannot stop. Operations personally have learned to perform their job while waiting for systems to come online, waiting for systems to upgrade, or even waiting for systems to be invented.

Once online, systems must be up 100% of the time, but aren’t. Systems must process data from a myriad of sources, but those sources are frequently intermit or sporadic. Thus the processing, utilization, storage, and analysis of real-time data is a challenge totally unlike the systems seen in administrations or financial operations.

Real time systems must address distinct channels of data flow – from the immediate to the analysis of terabytes of archived data.

Control and Supervision: Real-time data is used to provide direct HMI (human-machine-interface) and permit the human computer to monitor / control the operations from his console. The control and supervision phase of real-time data does not, as part of its function, record the data. (However, certain data logs may be created for legal or application development purposes.) Machine control and control feedback loops require, as a minimum, real-time data of sufficient quality to provide steady operational control.

Forensic Analysis and Lessons Learned: Captured data (and, to a lesser extent, data and event logs) are utilized to investigate specific performance metrics and operations issues. Generally, this data is kept in some form for posterity, but it may be filtered, processed, or purged. Nevertheless, the forensic utilization does represent post-operational analytics. Forensic analysis is also critical to prepare an operator for an upcoming similar process – similar in function, geography, or sequence.

Data Mining: Data mining is used to research previous operational events to locate trends, areas for improvement, and prepare for upcoming operations. Data mining is used identify a bottleneck or problem area as well as correlate events that are less than obvious.

Proactive / Predictive Analytics: The utilization of data streams, both present and previous, in an effort to predict the immediate (or distant) future requires historical data, data mining, and the application of learned correlations. Data mining may provide correlated events and properties, but the predictive analytics will provide the conversion of the correlations into positive, immediate performance and operational changes. (This utilization is not, explicitly AI, artificial intelligence, but the two are closely related)

The data-information-knowledge-understanding-wisdom paradigm: Within the data—>wisdom paradigm, real-time data is just that – data. The entire tree breaks out as:

  • data – raw, untempered data from the operations environment (elemental data filtering and data quality checks are, nevertheless, required).
  • information – presentation of the data in human comprehensible formats – the control and supervision phase of real-time data.
  • knowledge – forensic analytics, data mining, and correlation analysis
  • understanding – proactive and forward-looking changes in behavior characteristic of the proactive / predictive analytics phase.
  • wisdom – the wisdom phase remains the domain of the human computer.

Related Posts:

Data Mining and Data, Information, Understanding, Knowledge
http://profreynolds.wordpress.com/2011/01/30/data-mining-and-data-information-understanding-knowledge/

The Digital Oilfield, Part 1
http://profreynolds.wordpress.com/2011/01/30/the-digital-oilfield-part-1/

The Data-Information Hierarchy
http://profreynolds.wordpress.com/2011/01/31/the-data-information-hierarcy/

Multi-Nodal, Multi-Variable, Spatio-Temporal Datasets

Posted April 21, 2011 by ProfReynolds
Categories: Data and Data Mining, Digital Oilfield, Real-Time, Uncategorized

Tags: ,

Multi-Nodal, Multi-Variable, Spatio-Temporal Datasets are large-scale datasets encountered in real-world data-intensive environments.

Example Dataset #1

A basic example would be the heat distribution within a chimney at a factory. Heat sensors are distributed throughout the chimney and readings are taken are periodic intervals. Since the laws of Thermodynamics within a chimney are well understood, the interaction between the monitoring devices can be modeled. Predictive analysis could, conceivably be performed on the dataset and chimney cracks could be detected, or even predicted, in real-time.

In this scenario, data points consist of 1) multiple sensors or data acquisition devices, 2) multiple spatial locations, 3) temporally separated samples. When a sensor fails, it is simply removed from the processing and kept out of the processing until the sensor is repaired (during plant maintenance).

Example Dataset #2

An example would be the interconnected river and lake levels within a single geographic area. Distinct monitoring points are located at specific geo-spatial locations; geo-spatial points with interconnected transfer functions and models. Each of the monitoring points consist of multiple data acquisitions, and each data acquisition is sampled at random (or predetermined) intervals.

As a result, data points consist of 1) multiple sensors, 2) multiple spatial locations, and 3) temporally separated samples. In this scenario, sensors may fail – or become temporarily offline in a random, unpredictable manner. Sensors must be taken out of the processing until data validity returns. Due to the interconnectedness of the sensor locations, and the interrelationships between the sensors, sufficient redundant data could be present to permit suitable analytical processing in the absence of data.

Example Dataset #3

The most complex example could be aerial chemical contamination sampling. In this scenario, the chemical distribution is continuously changing at the result of understood, but not fully predictable, weather behavior. Sampling devices would consist of 1) airborne sampling devices (balloons) providing specific, limited sample sets, 2) ground based mobile sampling units (trucks) providing extensive sample sets, and fixed based (pole mounted) sampling units whose data is downloaded in relatively long intervals (hours or days).

In this scenario, multiple, non-uniform data sampling elements are positioned in non-uniformly (and mobile) located positions, with data collection performed in fully asynchronous fashion. This data cannot be stored in flat-table structures and it must provide enough relevant information to fill-in the gaps in data.

Knowledge as an Asset

Posted February 23, 2011 by ProfReynolds
Categories: Data - Information - Knowledge - Understanding - Wisdom, Data and Data Mining

Tags: , , , ,

This is the information age. So we are deluged with information (and sometimes just data). Organizations are only now beginning to grasp the fundamental concept of Knowledge as an Asset.

What constitutes the knowledge asset?

“Unlike information, knowledge is less tangible and depends on human cognition and awareness. There are several types of knowledge – ‘knowing’ a fact is little different from ‘information’, but ‘knowing’ a skill, or ‘knowing’ that something might affect market conditions is something, that despite attempts of knowledge engineers to codify such knowledge, has an important human dimension. It is some combination of context sensing, personal memory and cognitive processes. Measuring the knowledge asset, therefore, means putting a value on people, both as individuals and more importantly on their collective capability, and other factors such as the embedded intelligence in an organisation’s computer systems.”  (http://www.skyrme.com/insights/11kasset.htm)

Tackling the concept of knowledge is challenging. Knowledge Engineering was first conceptualized in the early 1980s (http://en.wikipedia.org/wiki/Knowledge_engineering) – coincident with the advent of the inexpensive personal computer. Since that time, the paradigm has shifted for all aspects of business. Seat-of-the-pants management is no longer suitable. And the requisite empowering of employees resulting from this transformation has completely changed the workplace.

Having said that, some of the most important issues in knowledge acquisition are as follows: (http://epistemics.co.uk/Notes/63-0-0.htm)

  • Most knowledge is in the heads of experts
  • Experts have vast amounts of knowledge
  • Experts have a lot of tacit knowledge
    • They don’t know all that they know and use
    • Tacit knowledge is hard (impossible) to describe
  • Experts are very busy and valuable people
  • Each expert doesn’t know everything
  • Knowledge has a “shelf life”

Moving knowledge within the organization is perhaps the most challenging aspect of corporate knowledge. Knowledge handoff must occur laterally and temporally. Laterally in that the knowledge must be shared in order to convey the necessary understanding that is required; When a critical mass of users gain the understanding, then the corporate wisdom will result. Temporally in that knowledge must be handed off to the shift change (this is true for 24/7 operations as well as global operations).

But the ability to use technology to share the results of technology is lagging. Frequently, the hand-off notes are the only tool at anyones disposal – very inefficient. Recently, Sharepoint sites and Wikis have become popular for inter-departmental information and knowledge sharing.

Jerome J. Peloquin wrote an interesting essay (“Knowledge as a Corporate Asset”) in which he opens with “Virtually every business in the world faces the same fundamental problem: Maintenance
of their competitive edge through the application and formation of knowledge.” He make two statements in his conclusion which serve well to wrap-up the premise of this essay (Knowledge as an Asset):

  • Information is useless unless we can act upon it, and that implies that it must first be transformed
    into knowledge.
  • The knowledge asset combines a number of factors which can be objectively proven by
    the observation and accomplishment of a specific set of criteria.

Enter Knowledge Information Management. Gene Bellinger writes that

“In an organizational context, data represents facts or values of results, and relations between data and other relations have the capacity to represent information. Patterns of relations of data and information and other patterns have the capacity to represent knowledge. For the representation to be of any utility it must be understood, and when understood the representation is information or knowledge to the one that understands. Yet, what is the real value of information and knowledge, and what does it mean to manage it?”

If Knowledge is an Asset, then, like any corporate asset, it must be managed, secured, maintained, and made available as a tool to the employees. If information is the core of business, then the resulting knowledge is the value of the business. And the wisdom (coming from the understanding) is the driving force and the profitability of the business. Smaller, faster, cheaper may be the mantra; knowledge is the asset.

The Data-Information Hierarcy, Part 2

Posted February 11, 2011 by ProfReynolds
Categories: Data - Information - Knowledge - Understanding - Wisdom, Data and Data Mining, Information Theory

Tags: , , , , , ,

Data, as has been established is the organic, elemental source quantities. Data, by itself, does not produce cognitive information and decision-making ability. But without it, the chain Data –> Information –> Knowledge –> Understanding –> Wisdom is broken before it starts. Data is recognized for its discrete characteristics.

Information is the logic grouping and presentation of the data. Information, in a more general sense, is the structure and encoded explanation of phenomena. It answers the who, what, when, where questions (http://www.systems-thinking.org/dikw/dikw.htm) but does not explain these answer, nor instill a wisdom necessary to act on the information. “Information is sometimes associated with the idea of knowledge through its popular use rather than with uncertainty and the resolution of uncertainty.” (An Introduction to Information Theory, John R. Pierce)

The leap from Information through Knowledge and Understanding into Wisdom is the objective sought be data / information analysis, data / information mining, and knowledge systems.

What knowledge is gleaned from the information; how can we understand the interactions (particularly at a system level); and how is this understanding used to create correct decisions and navigate a course to a desired end point.

Wisdom is the key. What good is the historical volumes of data unless informed decisions result? (a definition of Wisdom)

How do informed decisions (Wisdom) result if the data and process are not Understood?

How do we achieve understanding without the knowledge?

How do we achieve the Knowledge without the Information?

But in a very real sense, Information is not the who, what, when, where answers to the questions assimilating data. Quantified, Information is the measure of the order of the system; or conversely the measure of the lack of disorder, the measure of the entropy of the system.

Taken together, information is composed of the informational context, the informational content, and the informational propositions. Knowledge, then, is the informed assimilation of the information. And the cognitive conclusion is the understanding.

Thus Data –> Information –> Knowledge –> Understanding –> Wisdom through the judicious use of:

  • data acquisition,
  • data retention,
  • data transition,
  • knowledge mining,
  • cognitive processing, and
  • the application of this combined set into a proactive and forward action plan.

Follow

Get every new post delivered to your Inbox.