Posted tagged ‘Wisdom’

Thoughts on Data Mining

March 10, 2012

Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information (see prior blogs including The Data Information Hierarchy series). The term is overused and conjures impressions that do not reflect the true state of the industry. Knowledge Discovery from Databases (KDD) is more descriptive and not as misused – but the base meaning is the same.

Nevertheless, this definition of data mining is a very general definition and does not convey the different aspects of data mining / knowledge discovery.

The basic types of Data Mining are:

  • Descriptive data mining, and
  • Predictive data mining

Descriptive Data Mining generally seeks groups, subgroups and clusters. Algorithms are developed that draw associative relationships from which actionable results may be derived. (ie. a diamond head snake should be considered poisonous.)

Generally, a descriptive data mining result will appear as a series of if – then – elseif – then … conditions. Alternatively, a system of scoring may be used much like some magazine based self assessment exams. Regardless of the approach, the end result is a clustering of the samples with some measure of quality.

Predictive Data Mining is then performing an analysis on previous data to derive a prediction to the next outcome. For example: new business incorporation tend to look for credit card merchant solutions. This may seem obvious, but someone had to discover this tendency – and then exploit it.

Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: 1) massive data collection, 2) powerful multiprocessor computers, and 3) data mining algorithms (http://www.thearling.com/text/dmwhite/dmwhite.htm).

Kurt Thearling identifies five type od data mining: (definitions taken from Wikipedia)

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. If in practice decisions have to be taken online with no recall under incomplete knowledge, a decision tree should be paralleled by a Probability model as a best choice model or online selection model algorithm. Another use of decision trees is as a descriptive means for calculating conditional probabilities.

Nearest neighbour or shortest distance is a method of calculating distances between clusters in hierarchical clustering. In single linkage, the distance between two clusters is computed as the distance between the two closest elements in the two clusters.

The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes.

Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. The rules extracted may represent a full scientific model of the data, or merely represent local patterns in the data.

Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.

Advertisements

Predictive Analytics

September 9, 2011

Predictive analytics is used in actuarial science, financial services, insurance, telecommunications, retail, travel, healthcare, pharmaceuticals and other fields (Wikipedia). But operations – manufacturing, processing, etc., have been a little slower to encompass the concept. A drilling engineer friend of mine says “just put my hand on the brake lever and I’ll drill that well”. He probably can, but few of the rest of us can, or want to.

We want to see operating parameters, performance metrics, and process trends. All this because we want to have the necessary information and knowledge to assimilate understanding and invoke our skill set (wisdom). In this scenario, we are responding to stimulus, we are applying “reactive analytics”. But systems get more complex, operations becomes more intertwined, performance expectations become razor-thin. And with this complexity grows demand for better assistance from technology. In this case, the software performs the integrated analysis and the results is “predictive analytics”. And with predictive analytics comes the close cousin: decision models and decision trees.

Sullivan McIntyre, in his article From Reactive to Predictive Analytics, makes an observation about predictive analytics in social media that is mirrored in operations:

There are three key criteria for making social data useful for making predictive inferences:

  • Is it real-time? (Or as close to real-time as possible)
  • Is it metadata rich?
  • Is it integrated?

Having established these criteria, the nature of the real-time data and the migration of historical data into real-time, predictive analytics becomes achievable.

The Value of Real-Time Data, Part 2

September 1, 2011

Previously, predictive analytics was summarized as “system anticipates” (https://profreynolds.wordpress.com/2011/08/31/the-value-of-real-time-data/). But that left a lot unsaid. Predictive analytics is a combination of statistical analysis, behaviour clustering, and system modeling. No one piece of predictive analytics can exist in a vacuum; the real-time system must be statistically analyzed, its behaviour grouped or clustered, and finally a system modeled that can use real-time data to anticipate the future – near term and longer.

Examples of predictive analytics in everyday life include credit scores, hurricane forecasts, etc. In each case, past events are analyzed, clustered, and then predicted.

The result of predictive analytics is, therefore, a decision tool. And the decision tree will, to some degree, take into account a predictive analysis.

The output of Predictive Analytics will be descriptive or analytic – subjective or objective. Both outputs are reasonable and viable. Looking at the hurricane predictions, there are analytical computer models (including the so-called spaghetti models) that seek to propose a definitive resulting behaviour; then there are descriptive models that seek to produce a visualization and comprehension of the discrete calculations. By extension, one can generalize that descriptive predictions must be the result of multiple analytic predictions. Perhaps this is true.

Returning to the idea that predictive analytics is comprised of statistical analysis, clustering analysis, and finally system modelling, we see that a sub-field of analytics could be considered: reactive analytics. Reactive analytics seeks to understand the statistical analysis, and even the clustering analysis, with an eye to adapt processes and procedures – but not in real-time. Reactive Analytics is, therefore, the Understanding portion of the Data-Information hierarchy (https://profreynolds.wordpress.com/2011/08/31/the-data-information-hierarcy-part-3/). Predictive Analytics is, therefore, the Wisdom portion of the Data-Information hierarchy.

The Data-Information Hierarcy, Part 3

August 31, 2011

The Data-Information Hierarchy is frequently represented as
Data –> Information –> Knowledge –> Understanding –> Wisdom.

Or it is sometimes shortened to 4 steps, omitting Understanding. But, in fact, there are two predecessor steps: chaos and symbol. These concepts have been discussed in prior blogs (https://profreynolds.wordpress.com/2011/01/31/the-data-information-hierarcy/
https://profreynolds.wordpress.com/2011/02/11/the-data-information-hierarcy-part-2/).

Chaos is that state of lack of understanding best compared to the baby first perceiving the world around him. There is no comprehension of quantities or values, but a perception of large and small.

Symbol (or symbolic representation) represents the first stages of quantification. As such, symbolic representation and quantification concepts from the predecessor to Data.

So the expanded Data-Information hierarchy is represented in the seven steps:

Chaos –>
          Symbol –>
                    Data –>
                              Information –>
                                        Knowledge –>
                                                  Understanding –>
                                                            Wisdom

Continuing with this Data-Hierarchy paradigm, we can represent the five primary steps with the simple explanation:

  • Data and Information : ‘Know What’
  • Knowledge : ‘Know How’
  • Understanding : ‘Know Why’
  • Wisdom : ‘Use It’

Real-Time Data in an Operations/Process Environment

May 16, 2011

The operations/process environment differs from the administrative and financial environments in that operations is charged with getting the job done. As such, the requirements placed on computers, information systems, instrumentation, controls, and data is different too. Data is never ‘in balance’, data always carries uncertainty, and the process cannot stop. Operations personally have learned to perform their job while waiting for systems to come online, waiting for systems to upgrade, or even waiting for systems to be invented.

Once online, systems must be up 100% of the time, but aren’t. Systems must process data from a myriad of sources, but those sources are frequently intermit or sporadic. Thus the processing, utilization, storage, and analysis of real-time data is a challenge totally unlike the systems seen in administrations or financial operations.

Real time systems must address distinct channels of data flow – from the immediate to the analysis of terabytes of archived data.

Control and Supervision: Real-time data is used to provide direct HMI (human-machine-interface) and permit the human computer to monitor / control the operations from his console. The control and supervision phase of real-time data does not, as part of its function, record the data. (However, certain data logs may be created for legal or application development purposes.) Machine control and control feedback loops require, as a minimum, real-time data of sufficient quality to provide steady operational control.

Forensic Analysis and Lessons Learned: Captured data (and, to a lesser extent, data and event logs) are utilized to investigate specific performance metrics and operations issues. Generally, this data is kept in some form for posterity, but it may be filtered, processed, or purged. Nevertheless, the forensic utilization does represent post-operational analytics. Forensic analysis is also critical to prepare an operator for an upcoming similar process – similar in function, geography, or sequence.

Data Mining: Data mining is used to research previous operational events to locate trends, areas for improvement, and prepare for upcoming operations. Data mining is used identify a bottleneck or problem area as well as correlate events that are less than obvious.

Proactive / Predictive Analytics: The utilization of data streams, both present and previous, in an effort to predict the immediate (or distant) future requires historical data, data mining, and the application of learned correlations. Data mining may provide correlated events and properties, but the predictive analytics will provide the conversion of the correlations into positive, immediate performance and operational changes. (This utilization is not, explicitly AI, artificial intelligence, but the two are closely related)

The data-information-knowledge-understanding-wisdom paradigm: Within the data—>wisdom paradigm, real-time data is just that – data. The entire tree breaks out as:

  • data – raw, untempered data from the operations environment (elemental data filtering and data quality checks are, nevertheless, required).
  • information – presentation of the data in human comprehensible formats – the control and supervision phase of real-time data.
  • knowledge – forensic analytics, data mining, and correlation analysis
  • understanding – proactive and forward-looking changes in behavior characteristic of the proactive / predictive analytics phase.
  • wisdom – the wisdom phase remains the domain of the human computer.

Related Posts:

Data Mining and Data, Information, Understanding, Knowledge
https://profreynolds.wordpress.com/2011/01/30/data-mining-and-data-information-understanding-knowledge/

The Digital Oilfield, Part 1
https://profreynolds.wordpress.com/2011/01/30/the-digital-oilfield-part-1/

The Data-Information Hierarchy
https://profreynolds.wordpress.com/2011/01/31/the-data-information-hierarcy/

Knowledge as an Asset

February 23, 2011

This is the information age. So we are deluged with information (and sometimes just data). Organizations are only now beginning to grasp the fundamental concept of Knowledge as an Asset.

What constitutes the knowledge asset?

“Unlike information, knowledge is less tangible and depends on human cognition and awareness. There are several types of knowledge – ‘knowing’ a fact is little different from ‘information’, but ‘knowing’ a skill, or ‘knowing’ that something might affect market conditions is something, that despite attempts of knowledge engineers to codify such knowledge, has an important human dimension. It is some combination of context sensing, personal memory and cognitive processes. Measuring the knowledge asset, therefore, means putting a value on people, both as individuals and more importantly on their collective capability, and other factors such as the embedded intelligence in an organisation’s computer systems.”  (http://www.skyrme.com/insights/11kasset.htm)

Tackling the concept of knowledge is challenging. Knowledge Engineering was first conceptualized in the early 1980s (http://en.wikipedia.org/wiki/Knowledge_engineering) – coincident with the advent of the inexpensive personal computer. Since that time, the paradigm has shifted for all aspects of business. Seat-of-the-pants management is no longer suitable. And the requisite empowering of employees resulting from this transformation has completely changed the workplace.

Having said that, some of the most important issues in knowledge acquisition are as follows: (http://epistemics.co.uk/Notes/63-0-0.htm)

  • Most knowledge is in the heads of experts
  • Experts have vast amounts of knowledge
  • Experts have a lot of tacit knowledge
    • They don’t know all that they know and use
    • Tacit knowledge is hard (impossible) to describe
  • Experts are very busy and valuable people
  • Each expert doesn’t know everything
  • Knowledge has a “shelf life”

Moving knowledge within the organization is perhaps the most challenging aspect of corporate knowledge. Knowledge handoff must occur laterally and temporally. Laterally in that the knowledge must be shared in order to convey the necessary understanding that is required; When a critical mass of users gain the understanding, then the corporate wisdom will result. Temporally in that knowledge must be handed off to the shift change (this is true for 24/7 operations as well as global operations).

But the ability to use technology to share the results of technology is lagging. Frequently, the hand-off notes are the only tool at anyones disposal – very inefficient. Recently, Sharepoint sites and Wikis have become popular for inter-departmental information and knowledge sharing.

Jerome J. Peloquin wrote an interesting essay (“Knowledge as a Corporate Asset”) in which he opens with “Virtually every business in the world faces the same fundamental problem: Maintenance
of their competitive edge through the application and formation of knowledge.” He make two statements in his conclusion which serve well to wrap-up the premise of this essay (Knowledge as an Asset):

  • Information is useless unless we can act upon it, and that implies that it must first be transformed
    into knowledge.
  • The knowledge asset combines a number of factors which can be objectively proven by
    the observation and accomplishment of a specific set of criteria.

Enter Knowledge Information Management. Gene Bellinger writes that

“In an organizational context, data represents facts or values of results, and relations between data and other relations have the capacity to represent information. Patterns of relations of data and information and other patterns have the capacity to represent knowledge. For the representation to be of any utility it must be understood, and when understood the representation is information or knowledge to the one that understands. Yet, what is the real value of information and knowledge, and what does it mean to manage it?”

If Knowledge is an Asset, then, like any corporate asset, it must be managed, secured, maintained, and made available as a tool to the employees. If information is the core of business, then the resulting knowledge is the value of the business. And the wisdom (coming from the understanding) is the driving force and the profitability of the business. Smaller, faster, cheaper may be the mantra; knowledge is the asset.

The Data-Information Hierarcy, Part 2

February 11, 2011

Data, as has been established is the organic, elemental source quantities. Data, by itself, does not produce cognitive information and decision-making ability. But without it, the chain Data –> Information –> Knowledge –> Understanding –> Wisdom is broken before it starts. Data is recognized for its discrete characteristics.

Information is the logic grouping and presentation of the data. Information, in a more general sense, is the structure and encoded explanation of phenomena. It answers the who, what, when, where questions (http://www.systems-thinking.org/dikw/dikw.htm) but does not explain these answer, nor instill a wisdom necessary to act on the information. “Information is sometimes associated with the idea of knowledge through its popular use rather than with uncertainty and the resolution of uncertainty.” (An Introduction to Information Theory, John R. Pierce)

The leap from Information through Knowledge and Understanding into Wisdom is the objective sought be data / information analysis, data / information mining, and knowledge systems.

What knowledge is gleaned from the information; how can we understand the interactions (particularly at a system level); and how is this understanding used to create correct decisions and navigate a course to a desired end point.

Wisdom is the key. What good is the historical volumes of data unless informed decisions result? (a definition of Wisdom)

How do informed decisions (Wisdom) result if the data and process are not Understood?

How do we achieve understanding without the knowledge?

How do we achieve the Knowledge without the Information?

But in a very real sense, Information is not the who, what, when, where answers to the questions assimilating data. Quantified, Information is the measure of the order of the system; or conversely the measure of the lack of disorder, the measure of the entropy of the system.

Taken together, information is composed of the informational context, the informational content, and the informational propositions. Knowledge, then, is the informed assimilation of the information. And the cognitive conclusion is the understanding.

Thus Data –> Information –> Knowledge –> Understanding –> Wisdom through the judicious use of:

  • data acquisition,
  • data retention,
  • data transition,
  • knowledge mining,
  • cognitive processing, and
  • the application of this combined set into a proactive and forward action plan.

%d bloggers like this: