Archive for the ‘Information Theory’ category

Artificial Intelligence vs Algorithms

February 9, 2012

I first considered aspects of artificial intelligence (AI) in the 1980s while working for General Dynamics as an Avionics Systems Engineer on the F-16. Over the following 3 decades, I continued to follow the concept until I made a realization – AI is just an algorithm. Certainly the goals of AI will one day be reached, but the manifestation metric of AI is not well defined.

Mark Reynolds is currently at Southwestern Energy where he works in the Fayetteville Shale Drilling group as a Staff Drilling Data Analyst. In this position, he pulls his experiences in data processing, data analysis, and data presentation to improve Southwestern Energy’s work in the natural gas production and mid-stream market.

Recently, Mark has been working toward improved data collection, retention, and utilization in the real-time drilling environment.

www.ProfReynolds.com

Consider the Denver International Airport. The baggage handling system was state of the art, touted as AI based and caused the delay of the opening by 16 months and cost $560M to fix. (more – click here) In the end, the entire system was replaced with a more stable system based not on a learning or deductive system, but upon much more basic routing and planning algorithms which may be deterministically designed and tested.

Consider the Houston traffic light system. Mayors have been elected on the promise to apply state of the art computer intelligence. Interconnected traffic lights, traffic prediction, automatic traffic redirection. Yet the AI desired results in identifiable computer algorithms with definitive behavior and expectations. Certainly an improvement, but not a thinking machine. The closest thing to automation is the remote triggering features used by the commuter rail and emergency vehicles.

So algorithms form the basis for computer advancement. And these algorithms may be applied with human interaction to learn the new lessons so necessary to achieving behavioral improvement with the computers. Toward this objective, distinct fields of study are untangling interrelated elements – clustering, neural networks, case based reasoning, and predictive analytics are just a few.

When AI can be achieved, it will be revolutionary. But until that time, deterministic algorithms, data mining, and predictive analytics will be at the core of qualitative and quantitative advancement.

Advertisements

What is Content?

September 8, 2011

Several internet articles and blogs address the meaning of content from an internet perspective. From this perspective, content is the (meaningful) stuff on a page, the presentation of information to the seeker.

But content within an operations-centric perspective is entirely different. And the databases and operational tools must be content data reflecting the desired information being sought in the pursuit of knowledge. Thus, paraphrasing Scottie Claiborne (http://www.successful-sites.com/articles/content-claiborne-content1.php), “content is the stuff in your operations system;  good content is useful information”.

Therefore, content is the meaningful data and the presentation of this data as information.

Content can, and should be, redundant. Not redundant from a back-up perspective; redundant from an information theory perspective – data that is inter-related and inter-correlated. (Data that is directly calculated need not be stored, however, the method of calculation may change and therefore the original calculation may prove useful.) Data that is inter-correlated may be thought of in terms of weather: wind speed, temperature, pressure, humidity, etc. are individual, measurable values but the inter-relate and perfectly valid inferences may be made in the absence of one or more of these datums. When the historical (temporal) and adjacent (geospatially) datums are brought into the content, then, according to information theory, more and more redundancy exists within the dataset.

Having identified the basis of content, the operations system designer should perform content analysis. Content analysis is both qualitative and quantitative. But careful attention to systems design and systems management will permit increased quantification of the results. What is content analysis in its most base form: the designer asking the questions “What is the purpose of the data? What outcomes are expected from the data? How will the data be imparted to produce the desired behavior?”

So how do we quantify the importance of specific data / content? How do we choose which data / content to retain? This question is so difficult to answer, the normal response is to save everything, forever. And since data not retained is data lost, and lost forever, this approach seems reasonable in a world of diminishing data storage costs. But, then, the cost and complexity of information retrieval becomes more difficult.

The concept and complexity of data retrieval is left for another day…

The Value of Real-Time Data

August 31, 2011

Real-Time data is a challenge to any process-oriented operation. But the functionality of the data is difficult to describe in such a way that team members not well versed in data management. Toward that end, four distinct phases of data have been identified:

  1. Real-Time: streaming data
    visualized and considered – system responds
  2. Forensic: captured data
    archived, condensed – system learns
  3. Data Mining: consolidated data
    hashed and clustered – system understands
  4. Predictive Analytics: patterned data
    compared and matched – system anticipates

A mnore detailed explanation of these phases may be:

Control and Supervision: Real-time data is used to provide direct HMI (human-machine-interface) and permit the human computer to monitor / control the operations from his console. The control and supervision phase of real-time data does not, as part of its function, record the data. (However, certain data logs may be created for legal or application development purposes.) Machine control and control feedback loops require, as a minimum, real-time data of sufficient quality to provide steady operational control.

Forensic Analysis and Lessons Learned: Captured data (and, to a lesser extent, data and event logs) are utilized to investigate specific performance metrics and operations issues. Generally, this data is kept in some form for posterity, but it may be filtered, processed, or purged. Nevertheless, the forensic utilization does represent post-operational analytics. Forensic analysis is also critical to prepare an operator for an upcoming similar process – similar in function, geography, or sequence.

Data Mining: Data mining is used to research previous operational events to locate trends, areas for improvement, and prepare for upcoming operations. Data mining is used identify a bottleneck or problem area as well as correlate events that are less than obvious.

Proactive / Predictive Analytics: The utilization of data streams, both present and previous, in an effort to predict the immediate (or distant) future requires historical data, data mining, and the application of learned correlations. Data mining may provide correlated events and properties, but the predictive analytics will provide the conversion of the correlations into positive, immediate performance and operational changes. (This utilization is not, explicitly AI, artificial intelligence, but the two are closely related)

The Data-Information Hierarcy, Part 3

August 31, 2011

The Data-Information Hierarchy is frequently represented as
Data –> Information –> Knowledge –> Understanding –> Wisdom.

Or it is sometimes shortened to 4 steps, omitting Understanding. But, in fact, there are two predecessor steps: chaos and symbol. These concepts have been discussed in prior blogs (https://profreynolds.wordpress.com/2011/01/31/the-data-information-hierarcy/
https://profreynolds.wordpress.com/2011/02/11/the-data-information-hierarcy-part-2/).

Chaos is that state of lack of understanding best compared to the baby first perceiving the world around him. There is no comprehension of quantities or values, but a perception of large and small.

Symbol (or symbolic representation) represents the first stages of quantification. As such, symbolic representation and quantification concepts from the predecessor to Data.

So the expanded Data-Information hierarchy is represented in the seven steps:

Chaos –>
          Symbol –>
                    Data –>
                              Information –>
                                        Knowledge –>
                                                  Understanding –>
                                                            Wisdom

Continuing with this Data-Hierarchy paradigm, we can represent the five primary steps with the simple explanation:

  • Data and Information : ‘Know What’
  • Knowledge : ‘Know How’
  • Understanding : ‘Know Why’
  • Wisdom : ‘Use It’

The Data-Information Hierarcy, Part 2

February 11, 2011

Data, as has been established is the organic, elemental source quantities. Data, by itself, does not produce cognitive information and decision-making ability. But without it, the chain Data –> Information –> Knowledge –> Understanding –> Wisdom is broken before it starts. Data is recognized for its discrete characteristics.

Information is the logic grouping and presentation of the data. Information, in a more general sense, is the structure and encoded explanation of phenomena. It answers the who, what, when, where questions (http://www.systems-thinking.org/dikw/dikw.htm) but does not explain these answer, nor instill a wisdom necessary to act on the information. “Information is sometimes associated with the idea of knowledge through its popular use rather than with uncertainty and the resolution of uncertainty.” (An Introduction to Information Theory, John R. Pierce)

The leap from Information through Knowledge and Understanding into Wisdom is the objective sought be data / information analysis, data / information mining, and knowledge systems.

What knowledge is gleaned from the information; how can we understand the interactions (particularly at a system level); and how is this understanding used to create correct decisions and navigate a course to a desired end point.

Wisdom is the key. What good is the historical volumes of data unless informed decisions result? (a definition of Wisdom)

How do informed decisions (Wisdom) result if the data and process are not Understood?

How do we achieve understanding without the knowledge?

How do we achieve the Knowledge without the Information?

But in a very real sense, Information is not the who, what, when, where answers to the questions assimilating data. Quantified, Information is the measure of the order of the system; or conversely the measure of the lack of disorder, the measure of the entropy of the system.

Taken together, information is composed of the informational context, the informational content, and the informational propositions. Knowledge, then, is the informed assimilation of the information. And the cognitive conclusion is the understanding.

Thus Data –> Information –> Knowledge –> Understanding –> Wisdom through the judicious use of:

  • data acquisition,
  • data retention,
  • data transition,
  • knowledge mining,
  • cognitive processing, and
  • the application of this combined set into a proactive and forward action plan.

The Data-Information Hierarchy

January 31, 2011

(originally posted on blogspot January 29, 2010)

I see much internet attention given to the data –> information –> knowledge –> understanding –> wisdom tree. Most will omit the step of understanding. Many will overlook data or wisdom. But all five stages are required to move from total oblivion to becoming the true productive member of society that pushes us forward. To paraphrase, “it is a process”.

“The first sign of wisdom is to get wisdom; go, give all you have to get true knowledge.” (www.basicenglishbible.com/proverbs/4.htm) This central and key verse in proverbs rings true today. Wisdom is the objective, but the progression to wisdom must begin with assimilating the data and information into knowledge. And from knowledge comes understanding, from understanding comes wisdom.

In keeping with the phylosophical / religious examination of knowledge and wisdom, www.letgodbetrue.com/proverbs/04_07.htm explains it as:

Wisdom is the principal thing – the most important matter in life. Wisdom is the power of right judgment – the ability to choose the correct solution for any situation. It is knowing how to think, speak, and act to please both God and men. It is the basis for victorious living. Without wisdom, men make choices that bring them pain, poverty, trouble, and even death. With it, men make choices that bring them health, peace, prosperity, and life.

Understanding is connected to wisdom, and it is also an important goal. Understanding is the power of discernment – to see beyond what meets the eye and recognize the inherent faults or merits of a thing. Without understanding, men are easily deceived and led astray. Without it, men are confused and perplexed. With it, men can see what others miss, and they can avoid the snares and traps of seducing sins. With it, life’s difficulties are simple.

As great as this biblical basis is, for the effort of business and scientific endeavours require all five steps. But, as Cliff Stoll said, “Data is not information, Information is not knowledge, Knowledge is not understanding, Understanding is not wisdom.” Data is sought, information is desired, but wisdom is the objective.

“We collect and organize data to achive information; we process information to absorbe knowledge; we untilize the knowledge to gain understanding; and we apply understanding to achieve wisdom.” (Mark Reynolds)

Russell Ackoff provides a definition of the five stages

  1. Data: symbols
  2. Information: data that are processed to be useful; provides answers to “who”, “what”, “where”, and “when” questions
  3. Knowledge: application of data and information; answers “how” questions
  4. Understanding: appreciation of “why”
  5. Wisdom: evaluated understanding.

Data itself is not able to be absorbed by the human. It is individual quantums of substance, neither understandable nor desirable.

Information is the recognizable and cognitive presentation of the data. The reason that the Excel chart is so popular is that it allows the manipulation of data but the perception of information. Information is processable by the human, but is only a historical concept.

Knowledge is the appropriate collection of information, such that it’s intent is to be useful. Knowledge is a deterministic process. To correctly answer such a question requires a true cognitive and analytical ability that is only encompassed in the next level… understanding. In computer parlance, most of the applications we use (modeling, simulation, etc.) exercise some type of stored knowledge. (www.systems-thinking.org/dikw/dikw.htm)

Understanding an interpolative and probabilistic process. It is cognitive and analytical. It is the process by which I can take knowledge and synthesize new knowledge from the previously held knowledge. The difference between understanding and knowledge is the difference between “learning” and “memorizing”. (www.systems-thinking.org/dikw/dikw.htm)

Wisdom is the only part of the five stages that is future based, forward looking. Systems Engineering courses have wisdom as the basis for hteir existance, whether they regognize it or not. An alternate definition of Systems Engineering could be the effort to put knowledge and understanding to use. And as such, a large part of the Systems Engineering course work is to teach the acquisition of knowledge and understanding while pressing for the student’s mind to explore its meaning and function.

Neil Fleming observes: (www.vark-learn.com/english/page.asp?p=biography)

  • A collection of data is not information.
  • A collection of information is not knowledge.
  • A collection of knowledge is not wisdom.
  • A collection of wisdom is not truth.

Where have we come from, what are we doing here, where are we going? These questions look like philosophical questions. But they form the basis for the data, information, knowledge, understanding, wisdom tree. And once the level of wisdom is achieved in any study, the question of where are we going is effectively answered.

Information Theory and Information Flow

January 30, 2011

(originally posted on blogspot January 28, 2010)

Information is the core, the root, of any business. But exactly what is information? Many will immediately begin explaining computer databases. But only a small portion of information theory is actually computer databases.

Information is a concrete substance in that it is a quantity that is sought, it is a quantity that can be sold, and it is a quantity that is protected.

Wikipedia’s definition: “Information is any kind of event that affects the state of a dynamical system. In its most restricted technical sense, it is an ordered sequence of symbols. As a concept, however, information has many meanings. Moreover, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.” (http://en.wikipedia.org/wiki/Information)

Information Theory then is not the study of bits and bytes. It is the study of information. Moreover, the quantification of the information. And fundamental to Information Theory is the acquisition of information, along with the extraction of the true information from the extraneous. In Electrical Engineering, this process is addressed by signal conditioning and noise filtering. In the mathematical sciences (and specifically the probability sciences), the acquisition of information is the investigation into the probability of events and the correlation of events – both as simultaneous events and as cause-effect events. Process control looks to the acquisition of information to lead to more optimum control of the processes.

So the acquisition of a clear signal, the predictive nature of that information, and the utilization of that information is at the root of information theory.

C. E. Shannon published a paper in 1948 Mathematical Theory of Communication whicherved to introduce the concept of Information Theory to modern science. His tenet is that communicatioin systems (the means for dispersal of information) are composed of five parts:

  1. An information source (radio signal, DNA, industrial meter)
  2. A transmitter
  3. The channel (medium used to transmit)
  4. A receiver
  5. A recipient.

Since the information flow must be as distraction and noise free as possible, digital systems are often employed for industrial and parameterized data. Considerations then focus on data precision, latency, clarity, and storage.

Interestingly, the science of cryptography actually looks for obvuscation. Data purity, but hidden. Withing cryptograsphy, the need for precise, timely, and clear information is as important as ever, but the encapsulation of that information into shucks of meaningless dribble is the objective.

But then the scientist (as well as the code breaker) is attempting to achieve just the opposite: finding patterns, tendencies, and clues. These patterns, tendencies, and clues are the substance of the third pahse of the Data –> Information –> Knowledge –> Understanding –> Wisdom. And finding these patterns, tendencies, and clues is what provides the industrial information user his ability to improve performance and, as a result, profitability.

The Digital Oilfield is a prime example of the search for more and better information. As the product becomes harder to recover – shale gas, undersea petroleum, horizontal drilling, etc. – the importance of the ability to mine patterns, tendencies, and clues is magnified.

Hence the Digital Oilfield is both lagging in the awakening to the need for information and leading in the resources to uncap the information.


%d bloggers like this: