Archive for January 2011

The Data-Information Hierarchy

January 31, 2011

(originally posted on blogspot January 29, 2010)

I see much internet attention given to the data –> information –> knowledge –> understanding –> wisdom tree. Most will omit the step of understanding. Many will overlook data or wisdom. But all five stages are required to move from total oblivion to becoming the true productive member of society that pushes us forward. To paraphrase, “it is a process”.

“The first sign of wisdom is to get wisdom; go, give all you have to get true knowledge.” ( This central and key verse in proverbs rings true today. Wisdom is the objective, but the progression to wisdom must begin with assimilating the data and information into knowledge. And from knowledge comes understanding, from understanding comes wisdom.

In keeping with the phylosophical / religious examination of knowledge and wisdom, explains it as:

Wisdom is the principal thing – the most important matter in life. Wisdom is the power of right judgment – the ability to choose the correct solution for any situation. It is knowing how to think, speak, and act to please both God and men. It is the basis for victorious living. Without wisdom, men make choices that bring them pain, poverty, trouble, and even death. With it, men make choices that bring them health, peace, prosperity, and life.

Understanding is connected to wisdom, and it is also an important goal. Understanding is the power of discernment – to see beyond what meets the eye and recognize the inherent faults or merits of a thing. Without understanding, men are easily deceived and led astray. Without it, men are confused and perplexed. With it, men can see what others miss, and they can avoid the snares and traps of seducing sins. With it, life’s difficulties are simple.

As great as this biblical basis is, for the effort of business and scientific endeavours require all five steps. But, as Cliff Stoll said, “Data is not information, Information is not knowledge, Knowledge is not understanding, Understanding is not wisdom.” Data is sought, information is desired, but wisdom is the objective.

“We collect and organize data to achive information; we process information to absorbe knowledge; we untilize the knowledge to gain understanding; and we apply understanding to achieve wisdom.” (Mark Reynolds)

Russell Ackoff provides a definition of the five stages

  1. Data: symbols
  2. Information: data that are processed to be useful; provides answers to “who”, “what”, “where”, and “when” questions
  3. Knowledge: application of data and information; answers “how” questions
  4. Understanding: appreciation of “why”
  5. Wisdom: evaluated understanding.

Data itself is not able to be absorbed by the human. It is individual quantums of substance, neither understandable nor desirable.

Information is the recognizable and cognitive presentation of the data. The reason that the Excel chart is so popular is that it allows the manipulation of data but the perception of information. Information is processable by the human, but is only a historical concept.

Knowledge is the appropriate collection of information, such that it’s intent is to be useful. Knowledge is a deterministic process. To correctly answer such a question requires a true cognitive and analytical ability that is only encompassed in the next level… understanding. In computer parlance, most of the applications we use (modeling, simulation, etc.) exercise some type of stored knowledge. (

Understanding an interpolative and probabilistic process. It is cognitive and analytical. It is the process by which I can take knowledge and synthesize new knowledge from the previously held knowledge. The difference between understanding and knowledge is the difference between “learning” and “memorizing”. (

Wisdom is the only part of the five stages that is future based, forward looking. Systems Engineering courses have wisdom as the basis for hteir existance, whether they regognize it or not. An alternate definition of Systems Engineering could be the effort to put knowledge and understanding to use. And as such, a large part of the Systems Engineering course work is to teach the acquisition of knowledge and understanding while pressing for the student’s mind to explore its meaning and function.

Neil Fleming observes: (

  • A collection of data is not information.
  • A collection of information is not knowledge.
  • A collection of knowledge is not wisdom.
  • A collection of wisdom is not truth.

Where have we come from, what are we doing here, where are we going? These questions look like philosophical questions. But they form the basis for the data, information, knowledge, understanding, wisdom tree. And once the level of wisdom is achieved in any study, the question of where are we going is effectively answered.


Protecting Digital Identities, Part 1

January 31, 2011

A Digital Identity is the mechanism used to identify an individual to computers, networks, the internet, and social media. In a general case, digital identity is the digital fingerprint of an individual – or of an entity other than an individual – in either case, it is generically called the Digital Subject. But whatever it is, it consists of properties, relationships, attributes and authentication.

Properties are the characteristics of the digital subject. Within Facebook, properties may include name, age, marital status. Within a corporate network, the properties may include employment date, withholding exemptions, supervisor.

Relationships are the correlation between digital subjects. Within Facebook, relationships include friends, family, schools, employers, and special interests. Within the corporate environment, relationships refer to directory access rights, functional groups, etc.

Attributes are special characteristics of the digital subject and are not too different from properties. An attribute includes login name, password, home server. Generally, attributes are not shared outside the digital authority.

Authentication is the process for verifying the legitimacy of the digital subject. Generally username and password is the first line of defense. But authentication includes:

  • what you know (password)
  • what you have (passkey)
  • who you are (fingerprint, retina)
  • what you can do (this is relatively new and is generally seen in the form of captcha)

The protection of digital identity must address many facets. And the laws, ethics, and policies surrounding these protections do not encompass all aspects nor do they form a seamless shield.

As the digital identity becomes more and more integral to the existence of people in moderns societies, the protection and reliability of the digital identity becomes paramount.

Protecting the authentication. Authentication protection is the responsibility of both the digital subject and the central account store. And this responsibility is frequently substandard. Obviously the digital subject has shown laziness and disregard toward passwords in numerous scenarios. People tend to only use a couple passwords making their entire digital life accessible once a single account store has been violated. But within the central account store passwords may be kept in unencrypted form, they may be encrypted in a breakable two-direction cypher, or they may be broken through simple, brute-force dictionary comparisons. By far, the best solution is many passwords that use a combination of lowercase, uppercase, numbers, and symbols. But these are nearly impossible to remember.

Protecting the data. All of the protection of the authentication is meaningless if the digital data itself is unprotected. Unencrypted social security numbers, addresses, credit card numbers remain pervasive throughout the commercial industries. Remarkably, the medical community is making significant progress toward true information security. This progress is accomplished through the disappearance of paper records and the integration of digital-only records. The significance of this is that any view of the records requires 1) and authenticated user and 2) tracking of all access. (Three hospital employees were fired for improperly accessing the shooting victims in Arizona.)

Ensuring reliability. Safe and authenticated data is meaningless if not accurate. And accuracy has not received the level of attention as authentication and protection. Mistyped court records, un-updated address and employment records are the examples. Invalid properties, relationships, and attributes will cost money, cost jobs, cost relationships, cost productivity, etc. And typically, no one is held responsible. But the inaccuracies affect all of us.

Summary. Digital identities require a multifaceted oversight. failure of any level of protection, accountability, of reliability will render the records useless and affect the lives of many people. As the inventiveness of the nefarious groups improve, so must the determination of the shepherds of the data.

Information Theory and Information Flow

January 30, 2011

(originally posted on blogspot January 28, 2010)

Information is the core, the root, of any business. But exactly what is information? Many will immediately begin explaining computer databases. But only a small portion of information theory is actually computer databases.

Information is a concrete substance in that it is a quantity that is sought, it is a quantity that can be sold, and it is a quantity that is protected.

Wikipedia’s definition: “Information is any kind of event that affects the state of a dynamical system. In its most restricted technical sense, it is an ordered sequence of symbols. As a concept, however, information has many meanings. Moreover, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.” (

Information Theory then is not the study of bits and bytes. It is the study of information. Moreover, the quantification of the information. And fundamental to Information Theory is the acquisition of information, along with the extraction of the true information from the extraneous. In Electrical Engineering, this process is addressed by signal conditioning and noise filtering. In the mathematical sciences (and specifically the probability sciences), the acquisition of information is the investigation into the probability of events and the correlation of events – both as simultaneous events and as cause-effect events. Process control looks to the acquisition of information to lead to more optimum control of the processes.

So the acquisition of a clear signal, the predictive nature of that information, and the utilization of that information is at the root of information theory.

C. E. Shannon published a paper in 1948 Mathematical Theory of Communication whicherved to introduce the concept of Information Theory to modern science. His tenet is that communicatioin systems (the means for dispersal of information) are composed of five parts:

  1. An information source (radio signal, DNA, industrial meter)
  2. A transmitter
  3. The channel (medium used to transmit)
  4. A receiver
  5. A recipient.

Since the information flow must be as distraction and noise free as possible, digital systems are often employed for industrial and parameterized data. Considerations then focus on data precision, latency, clarity, and storage.

Interestingly, the science of cryptography actually looks for obvuscation. Data purity, but hidden. Withing cryptograsphy, the need for precise, timely, and clear information is as important as ever, but the encapsulation of that information into shucks of meaningless dribble is the objective.

But then the scientist (as well as the code breaker) is attempting to achieve just the opposite: finding patterns, tendencies, and clues. These patterns, tendencies, and clues are the substance of the third pahse of the Data –> Information –> Knowledge –> Understanding –> Wisdom. And finding these patterns, tendencies, and clues is what provides the industrial information user his ability to improve performance and, as a result, profitability.

The Digital Oilfield is a prime example of the search for more and better information. As the product becomes harder to recover – shale gas, undersea petroleum, horizontal drilling, etc. – the importance of the ability to mine patterns, tendencies, and clues is magnified.

Hence the Digital Oilfield is both lagging in the awakening to the need for information and leading in the resources to uncap the information.

The Digital Oilfield, Part 2

January 30, 2011

(originally posted on blogspot January 18, 2010)

“The improved operational performance promised by a seamless digital oil field is alluring, and the tasks required to arrive at a realistic implementation are more specialized than might be expected.” (

Seamless integrated operations requires a systematic view of the entire exploration process. But the drilling operation may be the largest generator of diverse operational and performance data and may produce more downstream data information than any other process. Additionally, the drilling process is one of the most legally exposing processes performed in energy production – BP’s recent Gulf disaster is an excellent example.

The seamless, integrated, digital oilfield is data-centric. Data is at the start for the process, and data is at the end of the process. But data is not the objective. In fact, data is an impediment to information and knowledge. But data is the base of the information and knowledge tree – data begets information, information begets knowledge, knowledge begets wisdom. Bringing data up to the next level (information) or the subsequent level (knowledge) requires a systematic and root knowledge of the data available withing an organization, the data which should be available within an organization, and the meaning of that data.

Data mining is the overarching term used in many circles to define the process of developing information and knowledge. In particular, data mining is taking the data to the level of knowledge. Converting data to information is often no more complex that producing a pie chart or an x-y scatter chart. But that information requires extensive operational experience to analyze and understand. Data mining takes data into knowledge tier. Data mining will extract the tendencies of operational metrics to fortell an outcome.

Fortunately, there are several bright and shinning examples of entrepreneurs developing the data-to-knowledge conversion. One bright and promising star is Verdande’s DrillEdge product ( Although this blog does not support or advocate this technology as a matter of policy, this technology does illustrate an example of forward thinking and systematic data-to-knowledge development.

A second example is PetroLink’s modular data acquisition and processing model ( This product utilizes modular vendor-agnostic data accumulation tools (in particular interfacing to Pason and MD-TOTCO), modular data repositories, modular equation processing, and modular displays. All of this is accomplished through the WITSML standards (

Future blogs will consider the movement of data, latency, reliability, and synchronization.

The Digital Oilfield, Part 1

January 30, 2011

(originally posted on blogspot January 17, 2010)

The oil business (or bidniz as the old-hands call it) has evolved in drilling, containment, control, and distribution. But the top-level system view has gone largely ignored. Certainly there are pockets of progress. And certainly there are several quality companies producing centralized data solutions. But even these solutions focus on the acquisition of the data while ignoring the reason for the data.

“Simply put Digital Energy or Digital Oilfields are about focusing information technology on the objectives of the petroleum business.” (, January 17, 2011)

Steve Hinchman, Marathon’s Senior VP of World Wide Production, in a speech to 2006 Digital Oil Conference says “Quality, timely information leads to better decisions and productivity gains.” and “Better decisions lead to better results, greater credibility, more opportunities, greater shareholder value.”

“Petroleum information technology (IT), digitized real-time downhole data and computer–aided practices are exploding, giving new impetus to the industry. The frustrations and hesitancy common in the 1990s are giving way to practical solutions and more widespread use by the oil industry. Better, cheaper and more secure data transmission through the Internet is one reason why.” (The Digital Oilfield, Oil and Gas Investor, 2004)

Future Digital Oilfield development will include efforts to integrate drilling data into its engineering and decision making. This integration consists of:

  1. Developing and integrating the acquisistion of data from all phases of the drilling operation. The currently dis-joint data will be brought together (historical and future) into a master data store architecture consisting of a Professional Petroleum Data Model (, various legacy commercial systems, and various internal custom data stores.
  2. Developing a systematic real-time data approach including data processing, analysis, proactive actioning, and integrated presentations. Such proactive, real-time processing includes collision avoidance, pay-zone tracking and analysis, and rig performance. Included is a new technology we a pushing for analytical analysis and recommendations for the best rig configuration and performance.
  3. Developing a systematic post-drill data analysis and centralized data recall for field analysis, offset well comparison, and new well engineering decisions. Central to this effort will include data analysis, data mining, and systematic data-centric decision making.

Watch the Digital Oilfield over the next few months as the requirements for better control, better prediction, and better decision making take a more significant center-stage.

Wikipedia turns 10

January 12, 2011

On January 15, Wikipedia will turn 10. Yes, 10 years old!

Looking back over time, the search engine game has been one of the the most rapid evolutions on the internet.

Gopher dates back to the earliest days of the internet. Primarily a text-base (command line) interface, it severed well when academia was the primary beneficiary and searching scholarly papers was the intention.

Closely related early search engines were named Archie, Jughead, and Veronica. Archie is generally considered to be the earliest search engine focusing on FTP hosted files.

During the mid-to-late 1990s, Infoseek was a popular service. It was bought by the Walt Disney company and eventually became the Go.Com search engine. Later in the 1990s, Ask Jeeves was extremely popular with people posting actual questions – and the answers tended to be accurate!

Web 2.0 has both consolidated the search engines and fragmented the information on the web. Google remains the most popular search engine followed by Yahoo and Bing.

The cyclical high-tech companies

January 5, 2011

Throughout my career, I’ve seen behemoth companies that were on the top of the world and had the world beating a path to their door, only to be displaced by upstarts.

Wordstar (anyone remember this) owned the word processing market but was replaced by Word Perfect, and then Microsoft Word.

Same thing with spreadsheets: Visicalc –> Lotus –> Excel. And databases: dbase –> SQL.

Companies have also risen and fallen. Banyon was the premier networking company displaced by Novell, displaced by Microsoft.

Now people say Microsoft and Google are too large, too entrenched, too commanding. But look closely at Microsoft – it is losing edge in Office to Google apps and in networking to the ubiquitous Internet.

Google is large, having displaced Yahoo, who displaced Ask Mr Jeeves, who displaced Infoseek.

But I venture to say, in 5-10 years, that both Microsoft and Google will be scratching and clawing to hold onto their former glory. Maybe not. Maybe 15 years.

%d bloggers like this: