Thoughts on Data Mining

March 10, 2012

Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information (see prior blogs including The Data Information Hierarchy series). The term is overused and conjures impressions that do not reflect the true state of the industry. Knowledge Discovery from Databases (KDD) is more descriptive and not as misused – but the base meaning is the same.

Nevertheless, this definition of data mining is a very general definition and does not convey the different aspects of data mining / knowledge discovery.

The basic types of Data Mining are:

  • Descriptive data mining, and
  • Predictive data mining

Descriptive Data Mining generally seeks groups, subgroups and clusters. Algorithms are developed that draw associative relationships from which actionable results may be derived. (ie. a diamond head snake should be considered poisonous.)

Generally, a descriptive data mining result will appear as a series of if – then – elseif – then … conditions. Alternatively, a system of scoring may be used much like some magazine based self assessment exams. Regardless of the approach, the end result is a clustering of the samples with some measure of quality.

Predictive Data Mining is then performing an analysis on previous data to derive a prediction to the next outcome. For example: new business incorporation tend to look for credit card merchant solutions. This may seem obvious, but someone had to discover this tendency – and then exploit it.

Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: 1) massive data collection, 2) powerful multiprocessor computers, and 3) data mining algorithms (

Kurt Thearling identifies five type od data mining: (definitions taken from Wikipedia)

Data Mining and Data, Information, Knowledge

January 4, 2011

Throughout my Computer Science courses, I strive to teach the difference between data and information.

Information is what the user wanted when he ask for the data.

Seldom does an end user want reems of data or Excel spreadsheets hundreds and thousands of rows long. The use needs a discernable, digestable presentation so that he may cognatively absorb the concept being presented.

Should your supervisor ask for the 3rd quarter sales data in the mid-west, do not unload a truckload of sales reciepts on his desk. He is looking for presentations that represents the significance of the data.

But Information is not the end of the story either. An excellent article by Gene Bellinger, et. al. extends the concept of Data –> Information to be
Data –> Information –> Knowledge –> Understanding –> Wisdom

Presenting Data as Information has long been the objective of spreadsheets, reports, and even three-dimensional presentations. But moving from Information to Knowledge has been a little more challenging.

Data Mining is a tool to move the to this next level. A great overview of this is provided by Bill Palace at

But Data Mining has advanced past Knowledge into Understanding through the advancement of ‘semantics‘. (Often referred to as the Semantic Web.) The Wikipedia article provides good coverage on this topic.

