Home » Uncategorized

Unsupervised learning and its role in the knowledge discovery process

Unlike supervised learning, unsupervised learning not working with labeled data, it is not showing the machine the correct answer. Instead, it is using different algorithms to let the machine create connections by studying and observing the data. Learn much of this through study and observation. Learning and improving by trial and error is the key to unsupervised learning.

However, the Knowledge Discovery process is the field of data mining is concerned with the development of methods, techniques and algorithm which can make sense of the available data. It is useful in finding trends, patterns, correlations and anomalies in the databases which is helpful to make accurate decisions for the future.

Knowledge discovery consists of an iterative sequence of following steps:

  1. Understand your goal or domain and create the dataset and select it
  2. Clean the selected dataset and transformed into appropriate form for mining
  3. Apply the intelligent methods on transformed dataset in order to extract data patterns
  4. When patterns are obtained evaluation, interpret and visualization is done to identify the patterns representing knowledge are
  5. At the end Knowledge presentation is done to present the knowledge to the user and manage the discovered knowledge

Unsupervised learning is one of the core techniques for knowledge discovery process as it is associated to learning without a teacher (without any labeling data) and modelling the probability density of inputs. There could be used a supervised learning to predict a certain outcome. But there might stand a better chance of finding something new if we try unsupervised learning. It could be the machine studying and observing millions of different data points and the machine create its own clusters.  One of the key things with unsupervised learning is access to massive amounts of data. The more data you have, the easier it is for the machine to observe and study trends that might lead to a worthwhile cluster.

The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. Clustering algorithm is applied on similar group with similar properties for data analysis, these similar group is called cluster. Cluster therefore is a collection of objects which are similar between them and are dissimilar to object belonging to other clusters. With the help of Clustering we determine the intrinsic grouping in a set of unlabeled data. Common clustering algorithms include:

Hierarchical clustering: builds a multilevel hierarchy of clusters by creating a cluster tree

k-Means clustering: Partitions data into k distinct clusters based on distance to the centroid of a cluster

Gaussian mixture models: models clusters as a mixture of multivariate normal density components

Self-organizing maps uses neural networks that learn the topology and distribution of the data

Hidden Markov models uses observed data to recover the sequence of states