This article was written by Mohammad Sajid.
Statistical cluster analysis is an Exploratory Data Analysis Technique which groups heterogeneous objects(M.D.) into homogeneous groups. We will learn the basics of cluster analysis with mathematical way.
Cluster Analysis can be done by two methods:
- Hierarchical cluster analysis.
- Non-Hierarchical cluster analysis.
Hierarchical Cluster Analysis(HCA):
- In HCA, the observation vector(cases) are grouped together on the basis of their mutual distance.
- An HCA is usually visualised through a hierarchical tree called dendrogram tree. This hierarchical tree is a nested set of partitions represented by a tree diagram.
Characteristics of HCA:
- Sectioning a tree at a particular level produces a partition into ‘g’ disjoint groups.
- If 2 groups are chosen from different partitions then either the groups are disjoint or 1 group is totally contained within the other.
- A numerical value is associated with each partition of the tree where branches join together. This value is a measure of distance or dissimilarity between two merged clusters.
- Different distance measures give rise to different hierarchical clusters structure.
There are two types of approaches for HCA:
- Agglomerative HCA
- Divisive HCA
- Operates by successive merges of cases.
- Begin with clusters, each containing single cases.
- At each stage merge the 2 most similar group to form a new cluster, thus reducing the number of the cluster by n.
- Continue till(eventually as similarity decreases) all subgroups are fused to form one single cluster.
- The divisive method operates by the successive splitting of groups.
- Initially starts with a single group(i.e. one single cluster).
- Group is divided into 2 types: 1) The objects in one subgroup are as far as possible from the objects in the other group. 2) Continue till there are ‘n’ groups, each with a single cluster.
To read the rest of the article, click here.