Home » Technical Topics » Data Science

Cluster sampling: A probability sampling technique

cluster sampling

Image source: Statistical Aid

Cluster sampling is defined as a sampling method where multiple clusters of people are created from a population where they are indicative of homogenous characteristics and have an equal chance of being a part of the sample. In this sampling method, a simple random sample is created from the different clusters in the population. This is a probability sampling procedure.


Area sampling: Area sampling is a method of sampling used when no complete frame of reference is available. The total area under investigation is divided into small sub-areas which are sampled at random or according to a restricted process (stratification of sampling). Each of the chosen sub-areas is then fully inspected and enumerated, and may form the basis for further sampling if desired.

Types of cluster sampling

There are three types as following,

Single stage Cluster: In this process sampling is applied in only one time. For example, An NGO wants to create a sample of girls across five neighboring towns to provide education. Using single-stage sampling, the NGO randomly selects towns (clusters) to form a sample and extend help to the girls deprived of education in those towns.

Two-stage Cluster: In this process, first choose a cluster and then draw sample from the cluster using simple random sampling or other procedure. For example, A business owner wants to explore the performance of his/her plants that are spread across various parts of the U.S. The owner creates clusters of the plants. He/she then selects random samples from these clusters to conduct research.

Multistage Cluster: Few step added to two-stage then it is called multistage cluster sampling. For example, An organization intends to survey to analyze the performance of smartphones across Germany. They can divide the entire country€™s population into cities (clusters) and select cities with the highest population and also filter those using mobile devices.

·        Consumes less time and cost
·        Convenient access
·        Least loss in accuracy of data
·        Ease of implementation