Subscribe to DSC Newsletter

Ward’s Method for clustering in SAS

Ward’s Method:

It looks at cluster analysis as an analysis of variance problem. This method involves an agglomerative clustering algorithm. It starts out with n clusters of size 1 and continues until all the observations are included into one cluster. This method is most appropriate for quantitative variables, and not binary variables.

1.  Standardize the data; since it is based on Euclidean distance, we need to change all the risk factors into the same scale.

proc standard data=mydata mean=0 std=1 out=mydata1;
var x1 x2 ... xn;

2.   Determine the number of clusters to classify based on CCC plot

ods graphics on;
proc cluster data=mydata1 out=determineK method=ward ccc pseudo
var x1 x2 ... xn;

3.  For big data: Pick up the turning point based on the Cubic clustering criteria(CCC) plot to determine K, and then pass the K to the fastcluster procedure; for small data: you can determine the cluster size by using sqrt(n/2), where n is the sample size

proc fastclus data=mydata1 out=temp1 radius=0 replace=full maxclusters=K maxiter=60 mean=temp2 list distance;
var x1 x2...xn;
id personID;

proc cluster method=ward outtree=tree plots=den (height=rsq);

4. side-produce: if you not only want to classify them into several groups, but also want to identify outlier clusters. Then you can set some threshold for the outlier clusters, like the size of that cluster is smaller then n*0.1%.

Views: 2772


You need to be a member of Data Science Central to add comments!

Join Data Science Central


  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service