Clustering analysis is another standard method available with SAP BW Data Mining. The clustering models based on this method may apply various combinations of parameters (e.g., maximum number of clusters, minimum fraction of inter-cluster hops per iteration, etc.) in order to implement various clustering approaches. The clustering-specific reporting of the method makes possible analysis of the modeling results. In this paper we would like to discuss extensions to the standard reporting in order to improve insight into the results of clustering modeling. The mentioned extensions are implemented via the following analytics:
We will focus the discussion on the method-specific (not problem-specific) indicators that are included on the standard clustering reporting in BW Data Mining. In other words, we will not be considering the part of the standard BW Data Mining reporting that visualizes clusters and their attributes (variables participating in the definitions of clusters) neither cluster influence coefficients for particular clustering models. However, we will focus on the indicators that provide insight as to the volume of models’ input data, as well as the quality of segmentation achieved via the models.
The method-specific indicators mentioned above can be viewed via either the modeling results overviews of models involved in analysis processes (transaction RSANWB, display the analysis process, right-click on the model and select to view modeling results) or directly via the modeling results overviews in model definitions (transaction RSDMWB, display the model, choose the modeling results button in the model’s toolbar).
An example of visualization available via the standard reporting of modeling results is provided in the below screenshot:
The standard visualization functionalities cover well the basic needs of a user that would like to obtain insight in the results of clustering modeling. Based on our practical experience with clustering modeling in SAP BW Data Mining the following additional business requirements could be suggested:
The implementation of the above business requirements in the “SAP BW Data Mining Clustering Reporting” dashboard is based on combining the functionality of the “SAP BW Data Mining Model Reporting” dashboard (find more details on this dashboard in SAP BW Data Mining Analytics: Model Reporting) with insight that is specific for the SAP_CLUSTERING method.
At startup, the “SAP BW Data Mining Clustering Reporting” dashboard displays four tabs:
The selectors of the dashboard match the columns of the lists and allow limiting the models and variables visualized via the lists to specific criteria. Each time a specific value is selected, the respective selector’s status indicator turns green.
The following columns have been enabled in the list at the Model Master tab (see the screenshot below):
The following columns have been enabled in the list at the Clustering Models – Distances Analysis tab (see the screenshot below):
The following columns have been enabled in the list at the Clustering Models – Values Analysis tab (see the screenshot below):
Finally, in the Clustering Models – Graph tab we obtain graphical visualization of the average intra-cluster distance indicators per specific clusters of a specific model. In the below screenshot we can see all of the clusters generated by the PIO_MRO_CL model (X-axis corresponding to the cluster IDs, Y-axis corresponding to the average intra-cluster distances and bubble size reflecting the count of data points assigned to a specific cluster):
The following could be examples of the typical use cases in which the usage of the SAP BW Data Mining Clustering Reporting dashboard could bring benefits:
1) A data mining specialist would like to visualize the models with Y as predictable variable and to study method-specific indicators of those of them that are based on the SAP_CLUSTERING method.
Use scenario: in the Model Master tab, select the records that correspond to the SAP_CLUSTERING method using the Modeling Method selector, then limit further your selection by choosing Y via the Model Field Name selector and X in the Field Is Predictable selector. The dropdown list of the Model ID selector will contain the technical names of the models we are interested in. Choose those models one by one in the Model ID selector and study their method-specific indicators in the Clustering Models – Distances Analysis, Clustering Models – Values Analysis and Clustering Models – Graph tabs.
2) A data mining specialist would like to visualize the models based on the SAP_CLUSTERING method that contain the variable Y and have generated up to 10 clusters.
Use scenario: in the Model Master tab, select the records that correspond to the SAP_CLUSTERING method using the Modeling Method selector, and then limit further your selection by choosing Y in the Model Field Name selector. The model list in the Model Master tab will display the technical names of the models matching all of the above criteria except for having generated up to 10 clusters. To apply this last criterion, switch to the Clustering Models – Distances Analysis tab and choose 10 in the Cluster ID selector (if 10 is not available, there are no models that match this criterion). The model list in the Clustering Models – Distances Analysis tab will display the models satisfying to tall of the above criteria.
3) A data mining specialist would like to visually evaluate the quality of the clustering produced by the model M from the point of view of the homogeneity of the data points assigned to clusters and of the distribution of the data points across the clusters.
Use scenario: in the Model Master tab, select M in the Model ID selector to limit the evaluation to the model M. After that, switch to the Clustering Models – Graph tab to proceed with the interpretation of the graphical visualization of the model M’s clusters (let us assume that we find there the visualization identical to the one presented in the below screenshot):
Based on the visual analysis of the above graph, we could make the following evaluation: