Subscribe to DSC Newsletter

A solution for classification rules management toward actionable analytics

Introduction

The analytics community has long been discussing whether analytics is about art or science. Analytics is more an art than a science in its ability to form conditions to drive business toward an action that is based on the confidence that the action will improve business performance. This ability to be actionable have recognized recently as the most important aspect in analytics [1]. The concept is known as Prescriptive Analytics [2] shares some similar statements with Actionable Analytics, but some meaningful differences are present as well.

Classification rules and actionable analytics

Classification rules plays a significant role in practical predictive analytics. The main advantage is that, of all possible pattern types , classification rules are closest to business rules, and the most comprehensible for business managers. Classification rules propose not only performing a high qualitative prediction to know what will happen, but also why it will happen , through exploration of the classification rules and achieving knowledge about fundamental reasons for a predicted future.

Mining classification rules from data is an important analytics task, but their further analysis can provide crucial knowledge for analytics project managers to tune the project's efforts at its different phases [3].

Generally, analysis of classification rules is performed based on set of objective interestingness measures [4]. Objective measures are those that depend on the structure of a pattern and can be quantified by using traditional statistical methods. Objective measures are a starting point for classification rules examination by a business manager. Such an examination is the first step in subjective evaluation of classification rules toward discovering actionable knowledge and driving the business user to take action. A consistent success in this activity will move the organization into different phases of analytics maturity [5]. 

The Problem

In practical terms, each classifier or tool will come with its own rules presentation and a very limited set of rules measures to describe some classification rule or set of rules. Even if some classification model supports PMML Tree Model [6] or PMML Ruleset [7], some work is still needed to calculate all types of relevant measures of interestingness.

To address all the goals that are named above, some solution must appeal to further requirements:

  • Calculating required  measures based on classification rule statistics provided by all state-of-art algorithms or PMML standard
  • Classification ruleset presentation must be performed in some unified way
  • Classification ruleset presentation will allow classification rules to be analyzed and managed
  • The solution must include the most important objective interestingness measures both for rule and ruleset level

Toward Classification Rules Management System

To demonstrate the idea behind DeActoRules solution let's use R's C5.0 Tree [9] algorithm implementation [10] and playing tennis dataset [8], adopted for illustration purposes . The goal is  to predict the "play" feature :

day

outlook

temperature

humidity

windy

play

1

sunny

85

85

FALSE

no

2

sunny

80

90

TRUE

no

3

overcast

83

86

FALSE

yes

4

rainy

70

96

FALSE

yes

5

rainy

68

80

FALSE

yes

6

rainy

65

70

TRUE

no

7

overcast

64

65

TRUE

no

8

sunny

72

95

FALSE

no

9

sunny

69

70

FALSE

yes

10

rainy

75

80

FALSE

no

11

sunny

75

70

TRUE

yes

12

overcast

72

90

TRUE

yes

13

overcast

81

75

FALSE

yes

14

rainy

71

91

TRUE

no

Figure 1: Playing tennis dataset

The R's C5.0 Tree model algorithmis applied to get the following decision tree: 

Figure 2: Decision Tree model by C5.0 on Playing tennis dataset

The tree can be presented as a set of if-then classification rules as follows:

Figure 3: If-then presentation based on Decision Tree model

Assuming conjunction between rules' predicates , the rules set can be presented as:

Figure 4:  Relational presentation of classification rules  

This relational form allows unified rules set presentation , visualization and calculation of classification rules counts, confidences, probability estimations, and a set of other interestingness measures. See a portion of them below, where CLASS presents the predicted value of target feature:

Figure 5:  Classification ruleset evaluation

Referring to Figure 5, some measure (LIFT_CNT) denotes the ratio of the proportion of the predicted class in the rule to the proportion of the predicted class in the original dataset. The CLASS_COVERAGE measure expresses the ratio of the number instances of a predicted class, forming the rule to the overall number of instances within apredicted class in the original dataset. Generally, these two measures would be high , as possibly we are talking about some interesting rule. So in this way, a Rule with ID = 5 would be very important for the prediction of a "no" value and its analysis would continue.

Three data sources: original dataset (Figure 1), rules set presentation (Figure 4), and rules set evaluation (Figure 5), when linked together , propose a powerful framework for the exploration and managing of classification rules. This enhanced classification rule presentation is a step toward the design of a classification rule management system that benefits an analytics project manager and business users.

The solution can be extended in the following ways:

  • Incorporating more, the most promising classification rule evaluation measures inspired by subjective and objective interestingness
  • Complete treatment of missing data and cost-sensitive learning; meaning simple rules counts aren’t enough from a theoretical standpoint
  • Support for PMML standard
  • Support for associative classification (AC) rules [11]
  • Using the framework for  business operative, tactic, and strategy recommendations toward achieving the vision of actionable analytics

 

Please , express your opinion:

  •  Is the solution would benefit your analytics?
  • Are you aware of a similar solution or tool?
  • What about the extension named above seems to you to be the most critical functionality?

 

References

[1] L. Cao, "Actionable Knowledge Discovery and Delivery," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 2, No. 2 ,pp. 149-163, March 2012.

[2] A. Basu, "Five Pillars of Prescriptive Analytics Success," Analytics, (March / April 2013), Informs , 2013.

[3] S. Sharma and K.M.Osei-Bryson, "Toward an integrated knowledge discovery and data mining process model," The Knowledge Engineering Review, 25(1), pp. 49-67, 2010.

[4] L. Gengand H.J. Hamilton, "Interestingness measures for data mining: A survey," ACM Computing Surveys, 38(3) , Article 9 , 2006.

[5] T. H. Davenport, J. G. Harris, and R. Morison, Analytics at Work: Smarter Decisions, Better Results. Harvard Business Press, 2010.

[6] PMML Tree Model, http://www.dmg.org/v4-2-1/TreeModel.html

[7] PMML RuleSet Model, http://www.dmg.org/v4-2-1/RuleSet.html

[8] J.R. Quinlan, "Induction of Decision Trees,"Machine Learning, 1(1March) , pp. 81-106, 1986.

[9] "Data Mining Tools See5 and C5.0", http://rulequest.com/see5-info.html

[10] R's C5.0 implementation, http://cran.r-project.org/web/packages/C50/index.html

[11] B. Liu, W. Hsu, and Y. Ma, "Integrating classification and association rule mining," In Proc. of the Knowledge Discovery and Data Mining Conference - KDD, New York, pp. 80 - 86, 1998.

Views: 851

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service