Here’s a fun fact; An average human being (probably an adult) makes close to 30,000 conscious decisions every day. This isn’t entirely true though, in fact, I just made that number up. I could be right because if you think about it, how many decisions would you say you make on a day to day basis? Depending on who you are the above obviously varies widely and you know best. We all make n decisions every day- what to do, eat, buy, or hit. The real question however is, do our daily choices solely depend on our consciousness? Are there any other factors at hand that influence our decision-making process? Are all these factors, if any, always straight forward choices, or do we sometimes get “nudged” into these choices we make?
Nudge theory basically states that; by understanding how people think and what drives their decisions, we can use those factors to steer them into making decisions differently, through positive reinforcement. Research has shown that, by presenting choices differently rather than in a legislative manner, people can be influenced into making specific desired choices. This theory is widely used in behavioral economics by presenting subtle nudge units intended to influence people’s thoughts about financial products. The theory was however initially more of a moral aspect meant to help people make better decisions in life and not as a tool for commercial gain. Over years of practice, different applications of the theory emerged.
Now that we have a basic understanding of what nudge theory is about, we can explore an applicable example. This post mainly focuses on a short research project I happened to be part of, actually my first hackathon experience hosted by Safaricom PLC. Let’s dive in!
This photo a team mate took at the hackathon contains a problem statement for the challenge:
Our twitter data was fetched using R, I have done a post on setting up a twitter API to fetch twitter data. R has several packages (such as “tweeteR” and “rtweet”) that one can use to stream data from twitter. Our data cleaning and pre-processing was mainly done in Python.
Note: To keep this post concise, code for the workings has been minimized. The source code for this post can be found here, for anyone interested in trying out the same process. The code is well commented for easier understanding as well.
The team agreed on a few terms to query data on from twitter. For an unbiased range of topics, we settled on fetching tweets under trending topics and a few more from random words. We had tweets from or containing the following:
A total of 7000 tweets were captured. The data frame had a total of 88 columns which we treated as variables for the research. However, not all variables were used in the research we therefore had to do some data cleaning. Here is a preview of the variables in our raw data.
This stage involved cleaning up our data by removing the unwanted columns/variables. We decided to do with a select few variables we thought would be most appropriate for our case study. We chose the following seven variables:
Code for the data cleanup and variables setting that was done in Python can be found here.
After cleaning up the data, we imported it into R, the code chunk shows a preview of the top 4 rows of the input data.
We still had to do some data pre-processing for the models which involved checking for and removing NULL values if present. Below is a sample table of the final data set used in the analysis.
From the table above, we can observe a new column “Characters”. This was an additional variable derived by counting the number of characters in the tweet text.
Due to the nature of our problem,(we had several uncorrelated variables) we decided to do a classification analysis. This means we had to come up with a classifier model to regress n variables based on our dependent variable, the Location variable. The main challenge of classifier models is knowing what really goes on inside the models that leads to the final output. Even with higher levels of accuracy in some models, it is quite difficult o understand the paths of a given model. However, using Random forests and Decision Tree classifiers can give us a graphical representation of the criteria followed by the models to arrive at a given output. Another upper hand of decision tree models is that they require minimal data cleaning, less time-consuming.
For the training and test data sets, we randomly split our data set into two sates. Usually, the best practice is to train the model with a larger proportion of the data set. We therefore took 80% for training and 20% for test purposes.
We trained our decision tree model to predict a class “location”. Whether a location is geotagged or not geotagged based on whether the user is verified, protected, has over 500 followers, is retweeted by another verified user and the number of characters in their tweet. Bellow is the visual output of the trained model.
When interpreting decision trees, you start at the root node. The root node is the one on top of the decision tree. Since what we want is those nodes with geotagged locations, it is safe to ignore the non-tagged nodes. Note that our highest entropy level was observed on one variable only, the number of characters in the tweet text. This might not always be the case with decision trees though, it is possible to have more than one factor. In such situations, it is best to run several decision trees to build a random forest and make a decision based on the most prevalent variables.
For our case, we only focus on what we found:
At the top node, we can see the overall probability of a user geotagging their tweets. 75 percent of the users in the training set geotagged their tweets. not
Our second node asks whether the number of characters are more than 134 and goes to depth 2 where we can observe the highest number of users tweeted more than 134 characters at 80 percent with an 80 percent probability of geotagging their tweets.
Node 3 checks if the number of characters in a tweet is less than 134. If yes, head to depth 3, where we can see that 20 percent of users had less than 134 characters with a 50 percent probability of geotagging their tweets.
Finally, looking at depth 4 which originates from the node that checks is number of characters is equal to or more than 122, we can see that 12 percent of users had tweets with character equal to or more than 124, with 88 percent probability of geotagging their tweets.
With our model trained and outputs observed, we were able to run a test with our test subset. Here is our confusion matrix.
From the confusion matrix above, we can observe that the model had a true negative of 90 predictions. That is,
90 predictions were correctly predicted as not geotagged. A false positive of 248 predictions was observed where
the model wrongly predicted 248 tweets were geotagged whereas in real sense they were not.
For the tagged tweets, we had a false negative of 2 predictions against a true positive of 1043 predictions. This means that our model was able to correctly predict 1043 geotagged tweets from the test data. The accuracy of the model turned out pretty good, at an 82 percent accuracy level. The theoretical formula for the accuracy is the proportion of true positives and the true negatives divided by the sum of the confusion matrix.
For a better accuracy level, the model’s hyper-parameters can be tweaked to improve performance. Another option is implementing a random forest test.
With our decision tree model, we were able to attain a high level of accuracy for a model that test whether users with tweets containing characters equal to or above 122 are likely to geotag their tweets. Our nudge, in this case, is the number of characters in a tweet and precisely, 124 or more. Our recommendation, therefore, would be to encourage users to tweet longer or engage them in trending topics that require one to write more, for example, a TT like # MyLifeHistoryInANutshell…-in the hope that a user will eventually geotag their tweet.
Come to think of it, did twitter really increase the number of characters just for tweeps to tweet more and as they said, to get more people to join twitter? I have a theory, it was a NUDGE!
Thaler, R.H., Sunstein, C.R., and Balz, J.P. Choice Architecture. SSRN Electronic Journal (2010), 1–18
Thaler, R.H. and Sunstein, C.R. Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press, New Haven, CT, and London, U.K., 2008.