# Trick-or-Treat a Data Scientist

“Dad, can you help get more candy this Halloween?” Abdul Rehman, my 12 year old asked me a night before the last Halloween. His simple question got me thinking of how I, as a data scientist, could optimize his path for collecting candy.

I had to tell my son to rely on his vampire costume this year, but if he could get me some Data Candy, I might be able to assist him next year. Let’s just define Data Candy as small pieces of information that could be readily collected by kids while trick-or-treating their way through the neighborhood. My son got a task.

As he went from door to door, he kept jotting down the number and brand of candy that he got from each house. That’s pretty quantitative right? But wait, based on his brief presence at any doorstep, he also had to figure out if it was a house with one or more daughters. Don’t worry, it gets creepier.

After a three hour trick-or-treat spree, joined by my wife and his two siblings, Abdul Rahman came back with buckets full of candy from more than a hundred houses. I got the candy I was interested in – little handwritten post-its, all squeezed up against their rather dashingly sugar counterparts.

America spends 6.9 Billion dollars on Halloween each year, 1.96B on decorations, 2.6B on costumes, and 330 million on pet costumes. American also spends 12.6 Billion dollars per year on chocolates alone.
As the kids went to bed, I started sorting both types of candy.

As I started populating a spreadsheet, I looked up related information from publicly available datasets and used what I found helpful in my analysis. For example, I got the exact house addresses from google maps; the worth of the houses, and per month rent estimates from Zillow; names of the people living in the houses that my family visited, their genders, age, and political affiliation were collected from the public voters database; the birth state was mapped using the publicly available SSA popular name by state and decades database; candy brands were mapped to the retail stores by mapping their online inventories, keeping in view their proximity to our neighborhood.

Here is what I found:

Out of 117 houses, 32 houses (27%) did not participate in Halloween – I have marked them out for this year’s optimized trick-or-treat path for my children.

The top choice for Halloween treats in our neighborhood came out to be Lolly pops (making up to 12.3% of all the treats), followed by Twizzlers, Snickers, M&M, Twix, Milkyway, Dove, KitKat, Whoopers and Tootiseroll (with only 3.4%).

The give-away treats that came to my house were from a total of 33 distinct brands. The chart points out that no consideration was paid to health and calories (sugar kills anyway, right?), as Butterfinger (on an ideal 45 calories) is not the star in the list, Stardust is about 160, Resses’ peanut butter comes up to 180, while Twizller (at 150 calories per pack) makes to the second of the most given treat list.

The house holds that my son identified as the ones with daughter(s) had a very clear distinction with the choice of treats/brands that they were giving away. Intuitively, the brands like KitKat, Snickers, M&M, and Smarties shout out ‘boys’, while Harsheys, Butterfinger, Twix, and Dove are more preferred brands of a household with daughters. Since we don’t have a daughter, I could customize a route if my sons preferred to collect more KitKat than Twix, or otherwise.

Here is an optimization strategy for finding out a house’s probability of participating in Halloween. If the wife is aged between 41 to 50, and the husband 51 to 60, there is 100% chance that you will get a treat. It dramatically reduces down to 25% if the wife is older than 51 and husband has passed his 60th birthday. Similarly, if either of the couple is aged more than 71, you are less likely to get any treats.

This chart shows the number of houses in my neighborhood with the respective age of couples living in them. Out of the four houses with people aged 71 or older, only one participated in Halloween this year.

I found no co-relation with treats and participation with the birth location or house value/rent. I guess, at the end, it is your heart that drives you to celebrate such events, not your pocket.

It gets interesting; this chart shows the cluster of people with their buying preference: which treats are bought to complement which treats. For example, houses in cluster 5 will buy Twizlers, Twix and Harsheys and nothing else. And Cluster 3 would buy Twizlers, Snickers, and Kitkat and nothing else. 87% of households in Cluster 9 bought Lollypops. All the highlighted brands are the one that makes the cluster unique.

We can also map it back to their announced political affiliations in the voter’s database. For example, democrats only come from Cluster 1 and 4, while republicans are fairly distributed among all clusters.

I have also learned that people who just moved to the neighborhood tend to give more treats, so does the parents who have small children in their house. I have estimated the expense on treats in the range of 30-50\$ if you have children in home and 10-20\$ if you don’t. A similar pattern is observed for expenses on decoration and costumes too. Halloween decoration average in our neighborhood turns out to be \$37, while the costumes average is \$89. These averages are way above the national average of \$75.03 per household for the entire Halloween holiday (decoration + treats + costumes) as published by national retail foundation.

This chart shows the preference of candy with the respective announced political affiliation of the people in the respective households. Blue represents that at least one of the partner in the house is democrat (the other may or may not be UNA). Red represents that at least one of the partner is a republican (again, the other may or may not be UNA). Somehow, Lollypops and KitKats are highly associated with democrats, while Snickers and Butterfinger are associated with Republicans. Give me the candy, and I can tell you who you are going to vote for. I’m not very optimistic about the future though; after all, it’s either a Lollypop, or a (butter)Finger.

Views: 3102

Comment

Join Data Science Central

Comment by Ahsan Iftikhar Qureshi on October 2, 2015 at 8:17pm

Just one request. Kindly, clarify how the clusters have been defined. Also if the spread sheet can be emailed to

[email protected]

Comment by Ahsan Iftikhar Qureshi on October 2, 2015 at 8:15pm

Too good analysis MashALLAH. I am a bit uncertain about how the clusters have been defined (or whether it was part of the data gathering exercise), nevertheless the taking a real life example and make it into a comprehensive analysis is really a treat to read and understand.

Comment by Muhammad Shoaib on October 2, 2015 at 2:06am
Mash'Allah Awesome article Sir
Comment by Achyutha Mohan on September 30, 2015 at 11:54am

Could you provide details on how you performed the cluster analysis?

Comment by Max Galka on September 29, 2015 at 4:47pm
Awesome post! Out if curiosity, why did you ask your son to check for daughters? Intuition that there would be a relationship btw gender and cabdy type?
Comment by Shankaran Sitarama on September 29, 2015 at 2:55pm

Do you have the data in the spreadsheet to share for people to try out or as a training exercise?

Comment by Shankaran Sitarama on September 29, 2015 at 2:55pm

Awesome.

1

2

3

4

5

6