Super Bowl Analytics: A Closer Look at #WhosGonnaWin

Excerpt reprinted with permission from ckmadvisors.com

This weekend sees Super Bowl XLVIII come to New York (yes, we're well aware that the stadium is technically in New Jersey). Earlier this week one of our data scientists noticed the Empire State Building lights putting on quite a show. A quick search revealed that the iconic building's lights are being used as a giant 'sentiment meter' to show the results of a Twitter war between Broncos and Seahawks fans. Fans were instructed to tweet responses including the hashtag #WhosGonnaWin to a series of questions about the upcoming game, with results announced on Verizon's website.

We've posted previously on how our data scientists often tinker with Twitter data as a testing ground for natural language code. Most of these tweets come from Twitter's Streaming API and we just happened to be collecting tweets about the Super Bowl over the same time period. The results posted on Verizon's website were vague and lacked details on the precise logic used to determine the winners. The site says:

Verizon is monitoring all Super Bowl tweets and counts those conversations that use key words relevant to the #WhosGonnaWin daily question. Verizon then evaluates how many tweets are positive for each team. The percentages that you see on the graph demonstrate the overall measure of positive fan sentiment for each team.

Subsequent results were simply posted as single number scores with little explanation or elaboration on the underlying data (e.g., "Seahawks Win 50.39% to 49.61%"). Given that we had some of the data on hand, we couldn't resist taking a closer look.

First we wanted to see who was participating in the Twitter war. We were able to find 414k tweets in our database with the hashtag #whosgonnawin captured between Jan 27th - February 2nd. In our experience only roughly 10% of tweets are tagged with GPS details, although with this particular dataset the number was closer to 1.5%. While that's a tiny proportion of the overall volume, it's still worth taking a look to see where this data is coming from. We started by looking at geo-tweets where the subject of the tweet is only about the Broncos.

Geographic plot of #WhosGonnaWin tweets mentioning the Broncos

Not surprisingly we see a large concentration of tweets around the Denver area. We also notice a few localized spikes scatter at some less expected places, including an area north of Toronto. Closer inspection reveals that these anomalies are being caused by single users posting many times from the same location. To prevent any one user from getting too much attention, we limited each user to one geographic tweet during the timeframe of our analysis. After applying this filter, these odd geographic spikes disappear:

Geographic plot of Broncos #WhosGonnaWin tweets, but allowing only one geo-tweet per user 

After applying the same analysis to Seahawks tweets, we start to see the clear volumes for each team coming out of the Seattle and Denver metro regions [read more...]

Views: 1118

Tags: analysis, api, football, gps, json, language, r, twitter


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service