By: Nicholas Hartman, Director at CKM Advisors
Today we'd like to share with you some fun charts that have come out of our internal linguistics research efforts. Specifically, studying weather events by analyzing social media traffic from Twitter.
We do not specialize in social media and most of our data analytics work focuses on the internal operations of leading organizations. Why then would we bother playing around with Twitter data? In short, because it's good practice. Twitter data mimics a lot of the challenges we face when analyzing the free text streams generated by complex processes. Specifically:
In this exercise, tweets from Twitter's streaming API JSON stream were scanned in near real-time for their ability to 1) be pinpointed to a specific location and 2) provide potential details on local weather conditions. The vast majority of tweets passing through our code failed to meet both of these conditions. The tweets that remained were analyzed to determine the type of precipitation being discussed.
The figure at the top of this post shows a summary of the analysis for the afternoon of 14 December 2013. Around this time a major storm system was moving up the eastern seaboard dumping heavy rain and snow along I-95. Twitter commentary indicating locally snowy conditions is displayed in blue, while commentary indicating rainy conditions is displayed in green. The 'rain/snow' line that extended from New York City down towards Philadelphia and Washington DC is clearly visible. There are some anomalies (like the blue in southern CA and FL, but the snow noise is small relative to the signal coming out of the northeast).