By CKM Advisors Natural Language Analysis Team,
Last year we posted a popular piece offering our view on defining the characteristics of a data scientist. Perhaps we should have added to that the quality: "during the NFL playoffs, they have one hand in the chips-n-dip and the other typing away in an Emacs terminal."
OK well it may not describe all data scientists, but at least some of ours couldn't resist the opportunity to analyze the gridiron action with code. Specifically in this case they applied language analysis algorithms to study tweets being exchanged during yesterday's NFC and AFC championship games. We occasionally tests its code on Twitter data, and we recently wrote about another example studying weather patterns.
Someone always pipes up to ask "why do you bother with this analysis?" Simply, Twitter offers a good source of high velocity un-polished language. Training natural language algorithms on polished prose like books is a very poor simulation of the types of free text we often encounter in our analysis (e.g., e-mails and free response fields within systems). Flexing our analytics muscles on a variety of problems helps ensure we can deliver the best results for our clients on actual business problems. For example, in one recent project we use natural language algorithms to scan through free-text communications contained within millions of IT incident tickets. Our client needed to study the occurrence of issues not currently captured with the pre-defined reporting categories. By sifting through all this data we were able to rapidly and accurately mine the free-text data to report on the previously un-reportable.
Enough, talk... let's take a look at some of the data captured yesterday:
Yesterday was all about determining who will play in the Super Bowl two weeks from now. In studying conversations throughout the day the evolution of that matchup becomes quickly apparent. The chart below looks at the relative volume of chatter by team discussing that team and chances of playing in the Super Bowl (times in EST for Sunday January 20th). Following the AFC Championship game, we see a big spike in activity for the victorious Denver Broncos. That chatter then calms down as everyone settled in for the NFC game to determining the Broncos' opponents. When the Seattle Seahawks emerged victorious shortly before 10 PM eastern time the Super Bowl chatter erupted again for both the Broncos and the Seahawks.
Within all this game chatter there's a lot more detail than can be teased out. By searching conversations for comments indicating that a scoring event took place, we plotted the occurrence of 'touchdown' and 'field goal' events relative to overall chatter about the playoffs. The chart below shows [read more...]