By Nicholas Hartman, Director
Recent revelations regarding the National Security Agency's (NSA) extensive data interception and monitoring practices (aka PRISM) have brought a branch of "Big Data's" research into the broader public light. The basic premise of such work is that computer algorithms can study vast quantities of digitized communication interactions to identify potential activities and persons of interest for national security purposes.
A few days ago we wondered what could be found by applying such Big Data monitoring of communications to track the conversational impact of the NSA story on broader discussions about Big Data. This brief technical note highlights some of our most basic findings.
Our communication analytics work is usually directed at process optimization and risk management. However, in this case we applied some of the most basic components of our analytics tools towards public social media conversations—specifically tweets collected via Twitter's streaming API. Starting last summer, we devoted a small portion of our overall analytical compute resources towards monitoring news and social media sites for evolving trends within various sectors including technology. A subset of this data containing tweets on topics related to Big Data is analyzed below.
Background Data Volume
Twitter's streaming API provides real-time access to public tweets. The two graphs below show the relative volume of communications collected for conversations related to Big Data (between 30 May - 12 July 2013). The rate of communications collected was highest on weekdays and during working hours (in US timezones). This observation is consistent with the expectation that most of the Big Data conversation takes place in the context of business activities and the companies applying Big Data to business problems. Conversations on other topics (e.g., leisure activities) would have a very different profile.