As part of my Technology and Innovation MBA program at Ted Rogers School of Management, I took a data and knowledge management course which teaches students the principles and practices of knowledge management. The second part of the course delves on tools used in data management and analytics. Although the theoretical part of the course was a bit dry, the hands-on portion was very interesting and exposed students to several different tools to capture, clean and analyze data.
One of the tasks given to students was to capture and analyze twitter data. Although students had access to Netlytics, which is a neat cloud-based text and social network analysis tool that also collects Twitter data, students were encouraged to find other ways to collect Twitter data.
One of the platforms I ended up learning about was Bluemix, IBM's cloud platform (PaaS) which supports many programming languages and hosts and ecosystem of services and runtime frameworks. It enables developers to quickly build, deploy and manage their applications. Ok, enough promotion of IBM here before this starts to sound like a paid blog!
After creating my 30 day trial account, I proceeded to research how to use the Bluemix platform. At first, the console was overwhelming to look at, but I was able to learn relatively quickly as there was enough documentation available to assist me where needed. (tip - developerWorks is a useful forum to look for help).
In this blog, I have shared the steps I took to capture the Twitter data in Bluemix. I have not listed every step but if you do end up using this as a guide and get stuck or, even better, have some feedback for me, do get in touch!
1. Create Project in Bluemix
Click on CREATE APP to create a Cloud Foundry Application.
Select type of app (In this case, select Web)
Select the runtime you want to use (For this project I used Liberty for Java)
Select an App name (mine was "Twitter MBA Project")
My project has now been created and I am ready to add the services needed to collect Twitter data and store it into a database. You can also set the memory dedicated to your app.
2. Add Services - Insights for Twitter & dashDB
On the dashboard of the Twitter Project, click on ‘ADD A SERVICE OR API’ and add the Insights for Twitter and dashDB services. You should find both services in the Data & Analytics category in the left navigation pane. You can read about each service below the image.
Now, you have the services needed to not just capture Twitter data but also a placeto store this data.
3. Launch IBM dashDB and select Twitter as data source
Click on the IBM dashDB service in the left navigation pane and then Launch the service with the icon on top right.
This opens the console below where you can either Load a new data set or go to an existing one. Click on "Load your data"
You get the option to select where your data comes from. Options include data sets on your local computer or stored in the cloud. Select Twitter as your data source.
Select the Insights for Twitter services as the source of your tweets and proceed to step 2 where you can specify your search.
You can now enter a twitter search query. At the time, I was interested in analyzing tweets containing the #scotiabank hashtag or tweeted by the official scotiabank handle. The "Search help" link was useful in learning the search string syntax. There is also a handy Get Tweet Count button which tells you how many tweets would be collected if you continued to populate the database with the search results.
The Insights for Twitter service, when used with the trial Bluemix account, does not return all the tweets matching your search results. For the free trial Bluemix account, users get access to the Decahose stream which is a 10% random sampling of the Twitter Firehose. If you upgrade to a paid version, you get access to the PowerTrack stream which allows you to filter the entire Twitter firehose.
Provide a prefix that will be used to name the tables containing twitter data.
Once the data is loaded, you are provided with a basic summary of the data including the number of tweets over time, location of tweeters and sentiment information.
4. Analyze the data
There are several ways to analyze the newly saved Twitter data it including the option of launching RStudio as shown below. More details using R for analysis here.
There are a lot more things to talk about here but this blog is getting too long! I plan on posting another blog soon on another analytics tool I learned in this data management class; Watson Analytics. In the meantime, please connect if you want to chat about this topic or just technology in general.