Guest post by Mike Waldron.
This guide was originally posted on the AYLIEN Blog. It was written as a how-to guide for using RapidMiner and AYLIEN to scape and Analyze online content.
One of the major challenges with mining the Web and Social Media for insights is trying to get all of your data into one place. To do this, you need to extract information from multiple sources in order to gain an accurate and holistic view.
Combining multiple data sources and analyzing their content can be a daunting task, but thankfully data mining frameworks such as RapidMiner and Weka make it easy to extract information from multiple sources in a quick and straightforward manner.
In this blog post, we're going to show you how to use AYLIEN's Text Analysis API from within RapidMiner to analyze text gathered from sources on the web.
The Web Mining extension for RapidMiner provides access to internet sources like web pages, RSS feeds, and web services. In this tutorial, we're going to use it to make HTTP requests to the Text Analysis API. In part 2 we will use it to scrape information from web pages such as Rotten Tomatoes.
The Web Mining package provides you with an operator for invoking external web services. This operator is called "Enrich Data by Webservice" and can be found in the Operators panel under Web Mining > Services > Enrich Data by Webservice.
url: "https://api.aylien.com/api/v1/sentiment?mode=tweet&text=<%text%>" or if you're using Mashape: "https://aylien-text.p.mashape.com/sentiment?mode=tweet&text=<%text%>"
request method: POST
query type: XPath
Here we are basically calling the
/sentiment endpoint of the Text Analysis API to analyze the sentiment of some text in order to find out if it's positive, negative or neutral.
Now that our API call is setup, we need to provide the operator with some input text.
text attributeparameter to "text"
url attributeparameter to "text"
Now that we have everything setup, it's time to run our process by clicking the Run button.
As you can see, "I love puppies!" was deemed to be positive and the result is now accessible in RapidMiner for further analysis and reporting. You could use one of the many other methods provided in the Text Processing package to generate any number of documents and analyze their sentiment in the same fashion. Also, by changing the
url parameter in the API call you can access any other endpoint from the Text API (Concept Extraction, Classification, Summarization and so on).
In the 2nd part of this series, we're going to crawl Rotten Tomatoes with RapidMiner to extract movie reviews and analyze their sentiment to gain some interesting insights.