Subscribe to DSC Newsletter

Analytics on Unstructured data – Twitter, Facebook and Social Media

Quoting Wikipedia: - Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional computer programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.

Yes, most big data sources, including Facebook, twitter etc., have unstructured data. And nearly no analytics can work directly on this unstructured data. Unstructured data is the starting point, but it has to be metamorphosed into some structured format before we can start with any actual analytics technique application. So what is the process?

Business requirement: - Tweets, Facebook postings and other social comments have to be analysed to determine the sentiment of the population.

Creating semi structured/ structured data (which can fit into relational tables) will involve dissecting the text into words and phrases which can then be categorised from ‘good’ to ‘bad’ and everything in between. Converted numerically, this would be in a range of +1 to -1. This set of fully numeric data is then ready for use. And all the analysis techniques can then be used to conclude and arrive at results. Thus, a new step of extracting structured data from unstructured data gets added into the analysis process.

 

Thus, all the Analytics skills and techniques are going to remain very valid in this new paradigm. Only the type of data, the source and its general understanding has to be re-vamped. And most of us analysts can breathe a sigh of relief.

So does this process of converting unstructured to structured data have to be manual, through heuristics? Or machine driven through algorithms? Algorithms reduce accuracy but increase scale. So a judicious decision or a gradual shift from manual to algorithm can be used to standardise this process within the organisation.

In fact, in this whole landscape, the decision on what data to delete becomes of paramount importance. And there are a new set of data guardians who are experts on helping organisations retain only relevant data.

This understanding and coming together of all the bits and pieces has given a lot of confidence to decision making process on dynamic and big data.

About the Author: -  Subhashini  is currently active in the Analytics Training (http://jigsawacademy.com/)  , Blogging and Consulting  arena, and  has a decade of experience across roles in Analytics in Retail Finance and Banking . These roles have been across Risk Management, Collections strategy, Fraud Control and Marketing. Her area of interest is the integration of results / outputs of Analytics with Business Decisions – Tactics and Strategy.

(Link to profile - http://in.linkedin.com/pub/subhashini-s-tripathi/3/405/77b )

Views: 5602

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Daden Limited on October 22, 2012 at 12:35am

Our data visualisation application datascape handles unstructured data and allows you to visualise them in 3D. There is a free community version of the software that you can try - link is http://bit.ly/DSC100th to compare the free and the pro version. With the pro version you can also add a Twitter streamer (and other social media) module which would allow you to visualise, filter, drill down live tweets etc (or even on captured social media data). You can also highlight key words as they are mentioned on tweets (e.g. words such as good, great, fantastic the tweets would be visualised in one colour and words like bad, hate, awful etc in another colour. One of our earliest blogs on data visualisation discussed sentiment analysis . http://www.daden.co.uk/adventures-in-opensim-data-visualisation-vis... 

They great thing about this inexpensive tool is that it allows the user complete flexibility of what they want to visualise and how they want to visualise it - you can plot more than 65,000 data entities and assign data fields to X/Y/Z coordinates, colour shape, size, rotation, etc etc. You can create translation tables to map data values to colours shapes and textures with autofill. One minute you could be visualising unstructured data, the next genetic/scientific data and the next financial data. All on the same tool. :)

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service