Facebook has launched a Kaggle competition to hire a data scientist:
“This competition tests your text skills on a large dataset from the Stack Exchange sites. The task is to predict the tags (a.k.a. keywords, topics, summaries), given only the question text and its title. The dataset contains content from disparate stack exchange sites, containing a mix of both technical and non-technical questions.”
Working at Facebook
The question is very similar to building a taxonomy to classify user questions into a number of categories, called tags. Interestingly, we face the same problem at Data Science Central: automatically attaching tags (from a set of 5,000 potential tags - e.g. big data, analytics, Hadoop, career, etc.) to all the blog posts posted on our network since 2007. We might hire someone to do this! In short, this is nothing more than automatically putting a structure on unstructured text.
You will find useful information in three of our articles, regarding this problem:
I wish I had the time to help potential candidates win this contest (for a fee), as this is my domain of expertise. I'm wondering if there is a market for statistical consultants to help candidates win these contests.