This article has been recovered from our archives. Author: Kirk Borne. Published in February 2014.
Recommender systems are among the most fun and profitable applications of data science in the big data world. Training data (corresponding to the historical search, browse, purchase, and customer feedback patterns of your customers) can be converted into golden opportunities for ROI (i.e., Return On Innovation and Investment). The predictive analytics tools of data science yield a bonanza of mechanisms to engage your customers and enrich their customer experience. What better loyalty program can there be if not the one that offers the customer what they want before they ask (and sometimes, even before they think of it for themselves). Yes, we know of some cases that have gone bad (such as the secretly pregnant teen and the targeted coupons that Target sent to her father), and we recognize that there is a fine line between being intimate with your customers versus being intimidating, but usually people do like to receive offers for great products that they love.
A new O’Reilly book (by Ted Dunning and Ellen Friedman) on “Practical Machine Learning – Innovations in Recommendation” takes a look at the nuts & bolts, the mechanics and the implementation, and the theory and the practice of recommender engines. They describe the design of a simple recommender using Apache Mahout, based upon the co-occurrence analysis of customers’ product purchases.
The book is available as a free download from the MapR website, which you can find here. These are the chapters in the book:
1. Practical Machine Learning
2. Careful Simplification
3. What I Do, Not What I Say
4. Co-Occurrence and Recommendation
5. Deploy the Recommender
6.Example: Music Recommender
7.Making it Better
8. Lessons Learned
In two separate articles, I examine recommender systems — their underlying principles, data science, history, and design patterns. Here are links to the articles, plus a short excerpt from each one:
(a) Design Patterns for Recommendation Systems – Everyone Wants a Pony
We identify four different design patterns that are useful in recommender engines for predicting customer behavior in the customer experience environment (e.g., online store, browser, smartphone app, or whatever): co-occurrence matrices, vector space models, Markov models, and “everyone gets a pony (the most popular item).”
The co-occurrence matrix, described in Dunning and Friedman’s book, is the cross-matrix of all possible product pairs A and B that were co-purchased by prior customers. Analysis of non-zero elements in this matrix identifies which co-occurrences are anomalous, that is, are more frequent than you’d expect by independent occurrence of items. These anomalous co-occurrences become indicators for potential offers of product B for customers who buy product A. This approach is based upon the association rule mining algorithm (a limited form of an approach called market basket analysis).
Vector space models are useful for both customer modeling and product modeling. This begins with building a feature vector, consisting of either a set of features that describe a customer (e.g., products of interest, features of interest, manufacturers of interest, purchase frequency, price range, etc.) or a set of features that describe a product (e.g., content, author/creator, theme/genre, etc.). Cosine similarity calculations are then made against these feature vectors to identify similar customers (X,Y) and similar products (A,B). In the first case, products are offered to customer X based upon the purchase history of similar customer Y. In the second case, the customer is offered product A based upon its similarity to product B that the customer has previously purchased or has recently looked at (but not purchased).
(b) Personalization – It’s Not Just for Hamburgers Anymore
Many years ago (don’t ask me how I know this!) the hamburger chain Burger King began branding themselves with this slogan: “Have it your way!” It was pure marketing genius! … … In 2006, Netflix offered a one million dollar prize for anyone who could improve upon their recommendation engine’s algorithm by at least 10%. There were over 44,000 entries in this contest, from over 41,000 teams, representing approximately 51,000 contestants. A winner was declared soon after July 26, 2009 when the “Bellkor’s Pragmatic Chaos” team submitted an algorithm that delivered 10.06% improvement. Another team matched their score on the test dataset, but the winning team scored best on the “hidden dataset” that Netflix used to score contestants’ entries. This latter detail provides a classic instructional example of how to avoid overfitting in a predictive analytics model, which is built against a training dataset – you find the “best solution” (which works best on a general set of data) through error measurement and verification of the algorithm against previously unseen data. “Bellkor’s Pragmatic Chaos” algorithm was the winner of the Netflix Prize on the basis of having the best performance on the hidden dataset, but they may have won in another category (though this cannot be verified) – their algorithm’s total length must have been among the leaders in that characteristic also. Their winning algorithm was presented in detail in a 90-page scientific research paper. The algorithm is an emphatic example of the type of algorithm that seems to be a frequent winner in Kaggle.com crowdsourced data science competitions – ensembles. Ensemble algorithms are in fact an amalgamation of multiple algorithms – they combine the predictions from l. … …e numbers of different algorithms. The proof is in the prizes – ensemble learning is one of the most accurate [and prize-winning] machine learning methodologies for big data analytics problems…
I encourage you to take a look at the new book and at the full-length versions of the above articles.