As a follow-up to my previous post "Using Machine Learning to predict Customer Behaviour", I wanted to address a similar topic but from an e-commerce perspective. How to you predict the behaviour of your visitors in your online store? and more importantly, how do you leverage this knowledge in order to optimize your traffic, conversion, profit, or whatever KPI you're using?

As a former business analyst in the e-commerce space, I used to look at 4 key analytics segments to provide data driven insights:

- Product Analytics,
- Site Analytics,
- Marketing Analytics and
- Customer Analytics.

Let's look at how Machine Learning can help you address each of the challenges posed by those four branches. In order to keep this post short, I've decided to split it in 4 parts where I'll cover each of the 4 segments. Let's start with product analytics.

**Product Analytics is all about understanding your demand**

Now that you've acquired (at a great cost) your visitor, how do you make sure you get a positive ROI on him? So how do you make sure he converts? Sure, he needs to reach the right page, I'll cover that part in the next post of the series, but once he's there, how do you make sure he sees the product he wants to buy, let alone acquire it?

This might appear as a trivial question in some cases. Say you want to purchase a book. You know the title, you know the author, shouldn't be too hard right? Amazon takes good care of that.

However, let's look at the following situations:

- You know want you want to buy, but you can't find it. Are there any other suitable substitute products which you could purchase instead?
- You know want you want to buy, you've found it, but there might be other complementary products you might be interested in.
- You don't really know what you're looking for, some inspiration maybe, so what are the products you're most likely to purchase?

Needless to say, being able to answer these questions would definitely help improve the conversion.

Conceptually speaking, one could see that these questions could be answered using data engineering. Let's have a closer look at this.

**Recommender's Engine: The Go-to Algorithm**

As a data scientist, thinking about product recommendation should automatically lead to the recommender's engine class of algorithms. The general idea is to leverage opinions of others to find possible recommendations. Typically, if customer A liked both product 1 and 2, and if customer B likes product 1, it is reasonable to recommend him product 2. Makes sense right? These algorithms have been widely researched and they even popularized Data Science with the famous Netflix $1M Challenge. There is also a public research group which uses a global movie database to provide personalized movie recommendations called grouplens (see links below).

With a sufficient volume of data, these algorithms work incredibly well. However, it is difficult to tell whether the recommended product is a good substitute or a complement to the initial product. Furthermore, what if the visitor needs inspiration and doesn't know what he's looking for?

To pursue the movie analogy, a recommender's engine might tell you whether you'll like Star Wars VII, but will fail to identify this movie as part of the Star Wars universe. In fact, if you've seen and liked the original trilogy, it might recommend you The Force Awakens, but it might also blindly recommend the JJ Abrams Star Trek reboot without distinction (and it probably will).

**Finding substitutes vs Suggesting complements**

For almost 2 decades now, Amazon has been pushing state of the art product recommendations. It works amazingly well, but I've sometimes found (and I still do) the "Customers also purchased these items" feature a bit wobbly. Very often I find substitute goods instead of complement goods which is kind of pointless really. Why would I purchase Pepsi if I already have Coca Cola in my shopping basket?

So how de we identify which is which? There is always the good old micro economy rule for substitutes and complements: Let's increase the price of product A. If the demand for product B increases it's a substitute, if it decreases it's a complement. The catch is that measuring precisely the price elasticity of demand is quite hard and requires careful isolation of exogenous factors with A/B tests and such. But used is conjunction with a recommender's engine, it can yield satisfying results.

Another option would be to mine frequent itemsets. Using the Market Basket Model, lets consider as an example itemsets {A, C1, C2, C3, C4,...,Cn} and {B, C1, C2, C3, C4,...,Cn}. If they are both frequent, it might be that under some specific conditions A and B could be substitute, and C1-Cn could be complement to both A & B. This is the field of Data Mining and there many algorithms which can efficiently mine frequent patterns, find associations or correlation within the data, such as Apriori, ECLAT, and FPgrowth.

**Getting the right inspiration**

Regarding the third question, getting the right recommendation is much harder, as the visitor has not identified the product he might be interested in. An option here would be to classify his current behaviour (site navigation path, search terms) into a product purchase likelihood using a supervised learning algorithm. I'll discuss this in greater details in the second part of this series on Site Analytics.

**Wrapping up**

As we saw above, there are many good ways to optimize the conversion of your product offering and very often, the choice of a method over the other is driven by contraints such as computing power. Given enough resource it is feasible to train a classification algorithm over thousands of possible outcome, but ultimately, applying the model to the test set has to be done in real time (as the visitor navigating on the website won't wait for his recommendation). This is sometimes where one has to trade off accuracy for simplicity.

I mentioned in the introduction of the article, that we'd only be looking at predicting behaviour from a product perspective. I'll discuss the site, marketing and customer perspective in upcoming articles.

Link to original post: https://logomachie.wordpress.com/2016/04/20/how-to-improve-your-onl...

**Sources**

- http://movielens.org
- http://grouplens.org/datasets/movielens/
- http://www.netflixprize.com/
- Jiawei Han, Micheline Kamber, and Jian Pei.
*Data Mining: Concepts and Techniques, 3rd Edition*. Waltham: Morgan Kaufmann, 2011.

You can get the first chapters on frequent pattern mining for free here: https://d396qusza40orc.cloudfront.net/patterndiscovery/Han_Data%20M...

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central