Subscribe to DSC Newsletter

How to analyze retail dataset with the text/categorical variables?


I am in currently in the final year of my Business Economics(MBA) course. I am involved in the project/competition where clustering and predictive modelling is to be done on a retail dataset having text( type of discount and coupon applied) and categorical variables.

Can anyone suggest me the source to get an idea so that I can proceed with it? Currently using kaggle competition kernels to get an idea how to analyze such datasets.

If anyone is interested in this competition, we can form a team and start working on it.

Help is needed.


Views: 670

Reply to This

Replies to This Discussion

Just a couple of resource suggestions that may be helpful: is a pretty good cloud based environment with lots of tutorials that is an alternative to kaggle kernals; also has a lot of helpful information oriented to business that is subscription based and has a inexpensive trial offer.   These are only suggestions and may require more time than you have to be useful.


I think the first objective is to understand the main algorithms and the Data Mining models, because the implementation is relatively easier.
There are many ways of doing it.
In this case, to clustering you can use K-means or tree-decision, but in the case of retail sales there are models such as RFM or Market Basket Analysis.
If you are using Kaggle Kernels, I think you know some programming language like R or Python, but in another case, there is a Data Mining plug-in for Excel.

If you want to investigate the main models of Data Mining, I recommend you take a look at:

For information on the Market Basket Analysis (or Next Best Offer):

For information on the RFM:

I hope this information helps you.



  • Add Videos
  • View All

© 2020   TechTarget ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service