Home » Uncategorized

Mining Customer Reviews to drive Business Growth

A passionate customer always provides feedback about his favorite product if it touches his emotional chord.

Product review contains wealth of information. Analyzing the review texts can unearth many hidden data points about the customer and the product. Such insights can help grow the business and gain revenue.

Lets look into a specific example. 

Our customer Bob decides to buy an edge pillow. 


He provides an in-depth feedback after using the pillow.

I have suffered with Gerd, Gastritis and Esophagitis for 1yr now and have been to several doctors and taken numerous medicine. All doctors told me to sleep on an incline and add blocks under my bed but I did not want to elevate both me and my wife so I slept on 3 pillows for over a year. Now I have arthritis in my neck and sleeping on 3 pillows have not done much to keep the acid down out of my throat. This wedge pillow does a good jo of not just elevating your head but it raises your entire upper abdomen to keep heartburn away from this area. I used to get up every night because of heartburn, bloating and stomach pain ……..

So what we learn when we read the whole text:

Our customer is not too Happy.. but his Review comments provides interesting insights

Lets now try to extract key signals and categorize them.

Health Concerns -> now my neck has become very stiff and painful

Product Reference ->  Get Rolled-up Cheap Pillow

Positive Feedback -> This pillow keeps food down and acid down

Missing Feature ->  does not have a steep incline

So it will be great if we can build a system to automatically extract such signals and share the insights through interactive visualization.

Quick high level view of the system components:


Technical Work Flow

  • Ingest Review Streams  (Real-time)  [ Kafka -> Spark ]
  • Store raw text in document index store for free form text search
  • Analyze incoming data asynchronously
    • Text analysis [ NLP using Spark-ML ]
      • Tokenize (lowercase, split)
      • Clean (remove stop word)
      • Normalize (lemmatize, stem)
    • vectorize attributes and lookup historical vectorized data to run period NLP model training workflow
    • match significant product terms by referring to [Product Taxonomy ]
    • match buyer’s preference [Buyer’s Profile]
    • match medical terms [Medical Ontology and Vocabs]
    • discover new product , topics using LDA
    • detect positive features , negative features
    • sentiment analysis using VADER ( valence Aware Dictionary and Sentiment Reasoner)
    • enrich the results by combining with product rating , product attribute rating , review votes
    • extract and match  user interests
    • its very important to detect plagiarism and  
  • Store current insights into Redis / DynamoDB for quick lookup and also stream to websockets
  • Visualize real-time insights
  • Historical analysis [ Elastic Search / Hadoop]
    • periodically aggregate the above insights
    • refine product offering on historical insights
    • product popularity comparison by category
    • generate demand based on signals
    • recommend products based on attributes 
    • find the hidden customers (channels / stores) and supply items to them need to buy in bulks
    • grow inventory and replenish items in local stores 
    • customer retention through personalized offer based on what user liked and didn’t like
    • sell in bulk to channels discovered from product texts and offer discounted price
    • extract the health concerns and accordingly correlate with medical conditions , drugs info , safety warnings and generate health recommendation and aggregated health score
  • Store aggregated and structured results in data warehouse cassandra or redshift 
  • Visualize summary reports , insights and trends 

In order to extract above hidden patterns with correlated signals, we should implement the best possible mechanisms and Recurrent Neural Network 


Word Embeddings [1]

• Document vs. Word Representations

• Word2Vec vs Med2Vec

• GloVe

• Embeddings in Deep Learning

• Visualizing Word Vectors: tSNE


Valence Aware Dictionary and Sentiment Reasoner — can help evaluate Buyer Sentiment Variations, positive/negative feedback ratio, feature attribute weightage enrichment and factor into all different types of product metrics computation (as explained above)

Finally we can generate incredibly useful visualizations and use them for product enhancement and improving overall buyer’s experience.

Lets get back to original feedback on wedge pillow and see the wonderful insights that we can gain.

Its noteworthy, how one can easily find the opportunity to sell wedge pillows to the rehabilitation center who ned them for their patients.

Many customers who actually buy the wedge pillows have undergone some sort of knee problems.

Just to understand the power of the knowledge that can be extracted from the reviews, lets quickly look into the insights gained from a set of feedbacks provided on ‘Cream of Wheat: Whole Grain Hot Cereal’


Its amazing to discover how this particular food item helps Alzheimer’s patients and mostly old people or persons with throat problems prefer this food item. 


Mining product review data can be real fun and can turn customer feedback into a continuous source of revenue.