Online reviews are valuable sources of relevant information that can support users in their decision making. An estimated 92% of online shoppers read online reviews, 88% trust online reviews as much as personal recommendations and they typically read more than 10 reviews to form an opinion. The objective
is to propose a framework aimed at improving user experience when faced with an otherwise unmanageable amount of online reviews and automatically rate them on a 5-star scale.
The framework consists of modules: (1) linguistic preprocessing, (2) topic modeling, (3) sentence classification against the topics extracted in the previous module, (4) sentiment analysis, (5) rating against the topics based on the sentiment of the corresponding sentences. The proposed method is unsupervised, i.e. does not require an annotated training dataset. It is also domain independent, and, therefore, can be applied across different domains for which online reviews are available.
module1: linguistic pre-processing
To prepare the raw text (lack of formal structure and informal style of writing), we employed the following linguistic pre-processing steps
• Removing stop word.
• Correcting spelling mistakes and typographical errors.
• Converting slang and abbreviations to the corresponding
• Stemming to aggregate words with related meaning.
• Removing punctuation, special characters, hyperlinks, etc.
module2: topic modeling,
Table 1 shows three examples of topics represented by 10 most relevant words within a topic. Intuitively, according to the given words, one may assume that the topic T1 is related to amenities, whereas T2 and T3 are more about the location.
module3: text classification,
Once the topic model has been generated, each sentence can be checked against the model to obtain information on topic distribution, which can be used to classify the sentence into an appropriate topic (see Table 2 for examples).
module4: sentiment analysis
We operate under an assumption that the rating is correlated with the sentiment strength. To calculate the overall sentiment, each sentence is analyzed separately using the weighted word embeddings method. The following steps provide more detail about our sentiment analysis approach
Step 1: The sentiment score of each word represented by a vector is calculated based on the cosine similarity between its vector of a word and the vectors of seed words of positive and negative sentiments
Step 2: Negation Handling – Negation words and punctuation marks are used to determine the context affected by negation.
Step 3: Part-of-Speech Tagging – Not every word is equally important for sentiment analysis, e.g. most sentiment words are adjectives, adverbs, nouns and verbs.
Step 4: Having calculated the sentiment of individual words as described in Step 1, the sentiment of a sentence is calculated using the following formula
Once the topic model has been extracted from a corpus of reviews, each sentence is classified into an appropriate topic. To rate a review from on a 5-star scale (1 star being very negative and 5 star being very positive), we first normalize the sentiment score of each sentence.The normalization effectively maps the sentiment of each sentence to a real number between 0 and 5. For each topic in turn, we aggregate the normalized scores of all sentences within the topic to obtain the average score
For more details, Please download original pdf from ACM, click here