Subscribe to DSC Newsletter

Is data science bad at detecting bogus Amazon or Yelp reviews?

There are lies, damn lies, and Amazon reviews. Why are so many Amazon or Yelp reviews bogus? Do they have bad data scientists who can't detect fraudulent reviews? No, they have unethical CEOs ready to do anything to make money short-term. And complaining about being unable to find real data scientists to solve their problems. This is a challenge for ethical data scientists who want to create value, but get punished by top management for not condoning their misdeeds.

In the case of Amazon, most items found on its website have excellent (but too many times, bogus) reviews, because they make money selling them - a big conflict of interest. In the case of Yelp, if your business is listed but you don't purchase advertising with them, your business will only get (bogus) bad reviews. Both cases are worth a class action lawsuit. Amazon will also generate (bogus) bad reviews (easy to detect - see below) against authors who do not comply with their ridiculous policies. As a data scientist, is it worth working for such companies?

In the case of Amazon, bad reviews can also arise because of publisher wars - Wiley against Elsevier or O'Reilly - or because of disruptive content, like proving that SQRT(2) is an irrational number. The author of the proof,  Aristotle, was murdered 2,000 years ago for stating and proving this fact; interestingly, an algorithm to generate all digits of this number was recently found. Even today, things haven't changed.

Anyway, here are two of these bogus reviews that any working brain could easily detect - no need for advanced data science algorithms! Dr Granville has promised to add detection of bogus reviews as a project in our DSA program.

Example of bogus Amazon reviews

Note how short these reviews are, providing no explanations, no facts. 

  • [This book] Never arrived. This two-words comment got upvoted, probably by another fake profile paid to trash the book in question. The poster (maybe a robot) is too dumb to realize that actually he/she/it is not bad-mouthing the author, but instead Amazon.
  • Please don't waste your time or money on reading this tosh. I question the author's motivations for putting this garbage out in the first place, and Wiley's lack of quality control. Who is this guy??  - the commenter seems to be trashing only Wiley books.
  • You won't find a lick of rigor in the 300+ pages. A red-neck or white trash commenting on a scientific book he read by error, or maybe another fake profile, paid-to-post?
  • This is shocking ! Author does not give any real solutions. Obviously, someone who did not read the book, since 50% of the book is about real solutions.

I think it is time to build a new Yelp or ignore both Amazon and Yelp reviews. These companies have many other problems, see e.g.

Views: 1758

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Catherine Dalzell on February 10, 2016 at 7:05am

The SQRT(2) proof was done by a Pythagorean mathematician, not Aristotle,  and he had had to leave town fast because other Pythagoreans were not pleased. They believed the whole universe was governed by "number", which to them, meant that all distances needed to be an integer multiple of a common root. There was a common "number", according to them, and everything was built out of that. sqrt(2) is manifestly the length of something (the side of a right triangle with unit sides), but not commensurable with them, hence the panic.. Galileo got around this problem by the dubious method of assuming infinitely short distances. The Greeks were smart. Renaissance calculus and physics work only because everyone avoided the nasty philosphic questions that they raise. Basically, Medieval mathematicians became comfortable working with infinitessimals, thus paving the way for the scientific revolution. But the bottom line is that physics is weird in any jurisdiction.

Comment by Dr. Dimitrios Geromichalos on June 20, 2015 at 11:48pm

What is also very important, is - especially in countries and branches with skill shortages - the qustion how fake reviews on employer reviews like Glassdoor (or Kununu in Germany) can be detected. Here, I would suggest the following patterns to detect fake reviews from companies:

1. Are there conspiciously many reviews compared to other similar companies?
2. Are there conspiciously many reviews compared to size of company / number of employees?
3. Are reviews undifferentiated positive in all aspects?
4. Is "employer speek" used ("socially acceptable aspects", "long-term opportunities for development")?
5. Are superlatives used ("super", "the best") suspiciously often?
6. Are there date accumulations in which reviews were written (June 2011,...)?
7. Are there accumulations of some departments (Marketing, IT,...)?
Sure the list is far from being complete.

PS: Aritstotle was not murdered nor did he prove the irrationality of sqrt(2).

Comment by Anke Audenaert on October 5, 2014 at 7:40am

I agree with you, it is time to build a new Yelp.  I am on of the founders of Favrit (www.favritapp.com), and our premise is that you shouldn't let people review, you should just let them bookmark the places they like.  At the aggregate, the good ones will emerge, and the bad ones won't get 'liked', hence, their score will go down.  It is harder to game the system as an individual account is required per 'like'.  We are eager to analyze the data as they emerge... 

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service