Subscribe to DSC Newsletter

Seven tricky sentences for NLP and text mining algorithms

Actually, these are 7 types of language patterns that are difficult to analyse with automated algorithms:

  1. "A land of milk and honey" becomes "A land of Milken Honey" (algorithm trained on Wall Street Journal from the 1980's where Michael Milken was mentioned much more than milk)
  2. "She threw up her dinner" vs. "She threw up her hands"
  3. "I ate a tomato with salt" vs. "I ate a tomato with my mother" or "I hate a tomato with a fork"
  4. Words ending ith -ing, e.g. "They were entertaining people"
  5. "He washed and dried the dishes", vs. "He drank and smoked cigars" (in the latter case he did not drunk cigars)
  6. "The lamb was ready to eat" vs. "Was the lamb hungry and wanting some grass?"
  7. Words with multiple meaning (e.g. a bay can be a color, type of window or body of water)

Views: 1820

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service