This article was written by ML bot2 on Machine Learning in Action.
Text mining (deriving information from text) is a wide field which has gained popularity with the huge text data being generated. Automation of a number of applications like sentiment analysis, document classification, topic classification, text summarization, machine translation, etc has been done using machine learning models.
Spam filtering is a beginner’s example of document classification task which involves classifying an email as spam or non-spam (a.k.a. ham) mail. Spam box in your Gmail account is the best example of this. So lets get started in building a spam filter on a publicly available mail corpus. I have extracted equal number of spam and non-spam emails from Ling-spam corpus.
We will walk through the following steps to build this application :
1. Preparing the text data.
2. Creating word dictionary.
3. Feature extraction process
4. Training the classifier
To check out all this information, click here.
Top DSC Resources