This article on a complete tutorial on data exploration, was posted by Sunil Ray. Sunil is a Business Analytics and Intelligence professional with deep experience in the Indian Insurance industry.
There are no shortcuts for data exploration. If you are in a state of mind, that machine learning can sail you away from every data storm, trust me, it won’t. After some point of time, you’ll realize that you are struggling at improving model’s accuracy. In such situation, data exploration techniques will come to your rescue.
I can confidently say this, because I’ve been through such situations, a lot.
I have been a Business Analytics professional for close to three years now. In my initial days, one of my mentor suggested me to spend significant time on exploration and analyzing data. Following his advice has served me well.
I’ve created this tutorial to help you understand the underlying techniques of data exploration. As always, I’ve tried my best to explain these concepts in the simplest manner. For better understanding, I’ve taken up few examples to demonstrate the complicated concepts.
Table of Contents :
1 Steps of Data Exploration and Preparation
2 Missing Value Treatment
- Why missing value treatment is required ?
- Why data has missing values?
- Which are the methods to treat missing value ?
3 Techniques of Outlier Detection and Treatment
- What is an outlier?
- What are the types of outliers ?
- What are the causes of outliers ?
- What is the impact of outliers on dataset ?
- How to detect outlier ?
- How to remove outlier ?
4 The Art of Feature Engineering
- What is Feature Engineering ?
- What is the process of Feature Engineering ?
- What is Variable Transformation ?
- When should we use variable transformation ?
- What are the common methods of variable transformation ?
- What is feature variable creation and its benefits ?
Top DSC Resources
- Article: What is Data Science? 24 Fundamental Articles Answering This Question
- Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
- Tutorial: Data Science Cheat Sheet
- Tutorial: How to Become a Data Scientist – On Your Own
- Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
- Tools: Hadoop – DataViZ – Python – R – SQL – Excel
- Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
- Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
- Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
- Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
- DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus