Big Data are like your children. You should always love them whether they're naughty or nice. Both dirty data and clean data can convey important actionable information, if properly handled and modeled. Consequently, always try to follow these 5 fundamental principles of Data Science:
1) Begin with the end in mind.
2) Know your data.
3) Remember that this *is* science.
4) Data are never perfect, but love your data anyway.
5) Overfitting is a sin against data science.
Read all about these five concepts, and more, at:
Also, remember this final data science commandment:
6) Honor thy data's first mile and last mile.
In other words, "the first mile is the hardest" due to all of the challenges associated with collecting, cleaning, and preparing distributed, heterogeneous, complex, dirty data. But, "the last mile is the hardest" too, because of the challenges in deriving timely and actionable insights from massive streams of data. Therefore, do not underestimate the amount of time that you will spend in the first mile and the last mile of a big data analytics project. Devote quality time and attention during those phases of the project (giving them proper respect) in order to produce lasting value and tangible ROI.