Domino habits for data science

Why do Navy seals make their own bed?

In a recent address to students Naval Adm. William H. McRaven, ninth commander of U.S. Special Operations Command mentioned the importance of making your own bed. The news can be read here and here 

Some of the reasons for this practice can be summarized as follows –

  • Completing the simple, the menial, and the repetitive – are important
  • Inculcating discipline – doing something you don’t really need to do.
  • Staying grounded – little things matter
  • Sense of achievement – completed the first task of the day
  • The domino effect – encourage you to do another task and then another

This got me thinking about the similar activities in a data science project. What are the equivalent domino habits for a data science project? What are those repetitive, mundane, un-glamorous part of  data science?

Surprisingly , the number of simple, menial, and repetitive tasks in the ‘sexiest job’ of the 21st century and high.. very high. So this can be a really, really, long list.

So: for the sake of brevity, let’s only focus on the activities after your data is pulled, and basic cleaning done. You know, that point, where you are about to bring out the ‘big guns’ from the R and/or Python packages. Below are some of those data science tasks which in my opinion make it to the list –

  • The simple, menial and repetitive [Data Cleaning and Exploration] –
  • Exploring your data – Distributions? Central tendencies? Outliers? Missing values?
  • Inculcating discipline [Understanding business justification] – Explore and document ‘why’ your data is there? What are the technical systems / business processes that generated this data?  Have you talked to people who decided to log the data? Who all use the data?
  • Staying grounded and staying updated  – Did you revisit the concepts and did a read-up of the best practices (again)? Have you checked the math? Read the book again? Checked the papers? Did something change since you last applied the methods? Are you aware of the latest details of how ‘your’ methods are changing – a little bit at a time?
  • Sense of achievement [Version Control] – Ensure that you have a method to communicate with the rest of the team and do it frequently. Check-out, modify, and check-in, simple tasks which can in the long run be the difference between a successful deployment and a scramble for code. At the same time, it will give you that sense of having done some ( usually simple) change, that will move the needle. So make those one-line code changes as fast as possible and share with the team.
  • The domino effect – Now that you have achieved your little victory, checked in your code, understood the data and the process, it is time to get to your next set of (not so small) tasks. So go on bring out the learning, machine learning, deep learning packages and enjoy..

Do comment and let me know which steps I should add to the routine.