At Machinalis we work daily on projects that fall within the area known today as Data Science. Here are 6 tips and learned lessons for people who want to provide a sustainable data science service and don’t want to avoid the mistakes that we made.
- Provide a Service, Not Just Data Science: Although it might sound obvious, if you love the technical aspects of your work, you risk falling in love with the challenge. The ultimate goal of a data science project is to build a tool that serves a purpose. All your efforts should be applied towards reaching this goal while keeping the client and the actual user of the product satisfied and you should avoid the temptation of exploring the subject further looking for interesting research avenues.
- Use Adequate Evaluation Criteria: The evaluation criteria (from baseline to target performance) ought to be finite, completely understood and agreed up both by you and the client. All you do from then on is work towards optimizing the criteria. Any ambiguity in the criteria leads to re-work and loss of client confidence.
- Describe "The Thing": Find out if the object of analysis has certain desirable qualities (manageable size, ease of measurement, reasonable distribution, etc.) before starting any work. In this regard, spend some time trying to come up with a minimal descriptive statistic before tackling more complex approaches, always looking for unexpected results in the behavior you’re trying to predict. We all want to apply all the fancy new machine learning methods, but we have to be patient and make sure we’re that the universe we’re working on is even worth it.
- Gent the Domain Specialist / Product Owner Involved: Validate your assumptions about the feasibility of the object of study with the product owner or domain specialist as soon as possible. Things that are strange to you might end up being normal. Did you find inconsistencies in the data? Share them with the team and the product owner. Need to reduce the size of the data? Validate the new data set with the domain specialist or product owner.
- Illustrate / Visualize: Visualization is essential. Minimalistic proposals that demonstrate the main idea are key. If you’re just starting in the area of visualization, rely upon existing products before investing on effort on your own solution.
- Sometimes Data Science is just a Stage: Sometimes in the context of a traditional software project, you might need to tackle a data science problem. Try to follow the tips that we mentioned, at least until you solve the specific problem. You can use whatever process works for you for the other non-data-science-related aspects of the project.
To summarize: there’s nothing wrong with loving doing Data Science, and being passionate about it can get you a long way, but keep the objectives in focus and direct most of your effort towards it.
Thanks to Agustin Barto for his cooperation to write this article.
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge