22 easy-to-fix worst mistakes for data scientists

And for software engineers or data analysts as well, in random order:

The list:

  1. Not being able to work well in a team
  2. Being elitist
  3. Using jargon that stakeholders don't understand
  4. Being perfectionist: perfection is always associated with negative ROI, in the business world: 20% of your time spent on a project yields 80% of the value; the remaining 80% yields the remaining 20% (this is also known as the law of diminishing returns)
  5. Not spending enough time on documenting your analyses, spreadsheets and code (documenting should eat 25% of your time, and be done on-the-fly, not after completing a project. Without proper documentation, nobody, not even you, will know how to replicate, extract value of, and understand what you've done, six months later.
  6. Not spending enough time on prioritizing and updating to-do-lists: talk to your stakeholders (but don't overwhelm them with a bunch of tiny requests), spend 30 minutes per day on prioritizing (use calendars and project management tools), to fix this
  7. Not respecting or knowing the lifecycle of data science projects
  8. Not creating re-usable procedures or code. Instead spending too much time on tons of one-time analyses
  9. Using old techniques on big data, for instance time-consuming clustering algorithms when an automated tagging or indexation algorithms would work millions of times faster to create taxonomies
  10. Too much or too little computer science - learn how to deploy distributed algorithms, and to optimize algorithms for speed and memory usage, but talk to your IT team before going too deep into this
  11. Create local, internal data marts for your analyses: your sys-admin will hate you. Discuss this topic with her first, before using these IT resources 
  12. Behave like a startup guy in a big company, and the other way around. Big companies like highly specialized experts (something dangerous for your career), startups like Jacks of all trades who are master of some of them
  13. Produce poor charts
  14. Focus on tools rather than business problems
  15. Planning communication last
  16. Data analysis without a question / plan
  17. Fail to simplify
  18. Don’t sell well
  19. Your project has no measurable yield - talk to stakeholders to know what the success metrics are, don't make a guess about them
  20. Identify the right people to talk to in your organization, and outside of it
  21. Avoid silos, and data silos. Be proactive about finding data sources, participate in database design conversations
  22. Failure to automate tasks such as exploratory data analysis

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 13546


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vivek Gupta on September 27, 2016 at 12:54am

I agree to most of the facts.

Comment by Rana S Gautam on September 7, 2015 at 7:06pm

This is a good article. I will agree with most of them but not with all of them. The magi of information is that it is able to bring out the uncertainty and address it. It is dynamic and changing. So there s a constant shift happening. So long the information need does not changes, having reproducible code and will work well. But once the information need changes, this approach will become inadequate. Now if we can add something more to the list so that it is able to address changing information needs, This list can be great.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service