Subscribe to DSC Newsletter

These tips are provided by Dr Granville, who brings 20 years of varied data-intensive experience working with successful start-ups, small companies across various industries, and eBay, Visa, Microsoft, GE and Wells Fargo.

  1. Leverage external data sources: tweets about your company or your competitors, or data from your vendors (for instance, customizable newsletter eBlast statistics available via vendor dashboards, or via submitting a ticket)
  2. Nuclear physicists, mechanical engineers, and bioinformatics experts can make great data scientists.
  3. State your problem correctly, and use sound metrics to measure yield (over baseline) provided by data science initiatives.
  4. Use the right KPIs (key metrics) and the right data from the beginning, in any project. Changes due to bad foundations are very costly. This requires careful analysis of your data to create useful databases.
  5. Fast delivery is better than extreme accuracy. All data sets are dirty anyway. Find the perfect compromise between perfection and fast return. 
  6. With big data, strong signals (extremes) will usually be noise. Here's a solution.
  7. Big data has less value than useful data.
  8. Use big data from third party vendors, for competitive intelligence.
  9. You can build cheap, great, scalable, robust tools pretty fast, without using old-fashioned statistical science. Think about model-free techniques.
  10. Big data is easier and less costly than you think. Get the right tools! Here's how to get started.
  11. Correlation is not causation. This article might help you with this issue. Read also this blog and this book.
  12. You don't have to store all your data permanently. Use smart compression techniques, and keep statistical summaries only, for old data. Don't forget to adjust your metrics when your data changes, to keep consistency for trending purposes.
  13. A lot can be done without databases, especially for big data.
  14. Always include EDA and DOE (exploratory analysis / design of experiment) early on in any data science projects. Always create a data dictionary. And follow the traditional life cycle of any data science project.
  15. Data can be used for many purposes:
    • quality assurance
    • to find actionable patterns (stock trading, fraud detection)
    • for resale to your business clients
    • to optimize decisions and processes (operations research)
    • for investigation and discovery (IRS, litigation, fraud detection, root cause analysis)
    • machine-to-machine communication (automated bidding systems, automated driving)
    • predictions (sales forecasts, growth and financial predictions, weather)
  16. Don't dump Excel. Embrace light analytics.
  17. Data + models + gut feelings + intuition is the perfect mix. Don't remove any of these ingredients in your decision process.
  18. Leverage the power of compound metrics: KPIs derived from database fields, that have a far better predictive power than the original database metrics. For instance your database might include a single keyword field but does not discriminate between user query and search category (sometimes because data comes from various sources and is blended together). Detect the issue, and create a new metric called keyword type - or data source. Another example is IP address category, a fundamental metric that should be created and added to all digital analytics projects. 
  19. When do you need true real time processing? When fraud detection is critical, or when processing sensitive transactional data (credit card fraud detection, 911 calls). Other than that, delayed analytics (with a latency of a few seconds to 24 hours) is good enough.
  20. Make sure your sensitive data is well protected. Make sure your algorithms can not be tampered by criminal hackers or business hackers (spying on your business and stealing everything they can, legally or illegally, and jeopardizing your algorithms - which translates in severe revenue loss). An example of business hacking can be found in section 3 in this article.
  21. Blend multiple models together to detect many types of patterns. Average these models. Here's a simple example of model blending.
  22. Ask the right questions before purchasing software.
  23. Run Monte-Carlo simulations before choosing between two scenarios.
  24. Use multiple sources for the same data: your internal source, and data from one or two vendors. Understand the discrepancies between these various sources, to have a better idea about what the real numbers should be. Sometimes big discrepancies occur when a metric definition is changed by one of the vendors, or changed internally, or data has changed (some fields no longer tracked). A classic example is web traffic data: use internal logfiles, Google Analytics and another vendor (say Accenture) to track this data.

 DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 27353


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Aarón Castillo on September 13, 2016 at 9:50am

Hello everyone, I have a question: according to the tip number 2, what are exactly the desirable skills found in nuclear physicists, mechanical engineers or bioinformatics experts?

As a Computer Scientist I want to become a solid Data Scientist and Data Engineer, whereby I want to increase and polish my skills and learn from the others as well.

If any expert could answer my question, I will thank you deeply.


Comment by Web Master on August 25, 2016 at 7:54am

This capability is an authentication program intended for those with an enthusiasm to enhance their profession prospects by entering the information investigation industry as an information expert and in addition those with existing foundation in programming and measurements who need to upgrade their aptitudes with a viable educational modules to in the long run be information researchers.

Comment by Chintan Donda on November 9, 2015 at 11:55pm


Comment by Ishaq Mohammed on December 9, 2014 at 8:10pm

This is great list.

Follow Us


  • Add Videos
  • View All


© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service