The discovery process used by data scientists commonly consists of four steps (see also Figure 1):

**Data acquisition:**In this first step, data is collected from various data sources. Data scientists select the data sources that may be useful and relevant for their study.**Data preparation:**In this step, data is transformed, aggregated, integrated and cleansed until it has the form that data scientists need for their study. For example, for many data mining algorithms, it can be useful to transform real-life values to binary values.**Data analysis:**In this step, data is analyzed using various types of techniques, including simple reporting techniques; classic statistical techniques, such as forecasting, predictive modeling and clustering; advanced data mining techniques;data visualization techniques such as affinity visualization, path visualization, scatter clouds, geo-visualization techniques; and time-series analysis.**Data interpretation:**When the techniques and tools present results and insights, it’s still the responsibility of the data scientist to determine whether the results make sense. This requires in-depth knowledge of the business and the data, and it demands common sense.

The discovery process deployed by data scientists has the following characteristics:

**The discovery result consists of rules.**The result of a discovery process is in most situations insights, and these insights are formulated as a set of*rules*. These rules can be simple if-then rules. For example, if two payments are done with the same credit card within 10 seconds, they are probably fraudulent. Rules can also be advanced statistical formulas indicating the relationship between specific variables. For example, a 10 degree rise in temperature increases sales of barbecue meat by 300%. Sometimes rules are sophisticated, self-learning data mining models that can predict customer behavior by combining historical and new incoming data.**The discovery process is an iterative process.**Figure 1 suggests that the discovery process is a serial process: when one step is finished, the next one starts, and we never return to a previous step. However, less would be closer to the truth. The discovery process is very iterative. For example, when a data analysis step has been finished, the conclusion may be to collect more data and start all over again. Even a data preparation step may lead to a return to the data acquisition step. In fact, this entire four-step process may have to be repeated several times before the right insights rise to the surface.**Discovery results should be actionable.**When a discovery process is finished, the organization has experienced no advantages yet – no money has been made, no ROI. The discovery process has to be followed up by a step called*Act.*In this step, the gained insights have to be used or implemented. Examples of implementing insights are: organization policies are changed, decision rules are embedded in operational applications, business processes are optimized, customers are offered special discounts and so on. Without the Act step, the entire discovery exercise has been for nothing. In other words, it’s important that discovery results are*actionable*. Note that the data scientist is not always involved in the Act step.

**Other links**

- 17 short tutorials all data scientists should read (and practice)
- Life Cycle of Data Science Projects
- Why Companies can't find analytic talent
- Six categories of data scientists
- Salary history and career path of a data scientist
- 2014 Analytics Salary Guide
- The data science toolkit
- 6000 Companies Hiring Data Scientists
- Data Science programs and training currently available
- Data Science: Connected Fields, Pioneers
- Clustering data scientists
- Salary surveys for data scientists and related job titles
- Difference between data engineers and data scientists
- Data Scientist vs. Statistician
- Marrying computer science, statistics and domain expertize
- Data Scientist Core Skills
- R Tutorial for Beginners: A Quick Start-Up Kit
- The death of the statistician
- Data Science / Big Data Salary Survey by Burtch Works
- Demand for Data Scientists and the Datification of Business
- Data Science Apprenticeship
- Map of data science university programs
- Job titles for data scientists
- How to better compete with other data scientists
- Horizontal vs. Vertical Data Scientists
- Data Scientists vs. Data Engineers
- Extreme Data Science
- 66 job interview questions for data scientists
- Test your analytical intuition
- Are data scientists overpaid?
- Data Science projects billed $300/hour on Kaggle
- The Face of the New University
- Fake data science
- Free courses from top universities
- Time Period for Analytical Positions Recruitment
- Data scientists making $300,000 a year
- Berkeley course on Data Science
- How much does a data scientist make at Facebook?
- Can data scientists replace business analysts?
- Debunking lack of analytic talent
- How maths should be taught in high school
- How do I become a data scientist?
- The amateur data scientist and her projects
- Data Scientist Demographics

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central