What are we trying to predict? Where is the data? How are we measuring if we are getting it right? Which departments are a part of this Data Science project? Who is my internal customer? When does this have to be completed? How clean is the data?
Data Science projects seem to go through a natural set of phases which model the inquisitive spirit of Data Science. The REASON method is a framework that demonstrates the ideal phases a business can expect for a Data Science project. Using a framework like the REASON method is beneficial not just for the Data Scientist, but the entire organization. It allows for more transparency to an already opaque discipline riddled with fancy terms like cross-validation, support vector machines, and area-under-the-curve.
Table 1.1 The REASON Method – Data Science phases and their meaning
Every Data Science project should start here. It is vitally important to get the business objectives correct. Take the time to understand what the business environment is like so as to understand what you will be working with. Translate the business needs into the Data Science objectives, and from that produce a Data Science project plan.
Here we start to go down the path of describing, exploring, and verifying the data.
At this juncture, Data Scientists should be selecting, cleaning, constructing, and integrating data. In some off-cases, there is a need to format the data so that your model can make use of the data for its purposes.
Many Data Scientists start prematurely at this phase and miss the richness and clarity of the problem by skipping over earlier phases. Sometimes this is done for speed reasons or a quick model test. This phase represents aspects of selecting appropriate modeling techniques, generating a test design, building the models, and assessing them.
This phase may be the most important of all the phases. Here we evaluate the other phases as deliverables and determine if we are meeting the goals of the project. It is important to be aware that not continuing the Data Science project past this point is acceptable and just as valid as deciding to continue.
This represents the final phase of the Data Science life-cycle. Here we discuss techniques to plan for the deployment of the Data Science model. In addition, a plan for monitoring and maintenance will help Data Science professionals be successful long into the future. We also cover aspects of archiving your learnings for future projects, allowing for shared learning, and speedups for future Data Science projects.
Each of these phases includes their own generic tasks. The business should expect that a Data Scientist will be able to translate specific tasks to more generic tasks to deliver on the expectations of the organization. For those that are part of Data Science project, it is helpful to gain understanding about the interrelationship between phases in a typical Data Science project as seen in figure 1.1.
The REASON Method gives a bird’s eye view of how things should get done. Mapping Data Science workflow can lead to optimized efficiency, particularly in Data Science teams where establishing a core process proves to be important.
Remember the phrase “it seemed otherwise to the gods” (dis aliter visum). In other words, the gods have different plans than mortals, and so events do not always play out as Data Scientists wish them to. However, having a field-tested framework should suit any professional well and far less frustrating than the alternative.