Home » Uncategorized

Not tools, focusing on problems: a project cycle in data science

  • MabAlam 

As a data scientist in an organization you frequently find yourself in a couple of situations:

  • you have a business problem, you want to find a data-driven solution

103f7fU4xEoTTM0grmWNu5Q

Data science project cycle ( @DataEnthus)

The problem

In step 1, you have a question or problem. If it’s a big one, you could break it down to smaller pieces if needed. For example, if the question is about forecasting sales growth over the next 10 years, you could break it down to pieces such as what’s been the historical sales? How’re the sales currently trending? How’s the demand trending in the market? How are the competitors doing? Etc.

The process

In step 2, no, you are not thinking about what models/tools/visualization techniques to use; not yet. You are thinking about a methodological process that will guide you through answering your question. You lay out a list of datasets, locate where to find them, and maybe make a list of tools that might be useful. Even if you haven’t made final decisions on specifics of data/tools, having an overall process in your mind or written on paper helps a lot, even if it will change later on with additional information. This approach is kind of similar in academic settings where you write your research proposal before actually executing the research; things often change along the way as you study the problem at hand further and go deeper.

The tools

In step 3, now you are thinking about what tools can help answer this question. If it’s a forecasting problem, you would think if a time series based model is any useful? Or is it a linear regression problem instead? Do you need GIS technology? Is there a good package in R or Python?

The answer

You have chunked your big problem into smaller pieces, and answered them individually. In aggregate have you solved the big problem that you started with? If yes, kudos. If not, go back to step 2.

The bottom line

In summary, being a scientist means going through a process of exploration/discovery. We are often hung up with what tools/models we know and how to fit them with the data. As we have seen, selecting the right tool is a small part of the problem solving process. It’s always problems first, tools later.

Originally posted here.