Most people think data science is smart people doing very smart stuff. Well that’s not it. Data science is just another subject involving its own bit of subtle complexities that has to be handled with knowledge and an innovative approach. JUST LIKE COOKING.
Getting the Data
Cooking is art and science. So is Analytics. Both start from getting the right ingredients. No matter how many spices and cooking techniques you apply, the dish won’t taste right if the correct ingredients aren’t present. One has to be knowledgeable about the right places, where they keep the good stuff and tap into it. Getting hold of quality ingredients is the first step to cooking. Synonymously, getting your hands on the right kind of data is also the most important pre-requisite.
Not getting the right ingredients will ultimately lead to you being a victim of unhealthy food, pesticide infected vegetables and a whole lot of diseases. A data science project also involves data you cannot blindly trust to be authentic. Otherwise, all the effort spent downstream will go to waste.
Cleansing the data
This part of the workflow is getting all the ingredients ready. They have to be washed, peeled, cut to the right shape and size and then may be washed again. Bringing the data to the right shape is a very important step for the analytics to work on it properly. So, you need to be aware of techniques where you can slice and dice the data, remove the unwanted weeds in it and bringing it to a sleek shape by studying and processing it thoroughly.
The cooking part
Here’s where the magic happens in both the cases. In the data science world this step is synonymous to applying the right algorithm so that the right results are achieved. The ingredients have to be processed sequentially, in parallel, the right spices need to be added at the right time, and sufficient time needs to be given for the processing. You need to keep tasting the dish intermittently just to see the ingredients are balanced, do some minor rectifications and make sure it’s all going in the right direction.
Same with data science. Once the data has been cleansed, the right algorithm needs to be applied with the right kind of configurations. The models need to be trained thoroughly and to be sure the models need to be validated intermittently. Once this is done, the dish is ready to be served, i.e. the system is ready to be deployed!
We often notice that we tend to leverage multiple sources of heat in the kitchen – the microwave oven, the ovens present in the burners etc. These sources are used independently and in parallel to satisfy the objectives quickly. In data science that’s exactly how things are done. This part of the process is called parallel/distributed computing which is the underlying concept behind “map reduce”. A term you might have heard of and thought it’s a difficult thing to understand. It is nothing but a lot of calculations happening simultaneously so that a lot of data is processed in a short span of time.
Data science is not rocket science. Anyone with the right amount of talent and the hunger to learn and explore can solve some really great problems. The right data science feature in your product will leave the user savoring the taste of the machine intelligence. So, strap up your aprons, and start having fun with data!