Subscribe to DSC Newsletter

The perspective of the word DATA has changed drastically over the decades and more so in recent years. The practice of collecting data in yore for mere bookkeeping has today become a matter of wise investment to create a gold mine for future. Hence every sector like government, corporations and academia are investing heavily on getting their Enterprise Information Management Architectures in place.

The prospect of tapping into the DATA generated in an enterprise backyard or on the World-Wide-Web, has suddenly spurred the need of unique talent. The name given to that talent ranges from the fancy ones like Data Scientists/Big Data Engineer to a little humble ones like Data Engineer/BI engineer/Data Architect to that of an old school ones like Data/Business Analyst. Currently these designations are more or less interchangeably and ambiguously used in the industry for work which fundamentally involves Storing of data from varied sources – Summarizing it – Reporting it for varied window of time periods and Building insights from it to essentially evaluate a problem or an opportunity and build a use case around it.

The reason these designations are sometimes fuzzy to their actual work definition is because the gap between the Information Management and Decision Science teams is narrowing. As the data driven business strategies evolve gradually, these roles are carving their distinct responsibilities and eligibility criteria so as not to be confused or overlapped with their actual strengths and expectations. But in that list, the Data Scientist role stands out as it’s a role where the overlapping of those strengths is more or less a mandate than a consequence. And because of the rareness and rising demand of this interdisciplinary talent, it is labeled as the sexiest job of the 21st Century by Harvard. Today when everybody is waking up to the call of DATA driven fortune making, corporate houses are desperately seeking this One-Man-Army called Data Scientist, who can visualize and execute an end-to-end data driven strategy to solve complex business problems and tap on growth opportunities.

A Data Scientist is someone with deliberate dual personality who can first build a curious business case defined with a telescopic vision and can then dive deep with microscopic lens to sift through DATA to reach the goal while defining and executing all the intermittent tasks.

Now, with almost a decade long tryst with “data” driven development, I am just adding my own thoughts to give a broader scope of what Data Science is and what it takes to get started on the path of becoming a Data Scientist. A Data Scientist is someone with deliberate dual personality who can first build a curious business case defined with a telescopic vision and can then dive deep with microscopic lens to sift through DATA to reach the goal while defining and executing all the intermittent tasks. Each of those intermediate stages requires knowledge of tools techniques and domain which can sometimes be very much diversified. Some of those broader topics which are used in those stages and which are a must for an aspiring Data Scientist are listed below.

  • Technical Geek – To handle the fundamental needs of digital data (to be stored, retrieved, moved around and transformed for consumption), one cannot be “Technically Challenged”. The knowledge of Algorithm Design, Database Systems, Distributed Systems, Cloud Technologies, Information Retrieval and a strong upper hand in Programing Languages optimized for every stage of data flow is a must. So apart from English, you should be able to speak in at least Python, Java, SQL and R. :)
  • Mathematics – When the data is ready for consumption, the ability to sniff for some obvious summaries and trends goes long way in deciding on the right data-sets for right purpose. Knowledge of Descriptive Statistics, Probability Theory, Algebra and Calculus help you quickly with basic data analysis. The challenge sometimes is not in knowing the math behind the analysis but in interpreting the results which drive the further course of action. Usually the data transformation results in descriptive summarization to capture the essence of the data.
  • Artificial Intelligence – The summaries and patterns identified from raw data have to be correlated with historical data to predict the future with some confidence levels. This is where the complex maths of Data Mining and Machine Learning techniques comes into picture. One needs to be strong in Matrix Decomposition, Optimization and Machine Learning algorithms. This is where the output of descriptive analysis is used to build and test Predictive as well as Prescriptive analytical models.
  • Artist – With having dived deep into the ocean of data to bring out meaningful and actionable insights. One needs convey those insights using Data Visualization techniques so that it becomes absolutely intuitive to the laymen. Even without looking at the data, the visualization should convey the gist of What-Was:What-Is:What-Can-Be of the data. Sometimes even the raw data when visualized, tells many stories to solve business cases. This is where the Artistic flare of an individual comes into play. One needs to be familiar with libraries like D3 to represent data visually.
  • 3Cs – Curiosity, Common Sense and Communication – The role of Data Science is strongly driven by inquisitiveness. Many times there is no “PROBLEM” definition; it’s only the curiosity which pulls out some insights for opportunities. At the same time, the knack of being comfortable with ambiguity and ability to mitigate it with curiosity and common sense goes long way. The common sense here can be driven by domain knowledge and business acumen and it plays a vital role for the starting point to formulate any hypothesis or a problem definition. Also, as the saying goes, “Correlation does not imply causation”, one needs to first understand what is analyzed, why it is analyses and how it is analyzed. While doing the analysis of data at hand, communication with stakeholders is an absolute must as it keeps the course of action on right path since it’s very easy to get distracted while sifting through data.

The above list is just for the mindset benchmarking and gives a broad overview of what one minimally needs to get started on core Data Science road map. The list can get quite exhaustive with specifics but that is not the intent of this article. There are many Data Scientists and evangelists who have taken a dig at this and few of them whom I personally love are Drew Conway’s Venn Diagram and Swami Chandrasekaran’s Metro Map.

Hope this adds some value to your own thoughts. Feel free to comment as even I am learner and would love to know your views.

Originally posted on my website Sapanpatel.in

4-September-2014

Author: Sapan Patel, Data Engineer @ Amazon || Sapanpatel.in - © 2014, all rights reserved

Views: 6386

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sapan Patel on September 27, 2014 at 8:57pm

Hi Sayed,

There are many schools which have compiled comprehensive courses for Data Science major, which are getting matured now. Few of them are listed below.

http://datascience.berkeley.edu/
http://www.cmu.edu/graduate/data-science/index.html
http://idse.columbia.edu/masters
http://datascience.nyu.edu/academics/programs/

Apart from these, there are many more schools across the globe who have more programs and I think at bachelors level you should focus on strengthening your fundamentals in Computer Science, Maths(Stats, Calculus etc) and Information Theory.

Hope this helps.

Thank you,
Warm Regards

Sapan

Comment by Sayed Hamdani on September 26, 2014 at 8:44am

Hello Sapan,

There are any schools where you could graduate in Data Science major ?

I want to get my bachelors in Data Science, But I don't know where to start ?

Thanks in advance

Sayed.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service