"Data Science(DS)" is nothing new but the term itself and the recent level of interest in it. As a practice it has commercially (not academically) existed for more than 25 years, mainly under "Data Mining (DM)" and "predictive analytics(PA)," since early 1990's. DM and PA got a lot of traction originally in financial, Telco, and retail industries that had a lot of granular historical data. Like anything that gets sudden attention and interest, DS has been misused and abused in a variety of ways. Given the fast surge in market demand in the last several years, many claim to be or want to be data scientists. True data scientists and DS managers who had to deal with screening DS resumes, can testify to the level of present noise (false positives) in that application process.
"Data Science" tries to be an umbrella field that covers more of what data mining and predictive analytics practices have covered. That is justified since with the growth of data of all kinds in recent decade and what is expected in the coming years, we need a lot more of the people with relevant DS skill sets. The challenge however has been the definition of that "skillset." What makes a good data scientist?
In my previous post "What is BuDAI?
," I explained that a successful DS project requires the involvement of the data science team through the whole cycle. The core part of a data science project deliverable is the insight and decision coming out of analytics. The analytics could be trivial (generally aggregated view of data and only looking at a handful of variables together) where in that case there would be no need for DS. That would be in the realm of a data or business analyst. DS comes into picture usually where:
- More sophisticated analytics approaches are required,
- More complex transformations are required to prepare the data,
- Granular or atomic analysis of entities of interests is required,
- Analytics could be straightforward but big data is involved requiring attention to optimization of analytics,
Within BuDAI process, the S team has to interact with business, data engineers, data architects, project managers, and product managers to name a few. Aside from some relevant technical skills/knowledge in math, stats, machine learning, programming, databases, and systems (the breadth and depth will depend on the level of seniority of the Data Scientist), through the years I have found the following ten traits to be as important as technical skills for junior hires and absolutely essential for senior data scientists.
- Problem solving ability
- Business acumen
- Ability to question the work of self and others,
- Passion for data (the more data, the better)
- Attention to details and ability to validate own work in multiple ways
- Statistical thinking (a thinker who knows when to reason deterministically and when not)
- Passion for exploration and discovery (quick learner from fails)
- Ability to devise optimal ways to experiment new (finding novel useful insight is cumbersome. One can never find a sure way to find it)
- Presentation ability (written and oral)
- Ability to simplify complex concepts for explaining to others.
 This is the subject of another blog and given the today's coverage of data science, the required technical abilities vary greatly.
I discuss these topics in detail in my book. Visit the book site for "High-Performance Data Mining and Big Data Analytics: The Story of I..." (http://bigdataminingbook.info ).