Subscribe to DSC Newsletter

Data Science - The Process of Capturing, Analyzing and Presenting Business Intelligence with Skill DataReality


Programming and Database skills 

The familiarity and ability to use Hadoop, Java, Python, SQL, Hive, and Pig are core essentials. Programming itself and computer science in general is the very starting point of gathering data and understanding how to "get" data and piece it together. Just moving data is it's own specialty reserved for ETL (extract, transformation, loading) specialists. ETL tools may include Informatica, MS SSIS, Teradata bulk loading tools among others. If you can't GET data, you sure can't analyze it. And you sure can't expect somebody else to capture it for you. 

Business Domain Expertise and Knowledge

Understanding the business data itself is its own special domain expertise that only comes with working in that data domain. Medical data is different from ecological data which is different from all the varieties of business data. This only comes from studying and asking lots of questions while working in that particular field.

Data Modeling, Warehouse, and Unstructured Data Skills

Knowing the difference between a fact table that is put together well and one that is faulty with semi-structured unconstrained keys makes all the difference in how easily you can trust and massage the data you're trying to capture. Knowing the validity and proper use of each of the dimensions is also key to leveraging any star-schemed data structure. Unstructured data is another story where you may have to figure or organize yourself a staging layer before the data itself is even useful. If you can't get through these things you can't begin making propositional sense of data to analyze.


Statistical Tool Skills 

Using R, Excel, SAS, or other tools to piece together your propositions and discover potential patterns and correlations through statistics are the heart of working data to discover and apply your creativity. This is where true genius can shine, but use of the tools is the first essential grind of skills required. If you can't use the tools, you can't analyze the data. You could use paper and pencil or even a fancy calculator if you've got the math skills down cold.

Math skills

Understanding correlation, multivariate regression and all aspects of massaging data together to look at it from different angles for use in predictive and prescriptive modeling is the backbone knowledge that's really step one of revealing intelligence. Nothing more to say. If you don't have this, all the data collection and presentation polishing in the world is meaningless.

Visualization Tool Skills

Potential list includes Flare, HighCharts, AmCharts, D3.js, Processing, Google Visualization API, Tableau, Excel, PowerPoint and Raphael.js (?). Most of those I admittedly don't know. Tableau and Excel should provide you with basic enough tools. Heck, if you're good, MS Paint will work just fine.

Storytelling Skills

This is that special set of soft skills that nobody can quite pin down. It's the art and communication holistic human side of the complete data scientist package. This is what makes the difference between a geek scientist and a business savvy Data Scientist of the sexy bent that's valued highly with the according pay and executive respect. When you can come into a meeting and throw up a PowerPoint presentation with an introduction, a proposition, and a revelation in business terms that tells the business what's wrong and what's right and how money is being made and lost, you've earned your income. The trick and value is that elusive almost lost art of storytelling. 
Go sit on the porch with Grandpa and get him to tell some stories. Listen to how he sets them up, builds upon them and then delivers the punch lines. You can still learn if you can put your analytic mind aside for awhile. It's the ART of the holistic ART of Data Science. Without it, you might as well just wear a lab-coat. With it you can wear your sunglasses at night. 

Other Opinions and Lists


Is the garden-variety spreadsheet jockey a data scientist? Yes, to the extent that they build statistical models and use the tool to find non obvious patterns in structured data, they are engaging in a form of data science. But if this exploration is not their primary job function, they are merely dabbling, not specializing.

Is BI report-building or OLAP cube-development data science? No. Those endeavors, although important, revolve around obvious data patterns — obvious in the sense that an organization has chosen to embed them in repeatable views and access patterns.

Data science is all about asking questions. You engage in it whenever you interactively and iteratively search for deep, hidden patterns.

Top Skill-set Requirements to be a Data Scientist

  • Analytical skill-set
  • Mathematics / statistics (including experimental design)
  • Domain knowledge (i.e. Industry specific processes where analytic are applied)
  • Technology / data
  • Communication skills (story-telling)
  • Curiosity (willingness to challenge the status quo)
  • Collaboration
  • Commercial acumen/ Strategic
  • Customer-centric
  • Problem-solving skills
  • Proactive
  • Diverse Technologies: Hadoop, Java, Python, C++, ECL, NoSQL, HBase, CouchDB
  • Mathematics
  • Business Skills
  • Visualization: Flare, HighCharts, AmCharts, D3.js, Processing, Google Visualization API, and Raphael.js
  • Innovation

How to hire data scientists and get hired as one

  • SQL, 
  • Statistics, 
  • Predictive modeling and 
  • Programming (probably Python)
Further advice of what it takes to be a Data Scientist from practitioners at Netflix, Orbitz and Hortonworks:
  • Know the core competencies
  • Know a litle more
  • Embrace online learning
  • Learn to tell a story
  • Prepare to be tested  (aka “Your pedigree means nothing”).
  • Exercise creativity

Data Scientist Skills Needed

  • Commitment,
  • Creativity,
  • Business savvy
  • Presentation,
  • Intuition

From the O'Reilly Strata Conference

  • Open-source tools (G)
  • Statistics (A)
  • Presentation (P)

Winning with Big Data

Michael Driscoll, Secrets of the Successful Data Scientist

Northwestern University - Master of Science in Predictive Analytics degree

Core Curriculum (which tells you a lot of what they think it takes in skills to do this stuff):
  • CIS 317-DL Database Systems Design & Impl -- This course covers the fundamentals of database design and management. Topics include the principles and methodologies of database design, database application development, normalization, referential integrity, security, relational database models, and database languages. Principles are applied by performing written assignments and a project using an SQL database system
  • CIS 435-0 Data Warehouse & Data Mining --This course provides an introduction to data mining, with a few hours of focus on data warehousing as one of the commonly used data sources for data-mining applications. Students learn data-mining applications, core concepts, and algorithms. Among these algorithms are supervised (Naive Bayes, Decision Tree, and Neural Network) and non supervised (Association Rules, commonly used for market basket analysis, and Clustering) algorithms. Students learn via experimentation; they observe the outcome of applying data mining algorithms to real-life data 
  • PREDICT 401-DL Statistical Analysis -- Students learn to apply statistical techniques to the processing and interpretation of data from various industries and disciplines. Topics covered include probability, descriptive statistics, study design and linear regression. Emphasis will be placed on the application of the data across these industries and disciplines and serve as a core thought process through the entire Predictive Analytics curriculum.
  • PREDICT 410-DL Predictive Modeling I -- This course introduces statistical models as they are used in predictive analytics. The course reviews traditional linear and generalized linear models, including multiple regression and logistic regression. It addresses issues of model specification and model selection, as well as best practices in developing models for management. The course also demonstrates the application of multivariate methods in predictive analytics
  • PREDICT 411-DL Predictive Modeling II -- Drawing upon examples from economics and business, this course provides an in-depth review of modeling practice. Special attention is paid to linear predictor and error structure specification for time series models. The course reviews econometric methods, including maximum likelihood estimation, two-stage and three-stage least squares, seemingly unrelated regressions, and simultaneous equation estimation. The course shows how to use autoregressive integrated moving average (ARIMA) models in time series forecasting. The course also demonstrates the application of survival/duration analysis in predictive analytics
  • LEADERS 481-DL Leadership -- The purpose of this course is to identify the fundamental leadership behaviors that enable people to excel in their careers, and to help students apply these behaviors to personal and professional success. The course builds from the basic premise that leadership is learned, and looks at the theory and practice of leadership at the individual and organizational level. The course will explore definitions of leadership, the importance of leadership, leadership styles, the role of vision and integrity, the importance of giving and receiving feedback, how to lead change and solve problems, effective teamwork, and communication strategies
  • PREDICT 402-DL Analytics and Data Collection -- This course will describe the appropriate uses of analytics and its limitations while defining how to approach the various stakeholders within an organization. Included will be a review of the ethical, regulatory, and compliance issues related to a given business problem and/or solution. Time will be spent interpreting performance-based organizational issues while concurrently identifying solutions for these same performance-based organizational issues. In addition, time will be spent identifying best practices to plan for engaging, implementing, and sustaining organizational change.
Happy modeling! :)
-------------------- end of original article ----------------------------------
Addendum UPDATE links for 2017

Looks like it’s time to update the generous and insightful contributions of thought from others on this article. Here’s skills recommedations I’m seeing from other working data scientists.

Linda Burtch on KDnuggets

Technical Skills: Analytics

  • Education – Data scientists are highly educated – 88% have at least a Master’s degree and 46% have PhDs – and while there are notable exceptions, a very strong educational background is usually required to develop the depth of knowledge necessary to be a data scientist. Their most common fields of study are Mathematics and Statistics (32%), followed by Computer Science (19%) and Engineering (16%).
  • SAS and/or R – In-depth knowledge of at least one of these analytical tools, for data science R is generally preferred.

Technical Skills: Computer Science

  • Python Coding – Python is the most common coding language I typically see required in data science roles, along with Java, Perl, or C/C++.
  • Hadoop Platform – Although this isn’t always a requirement, it is heavily preferred in many cases. Having experience with Hive or Pig is also a strong selling point. Familiarity with cloud tools such as Amazon S3 can also be beneficial.
  • SQL Database/Coding – Even though NoSQL and Hadoop have become a large component of data science, it is still expected that a candidate will be able to write and execute complex queries in SQL.
  • Unstructured data – It is critical that a data scientist be able to work with unstructured data, whether it is from social media, video feeds or audio.

Non-Technical Skills

  • Intellectual curiosity – No doubt you’ve seen this phrase everywhere lately, especially as it relates to data scientists. Frank Lo describes what it means, and talks about other necessary “soft skills” in his guest blog posted a few months ago.
  • Business acumen – To be a data scientist you’ll need a solid understanding of the industry you’re working in, and know what business problems your company is trying to solve. In terms of data science, being able to discern which problems are important to solve for the business is critical, in addition to identifying new ways the business should be leveraging its data.
  • Communication skills – Companies searching for a strong data scientist are looking for someone who can clearly and fluently translate their technical findings to a non-technical team, such as the Marketing or Sales departments. A data scientist must enable the business to make decisions by arming them with quantified insights, in addition to understanding the needs of their non-technical colleagues in order to wrangle the data appropriately. Check out our recent flash survey for more information on communication skills for quantitative professionals

Seamus Breslin on KDnuggets

  1. SQL
  2. Data Visualization
  3. Communication Skills
  4. Hadoop
  5. Spark
  6. Python
  7. Statistics
  8. R
  9. Creativity

Dave Holtz at UdaCity

  1. Basic Tools: R or Python, and a database querying language like SQL.
  2. Basic Statistics: At least a basic understanding of statistics.
  3. Machine Learning: This can mean things like k-nearest neighbors, random forests, ensemble methods – all of the machine learning buzzwords.
  4. Multivariable Calculus and Linear Algebra: Understanding these concepts is most important at companies where the product is defined by the data and small improvements in predictive performance or algorithm optimization can lead to huge wins for the company.
  5. Data Munging: Some examples of data imperfections include missing values, inconsistent string formatting (e.g., ‘New York’ versus ‘new york’ versus ‘ny’), and date formatting (‘2014-01-01’ vs. ‘01/01/2014’, unix time vs. timestamps, etc.).
  6. Data Visualization & Communication: When it comes to communicating, this means describing your findings or the way techniques work to audiences, both technical and non-technical.
  7. Software Engineering: If you’re interviewing at a smaller company and are one of the first data science hires, it can be important to have a strong software engineering background.
  8. Thinking Like A Data Scientist: Companies want to see that you’re a (data-driven) problem solver.


How to Become a Data Scientists | Data Scientist Salary

Technical Skills

  1. Math (e.g. linear algebra, calculus and probability)
  2. Statistics (e.g. hypothesis testing and summary statistics)
  3. Machine learning tools and techniques (e.g. k-nearest neighbors, random forests, ensemble methods, etc.)
  4. Software engineering skills (e.g. distributed computing, algorithms and data structures)
  5. Data mining
  6. Data cleaning and munging
  7. Data visualization (e.g. ggplot and d3.js) and reporting techniques
  8. Unstructured data techniques
  9. R and/or SAS languages
  10. SQL databases and database querying languages
  11. Python (most common), C/C++ Java, Perl
  12. Big data platforms like Hadoop, Hive & Pig
  13. Cloud tools like Amazon S3

Business Skills

  1. Analytic Problem-Solving: Approaching high-level challenges with a clear eye on what is important; employing the right approach/methods to make the maximum use of time and human resources.
  2. Effective Communication: Detailing your techniques and discoveries to technical and non-technical audiences in a language they can understand.
  3. Intellectual Curiosity: Exploring new territories and finding creative and unusual ways to solve problems.
  4. Industry Knowledge: Understanding the way your chosen industry functions and how data are collected, analyzed and utilized


Certified Analytics Professional (CAP)

Cloudera Certified Professional: Data Scientist (CCP:DS)

EMC: Data Science Associate (EMCDSA)

SAS Certified Predictive Modeler using SAS Enterprise Miner 7

Professional Organizations for Data Scientists

Data Science Association

International Institute for Analytics (IIA)

International Machine Learning Society (IMLS)

Institute for Operations Research and the Management Sciences (INFORMS)


What are the most valuable skills to learn for a data scientist now? On Quora from Raja Tanveer Iqbal

  1. Curiosity About Data and Passion for Domain: If you are not passionate about the domain/business and curious about data then it is unlikely that you will succeed in a data scientist role.
  2. Soft Skills: Communication and influencing without authority. Being a good story teller is also something that helps.
  3. Math/Theory: Machine Learning. Stats and Probability 101. Optimization would be icing on the cake.
  4. CS/Programming: At least one scripting language (I prefer python). Decent algorithms and DS skills, to be able to write code that can analyze a lot of data efficiently.
  5. Big Data and Distributed Systems: Understanding of basic MapReduce concepts, Hadoop and Hadoop file system and least one language like Hive/Pig.
  6. Visualization: Ability to create simple yet elegant and meaningful visualization.

Also noteworthy:

Views: 90454


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Prasanna Dixit J on October 13, 2016 at 11:26pm

 A good overview of the core skills to get the basics right, and the clutter around the Data Engineering and practices.

Comment by kingsley mbonu on March 30, 2015 at 11:49pm

a little bit  i just need how to start i have finish the coding using R  i have  alot of descriptive plot i just wants to know how to start if i knew all this i will not be asking, thanks any more ideals  i will be greatful

Comment by Mitchell A. Sanders on March 30, 2015 at 2:43pm

@kingsley ... quality storytelling is composition art, so there is no step-by-step algorithm. Saying that, I have found that the "Exploratory" aspect of the Capture phase can yield "Discoveries" which should be noted/written down as elements for your STORY-LINE. The Modeling done in Analysis phase should give you some "Insights" (take notes) into the data. And once you've made some Deliverables with your Present stage, you should end up with some "Prescriptive Analysis" (aka advice backed by facts). 

These 3 elements: 1.) Discoveries, 2.) Insights, and 3.) Prescriptions are all the elements you need to tell a good story to your data consumers. Start with the Problem, end with the Solution, filling in with these 3 elements above, and wha-lah you have your outline for a 5 paragraph essay story.

Hope this helps a bit.


Comment by kingsley mbonu on March 30, 2015 at 11:12am

can someone give me an idea on how to tell a beautiful story with a datasets i have a presentation to give i am try to get started i want to guidelines on how to start so thhat i can be confident that i know what am doing, i will be glad if i should get help yea 

thank  you

Comment by Sayed Hamdani on September 26, 2014 at 10:11am

Thank You,

Great information, Which schools offer Bachelor's degree in Data Scientist ? I have tried google and bing but no luck.

Thank You,


Comment by Martin Squires on July 29, 2014 at 12:01am

I'm intrigued how the articles on this subject seem to divide between computer science/Phd Statistics/Code skills viewpoints and story telling/business acumen/commercial skills viewpoints. This fits with all sorts of questions I've been wrestling with:  Can we really expect individuals to exhibity every skill we need or are we going to have to develop new operating models and team structures to blend the skills we need? What is the skill set required to lead this type of team in the future? Where should the data scientists fit within an organisation?


If anyone has any answers I'd love to hear them!

Comment by Srinivas G on May 25, 2014 at 8:27am

Nice article for startups. Good going.

Comment by Pradyumna S. Upadrashta on September 27, 2013 at 10:35am

Accountability is the highest measure of relevancy.  I think there should be a bubble added to the chart above to address this. A non data scientist being held accountable for good data science is necessarily at a huge disadvantage.  A data scientist should also function as the chief risk manager when the models drive business decisions: They have to be in a position of authority/accountability with some weight.  If the accountability is separated from the modeling, you have a recipe for disaster.  While BI is an element of data science, it is not the main thing.  I've run into plenty of "Big (Sparse) Data" posing as "Big Data" because IT thought it was a good idea to store everything storable with no (or very little) data governance.  Data Scientists (the practical end users of data) should have more say in what gets stored, how it gets stored, why it gets stored, and for how long it gets stored.  To answer these questions, you have to know what kinds of problems you want to (/can) solve, and what kinds of data are required to solve these problems -- so some forethought about future applications from a modelers perspective is a key missing input to current BI practices.

Comment by Frank Banin on September 27, 2013 at 9:23am

As a BI consultant my typical day could be one or a combination of Capturing, Analyzing or Presenting data. It is the kind of analysis that I am doing or can do that pushes me to the realm of data science. Data analysis is broad; statistics (Descriptive, Inferential, Correlation ), Grouping, Searching and Mathematical modeling. As long as you can do them using the right set of tools you qualify to call yourself a data scientist. the only question then becomes how experienced you are?

Comment by Liu Shuanglu on September 27, 2013 at 3:59am
Seems like BI is different from the data science. What's the difference? what a typical day of people in BI?

Follow Us


  • Add Videos
  • View All


© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service