As an academic discipline, the rate of maturation for data science should be measured in light years. Although it's really only about 10 years old as a field of study – with the first Ph.D. program in the country emerging just four years ago – most, major universities across the world have integrated data science into their portfolio of degree options. Universities – not typically known for nimbleness or their even their receptivity to market forces – have responded to the calls for developing the talent needed to translate increasingly massive amounts of structured and unstructured data into information.

Over the last 10 years as an academic engaged in data science programs at my own university, I have seen a shift in the conversations that take place at conferences, in panel discussions and in coffee shops. A few years ago, the conversations related to data science centered on computational skills – students needed programming languages, combined with “deep learning”, “machine learning”, and “predictive analytics”. It was all about what data science students could do with data.

Today, the myriad of issues related Facebook’s use of personal data (e.g., the Cambridge Analytica data leak, their emotional contagion testing, fake quiz apps) or the inherent racial bias that has manifested through facial recognition..., now have pivoted these conversations away from what students can do, and more to what they should (or should not) do. Part of this conversation includes a thread related to what responsibility academics have to teach ethics as part of a data science curriculum. Most academics are unsure how to engage this thread.

I think it is helpful to frame this shift from a conversation related to computational skills to one related to ethics in data science in the context of Maslow’s Hierarchy of Needs. Simplistically, this foundational theory in Psychology places human needs into five tiers – from basic physiological needs (e.g., food and water) up to “self-actualization” (i.e., the ability to achieve one’s full potential). The needs are typically placed in a hierarchical pyramid, where progression from one tier is contingent on the achievement of the tier below. In other words, you are not worried if you look better in mauve or purple if you have not eaten in two days.

I propose an academic “Maslow’s Hierarchy of Data Science”:

In the Hierarchy of Data Science as an academic discipline, students MUST learn the basics of mathematics as a starting point. This is true, because the higher level concepts of statistics, computer science and programming are grounded in the basics of algebra, matrices, discrete optimization…graph theory and calculus don’t hurt.

Only AFTER students understand the core concepts from statistics and computer science (note that this is an “and” rather than an “or”), will the skills related to modeling and classification, followed by the ability to communicate results (particularly to a non-scientific audience) actually make sense. Borrowing from the point above, you really should not be worried about visualizing the performance of the model, if you have no idea which features (variables) were actually used. Or why. Or in what form. Or if they are biased.

While the concept of the “citizen data scientist” has its place, I believe that the national conversations related to ethics in data science, have emerged partially as a function of academic programs trying to shortcut this hierarchy.

In the rush to become a “data scientist”, students are given opportunities to take the one year certificate or the online course, which will allow them to update their LinkedIn profile. In the process, they skip the science disciplines and go right to the business disciplines of analytical modeling (through a point-and-click interface) and visualization, without any understanding of HOW the algorithms they are “pointing and clicking” actually work. Or if they even make sense. Frequently, this generates meaningless output – like the highly-paid analyst who used social security number as a predictor (true) – or in the worst case, algorithms (unintentionally) built on racially biased data.

The national conversations related to ethics in data science are much needed and are a manifestation of the maturing of the discipline. However, an important thread in these conversations in conference panels and in coffee shops, needs to include an acknowledgement that shortcutting the science disciplines has contributed to the problem.

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central