Genevera I. Allen (left) is professor in the Departments of Statistics, and the Electrical and Computer Engineering, at Rice University. Corinne Cath (right) is a doctoral student at the Alan Turing Institute, the national institute for data science in UK. Below are extracts of recent interviews that are most relevant to our audience. Links to full interviews are provided.
Genevera, what do you think of the shift from “Statistics” to “Statistical Learning and Data Science” in the statistics community (The “Data vs Math” Question?)
I am huge proponent of data science. I think it’s very critical to scientific discussions, and especially important that statisticians are involved. As a discipline (especially in academic circles) we place a preponderance of emphasis on mathematical / statistical rigor and theory; and this often comes at the expense of focusing on the intricacies of the data itself. Data should be paramount and the focus in everything that we do. Rich mathematics and theory are important in statistics, but it should be put in the context of supporting our understanding of data. Specifically, I think that the really important questions in science (that need to be addressed by data) are not addressed by methods that were developed from small, clean, toy data sets, that lend themselves to the assumptions of clean statistical theory. The data sets that are most important, the ones we need to be working with, are big, messy, and complex; they present so many initial statistical challenges that you don’t know where to start. Just because the dataset doesn’t lend itself to beautiful, clean, math to describe the problem, doesn’t mean that the dataset should be ignored. Instead the statisticians should feel encouraged to tackle these complex problems, put data challenges first, and the nice mathematics can follow in its own time. Also, practical, effective, or heuristic methods should not be discarded just because we don’t yet understand all of their statistical properties. If a method is effective, there’s generally a very good mathematical and scientific reason that it is. In this respect, we could be a lot more like engineers and computer scientists because they go out and do things that are useful. I’m a very big proponent of “do something”. Then, we can go back and study the mathematics / theoretical statistical properties of the practical, effective methods.
Corinne, few young women take up technology subjects and careers; just 16% of the graduates in computer studies are women and the figure is 14% for engineering and technology*. Why do you think this is the case?
I think one of the main problems is – to stay within Internet parlance – that there are very strong institutionalized “memes” about which type of people are expected to excel in what kind of professions. Unfortunately, these memes are often based on flawed stereotypes and prejudices. This holds not just for gender, but also for class, race, and a host of other issues. In practice, it means that for instance girls are often not encourage to the same extent as boys to pursue certain interests or careers at a young age. Over the years this adds up. Not just on personal level, but on an institutional level as well.
The challenges and obstacles that women face to access scientific fields vary. Like with most sectors, sexism remains a serious obstacle. But we need to be careful about defining gender equality as the barriers that women face. In my opinion gender inequality can only be tackled when we understand gender as a broad term that includes a variety of gender identities and expressions, and when such gender-based inequality is addressed in tandem with the other forms of structural discrimination like racism, classism, or ableism.
Top DSC Resources
- Article: Difference between Machine Learning, Data Science, AI, Deep Learnin…
- Article: What is Data Science? 24 Fundamental Articles Answering This Question
- Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
- Tutorial: Data Science Cheat Sheet
- Tutorial: How to Become a Data Scientist – On Your Own
- Tutorial: State-of-the-Art Machine Learning Automation with HDT
- Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
- Tools: Hadoop – DataViZ – Python – R – SQL – Excel
- Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
- Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
- Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
- Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
- DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus