In the nascent field of Data Science, myths are abound. Here’s my top 10, scoured from the internet (where better than to find a myth or two?).
Myth #1: It’s a male dominated field.
This one is only part myth. Historically, women have been discouraged from entering the computing sciences for many reasons unrelated to talent (see my previous post, On being a Female Data Scientist), and in 1975, the Good Ol’ Boy’s Club was in full swing. But this isn’t 1975, it’s 2019–a new era where women are welcome.
While it’s true that around 70-80% of data scientists today are male (BetterBuy puts this figure at 74%), that doesn’t mean that it’s fair to call it a “male dominated field”. To put this in perspective, roughly three quarters of K-12 teachers are women. But teaching is never described as a “female dominated field”, and men wouldn’t be discouraged from a career as a teacher. Many great teachers are men, and many great data scientists are women.
In fact, some of the best data scientists today are women, including those that Forbes describes as “Leading The Pack in Data Analytics”, including:
- Pamela Baker, project management lead for lean six sigma and others,
- Brenda L. Dietrich IBM Fellow and Vice President, Data Science,
- Vivian Shangxuan Zhang, CTO of NYC Data Science Academy.
If you love statistics or programming, or something in between, don’t let misconceptions steer you away from DS. And if you want inspiration? Consider attending Stanford University’s Women in Data Science (WiDS) conference, a female-Data Scientist extravaganza with 100,000 participants and 150+ worldwide regional events.
Myth # 2: You have to know how to code
I don’t code (other than a smattering of R and html that I’ve picked up over the years), nor do countless other data scientists. That’s not to say you shouldn’t worry about learning coding, because it’ll certainly help you in your career path. But, thanks in part to the development of many popular algorithms, coding is no longer the necessity it once was. Even if you’re aspiring to get a master’s degree in Data Science, you don’t necessarily have to have a programming background. For example, Notre Dame’s Master’s program doesn’t have any programming prerequisites. For a full explanation of why you don’t need to know how to code, take a look at my post from last week–Can you be a data scientist without coding?
Myth # 3: You have to be an egghead to become a data scientist
You don’t have to be a great statistician or mathematician to become a DS–but it helps. DS is all about team work, and team members have strengths and weaknesses. Many Data Scientists are cross-disciplinarians, with some knowledge of stats and coding, plus a dash of business acumen/ethics/interpersonal skills thrown in. As well as working with data, you’ve got to be able to share the results of your amazing analysis with colleagues in an understandable way. No one is going to have all the skills necessary to wear all of the DS hats at the same time; Even if you excel at both coding and statistics, you might lack in the creativity area or interpersonal skills area. And , if you’re a great coder, it’s completely possible to become a data scientist with only a basic understanding of mathematics and statistics.
Myth #4: A Master’s degree in Data Science = Data Scientist
If you think it just takes a degree to become a data scientist, then think again. A master’s degree will get you closer to your goal, but it ‘s not a final destination. Working with real data, involving real people, is a lot different than working with hypothetical scenarios in school. In order to call yourself a DS, you’ve got to dive into the real world. If you’re lucky, you might step into a DS position right away, but you won’t actually be a real DS until many months later. You probably won’t be a good DS until several years after that, when you know everything about the particular data you’re working with.
Myth #5: “Data Scientist” and “Business Analyst” are the same thing
The term “Data Scientist” is only 10 years old, and hasn’t actually been defined well yet. So it’s not surprising there’s confusion with other job titles. Perhaps the most common misconception is that a DS is just a new term for an old school BA.
A Business Analyst isn’t a Data Scientist, but a Data Scientist is part Business Analyst. Data Scientist Emily Thompson explains that an old-school BA was basically given nice, cleaned data, ready for analysis. In addition to data analysis, the modern Data Scientist has to find/create that data in the first place. Business analysts can answer the question “When did it happen?”, while the Data Scientist can tell you if it will happen again.
Myth #6: There’s a Shortage of Data Scientists.
According to a 2016 article by Thomas Davenport (The Myth of the Data Scientist Shortage), this was true in 2013 but not so in 2016. Back in 2013, DS was still in it’s infancy, with few college programs. By 2016, more than 100 DS programs were in existence, including at well-known institutions like Northwestern, NYU, and UC Berkeley. Around the same time, McKinsey Analytics produced a report that seemed to indicate data scientists were in short supply, but this report was based on data from preceding years. Much has happened since 2016 in the realm of DS.
Fast forward to 2019, and the education offerings are staggering, with tens of thousands of new graduates each year. In 2013, you probably had to attend a big school on the East or West Coast to find DS offerings. Nowadays, it doesn’t matter whether you live in New York City or Alabama: there are programs near you. For example,The University of Alabama at Birmingham, The University of Tennessee at Knoxville, and The University of Utah all offer graduate degrees in data science. Massive open online courses like this DS Master’s from Coursera & Michigan State, have even made a DS degree possible from the comfort of a couch.
Don’t let the glut of data science graduates put you off reaching your goal. While the shortage of a few years back has been eased somewhat by the myriad of course offerings, remember that there is always a shortage of great data scientists.
Myth #7: Data Scientists Earn the Big Bucks.
Data scientists tend to be well compensated, but don’t sign up for that 50k master’s degree program before doing your salary homework.
2018’s 5th-annual Burtch Works Study: Salaries of Data Scientists reported median base salaries of $90k to $165k, depending on experience. Managers earned even more, from $145k to $250k. While that sounds enticing, those upper-end salaries tended to go to those with PhDs. In other words, you can expect to earn a median base salary of 90k as a new graduate with a master’s in data science. Consider though, that small word inserted into the data: median. If you know something about statistics, you can probably sense what I’m going to say next: half of all data scientists earn less than the median salaries reported. So don’t let the promise of a high salary tempt you into an expensive education program; Make sure this is something you actually want to do for a living, even if your salary is less than wonderful.
Indeed’s latest figures (as of March 2019) put the national average at $41k to $262k. As a top level manager with a PhD, working in San Francisco, you can expect to earn close to that $262k average (but you’ll have those exorbitant housing costs to deal with). If you’re in a small town in the South, your salary will likely be closer to the bottom. The employer will also matter. For example, Amex averages 200k, while the University of Pittsburgh will pay you about one quarter of that (59k). Again, bear in mind that we’re talking about “averages” here–many data scientists are earning a lot less than those numbers, so research salaries in your town before enrolling in a master’s degree.
Myth #8: AI Will Replace the Data Scientist
A DS is part number magician, part desktop philosopher. AI can definitely provide the tools a DS needs, but we’re a few iterations away from replacement. Google’s Cloud Machine Learning (ML) Engine can “…design and evaluate model architectures to achieve an intelligent solution faster and without expert”. But it can’t make ethical decisions, decisions that involve personal risk, or make any nuanced judgment calls (well, perhaps it can, but perhaps you wouldn’t want it to!).
While an AI can process data at roughly 2 million times the speed of a human, it’s nowhere close to the creativity of Picasso, Mozart or Stephen King. While a computer has become pretty competent at facial recognition (like Microsoft Azure), choosing appropriate wardrobe items (à la Stitch Fix) and detecting fraudulent credit card activity, it fails at true creativity–which is at the core of Data Science. Currently, computers have trouble copying creativity, and are “…very far from long narrative arcs” (Google Brain researcher Douglas Eck, cited in a 2016 Quartz article).
Myth #9: It’s all about the tools
Granted, a little working knowledge of SAS or R can be a great addition to your toolbox, but they are just that– tools in your toolbox. While you won’t get electrocuted by using the wrong data analytics tool, you might corrupt data or cause your entire team to work the weekend to fix the damage. It’s not just about the tools: it’s about applying your knowledge to the specific data at hand, for the specific business purpose you’re tasked with. It’s about communication, people management, and ethics. So fill your toolbox with a few multi-purpose tools, but learn how to use those tools creatively and on the job.
Myth #10: Data Science is a lifetime career.
No job title is forever. If you don’t evolve over your career, you’ll become one of those sad, dated dinosaurs before you’re out of your thirties. When I first started in computing (back in the 1980s), I worked in the data library (yes, get that chuckle out of the way now). My main task was to mount tape reels and cassettes. It was unfathomable at the time that artificial intelligence could replace the tape mounter, running around at full speed with an arm full of tapes, frantically mounting, dismounting, and re-stocking the tape shelves. But the robot tape library was already in the works. By the early 90s, the human tape mounter was history, replaced by the robotic tape silo. In 2019, cloud-base tape mounting is fast replacing those robots, like the virtualized tape storage seen on IBM’s Z servers. Other notable computing-related careers that are on their way to the purge queue:
- Data entry personnel (fast being replaced with scanning technology).
- Computer programming (not quite dead, but not the hot career it once was, according to Moneywise).
- Computer operators (will decline by 19% in the next decade, according to BLS projections ).
Note that thirty years ago, most people working in computing were working in one of those fields. If there’s one thing you know for sure when it comes to computing, is that nothing lasts forever. While it’s a good bet that the data scientist has a slightly longer shelf life than the tape librarian, don’t count on it, and make sure you keep up with the times!
Academia to Industry: Data Science Myths and Truths
The Myth of the Data Scientist Shortage
Universities Get Creative with Data Science Education
AI Joins The Fight To End Credit Card Fraud
Google is launching a new research project to see if computers can …
Burtch Works Study: Salaries of Data Scientists