Home » Uncategorized

Can you be a Data Scientist without coding?

Whether you can call yourself a data scientist if you can’t code is as hotly debated as Brexit. Type the question “Can you be a Data Scientist without coding?” into Google and you’ll get a hundred different answers. The opinion will vary wildly depending on whether the author is a coder, or a non-coder. Search the job listings, and you won’t find a definitive answer there either. A Glassdoor survey on the skills required for data science job postings showed that for every 10 job postings, nine required at least Python, R, and/or SQL skills. So while coding is a “common” requirement, is it a necessity in today’s ever changing machine learning landscape?

Rather than fill the internet with yet another opinion (as I am not a coder, my opinion would be quite biased), I thought I would perform a little meta analysis. For this post, I pooled data from about sixty different sources, to uncover current thinking on what is quite a debatable topic. You won’t find all sources listed below, as I have no wish to bloat out a blog post with a dissertation worthy reference list. But my methodology was simply to type the question into Google, and click on the first  25 search results for each category (Opinion Sites/Business/*.edu/). I figured that would get me the current/popular thoughts. Any “Wiki” style sites were filtered out (as it would be difficult to ascertain whether those posts were the opinions of a business person, academic, or other), as were multiple posts from the same site.

Can you be a Data Scientist without coding? The NO camp

Opinion sites (like newspapers and magazines) and bloggers are firmly split down the middle of the “coding or no coding” debate. The analytics dude Eric Hulbert sums up the answer to the question “Can you be a data scientist without knowing how to code?” with a single word “Nope” (although he does go on to explain the “nuances” of that statement). Rachael Tatman, writing on freeCodeCamp states that every data scientist should be able to “write code for statistical computing and machine learning.”  Ronald Van Loon agrees, giving a rather lengthy list of required technical skills including knowledge of programming languages like Python, Perl, C/C++, SQL and Java plus expertise in tools like SAS, Hadoop, Spark, Hive, Pig.

Also in the “No” camp is the Executive recruiting company Burtch Works, which lists the following “Must-Have Skills that employers are looking for: Python coding (along with Java, Perl, or C/C++) and machine learning. Experience with Hadoop, Hive, or Pig is a “strong selling point.”

In education, many university websites tend to be planted firmly in the “no” camp, but many of the articles are outdated when you consider that DS has only been a “thing” for a decade. For example, this article on Columbia University’s website (from 2013) is titled “Statistics is the least important part of data science”.

Ouch.

With today’s ever changing DS realm, one could argue that statistics is one of the most important parts of data science. In 2018, it’s certainly not the least important.

Georgetown university’s Advice to Future Data Scientists also stands firmly in the No camp; “Write Code, Any Code.” That said, further down the page the article states “Focus on what you’re good at. Not everyone is a programmer. Not everyone is a statistician…Whatever interests you, whatever talent you have, augment your assignments with that.” At first, that sounds like you might be able to get away with just being a statistician. But notice the word augment  in there: they are still telling you to code, code, code–and augment it with other things (like programming or statistics).

Can you be a Data Scientist without coding? The YES camp

Blogger Tom Wentworth’s opinion, writing for Rapid Miner: “Yes, you can do real data science without writing code. ” Perhaps the most important question here is: why don’t you need to be able to code to be a data scientist? Many people give solid reasons why. Here are a few of the more popular:

  • Common algorithms are already known, coded and optimized.
  • Explicit coding is being replaced with drag-and-drop interfaces, like Trifacta and Tableau.
  • Data science is becoming more automated with options like Google’s Cloud AutoML or DataRobot, both of which help you to find the right algorithm. Google promises that you can “Train high-quality custom machine learning models with minimum effort and machine learning expertise.”
  • Talking of automation, the Google Duplex demo hinted at the future of AI. The future data scientist might simply be having a conversation with a machine, rather than coding one.

Also in the Yes camp is Dana Parks’ 2017 dissertation titled Defining Data Science and Data Scientist. Parks’ dissertation is an attempt to “[establish] set guidelines on how data science is performed and measured. Important findings include that the “Ability to write code using Java, C, C++ and HTML [is] common among practitioners.” However, “The academic community did not consider this skill as necessary for data scientists. ” This is a fairly up to date paper, and a much more in depth analysis of current thinking on the coding debate that I can squeeze into a blog post. Therefore, it’s worth a read (and is available in it’s entirety here).

Can you be a Data Scientist without coding? The “Maybe” camp

Bob Violino offers little hope for the non-coder: “top-notch data scientists know how to write code”. While that means you can’t rise to the very top without knowing code, you can probably still get a foot in the door. And as you make your way up the ranks, Violino offers this nugget of advice: “If a data scientist doesn’t understand how to code, it helps to be surrounded by people who do [like a developer].”

While it doesn’t explicitly state much about the requirement of coding, this 2017 article on the University of Notre Dame website states that “employers are increasingly valuing data scientists who can take on more responsibility beyond their coding or technical duties.” Even more interesting is the trend for higher salaries to be linked not to coding expertise but to the amount of time spent in meetings. The top salaries for data scientists tend to go to those who spend more than 20 hours a week in meetings. 

1187827260

Image:  2017 Data Scientist Salary Survey from O’Reilly, posted on the University of Notre Dame website.

If that sounds like you might want to brush up on your interpersonal relationship skills instead of coding, don’t snap your laptop shut yet. This University of Wisconsin article notes that, “the highest data scientist salaries belong to those who code four to eight hours per week; the lowest salaries belong to those who don’t code at all.” The biggest coding factor that affects salary? Knowledge of Apache Hadoop will affect your salary by 8%. Before you get alarmed though, bear in mind that a lot of institutions are merely using statistics to sell their wares. The University of Wisconsin notes that “only a data science masters degree will give you the precise education you need to be ready for a career in data science,” placing them firmly in the “Maybe” camp.

Conclusion

There are strong voices on both sides of the data science and coding debate. Perhaps the two antipodean camps are a product of the recency of data science and the lack of a solid definition of what exactly a “Data Scientist” is. Trying to pin down a solid definition for “Data Scientist” is like pinning down a universal definition for “expert”. John D Cook attempts to clarify the DS definition by splitting data science into two realms: The statistician who can code (which cook calls “Type A”) and the strong coders / software engineers who know a little statistics (“Type B”).

The definition is hard to pin down and seems to be ever changing. Ten years ago, very few people were using the job title “data scientist”, and there wasn’t such a thing as a “data science” degree. The internet was still in its infancy and coding was a must. Nowadays, with many algorithms already worked out and universally available online, coding doesn’t seem to be a “must” any more. 

If you’re a non-coder then, what do you need? Meta S. Brown, author of Data Mining for Dummies, wrote in a Forbes article that “…a person with a Bachelor’s degree, some worthwhile industry experience, and skills in statistics or programming…has a chance to get into the data science game.” So if you decide you’re not a coder, and you want an alternate path to becoming a data scientist, then statistics is the way to go.

My two cents? Ignore (for the most part) the confusing “advice” dished out on the internet, and simply do what you love. If statistics is your thing, then study that and forget coding (for now). Like coding? Forget the statistics (for now) and code your happy heart out.

References