Subscribe to DSC Newsletter

Data Science has a Terminology Problem

Having thoroughly enjoyed the debate around Bernard Marr's post, Why so many fake data scientists?, it occurred to me that "data scientist" is not the only problematic term in our industry. Many of the most common data-related terms and concepts are also ambiguous or poorly-defined.

Here are some of the terms that cause me frustration.

Why is data science the only field whose practitioners are called "scientists?"

In every other field, a "scientist" is a researcher, often in an academic setting. To take an example from a related field, a computer scientist is someone who works on the theoretical aspects of programming and computation. The practitioners who apply computer science to design algorithms and build software are called engineers.

By analogy, shouldn't data scientists actually be called data engineers? And that leads to the next question.

Why is someone who designs data infrastructure called a data engineer?

An engineer who designs physical infrastructure such as bridges and roads is not called a traffic engineer. He / she is a civil engineer.

Why is the infrastructure that controls the flow of data any different? By the same logic, shouldn't a data engineer really be called a data systems engineer, or something similar?

What is the definition of big data?

Does it mean: 

  • A whole lot of regular data?
  • A general trend toward more availability data / the exponential growth of the digital universe?
  • A specific class of data that is so big/diverse/complex that traditional analysis methods no longer work?

Here are two businesses that should know the definition better than anyone, and even they do not agree.

"Big Data refers to technologies and initiatives that involve data that is too diverse, fast-changing or massive for conventional technologies, skills and infrastructure to address efficiently."

MondoDB

"Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured."

SAS

Big data is way too commonly used to be so poorly defined.

What does data mining mean?

Since you're reading a post on Data Science Central, chances are you know what data mining is. It involves the extraction of knowledge/insight from raw data. However, when I see someone use the term online, what they usually mean is "extracting raw data from the web."

There is no question which definition is technically right. But in practical terms, unless you are speaking with someone well versed in the subject, using the term "data mining" is likely to lead to confusion.

If the purpose of data visualization is to make data science more accessible, why is the term so technical sounding?

The results of data science are only as valuable as what the practitioner is able to communicate, which is why data visualization is such an important piece. It has the ability to take complex concepts and present them in a way that is accessible to anyone.

However, the term data visualization itself is about as inviting and digestible as a computer science text book. I like the idea of making data accessible to people who are scared of data, and I run a website dedicated to that purpose. But when I describe it as "data visualization," people's eyes glaze over.

If the entire purpose of data visualization is to remove technical barriers from communication, shouldn't it have a less technical sounding name?

Can a room full of data scientists agree on what data is (or is it "data are")?

According to IBM, 90% of the world's data was created in the last 2 years. But if there is no differentiation between useful and useless data, do those numbers have any practical meaning?

In this case, IBM is referring to the entire digital universe, which would include all the cat videos ever uploaded to Youtube. When discussing the importance of business intelligence, would anyone feel comfortable pointing to that and calling it data?

Would love to hear some other opinions on this topic. Do you find that data-terminology gets in the way of communication?

I am a [questionably "fake"?] data scientist, financial/insurance modeler, software engineer, and entrepreneur. I also write about data and data visualization at Metrocosm and as a contributor for the Huffington Post.

Connect with me on Twitter at @galka_max

Views: 2494

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sione Palu on August 10, 2015 at 2:07am

Great Late Richard Feynman took a snide remark about data science over 30 years ago.

https://www.youtube.com/watch?v=IaO69CF5mbY

Comment by Javier Alonso on August 8, 2015 at 8:50am

"Scientist", by definition, is the one that applies the rules of the scientific method to search for the truth. Academy, amateur, Ph Dr, or plain citizen are scientist if they apply The Method to the quest of understanding things.

This said, of course there is a confusing lot of terms showing up. Remember the case of "Operational Research" ¿What was that really? ¿A bunch of techniques, a Know How pool of recipes?...

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service