There has been much discussion and debate about the definition of data science and the new rare breed of sexy bird called the data scientist. The Data Science Association defines "Data Science" as the scientific study of the creation, validation and transformation of data to create meaning; and the "Data Scientist" as a professional who uses scientific methods to liberate and create meaning from raw data.
While these definitions may appear overbroad, think about the definitions of a lawyer or physician. A lawyer is a legal professional who can help prevent or solve legal issues and a physician is a health professional who can help prevent or cure health issues. Like the professionalization of law and medicine in the past hundred years, data science is at the very beginning of becoming a profession - with competency standards and a Data Science Code of Professional Conduct.
This means that data science will evolve into a profession where data scientists specialize in different areas - like lawyers and physicians. When you need to hire a lawyer you usually consider the special area of law that a lawyer practices. If you have a tax problem you hire a tax lawyer, not a divorce lawyer. If you have a heart problem you do not hire a gynecologist.
The simple truth is that data science is a vast and complicated field and - like law and medicine - much too big and complex for a person to master in one lifetime. My colleague Gary Mazzaferro has been exploring the concepts and ideas surrounding data science and definitions as formalizations aligning with knowledge economies and the knowledge / science / technology maturity models. Gary has (to date) defined the following data science specializations and types of data scientists:
Data Science: A field of systematic interdisciplinary study to elucidate relationships across and within Formal, Social Natural and Special Sciences phenomenon through the application of scientific methods. Interdisciplinary areas include analytical processes, mathematics, probability and statistics, logic, modeling, machine learning, algorithms, communications, traditional sciences, business, public policy and philosophy.
Blue Sky Data Science: A purely curiosity driven exploratory branch of Data Science oriented towards the development and establish understanding about relationships across and within phenomenon with no focus on specific goals and immediate application.
Basic Data Science: A branch of Data Science research focused on clearly defined goals and oriented towards the development and establish understanding about relationships across and within phenomenon.
Applied Data Science: A branch of Data Science oriented toward the development of practical applications, technologies other interventions including engineering practices. Applied Data Science bridges the gap between Basic Data Science and the engineering domains to provide predicable, usable tools to industries including standard methods and practices.
Data Science Practice: The regular performance of Applied Data Science activities and methods for private and public organizations. May practice externally or internally. Practice may necessitate additional disciplines based on the needs of the organization including domain expertise and communications supporting presentation and reporting activities.
Data Scientist: A person that studies or has expert knowledge of the interdisciplinary field of Data Science.
Blue Sky Data Scientist: A person that studies or researches in the branch of Blue Sky Data Science.
Basic Data Scientist: A person that studies, researches or has expert knowledge in the branch of Basic Data Science.
Applied Data Scientist: A person that studies or researches in the branch of Applied Science.
Note that this is a preliminary list and is not complete. The profession of data science will evolve to create many specializations. After all, it took law and medicine over one hundred years to evolve as professions with different specialties.