In this article, I share some of my unusual views about big data and data science. My next article will be about new trends in data science.
Big data is necessary in more applications than people think
And used with success. It includes
Healthcare, HR, and law will also benefit from big data, to create customized drugs and reduce healthcare costs, to hire the right candidates by automatically analyzing his tweets, and to better identify criminals (currently, many cases are not pursued because the evidence, based on expert opinion but not on data, is not strong enough for conviction)
Small companies can leverage external data
They don't necessarily need internal data. Third party research data (though not always accurate or filtered) will tell you a lot about your market and your competitors (I use Quantcast and other similar vendors, in my case). It is a great source of competitive intelligence. If you outsource some of your processes (newsletter management) to a vendor, your vendor will provide detailed reports about conversions, clicks, unsubscribes, broken down per segment. This data can be enhanced if you combine it with Google Analytics tracking of your websites.
Interestingly, as a digital publisher, data is our #5 asset, after our members, people working with us, our content, and the volume/quality of our traffic. And we use data to understand our community, identify opportunities and trends, and to produce research reports valuable to our clients (it has an impact on what material they promote with us, help them understand trends and opportunities, and eventually, boost the ROI that they derive from us).
Many still don't understand what data science is
Some think that a data scientist is a statistician or a machine learning guy. Data science overlaps with many fields, and has its own core. See the following two articles for detail:
Here are three technical articles that show how the data science approach is different from the statistical approach:
Much of data science is actually about automating the job of the statistician or other analytic experts, providing simple, robust, black-box solutions that can be used reliably by non-experts. Data scientists have also business and domain knowledge. If you hire a pure geek because he has R, Hadoop, and Python on his resume, and pay him a $160k salary, your ROI on this employee might not be positive. Plenty of great candidates are undetected by HR radars because the way automated resume filtering currently works. At the end, companies believe that data scientists are unicorns, and data scientists believe that jobs are scarce and hard to get. Many times, I suggest data scientists to become consultants, create their own company, sell data, become a publisher, or develop some data-intensive apps or systems (for instance, a platform that would display the price of most medical procedures for each hospitals, based on sample data and crowdsourcing).
Data science training must change
Because many traditional data science programs are just a relabeling of operations research, statistics or computer science curricula, taught by adjuncts that are paid very low salaries and have no business experience, we have now a bunch of candidates that are not real data scientists, compounding the myth that (real) data scientists are unicorns.
Things are changing for the better: programs like Zipfian Academy (sponsored by LinkedIn, Facebook etc.), or our Data Science Apprenticeship are project-based, free, online, on-demand, last only a few months, and allow students to work on projects that benefit the parent organization. A win-win for the students and training organization. Finally, lot of free data science material can be found on DataScienceCentral, some state-of-the art, some research-level, but most applied to real, modern business problems.