In this article, I share some of my unusual views about big data and data science. My next article will be about new trends in data science.

Big data is necessary in more applications than people think

And used with success. It includes

  • Scoring transactions (credit card purchases) in real time
  • Automated piloting
  • Monitoring car traffic (sending alerts, finding optimum routes). Same with planes.
  • Digital publishing (feed management, click and revenue attribution, tagging and content categorization)
  • Prediction of Earthquakes, eruptions (volcanoes), local or long-term weather, solar flares
  • Automated investment strategies (WealthFront is doing that for Google employees)
  • Census data
  • Automated bidding strategies (stock market, AdWords)
  • Relevancy engines, detection of fake reviews
  • NSA, network security, tax fraud, insurance, healthcare fraud detection
  • Military intelligence: satellite image analysis to detect weapons being moved around (tanks, missiles etc.), detecting and analyzing financial transactions from state-sponsored terrorists
  • NASA - discovery of new planets (astrophysics)
  • Mobile data analysis (text messages sent by users) or tweets analysis, to monitor the spread of a disease (Ebola etc.)

Healthcare, HR, and law will also benefit from big data, to create customized drugs and reduce healthcare costs, to hire the right candidates by automatically analyzing his tweets, and to better identify criminals (currently, many cases are not pursued because the evidence, based on expert opinion but not on data, is not strong enough for conviction)

Small companies can leverage external data

They don't necessarily need internal data. Third party research data (though not always accurate or filtered) will tell you a lot about your market and your competitors (I use Quantcast and other similar vendors, in my case). It is a great source of competitive intelligence. If you outsource some of your processes (newsletter management) to a vendor, your vendor will provide detailed reports about conversions, clicks, unsubscribes, broken down per segment. This data can be enhanced if you combine it with Google Analytics tracking of your websites.

Interestingly, as a digital publisher, data is our #5 asset, after our members, people working with us, our content, and the volume/quality of our traffic. And we use data to understand our community, identify opportunities and trends, and to produce research reports valuable to our clients (it has an impact on what material they promote with us, help them understand trends and opportunities, and eventually, boost the ROI that they derive from us).

Many still don't understand what data science is

Some think that a data scientist is a statistician or a machine learning guy. Data science overlaps with many fields, and has its own core. See the following two articles for detail:

Here are three technical articles that show how the data science approach is different from the statistical approach:

Much of data science is actually about automating the job of the statistician or other analytic experts, providing simple, robust, black-box solutions that can be used reliably by non-experts. Data scientists have also business and domain knowledge. If you hire a pure geek because he has R, Hadoop, and Python on his resume, and pay him a $160k salary, your ROI on this employee might not be positive. Plenty of great candidates are undetected by HR radars because the way automated resume filtering currently works. At the end, companies believe that data scientists are unicorns, and data scientists believe that jobs are scarce and hard to get. Many times, I suggest data scientists to become consultants, create their own company, sell data, become a publisher, or develop some data-intensive apps or systems (for instance, a platform that would display the price of most medical procedures for each hospitals, based on sample data and crowdsourcing).

Data science training must change

Because many traditional data science programs are just a relabeling of operations research, statistics or computer science curricula, taught by adjuncts that are paid very low salaries and have no business experience, we have now a bunch of candidates that are not real data scientists, compounding the myth that (real) data scientists are unicorns.

Things are changing for the better: programs like Zipfian Academy (sponsored by LinkedIn, Facebook etc.), or our Data Science Apprenticeship are project-based, free, online, on-demand, last only a few months, and allow students to work on projects that benefit the parent organization. A win-win for the students and training organization. Finally, lot of free data science material can be found on DataScienceCentral, some state-of-the art, some research-level, but most applied to real, modern business problems.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 5188


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sione Palu on November 10, 2014 at 1:00pm

I believe that Wall St  top hedge-fund Renaissance Technology was doing market big data analytics more than a decade ago or so for doing automated/algorithmic trading before big data is common everywhere today :

"Renaissance hedge fund: Only scientists need apply"


© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service