Subscribe to DSC Newsletter

Do you know what is bigger than Big Data?

From episode 10 of my Naked Analyst Channel on YouTube.

I think I do - and it is the ‘appification’ of analytics. What I mean by this is the reduction of a complex analytic activity such as market segmentation, down to a single button on your computer interface. Very much like the Apps on your smartphone, tablet or increasingly your desktop.

That’s what it looks like but the impacts are more profound. That’s because it makes it possible for analytics to be successfully done by people who may not understand how it works, but do understand the ‘why’ and ‘when’ they need to do it.

For example, a marketer in a company can access more sophisticated views of their campaigns without the need of a specialist analyst. Appification extends the range of analytic things that a non-specialist can do.

This appification is made possible because of three things that have emerged in recent years:

  1. The rapid increase in the number and sophistication of APIs (Application Programming Interface).
  2. The rise of open source analytic platforms like R ( These platforms have created vast libraries of algorithms freely available to anyone who knows how to use the platform.
  3. The shear number of people and organisations involved in creating open source analytic platforms like R.

The last enabler needs a little further explanation. R is a free software programming language and software environment for statistical computing and graphics. It contains thousands of packages (10,000?) specializing in topics like econometrics, data mining, spatial analysis, and bio-informatics. Nobody knows how many R users there are, but a reliable estimate (see puts it in the millions. Many thousands have helped R develop over the years.  I think that this sort of large-scale self-organising open source effort is beginning to teach the world how to use analytic algorithms.

The above is all supposition, but I can back this up with evidence. Here are 4 examples of algorithm markets - or at least they exhibit varying degrees of ‘algorithm marketness’.

  • Dataxu - senses and reacts in real time to changing consumer behavior. Openness is at the technical core of DataXu. The DataXu platform is a flexible technology stack with open APIs that make it easy to integrate and extend functionality.
  • - delivers machine learning for streaming data. The cloud platform makes it easy to use machine learning algorithms to classify streaming data from connected devices. Turn the Internet of Things into the Internet of Action
  • Snapanalytx - aim to provide predictive analytics for all and make them more accessible and affordable.
  •  Algorithmia - are building a community around state-of-the-art algorithm development, where users can create, share, and build, on other algorithms, and then instantly make them available as a web service.

In conclusion, there is still one issue needing resolution before algorithm markets take-off: How will the world’s business people get data into and out of these algorithm apps? I’m not sure yet, but I think the answer will be more apps. Apps that themselves appify the transformation and loading of data into and out of algorithm apps.

Confused? Well don’t be, like most new and shiny things, what we are talking about is just the next generation of ETL - something business intelligence people like myself have been building for the last 20 or more years.

What’s old is new again? Maybe, but from my perspective the future looks exciting for analytics.

My YouTube channel contains tips from a 25 year veteran of the analytic profession.

Views: 3143

Tags: Analysis, Analytics, Applications, Big, Cloud, Computing, Data, Integration, Learning, Machine, More…Mining, Open, R, Self-service, Source, Statistical


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sione Palu on October 14, 2014 at 1:17pm

Here's a free Matlab Dimensional-Reduction  API that may be useful to users:

The author's paper on the review of the algorithms is here:

"Dimensionality Reduction: A Comparative Review"

Comment by Sione Palu on October 13, 2014 at 5:05pm

I think that Matlab is wider than any other language in numerical computing. Not only the product has extensive libraries & toolboxes of its own, but other free Matlab APIs made available by academic researchers is abundant. For example,  one can get free state-of-the-art algorithms (recently available in the literature) in "Dimensional-Reduction" on the net in Matlab, no on one algorithm but tons of them. One can find such algorithms in R or python packages, but not many variants (perhaps at most 4 with standard ones like PCA, SVD, etc...). The reason is because Matlab was built from day one to be wide as possible, like its the dominant tool in most or all engineering schools at universities around the world (with APIs in signal processing, statistics, image processing,  control systems, econometric, system identifications & many more), but not like R that was originally developed only to target Statistical analysis but it has widen its domain of applications.

Comment by Alex Esterkin on October 11, 2014 at 9:02pm

A good blog, but I have to respectfully disagree that with novel enabling data analytics products, platforms, and SaaS, data scientists will be replaced with full automation of data science.  Arguably, these emerging products and solutions will empower data scientists to unlock greater value proposition from continuously growing big data volume, variety, and velocity, as opposed to replacing them.

Comment by Sean McClure on October 9, 2014 at 7:33am

R is unlikely to make much headway in terms of deploying real-life solutions as it is more of a "lab language" than something that is used in productization.  This is why Python is rapidly becoming the data science language of choice. Black boxing data science with the expectation that someone from business can simply push a button is a dangerous proposition and not something that is going to help data science ROI.  There will be some tasks that users in, say, BI could handle but unless the product is fully automated as in HFT, business users should not be managing algorithms or their use on their business models.  It is more about sitting down with domain experts and ensuring they understand what assumptions to models are making and where naive aspects of the model could fail. Business data is in constant flux and the idea that a button can simply be pressed to act on that data means a regular and strong effort of many data science professionals actively working towards making sure that button does something valid. 

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service