Subscribe to DSC Newsletter

The growth of the digital economy has resulted in torrents of data. This problem will only continue because data is the language of technology. As companies continue to increase their reliance on technology, the data they create and their need to analyze it, will also increase.

The growth of data has given rise to a class of problems that we call, for lack of a better term, big data analytics. The common requirements for solving this class of problems, loosely, are:

  • Tell me what’s in my data
  • What are some outcomes that I can track? (Machine failure, network slowdown, etc.)
  • What indicators are related to these outcomes?
  • How can I respond to these indicators and influence these outcomes?

The broad approach to these kinds of problems is search or query based analytics. The approach is rooted in traditional statistics, where a central tenant of the scientific method is hypothesis testing. If we do not know what’s in the data, we present a hypothesis and then use queries, or questions, to piece a solution together.

A result of this lineage is modern business intelligence, an ad hoc analysis designed to answer a single business question. The answer to this question is typically a statistical model, analytic report, or other type of data summary delivered on demand to the business user.

SAS, the reigning giant of statistical modeling software, defines big data analytics as “(T)he process of examining big data to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions.”

But the number of possible queries in a data set is very large.

Analysts and data scientists continue to discover new ways to store more data and make our queries run faster, but the additional complexity of more data very quickly outpaces our ability to create more and better queries.

Gartner stated at a recent conference,

“Data is inherently dumb. It doesn’t do anything unless you know how to use it, how to act on it, because algorithms is where the real value lies. Algorithms define action”

The No Query approach requires that the algorithm computes the queries and ranks them based on relevance (like Google’s page rank algorithm).

Would love to hear from you on what kind of tools you use and how you query your data. What are your challenges in querying your data?

The original blog can be seen here.

Views: 251

Tags: analytics, discovery, hadoop


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service