2.7 zetabytes of data exist in the digital universe today according to recent IBM case studies. Yes, you heard right. I said zetabytes; 10^21 bytes or 1,000x larger than an exabyte (which itself is a billion times larger than a gigabyte). From a visual perspective, if the cup of coffee on your desk was equivalent to one gigabyte, a zetabyte would have the same volume as the Great Wall of China. Now multiply that number by 2.7 and you get a figure that’s comparable to:
Now that’s a lot of data – but is it actually possible to make sense out of it?
Most of us have heard the term Big Data, but there is still a lot of confusion as to what the term actually means. In some spheres, the word has been defined as datasets so large that they are “constantly moving.” In other words, collections of data that are too large to be measured with conventional software analysis tools. In others spheres, the word has been declared as a new class of economic asset, like gold or currency. This is data that can be utilized by businesses for its predictive power. Such data monitors and measures people and machines with clever algorithms and has the potential to predict behavior of all kinds.
Whichever definition you adhere to, the key to understanding Big Data is that it is a revolution in data measurement. It encompasses the advancing trends in technology for dealing with data of unmanageable proportions. MIT economist Eric Brynjolfsson described this revolution similar to the impact of the microscope. “Data measurement is the modern equivalent of the microscope…the microscope, invented four centuries ago, allowed people to see and measure things as never before – at the cellular level.”
Can you imagine what the world would be like if our data could be seen in a new way, like the revelations of a scientist after his first look at an organism under a microscope? Unfortunately, becoming “scientific” in data measurement is not an easy task. Big Data can be very bad for business. Let’s look at the top three challenges organizations are facing:
Companies that implement Big Data analytics platforms are often simply “data hoarders.” This means they collect vast amounts of information, but are then unable to delete any data for fear that they can mine some sort of value out of it.
According to the Compliance, Governance, and Oversight Council survey results, organizations on average need to archive about 2 to 3% of their data for legal hold, 5 to 10% to meet regulatory requirements, and 25% for business analysis and insights. This creates hoarding costs because businesses simply do not know what information to keep and what to trash, so they end up moving the data around, not solving their problems.
Keeping poor data around can cost businesses 20 to 35% of their operating revenue. What’s worse is that bad data or poor quality of data cost U.S. business $600 billion annually.
To solve expenditure problems with Big Data, focus is being shifted from Big Data to Smart Data. This comes in the form of using visualization tools associated with the data, driving automation processes, or democratizing analytics tools so that managing the data is a collaborative process. In short, Smart Data is data that can be used to make decisions.
For example, connecting product lifecycle management (PLM), service lifecycle management (SLM), enterprise resource planning (ERP), and content management systems (CMS) can streamline delivery of up-to-the-minute data. With access to this information, users can make more informed decisions that minimize downtime and reduce the risk of error.
Another fear is that improperly managed data can offer false truths. Kate Crawford, a Microsoft researcher, noted that Big Data may not be giving us the whole picture. “In fact,” she stated, “we may be getting drawn in to particular kinds of algorithmic illusions.”
Illusions arise when using sophisticated math models to assess high levels of data aggregation. Here small bits of essential information can be lost, thereby giving different meaning to any given analysis. More simply, we think Big Data offers a descriptive analysis of the statistical information being collected. The “trap” occurs because we don’t realize that particularly large datasets often include an unintentional bias, thereby tabulating results that don’t accurately represent the full collection.
The misinterpretation of Big Data extends even further with the fact that data is always embedded in a context. Basically, numbers don’t speak for themselves; researchers, data scientists, and even employees must interpret unstructured data and place it in the right context for their given purpose or business decision. Thus, in order for Big Data to work effectively, the context must be realized along with its limitations and biases.
Eliminating “blindness” in Big Data requires it to be modeled in parallel with small data in order to offer the required depth to eliminate illusions. But how can this be done?
One solution is to provide a single data aggregation system that can filter information delivery according to defined user roles. In this way, users are better able to organize, access, and view large datasets at both high and low levels. With multiple user role options, such as Viewer, Publisher, Admin, Reviewer, Librarian, and Commenter, a harmonious system of checks and balances is established.
Top-level user roles control the data at the highest levels while analysis of the data is broken down in specific steps. Individual roles, with distinct tasks and security settings, are limited to certain modifications or additions to the data. The ability to publish, create, search, filter, and edit the dataset is divvied out between users. The hierarchy creates an efficient, repeatable workflow that enables users to obtain the most relevant data available.
With so much emphasis centered on how we interpret data with depth and at a low cost, the world needs people who can understand data intelligently. The problem is that there is a large gap in the number of individuals who can perform this task with the demand for it. A recent report cited the workforce demand for an additional 140,000 to 190,000 professionals with deep analytical talent in the next five years alone. So how can organizations fill the need to analyze big data today - when the manpower simply isn’t available?
Until the backlog of data analysis professionals can be filled, organizations must rely on the development of innovative tools that provide them with access to the information that is necessary to make critical business decisions.
At TerraXML, we have clear understanding of how to turn Big Data into Smart Data. Our employees come from a variety of backgrounds rooted in mathematics, computer science, and enterprise consulting, however we are united in the ability to leverage data to help organizations make sense of their information. TerraXML partners with organizations to develop individual solutions that address specific business needs. Our solutions are designed to be customizable, allowing us to cost effectively address an almost limitless array of enterprise needs. This eliminates the need for solutions to be developed internally, which can come at high cost and with high potential for error.
For more information on improving information access and delivery, visit us at terraxml.com.
* Please note that this Blog post was originally authored by Michelle Hastings. The original blog post can be found on the TerraXML website at: http://www.terraxml.com/community/blog/bid/274592/is-big-data-bad-f...