Subscribe to DSC Newsletter

Grave Mistakes that Companies Make in Big Data Projects

Guest blog by Suha Emma:
Ubiquitous in the contemporary industry, big data and  analytics are being deployed by just about every organization to improve business outcomes. One of the primary purposes of big data implementation is to incorporate additional sets of data into the existing data infrastructure, so as to give companies the capability of questioning anything from the resulting data set. But then, big data is not restricted to the mere handling of large volumes of data and there are certain common mistakes that enterprises need to avoid while implementing big data projects. This is to attain better decision support processes and analytical insights.
 
Lack of Business Case
 
The integration of big data into the decision support platform of a company requires the use of a proper business case that includes a well-developed requirement for all gaps. For instance, in case of a logistics company using social media data for the purpose of brand monitoring and understanding the expectations of its customers, it requires many variables in its business case. These include geospatial information about social media users, competitive brand analysis, market analytics, etc. to regain market share and customer confidence. Lack of a proper business case serves to hinder the cause. 
 
Minimizing Data Relevance
 
It is important to understand the relevance of big data sets to specific business requirements. Today, big data is available in diverse shapes and sizes:
 
  • Unstructured data that includes audio, text, videos, and images.
     
  • Semi-structured data that includes spreadsheets, email, earnings reports and software modules.
  • Structured data that includes actuarial models, machine data, sensor data, financial models, mathematical model outputs and risk models.
While most enterprises have access to these data sets they generally do not have an understanding of their relevance to business analytics. In the absence of appropriate relevance and context, these analytics tend to be skewed heavily (and unnecessarily) by additional data.
 
Underestimating Data Quality
 
Poor data quality results in ruining analytics, especially in the case of big data projects. The integration of unstructured/ semi-structured data into data sets can degrade data quality to a large extent. This makes it important to understand the impact created by data quality and take timely steps to resolve problems before processing big data. For instance, in the case of unstructured data, organizations may like to use taxonomies, semantic libraries, third-party sources and ontologies with reliable end-user inputs to enhance video and image data quality acquired from the internet and other sources. Similarly semi-structured data with numeric or text values need to be processed to ensure the accuracy and validity of acquired data. Avoidance of this step results in skewed data and negatively impacts the analytical system of an enterprise.
 
Overlooking of Data Granularity
 
Big data, particularly semi-structured and textual data, is highly ambiguous in nature and there is little or negligible definition for grains of data present within the acquired data. This ambiguity comes to the fore when organizations understand and learn about their granularity while processing data sets. The scenario results in organizations being unable to associate and process the levels of hierarchies linked with the metrics, therefore resulting in erroneous result sets that skew analytical outputs. Elasticity of hierarchies takes place when organizations encounter rolled-up and jagged data in the same set. The association of wrong data grains into relationships creates different kinds of errors in the processes of integration and analysis.
 
Improper Contextualization of Data
 
Contextualization of data serves to be the fundamental logic that exists behind executing text analytics and processing textual data. The absence of proper contextualization leads to the data being processed inaccurately, thereby producing erroneous analytics. Beyond contextualization, there are many other steps like alternate spellings, homographs and categorization of text analytics that have to be undertaken to improve upon the data and allow organizations to derive enhanced value from their data processing efforts. 
 
Ignoring Data Preparation
 
Big data processing needs essential preparation of data, prior to the steps of processing and during processing cycles alike. It is also important to provide additional inputs when needed for metadata and taxonomies. In most cases, organizations end up ignoring the preparation steps that govern how acquired data has to be associated with metadata or named, enriched and parsed. Additionally, they fail to pay special attention to date/ time formats, ambiguous data, master/ metadata or column values. Inadequate preparation of data before downstream processing often results in problems for big data operators and users. 
 
Due to the complexities showcased by big data—apart from its velocity, volume, and variety—there are many other risks linked with the implementation of big data programs. However, careful learning and planning goes a long way in helping these programs become successful.
 
All the best!

Author : SuhaEmma

Views: 1358

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Big Data Queen on September 10, 2015 at 10:05am

Suha, very informative article. With the explosion of big data, companies are faced with data challenges in three different areas. First, you know the type of results you want from your data but it’s computationally difficult to obtain. Second, you know the questions to ask but struggle with the answers and need to do data mining to help find those answers. And third is in the area of data exploration where you need to reveal the unknowns and look through the data for patterns and hidden relationships. The open source HPCC Systems big data processing platform can help companies with these challenges by deriving insights from massive data sets quickly and simply. Designed by data scientists, it is a complete integrated solution from data ingestion and data processing to data delivery. Their built-in Machine Learning Library and Matrix processing algorithms can assist with business intelligence and predictive analytics. More at http://hpccsystems.com

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service