Data is growing at an unimaginable speed. Data has been and will be the most crucial driving factors behind day-to-day activities that we carry out. As we talk, tons and tons of data---in exabytes is being added and processed each day, making it difficult to handle such an astronomical amount. There lies one major difficulty in handling such large amount of data. Since the volume is increasing rapidly in comparison to the computing resources, we find it difficult to store and process the data that increases exponentially every day.
But that is not the case in point here. Big data, no matter how difficult to handle, is the key player today in every major sector across the human diaspora. Often misjudged, the word Big Data has more to it than meets the eye. When you dig deeper, you will find other existing properties; the building blocks of what big data is today. Let’s discuss them in brief first.
All the data that is produced each day, doesn’t belong to one single category, since it not only includes the traditional data but also the semi-structured data from various resources like web Pages, Web Log Files, social media sites, email, documents, sensor devices data both from active passive devices. The data is entirely different consisting of raw, structured, semi-structured and unstructured data which is to be handled by the existing traditional analytic systems.
The name Big data itself states the volume, it defines the volume. Currently, the data existing is in petabytes and is supposed to increase to zettabytes in nearby future. As a matter of fact, social networking sites alone produce and process terabytes of data.
Here we are talking about the speed at which data comes from various sources. This particular characteristic does not necessarily define the speed of incoming data; it signifies the speed at which the data flows. Let’s take sensor devices into consideration. This data could be constantly moving to the database store and this amount will not be small enough.
This aspect considers the existing inconsistencies of the data flow. Data loads can sometimes become challenging, especially with the increase in usage of the social media which generally causes a peak in data loads during certain major events.
In order to link, match, cleanse, and transform the data across systems pouring in from various sources, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or data can quickly spiral out of control.
When you neatly understand exactly what Big Data is, it is easy to understand how a particular sector can benefit out of it.
Big Data aims at improving the overall public health by providing insights into the causes and effects of disease, better drug targets for precision medicine, and enhanced disease prediction and prevention. Scientists and researchers all over are using this information to promote their own health and wellness. Big Data has immensely improved our understanding of health related behaviors like smoking, drinking, excessive sleeping, sleep deprivation etc. and help accelerate the knowledge-to-diffusion cycle.
However, there are times when we do make ‘big’ errors with Big Data. In 2013, influenza had hit the United States. At that time, analysis of flu-related Internet searches drastically overestimated peak flu levels relative to those determined by traditional public health surveillance. This also poses a threat potential for many false alarms triggered by large-scale examination of suppositional associations with disease outcomes.
In return, spurious correlations and ecological fallacies may multiply. There are numerous such examples, like “number of people who drowned by falling into a pool correlates with Films Nicolas Cage appeared in.”
In order to enhance the current experience in the healthcare sector, a strong epidemiological foundation is necessary. Big Data analytics is currently focused on convenience samples of people or information available on the world wide web. When there are associations between perfectly measured data, like a genome sequence, and poorly measured data, like the administrative claims health data, research accuracy gets dictated by the weakest link.
Big Data is observational in nature and is replete with biases ranging from selection, confounding variables, and lack of generalizability. Big Data analytics can be embedded in epidemiologically well-characterized and representative populations.
Harnessing Big Data to bear on public health is where the last piece of the puzzle fits the rest of the board. The mixture of a strong epidemiologic foundation, robust knowledge integration, principles of evidence-based medicine, and an expanded translation research agenda can put Big Data on the right course.