In a nutshell, Moore's law says that every two years, computer capacity (memory, speed and so on) increases by a factor 2. How does this apply to big data? It seems like big data is also growing exponentially, nobody will contest this statement.
Although data is growing exponentially, does information follow the same path? Information is extracted from data: it is the essence of data, what makes data valuable. If Information = F(data), where data is measured in petabytes, and information (say) in entropy units, what is the shape of the F function? If it is linear, it means that information is also growing exponentially. If it is a logarithm, then information is growing only linearly.
By information, I mean information that (1) has been found in big data (sometimes called insights), as opposed to invisible or undetected information and (2) used to provide added value.
My guess is that information is growing far slower than big data, but faster than linear over time (that is, super-linear). It's a bit like the growth of the Windows operating system. Windows might be 1,000 times bulkier than 30 years ago, but most of the new stuff added into Windows is rarely used, it still feels slow at times, and Excel spreadsheets are still limited to 1,000,000 rows. Sure your machine is much faster and has far more memory, but that's not thanks to Windows. In short, multiplying data size by 2, does not result in multiplying useful information by a factor 2.
How would you measure information growth?
Related article
Tags:
I think it's a similar process to when you pour excessive amount of nutrients to a pond and algae start to react. It takes a bit of time, but in the end you can only see algae. We first react slowly to the excessive amount of data due to our slow education system. Most jobs require an MSc, and gaining experience also takes time. If we could accelerate our learning rate with programs like Data Science Apprenticeship, Coursera and Udacity (but with whole course systems like a university) and keep them resilient and keeping pace with the edge of technology, then response time would decrease dramatically.
Other way would be to work on automated data processing methods, something like a universal artificial intelligence.
I predict we'll keep pace with this huge amount of data within 10-15 years.
I think the curves will look like this (green being "amount of data processed per year", blue is "amount of data arising per year"):
There are formal definitions of information for few fields of science and in mathematics (A. M. Yaglom & I. M. Yaglom 1983 Probability & Information). To define information in application to Big Data specific work has to be done
Looking at the growth of zipped files compared with uncompressed volume growth would be a good indicator that data volume grows faster than information.
Well one success criteria might be increase in technology and related patents. gg
Maybe F can be obtained as the solution of a differential equation, looking at what happens when data is increased by a small amount.
Maybe one way to quantify/measure any change in rate of information output versus growth in data is to look at productivity. With the ubiquity of data and data capturing technology one would think we are also extracting more info/insights from all this data mess and subsequently becoming more productive? Interesting to find out if any analysis is currently available on correlation of data growth rates and productivity at various levels of economy....is it as simple as correlating national GDP to data growth rates? Well I understand that certain conditions will have to be accounted for but still an have to start somewhere, right?
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service