It is hard to imagine that some data element could contain less information than a bit (a digit equal to either 0 or 1.) Yet examples are abundant. Indeed, I am wondering if we should create a unit of information called microbit, or nanobit.

The first examples that come to my mind are some irrational numbers such as Pi: it's digits are widely believed to be indistinguishable from pure noise, thus carrying essentially no information. While there is not enough data storage in the universe even if you could put a trillion digits in each atom, in terms of information all these digits contain a far smaller amount of information than a simple yes or no answer to any meaningful question. To put it differently, if you were able to compress a big message with any standard data compression algorithm, and the bits after compression would match the first trillion digits of number Pi in base 2, it means that your original message was pure gibberish, with no meaning, and no extractable information.

*DNA: An example of nano-structure (though information-rich)*

Communications based on Blockchain technology use a similar idea: when you mine bitcoins using hash keys, the resulting key, to be valid, must contain a number of pre-specified digits (all zeroes in this case.) This is accomplished by adding noise to the original block of text to be transmitted, until you find some noise that creates these zeroes in the right order after being hashed. It is just as hard as finding gibberish text that once compressed, matches the digits of Pi.

Another example is steganography: a technology used to hide messages in images or videos, for safe transmission. The image itself does not carry any valuable visible information, and if the actual message, encrypted and scattered randomly throughout the image, represents a small portion of the data, you can say that each pixel (assuming it is a black and white picture) carries much less than one bit of information.

Finally, I first came with this concept when researching numeration systems. I designed a system (like the decimal system) that provides highly correlated binary digits; even for the number 0, you would need to compute a lot of digits to get just a couple of correct decimals in base 10. It is a system with built-in redundancy. All number representation systems with a base that is smaller than 2 also have that feature, although less pronounced. In my new numeration system, even finding a set of digits that corresponds to an actual number is very hard, as the set of valid digit combinations is extremely sparse. In that sense, each digit, in that system, carries much less than one bit of information. See details here.

To summarize, just like units of distance cover a big spectrum, from light years to nano-millimeters, the same is true with units of information. Human beings are impressed by terabytes and petabytes, but at the other end of the spectrum, we have nanobits. Micro-information could become an interesting area of research for data scientists, with applications as described in this article. I did a Google search for the words microbits and nanobits, but found no interesting results. These two keywords are trademarked though, but used in a different context.

*For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn.*

**DSC Resources**

- Subscribe to our Newsletter
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Post a Blog | Forum Questions

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central