How big your data is depends on the quantity of information that it contains (measured using entropy metrics), rather than the number of terabytes. Huge data that is sparse or shallow is indeed not huge - and can be compressed very efficiently. What do you think?
Here's Stfan Carandy's viewpoint (founder of Bayesia Networks):
If I may cross-post the following from our blog at www.conradyscience.com, which speaks to the same point:
Learning = Data Compression
"It has long been understood that even when confronted with a ten-gigabyte file containing data to be statistically analyzed, the actual information-theoretic amount of information in the file might be much less, per haps merely a few hundred megabytes. This insight is currently most commonly used by data analysts to take high-dimensional real-valued datasets and reduce their dimensionality using principal components analysis, with little loss of meaningful information. This can turn an apparently intractably large data mining problem into an easy problem." 
As an alternative to dimension reduction, we can exploit existing regularities in the data to create a more compact and thus more tractable representation with Bayesian networks. "In context of Bayesian network learning, we describe the data using DAGs [Directed Acyclic Graphs] that represent dependencies between attributes. A Bayesian network with the least MDL [Minimum Description Length] score (highly compressed) is said to model the underlying distribution in the best possible way. Thus the problem of learning Bayesian networks using MDL score becomes an optimization problem."  Consequently, learning Bayesian networks is inherently a form data compression.
 Davies, S., and A. Moore. “Bayesian networks for lossless dataset compression.” In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 391, 1999.
 Hamine, Vikas. “Learning Optimal Augmented Bayes Networks” (n.d.).http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.6100.