Subscribe to DSC Newsletter

Also check the related question / answer "How do your quantify data as large, big, or huge?" at

Views: 736

Reply to This

Replies to This Discussion

Not quite quantified, but here's an opinion on categorization:

A big factor is the utilization rate.

  • Huge -- the fire hose.  Lots of activity / noise / dimensionality / velocity.  Data at point of use is usually highly sampled, or reported at high levels.  The first order of business is to reduce it to something you can work with.  I usually use this for either exploring the fuzzy front end, or for early identification of problems (thank you, Hadoop contributors)

  • Big data -  use a little, broadly distributed.  Getting a sense of the valuable parts; reducing dimensionality.  This is processed or selected.  You can manage the whole of big data.
  • Large data - you use and re-use more of what you have.  Sampling seldom. Business problems well defined. This is the boundary of data warehouses...facts & dimensions.  At this level, you've discussed it, defined it at an enterprise level, and incorporated it into the BI realm.  This enterprise data can also be quite large, but gladly, well-behaved and accessible.



  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service