Subscribe to DSC Newsletter

The assumptions on which the RDBMS is based has changed: data and code

In general, computer scientists treats code and data in two very different ways. Virtual memory was originally developed to run big programs (code) in small memory, while data are entities kept in external storage and must be retrieved into memory before computing. As a result, today’s application developers think by instinct the programming model based on storage and explicit data retrieval. This model, referred to as storage-based computing, plays an important role and has done a great job in transactional applications such as banking and ERP systems, where data integrity is the primary concern and the data size (per transaction) is assumed smaller than the code size.

Since the last decade, the weight of applications has been gradually shifted from ’transactional’ to ’analytic’ ones, and the data size has been increased from a few kilobytes to megabytes/gigabytes/terabytes or even bigger, while the code footprint remains relatively unchanged. The assumption of the data size smaller than the code size becomes no longer valid. With such landscape changed between code and data, storage-based computing imposes serious performance issues as follows.

  • Data movement: Large data retrievals will take excessive I/O and network overhead.
  • Juggling: Aware of memory space limitation and data that is too large to fit in main memory, application developers are forced to design algorithms (i.e. out-of-core or external-memory algorithms) to process data in small amount and later merge the results.
  • Swapping: The main-memory data structures that hold retrieved data in computing space are also in virtual memory, subject to swapping.

The worst case is the mixed effect caused by juggling and swapping, leading to a special type of double paging anomaly.

Storage-based computing model has been deeply rooted in developers’ minds for more than forty years even when the landscape is changing gradually. By observing the shift in data size and code footprint from transactional application to big data analytic, we raise the first question for big data computing:

”Instead of moving data to the computing space, is it possible to move programs into the data space and perform computing tasks where data is stored?”

To know more about the details, please find the technical whitepaper here.

Views: 295

Tags: OLAP, RDBMS, SQL, analytic, big, data, database, in-memory, in-place, memory, More…retrieval, virtual

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service