Subscribe to DSC Newsletter

Three Myths About Today’s In-Memory Databases

In-memory database technology is fashionable in recent years as the price of RAM drops substantially and gigabyte chips become affordable. By taking advantage of the cost-performance value of RAM, leading edge database developers are boosting the performance of next-generation databases with in-memory technology. However, many developers who intend to adopt in-memory technology only think of speed in terms of RAM, and do not exploit the true power of in-memory technology.

The argument here is that in-memory technology means not only taking advantage of the speed of RAM, but also suggests a new way trading space for time.

Many developers apply the technology in limited ways as follows:

1. In-memory databases are cool since RAM runs faster than disk. But is that all?

We understand that RAM offers lower latency than hard disk drives, including the newer SSDs. So when implementing an in-memory database, we may simply load the data tables into RAM during system initialization and take full advantage of memory speed at run time. This seems straightforward and intuitive to most people. However, speed is still quite limited since the actual size of virtual memory that can be allocated is subject to the size of disk swap space – usually one or two times the size of RAM. Therefore, the size of the dataset will be constrained by that of swap space. Once the data size approaches the limits of the swap space, process time will jump exponentially and computation begins to bottleneck. In spite of the drop in price of memory, given the data size we work with today, the investment in hardware is still cost prohibitive.

2. From 32-bit to 64-bit, is recompiling the code enough?

64-bit architecture plays a critical role on in-memory technology, which allows more than 4GB of memory. Without 64-bit architecture, in-memory databases would be less effective and certainly would not be able to stand up as today’s breakthrough solution. Ever since the 64-bit CPU was introduced, we see efforts to recompile the code used to run on 32-bit architecture and port it to the new one. While 2^32 is a limited number, 2^64 is virtually infinite. When address space is nearly unlimited, it opens the possibility for us to redesign computational algorithms so that we can trade space for time.

3. Given the definition of Virtual, why is there a limitation for Virtual Memory?
Peter Denning blew our mind in 1970 by introducing his work in “Virtual Memory”, which manages memory resources in a more efficient way. However, if the concept is deployed properly, why do we still see “out of memory” issues today? Shouldn’t we be enjoying 2^64 in virtual memory space? Despite what the name “virtual” implies, memory space is preconfigured at OS installation with a limited swap space – usually one or two times of RAM – to back up the memory allocated from the heap. Once the data size hits its limitation, computations bog down and performance begins to suffer.

Now that the 64-bit CPU has arrived, and in light of the fact that a Linux system supports up to a 48-bit address space (256TB), this is ample volume for today’s big data needs. By using an adequate memory-mapping approach, one commodity machine can handle up to 256TB – way more than swap space current in-memory technologies offer.

Imagine this. You are working on a 4x4 sliding puzzle. While you might easily solve it in five minutes, how about a 5x5, 6x6, or even a 10x10? As complexity increases, the computation problem comes to a head. Now, what if the puzzle is frameless? Let’s say you can put aside all the pieces, reorganize them on the side, and then reassemble them to solve the puzzle? You’re no longer constrained by space and you are free to arrange the puzzle at your leisure. This is what we say about trading space for time.

When the assumption that “memory is a scarce resource” is no longer valid, we need to think outside the box when solving today’s big data issues. With virtual memory theory, 64-bit architecture, and affordable memory modules, you are now fully armed to face the overwhelming challenges that big data poses.

Views: 2289

Tags: Big, In-memory, Virtual, data, memory

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by jaap Karman on March 14, 2014 at 12:19pm

For your question at point 3.
The background for that is: http://en.wikipedia.org/wiki/Thrashing_(computer_science)
By setting a limit the goal is preventing this situation where the only remaining activity is management of memory withoud any progress of the real workload. 


With queing or workloadbalancing between coupled precssors as similar negative effect is possible. That is the area the chip-factoryies are seeking the improvements. There will Always be limits although they are changing. Understanding the limits is Always necessary.  Recent news item:  

http://newsroom.intel.com/community/intel_newsroom/blog/2014/02/18/...

The conclusion for thinking outside the old box, I agree with that.

Comment by Yuanjen Chen on March 10, 2014 at 8:24pm

Thanks for reminding. I've corrected them.

Comment by James Lin on March 10, 2014 at 7:40am

typos?  232 -> 2^32  and 264 -> 2^64

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service