Subscribe to DSC Newsletter

MUMPS – The Most Important Database You (Probably) Never Heard Of

Summary:  What if I told you there’s a database in wide use today that does everything RDBMS and Hadoop can do but is 50 years old?  Never heard of MUMPS?  Check out these startling facts.

 

If you’ve never heard of MUMPS don’t feel like the lone ranger.  A colleague mentioned it to me and drove me to a bit of research.  What I found is really astounding.

  • MUMPS was born in 1966 to solve the problem of massive data flowing into multi-user systems in the healthcare industry.
  • It predates RDBMS but has all the features of NoSQL including (in its modern form) massive parallel processing, horizontal scaling, and runs on commodity hardware.
  • It easily models all four types of NoSQL DBs (key-value, column, document, graph).  In the 70s and 80s it was modified to model RDBMS and handle SQL queries.
  • It shares features with NewSQL in that it is fully ACID compliant.
  • It’s alive and well today used in a wide variety of healthcare patient information systems, banking, the European Stock Exchange, and the travel industry among others.

If you’ve been to your doctor or to a hospital, or used an ATM it’s likely that the data was processed and stored in a MUMPS-based system.  Despite the fact that 2016 represents its 50th anniversary the original design basics of MUMPS are still meeting commercial needs today and show little evidence of being displaced in healthcare or large financial institutions by either RDBMS or NoSQL.  It would not be inaccurate to say that MUMPS is/was NoSQL long before ever becoming a gleam in the eye of Google researchers.

A Little Background. 

Originally designed in 1966 and constantly updated over the years, MUMPS derives its name from Massachusetts General Hospital Utility Multi-Programming System) or alternatively M.  As you should gather from the name the original need was driven by large hospitals (and ultimately banks) to drive high-throughput multi-user transaction processing.  As RDBMS emerged (and ultimately NoSQL and NewSQL) MUMPS remained not only viable but superior in performance and capabilities that includes even today.

The original problem to be solved was how to receive, store, and process the wide array of tests and other variables being rapidly generated and collected on a single ICU patient in just one day.  That would include at least 12 different variables including temperature, heart rate, blood oxy, blood pH, and others. The data generated via sensors (electrodes) measure many factors in real time, plus lab tests done multiple times per day per patient.  On average the data needs to be accessed by about 20 doctors and medical staff for each patient and there are hundreds of thousands of patients.

The thing that particularly strikes me is how this resembles streaming data problems of IoT that we are only recently solving with Spark and Storm, but were solved perfectly adequately 40 and 50 years ago by MUMPS.

MUMPS by Any Other Name

The original copyrights on MUMPS expired about a decade ago.  An improved successor version is actively marketed by InterSystems Corp. under the name Caché.  A version known as GT.M is available for Linux under a Free Open Source license.  Googling either of these names will be as efficient as looking under MUMPS.  There was also a movement some years back to simply call it “M” and you will sometimes see it identified as MUMPS/M.

Is It a Data Base With Its Own Language or a Language With Its Own Data Base?

This begins to look like that old peanut butter in my chocolate or cholate in my peanut butter meme but it’s important for understanding why MUMPS is both efficient and successful.  MUMPS is both things at once. Specifically it’s a database with an integrated language optimized for accessing and manipulating that database.  Although the language has been criticized as archaic, modern users compare it favorably to Python.

Having a ‘built in’ database enables MUMPS high level access to storage not available in other programs or DBs.  This access uses ‘variables’ (keys) and ‘arrays’ (tables) which are sparse.  The default structure is key-value (though MUMPS can easily be scripted to work as document, columnar, graph, or even RDBMS) and has a modern parallel in JSON.  The structure is schema-less and the data is stored in multidimensional hierarchical sparse arrays (also known as key-value nodes, sub-trees, or associative memory). Each array may have up to 32 subscripts, or dimensions. Holy cow Batman!  Sounds like we just found the sacred headwaters of Hadoop.

A key to its speed and efficiency is that the database is accessed directly through the variables rather than queries or retrievals.  A feature of the MUMPS language/DB is that accessing volatile memory and non-volatile storage use the same basic syntax, enabling a function to work on either local (volatile) or global (non-volatile) variables. Practically, this provides for extremely high performance data access.  Michael Byrne, writing on Motherboard does a good job of explaining this.

"Variables (or keys, in this case) are just addresses of different memory locations within those arrays, which are called globals in MUMPS-speak. A MUMPS system, which might be made up of many computers, has its own collection of global arrays stored in non-volatile memory. So, unlike an array created in a language like C++, which exists only for the duration of the program or the program's existence within a computer's RAM address space, a MUMPS global sticks around on a server, accessible at any given time to a computer within the system. We say that it's persistent."

"The result of this is that a MUMPS programmer can tap a database directly rather than using a query. This is faster on its face, eliminating the query abstraction, but direct access also allows a bunch of alternative programming ideas. For one thing, as a programmer, I can take an item stored in one of those globals and give it "children," which might be some additional properties of that item. So, we wind up with lists of different things that can be described and added to in different ways on the fly. The relationships are hierarchical."

Who’s Using It Today

The MUMPS claim to fame is the Veterans Health Information Systems and Technology Architecture (VistA), which is a vast suite of some 80 different software modules supporting the largest medical system in the United States. It maintains the electronic health records for 8 million veterans used by some 180,000 medical personnel across 163 hospitals, over 800 clinics, and 135 nursing homes. It's considered a model for current efforts to create a nationwide medical health records network.

  • Indian Health Service
  • Major parts of the Department of Defense CHCS hospital system

Large healthcare companies currently using MUMPS include

  • Care Centric
  • Allscripts
  • Epic
  • Coventry Healthcare
  • EMIS
  • Partners HealthCare (including Massachusetts General Hospital)
  • MEDITECH
  • GE Healthcare (formerly IDX Systems and Centricity)
  • Sunquest Information Systems
  • Many reference laboratories such as DASA
  • Quest Diagnostics
  • Dynacare

Among financial institutions

  • Ameritrade, the largest online trading service in the US with over 12 billion transactions per day
  • Bank of England
  • Barclays Bank

In 2010, the European Space Agency selected MUMPS/Cache to support the Gaia mission to map the Milky Way with unprecedented precision.

Strengths

MUMPS checks all the same boxes as NoSQL and is clearly very mature.

  • Elastic horizontal scaling across multiple low-cost commodity servers.
  • Designed to support Big Data quantities of data beyond the capabilities of RDBMS with extremely high performance.
  • Extremely simple to administer requiring essentially no DBAs.
  • Very low cost resulting from commodity hardware and open source code.
  • Flexible data modeling able to easily duplicate the features of RDBMS, key value, columnar, document, and graph architectures.
  • Readily supports advanced analytics and BI with SQL.
  • Full ACID for OLTP.

Weaknesses

  • Not widely known.  Small market share outside of healthcare and finance.
  • Small programmer cadre.
  • Pretty much no code libraries outside of what InterSystems (the commercial vendor) supplies.

Is this a Technology in Need of an Upgrade?

You might be tempted to think why not marry MUMPS with Hadoop?  The fact is that MUMPS will scale and perform in all the ways that Hadoop will.  Trying to bolt these together just seems unnecessarily complicated all for no appreciable gain in scale or performance.  Plus the requirement to have technical people who understand both systems to keep them in synch.  So no, MUMPS is fine just the way it is.

Is There an Opportunity Here?

On the one hand, would anyone choose to start a new project electing MUMPS instead of Hadoop or RDBMS?  Probably not.  There’s not enough awareness or for that matter enough programmers to go around.

However, if you’re working in healthcare or finance, especially where MUMPS is already in use consider this.  A search of LinkedIn yields only 699 MUMPS developers and 77 Cache developers in all of LinkedIn.  If you’ve already mastered NoSQL and are looking for a competitive edge this is a vanishingly small pool of competitors.  Mastering MUMPS could easily leverage into good pay and job security.  There’s a good MUMPS coding tutorial here.

 

 

About the author:  Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001.  He can be reached at:

Bill@DataScienceCentral.com

Views: 8213

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by William Vorhies on February 2, 2016 at 7:30am

I continue to be surprised by the volume of comments coming in by email from obviously enthusiastic practitioners of MUMPS - here's another one

------------------------------------------------------

   Your blog was shared on Facebook, and was surprised that you selected Kevin O'Kane's documentation on MUMPS.  It is good, but a bit brief and perhaps a bit incomplete.  MUMPS has been use for over 50 years.  It has been used for a wide area of applications and problems solved (even AI problems).  The beauty of MUMPS is that there is no pre-allocation of anything in this language and it sorts very quickly and is easy to use to establish lists on an ad hoc basis.

   Currently, I have written MUMPS solutions for the Knight's Tour problem, solver for SUDOKU, a corrected model for the German Enigma Model done in MUMPS that can be used with any of the character sets represented in UTF-8.  I am also working on making VistA International and support multiple languages.  The VA had written VistA, their integrated hospital system that has been keeping patient records for nearly 40 years (come 2017, it will be 40 years).  There are over 160 different areas of the hospital that are part of this more than Open Source Software Suite (it is actually FOIA, and is available as a world domain package).  The US Tax Payers paid for this software and it is an amazing in that it was designed to be adopted by the people who will be using it.  The power of it is that the end user can direct the way that information is collected and displayed and what information needs to be added.   Please contact me and I can send you a disk containing over 5 gigabytes of Open Source documentation  and white papers.  I can also supply a virtualized VistA and MUMPS configuration that can be started up from Virtual Box.

   Oh, by the way, GT.M (Greystone Technologies MUMPS) available from Source Forge with documentation.  This is a production grade MUMPS that is running Banks and Credit Unions (and now Hospitals) all over the world.  It was made Open Source by the people at FIS (Fidelity Information Services),  "FIS™ is the world’s largest global provider dedicated to banking and payments technologies. FIS empowers the financial world with payment processing and banking solutions, including software, services and technology outsourcing. FIS’ more than 55,000 worldwide employees are passionate about moving our clients’ business forward. ".  

Features

  • Key-value database files into the TB range (unlimited aggregate database sizes)
  • ACID (Atomic, Consistent, Isolated, Durable) transactions
  • Large scale replication for business continuity
  • Thousands of concurrent users at largest production sites
  • Plug-in architecture for database encryption



    Best wishes;


      Chris Richardson

Comment by Mike Rowe on February 1, 2016 at 10:32am

You omitted one of the largest user of MUMPS, Epic.  They maintain health records for 10s (or maybe over 100) of millions of people world wide.  There base products are written in MUMPS.

Comment by William Vorhies on February 1, 2016 at 10:28am

This comment came in by email from a MUMPS practitioner and has some great insights so I will repeat the email here:

--------------------------------------------

I haven’t seen a fair or accurate depiction of MUMPS in a very long time.  Your article did both, so thank you.  It was a great write-up.

 

One weakness that you didn’t mention, which I feel is the most important one – MUMPS is very, very easy to learn, but incredibly difficult to master.  I’ve been working with it for 18  years and still consider myself a mid-level guy.  That’s because MUMPS is very wrapped up in the system – be it VistA, CHCS, Epic, etc.  Everyone has APIs that have to be learned.  Most of the APIs were built to be as flexible as possible – which results in a single named API having 20 different purposes in some cases.  You have to focus your expertise in one or two systems.  I chose the government systems, since they are all the same under the hood…and are the basis for several of the commercial flavors.  I know a few Epic programmers and what they talk about is very different than what I usually do.

 

Still, now that I know how few of us MUMPS programmers there are, I think I’ll raise my hourly rate!  Time to start asking for lawyer rates!  Huge thanks for that!

 

C B Farley

Consultant

Comment by K.S. Bhaskar on January 31, 2016 at 6:49am
Thank you for the recognition. I manage GT.M (http://fis-gtm.com - and click on the User Documentation tab for current user documentation) and would be glad to help anyone get started with it. A good way to get started with GT.M DevOps is the GT.M Acculturation Workshop, a series of self-paced exercises using a virtual machine. Go to https://sourceforge.net/projects/fis-gtm/files/GT.M%20Acculturation... and get the latest version (0.9 as of today).

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service