Subscribe to DSC Newsletter

MUMPS – The Most Important Database You (Probably) Never Heard Of

Summary:  What if I told you there’s a database in wide use today that does everything RDBMS and Hadoop can do but is 50 years old?  Never heard of MUMPS?  Check out these startling facts.

 

If you’ve never heard of MUMPS don’t feel like the lone ranger.  A colleague mentioned it to me and drove me to a bit of research.  What I found is really astounding.

  • MUMPS was born in 1966 to solve the problem of massive data flowing into multi-user systems in the healthcare industry.
  • It predates RDBMS but has all the features of NoSQL including (in its modern form) massive parallel processing, horizontal scaling, and runs on commodity hardware.
  • It easily models all four types of NoSQL DBs (key-value, column, document, graph).  In the 70s and 80s it was modified to model RDBMS and handle SQL queries.
  • It shares features with NewSQL in that it is fully ACID compliant.
  • It’s alive and well today used in a wide variety of healthcare patient information systems, banking, the European Stock Exchange, and the travel industry among others.

If you’ve been to your doctor or to a hospital, or used an ATM it’s likely that the data was processed and stored in a MUMPS-based system.  Despite the fact that 2016 represents its 50th anniversary the original design basics of MUMPS are still meeting commercial needs today and show little evidence of being displaced in healthcare or large financial institutions by either RDBMS or NoSQL.  It would not be inaccurate to say that MUMPS is/was NoSQL long before ever becoming a gleam in the eye of Google researchers.

A Little Background. 

Originally designed in 1966 and constantly updated over the years, MUMPS derives its name from Massachusetts General Hospital Utility Multi-Programming System) or alternatively M.  As you should gather from the name the original need was driven by large hospitals (and ultimately banks) to drive high-throughput multi-user transaction processing.  As RDBMS emerged (and ultimately NoSQL and NewSQL) MUMPS remained not only viable but superior in performance and capabilities that includes even today.

The original problem to be solved was how to receive, store, and process the wide array of tests and other variables being rapidly generated and collected on a single ICU patient in just one day.  That would include at least 12 different variables including temperature, heart rate, blood oxy, blood pH, and others. The data generated via sensors (electrodes) measure many factors in real time, plus lab tests done multiple times per day per patient.  On average the data needs to be accessed by about 20 doctors and medical staff for each patient and there are hundreds of thousands of patients.

The thing that particularly strikes me is how this resembles streaming data problems of IoT that we are only recently solving with Spark and Storm, but were solved perfectly adequately 40 and 50 years ago by MUMPS.

MUMPS by Any Other Name

The original copyrights on MUMPS expired about a decade ago.  An improved successor version is actively marketed by InterSystems Corp. under the name Caché.  A version known as GT.M is available for Linux under a Free Open Source license.  Googling either of these names will be as efficient as looking under MUMPS.  There was also a movement some years back to simply call it “M” and you will sometimes see it identified as MUMPS/M.

Is It a Data Base With Its Own Language or a Language With Its Own Data Base?

This begins to look like that old peanut butter in my chocolate or cholate in my peanut butter meme but it’s important for understanding why MUMPS is both efficient and successful.  MUMPS is both things at once. Specifically it’s a database with an integrated language optimized for accessing and manipulating that database.  Although the language has been criticized as archaic, modern users compare it favorably to Python.

Having a ‘built in’ database enables MUMPS high level access to storage not available in other programs or DBs.  This access uses ‘variables’ (keys) and ‘arrays’ (tables) which are sparse.  The default structure is key-value (though MUMPS can easily be scripted to work as document, columnar, graph, or even RDBMS) and has a modern parallel in JSON.  The structure is schema-less and the data is stored in multidimensional hierarchical sparse arrays (also known as key-value nodes, sub-trees, or associative memory). Each array may have up to 32 subscripts, or dimensions. Holy cow Batman!  Sounds like we just found the sacred headwaters of Hadoop.

A key to its speed and efficiency is that the database is accessed directly through the variables rather than queries or retrievals.  A feature of the MUMPS language/DB is that accessing volatile memory and non-volatile storage use the same basic syntax, enabling a function to work on either local (volatile) or global (non-volatile) variables. Practically, this provides for extremely high performance data access.  Michael Byrne, writing on Motherboard does a good job of explaining this.

"Variables (or keys, in this case) are just addresses of different memory locations within those arrays, which are called globals in MUMPS-speak. A MUMPS system, which might be made up of many computers, has its own collection of global arrays stored in non-volatile memory. So, unlike an array created in a language like C++, which exists only for the duration of the program or the program's existence within a computer's RAM address space, a MUMPS global sticks around on a server, accessible at any given time to a computer within the system. We say that it's persistent."

"The result of this is that a MUMPS programmer can tap a database directly rather than using a query. This is faster on its face, eliminating the query abstraction, but direct access also allows a bunch of alternative programming ideas. For one thing, as a programmer, I can take an item stored in one of those globals and give it "children," which might be some additional properties of that item. So, we wind up with lists of different things that can be described and added to in different ways on the fly. The relationships are hierarchical."

Who’s Using It Today

The MUMPS claim to fame is the Veterans Health Information Systems and Technology Architecture (VistA), which is a vast suite of some 80 different software modules supporting the largest medical system in the United States. It maintains the electronic health records for 8 million veterans used by some 180,000 medical personnel across 163 hospitals, over 800 clinics, and 135 nursing homes. It's considered a model for current efforts to create a nationwide medical health records network.

  • Indian Health Service
  • Major parts of the Department of Defense CHCS hospital system

Large healthcare companies currently using MUMPS include

  • Care Centric
  • Allscripts
  • Epic
  • Coventry Healthcare
  • EMIS
  • Partners HealthCare (including Massachusetts General Hospital)
  • MEDITECH
  • GE Healthcare (formerly IDX Systems and Centricity)
  • Sunquest Information Systems
  • Many reference laboratories such as DASA
  • Quest Diagnostics
  • Dynacare

Among financial institutions

  • Ameritrade, the largest online trading service in the US with over 12 billion transactions per day
  • Bank of England
  • Barclays Bank

In 2010, the European Space Agency selected MUMPS/Cache to support the Gaia mission to map the Milky Way with unprecedented precision.

Strengths

MUMPS checks all the same boxes as NoSQL and is clearly very mature.

  • Elastic horizontal scaling across multiple low-cost commodity servers.
  • Designed to support Big Data quantities of data beyond the capabilities of RDBMS with extremely high performance.
  • Extremely simple to administer requiring essentially no DBAs.
  • Very low cost resulting from commodity hardware and open source code.
  • Flexible data modeling able to easily duplicate the features of RDBMS, key value, columnar, document, and graph architectures.
  • Readily supports advanced analytics and BI with SQL.
  • Full ACID for OLTP.

Weaknesses

  • Not widely known.  Small market share outside of healthcare and finance.
  • Small programmer cadre.
  • Pretty much no code libraries outside of what InterSystems (the commercial vendor) supplies.

Is this a Technology in Need of an Upgrade?

You might be tempted to think why not marry MUMPS with Hadoop?  The fact is that MUMPS will scale and perform in all the ways that Hadoop will.  Trying to bolt these together just seems unnecessarily complicated all for no appreciable gain in scale or performance.  Plus the requirement to have technical people who understand both systems to keep them in synch.  So no, MUMPS is fine just the way it is.

Is There an Opportunity Here?

On the one hand, would anyone choose to start a new project electing MUMPS instead of Hadoop or RDBMS?  Probably not.  There’s not enough awareness or for that matter enough programmers to go around.

However, if you’re working in healthcare or finance, especially where MUMPS is already in use consider this.  A search of LinkedIn yields only 699 MUMPS developers and 77 Cache developers in all of LinkedIn.  If you’ve already mastered NoSQL and are looking for a competitive edge this is a vanishingly small pool of competitors.  Mastering MUMPS could easily leverage into good pay and job security.  There’s a good MUMPS coding tutorial here.

 

 

About the author:  Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001.  He can be reached at:

[email protected]

Views: 12767

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Rob Tweed on Monday

The important, unique and powerful part of Mumps isn't the language, but the database.  Indeed, criticisms of Mumps as a technology always relate to the language.  The problem is that the database baby is thrown out with the language bath-water - a great shame because, as your article correctly explains, the database has real power and relevance to today's requirements, and there's nothing in the Mumps language that can't be achieved by modern languages. As such, the database is not just something that's limited to healthcare.

With that in mind, you should probably take a look at this: https://github.com/robtweed/ewd-redis-globals

It's an emulation of the Mumps database, but using Redis.  Through the ewd-document-store Node.js module this is able to then be abstracted as persistent JavaScript objects and a fine-grained Document Database - capabilities that wouldn't normally be associated with Redis, and that are normally only possible with the hierarchical Global Storage of a Mumps database engine.

ewd-redis-globals is a pure database.  There's no Mumps language processor for it - it's accessed via JavaScript, and as such is a rejuvenation of this powerful database technology, but cast in a modern technology setting

Comment by Donald Mayfield on November 6, 2016 at 4:56pm

I was programming in FOCAL12 on a PDP12 when I heard of MUMPS.  I did not have a chance to use it.  

Comment by Stephen Wilson on September 30, 2016 at 7:27am

Employees do not always represent the views of their company. That is a disclaimer often found on blog sites of people who are known to work for large corporations. With respect to Catherine Marenghi's comment, we have a Laboratories Information Management System that is still in use today but was first developed in 1978. We are an Intersystems Caché customer and we have enjoyed a good business relationship with them for many years. Most of our software routines are *.int files that make use of the DSM-11 standard. Open M used to be an Intersystems product until 1997 when Caché was announced. Open M is more of an umbrella term for DSM, MSM, DTM flavours of MUMPS. Intersystems cannot distance themselves from MUMPS, they bought out their rivals including the DSM product line from DEC and DTM from DataTree to name a couple. The Intersystems own documentation for Caché 2016.1 has information on Open M compatibility. Without this compatibility, our software would not work and we would not be an Intersystems customer. I understand that Intersystems do not want to use the term "legacy" in their product line and maybe that was the reason for the Caché rebrand. Intersystems a happy to label Open M as legacy and provide consultation services to "migrate from legacy Open M systems to Caché". My point here is that Caché didn't magically appear out of nowhere, it is an evolution of Open M. Just don't let Intersystems hear you call it that. They are not the only company to do a rebrand on Open M. George James Software produced a product called VC/M. Think of it as Version Control for MUMPS. In recent years, the company has announced they are no longer supporting certain versions of MUMPS (see release notes) and re-branded their product Deltjanji. 

Comment by Catherine Marenghi on September 6, 2016 at 9:09am

With all due respect, this article is not accurate.  Full disclosure -- I work at InterSystems - and we do not sell MUMPS.  Period. We sell Caché to Epic, the ESA and every customer you name.  I appreciate that MUMPS has its enthusiasts, but we just don't sell it any more, and we haven't since the 1990s.  Please do not refer to our product as MUMPS/Caché. That does not exist.

Comment by William Vorhies on February 2, 2016 at 7:30am

I continue to be surprised by the volume of comments coming in by email from obviously enthusiastic practitioners of MUMPS - here's another one

------------------------------------------------------

   Your blog was shared on Facebook, and was surprised that you selected Kevin O'Kane's documentation on MUMPS.  It is good, but a bit brief and perhaps a bit incomplete.  MUMPS has been use for over 50 years.  It has been used for a wide area of applications and problems solved (even AI problems).  The beauty of MUMPS is that there is no pre-allocation of anything in this language and it sorts very quickly and is easy to use to establish lists on an ad hoc basis.

   Currently, I have written MUMPS solutions for the Knight's Tour problem, solver for SUDOKU, a corrected model for the German Enigma Model done in MUMPS that can be used with any of the character sets represented in UTF-8.  I am also working on making VistA International and support multiple languages.  The VA had written VistA, their integrated hospital system that has been keeping patient records for nearly 40 years (come 2017, it will be 40 years).  There are over 160 different areas of the hospital that are part of this more than Open Source Software Suite (it is actually FOIA, and is available as a world domain package).  The US Tax Payers paid for this software and it is an amazing in that it was designed to be adopted by the people who will be using it.  The power of it is that the end user can direct the way that information is collected and displayed and what information needs to be added.   Please contact me and I can send you a disk containing over 5 gigabytes of Open Source documentation  and white papers.  I can also supply a virtualized VistA and MUMPS configuration that can be started up from Virtual Box.

   Oh, by the way, GT.M (Greystone Technologies MUMPS) available from Source Forge with documentation.  This is a production grade MUMPS that is running Banks and Credit Unions (and now Hospitals) all over the world.  It was made Open Source by the people at FIS (Fidelity Information Services),  "FIS™ is the world’s largest global provider dedicated to banking and payments technologies. FIS empowers the financial world with payment processing and banking solutions, including software, services and technology outsourcing. FIS’ more than 55,000 worldwide employees are passionate about moving our clients’ business forward. ".  

Features

  • Key-value database files into the TB range (unlimited aggregate database sizes)
  • ACID (Atomic, Consistent, Isolated, Durable) transactions
  • Large scale replication for business continuity
  • Thousands of concurrent users at largest production sites
  • Plug-in architecture for database encryption



    Best wishes;


      Chris Richardson

Comment by Mike Rowe on February 1, 2016 at 10:32am

You omitted one of the largest user of MUMPS, Epic.  They maintain health records for 10s (or maybe over 100) of millions of people world wide.  There base products are written in MUMPS.

Comment by William Vorhies on February 1, 2016 at 10:28am

This comment came in by email from a MUMPS practitioner and has some great insights so I will repeat the email here:

--------------------------------------------

I haven’t seen a fair or accurate depiction of MUMPS in a very long time.  Your article did both, so thank you.  It was a great write-up.

 

One weakness that you didn’t mention, which I feel is the most important one – MUMPS is very, very easy to learn, but incredibly difficult to master.  I’ve been working with it for 18  years and still consider myself a mid-level guy.  That’s because MUMPS is very wrapped up in the system – be it VistA, CHCS, Epic, etc.  Everyone has APIs that have to be learned.  Most of the APIs were built to be as flexible as possible – which results in a single named API having 20 different purposes in some cases.  You have to focus your expertise in one or two systems.  I chose the government systems, since they are all the same under the hood…and are the basis for several of the commercial flavors.  I know a few Epic programmers and what they talk about is very different than what I usually do.

 

Still, now that I know how few of us MUMPS programmers there are, I think I’ll raise my hourly rate!  Time to start asking for lawyer rates!  Huge thanks for that!

 

C B Farley

Consultant

Comment by K.S. Bhaskar on January 31, 2016 at 6:49am
Thank you for the recognition. I manage GT.M (http://fis-gtm.com - and click on the User Documentation tab for current user documentation) and would be glad to help anyone get started with it. A good way to get started with GT.M DevOps is the GT.M Acculturation Workshop, a series of self-paced exercises using a virtual machine. Go to https://sourceforge.net/projects/fis-gtm/files/GT.M%20Acculturation... and get the latest version (0.9 as of today).

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service