Subscribe to DSC Newsletter

5 Machine Learning Research Studies To Understand & Predict Length of Stay in Hospitals

Length of Stay (LOS) is a critical factor in managing hospital quality & economic outcomes in Healthcare. The metric is calculated by summing the total number of days for all discharges & dividing it by the total number of discharges. Insurance programs such as Medicare are moving to a model where they are compensating Hospitals the same amount for a specific surgery (e.g. Joint replacement) regardless of the number of days spent in the hospital. Therefore, hospitals & the overall healthcare ecosystem are motivated to reduce LOS.

Dexur analyzed large scale medical claims data set to identify LOS by discharge code for all hospitals. We also aggregated & summarized raw data to enable easy machine learning modelling & predict LOS.  If you are a healthcare researcher, who wants access to these data sets, please contact us & we can work on collaborating on a project.  A simple of the illustration of the top Discharge Groups (DRGs) by Length of Stay at Mayo Clinic at Rochester is given below.  

To get your creative juices going, here are five Machine learning research projects that you can read to better understand how to predict LOS in Hospitals. 

1)      Length of Stay Prediction and Analysis through a Growing Neural Gas...: Length of stay (LoS) prediction is considered an important research field in Healthcare Informatics as it can help to improve hospital bed and resource management. The health cost containment process carried out in Italian local healthcare systems makes this problem particularly challenging in healthcare services management. In this work a novel unsupervised LoS prediction model is presented which performs better than other ones commonly used in this kind of problem. The developed model detects autonomously the subset of non-class attributes to be considered in these classification tasks, and the structure of the trained self organizing network can be analysed in order to extract the main factors leading to the overcoming of regional LoS threshold.

2)      IMPROVED PREDICTION OF HOSPITAL LENGTH OF STAY FOR SEVERE INJURY: There are limited beds in hospital trauma wards, and yet there is a constant demand for these beds by the inflow of severely injured patients. Many patients are initially allocated to these beds when they could be better treated in another specialised ward. If we could accurately classify patients with hospital length of stay (LOS) of 2 days or less versus those who require longer stays, we could make a more informed decision whether or not to place them in another ward when they are admitted, rather than wasting time and resources transferring them to another ward later. We systematically investigate feature transformation and selection techniques in the construction of a LOS prediction model for trauma patients. We also apply and evaluate a comprehensive range of classification algorithms on data from the trauma domain as well as from a general hospital setting. In addition, we propose a new nearestneighbour (NN) algorithm, ranked NN, which takes into account the predictive relevance of features when computing the distance to the nearest neighbors.

3)      MACHINE LEARNING TECHNIQUES FOR PREDICTING HOSPITAL LENGTH OF STAY ...: In this paper, we compare three different machine learning techniques for predicting length of stay (LOS) in Pennsylvania Federal and Specialty hospitals. Using the real-world data on 88 hospitals, we compare the performances of three different machine learning techniques—Classification and Regression Tree (CART), Chi-Square Automatic Interaction Detection (CHAID) and Support Vector Regression (SVR)—and find that there is no significant difference in performances of these three techniques. However, CART provides a decision tree that is easy to understand and interpret. The results from CART indicate that psychiatric care hospitals typically have higher LOS than nonpsychiatric care hospitals. For non-psychiatric care hospitals, the LOS depends on hospital capacity (beds staffed) with larger hospitals with beds staffed over 329 having average LOS of 13 weeks vs. smaller hospitals with average LOS of about 3 weeks.

4) A Comparison of Supervised Machine Learning Techniques for Predicti...: Diabetes is a life-altering medical condition that affects millions of people and results in many hospitalizations per year. Consequently, predicting the length of stay of inhospital diabetic patients has become increasingly important for staffing and resource planning. Although statistical methods have been used to predict length of stay in hospitalized patients, many powerful machine learning techniques have not yet been explored. In this paper, we compare and discuss the performance of various supervised machine learning algorithms (i.e., multiple linear regression, support vector machines, multi-task learning, and random forests) for predicting long versus short-term length of stay of hospitalized diabetic patients.

5)  Real-time prediction of inpatient length of stay for discharge prio...: Hospitals are challenged to provide timely patient care while maintaining high resource utilization. This has prompted hospital initiatives to increase patient flow and minimize nonvalue added care time. Real-time demand capacity management (RTDC) is one such initiative whereby clinicians convene each morning to predict patients able to leave the same day and prioritize their remaining tasks for early discharge. Our objective is to automate and improve these discharge predictions by applying supervised machine learning methods to readily available health information. The authors use supervised machine learning methods to predict patients’ likelihood of discharge by 2 p.m. and by midnight each day for an inpatient medical unit. Using data collected over 8000 patient stays and 20 000 patient days, the predictive performance of the model is compared to clinicians using sensitivity, specificity, Youden’s Index (i.e., sensitivity þ specificity – 1), and aggregate accuracy measures.

Views: 4211


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Douglas A Dame on December 31, 2016 at 12:59am

Those five papers include some pretty weird stuff. More comments on that below. 

There is a LONG history (in US healthcare) of predictive modeling of LOS. When used for the purpose of reporting on the efficiency of hospitals, this process is generally called "risk adjustment." Risk adjustment is most commonly used for the purposes of reimbursement (setting fair payment rates), and for outcomes (mortality) or efficiency (LOS, cost) reporting, but is also used for "leveling the field" for some other measurements of quality.

One of the key things to consider in LOS modeling is that it is exceedingly difficult to get good predictive models when many different kinds of patients from a clinical/diagnostic perspective are included. So LOS models often subset down to a narrow slice of patients ... e.g. one of these papers deal only with patients on a trauma nursing unit, and another looks only at diabetes patients. 

When it's necessary to come up with LOS predictions that work across all kinds of diverse patients, the more or less "industry standard" approach is to stratify the patients into many subgroups, and then run a more or less similar model on each of the subgroups. Commonly these subgroups are "base DRGs." For example, one base DRG is "Intracranial Hemorrhage." Most flavors of DRGs (the CMS-Medicare version, 3M's APR-DRGs, and others) will have 2 or 3 or 4 separate DRGs for intracranial hemorrhage, following some hierarchy of severity. For a general LOS model, you'd label all 2/3/4 of those DRGs as being in one clinically relevant group. So for general purpose work, you usually don't have just one LOS model, you have a family of probably 300+ models that are trained on different subsets of the main database. 

A pretty decent write-up of a well-regarded industry-standard approach can be found here: Mortality Risk Adjustment Methodology for University Health System’... . (Although the focus of this is the mortality models, the LOS modeling process is very similar, except that the LOS models use OLS regression rather than logistic regression.)

On to the weirdness of these five articles:

#1 - Growing Neural Gas approach. It's inexplicable to me why the authors would choose to use an unsupervised method to predict LOS or attempt to identify the underlying factors that influence LOS. They started with a population of 274,962 patient records, but sampled down to just 1,374 to do their training. If you're wanting to prove the potential value of some new approach, it's always good to compare your results to the existing industry-standard approach if there is one. Remember the "stratify" comment above, which I said is industry-standard? These guys choose to limit their training data to approx 5 cases per clinically relevant strata. This seems to be more of a regional capacity model than a (patient-level) LOS model. Or maybe an algorithm looking for a problem. 

#2 - Pu (?2014, a conference poster more than a paper per se) - Improved Prediction of Hospital Length of Stay for Severe Injury. This focuses on only patients in a trauma unit. Doesn't really model LOS in a conventional sense, is trying to predict which patients will only be on the unit for 1 or 2 days vs longer, on the heroically misbegotten (IMO) assumption that patients who were only in a trauma unit for 2 days should never have been there in the first place. Pretty sure there's already well-researched and validated clinical scoring algorithms for trauma patients that serve as guidelines as to whether the patient has a clinical/nursing needs sufficient to justify care on a trauma unit. 

#3 - Pendharkar & Kharuna (2014) - Machine Learning techniques for predicting hospital length of stay in Pennsylvania federal and specialty hospitals. They don't have patient-level data, they have just 1 data point for each of 88 hospitals, which includes 5 VERY different classes of hospitals. They would have been better served to have spent 5 minutes in Excel making a pivot table showing Avg LOS by type of hospital, and making a set of box charts or histograms to show the LOS distributions for each hospital type. And then just stopping.

#4 - Morton et al (?2014) - A Comparison of Supervised Machine Learning Techniques for Predicting Short-Term In-Hospital Length of Stay Among Diabetic Patients.

Decent for what it attempts to do. (Which is limited, it's apparently the result of a month long class project.) The decision to model short (3 day or less) vs long admissions, rather than actual LOS, is baffling. I'm having trouble thinking of a real-world reason you'd want to do that. Makes me think that changing the "Y" may be been a re-think when the initial results didn't show what was wanted. Or maybe they just really wanted to do classification models. Re-using just the predictors found significant for a conventional linear model by a prior study (Patel) has the effect of handicapping these non-linear models .. .they probably would have have better results by doing their own feature selection. The main point of using non-linear approaches is that they can find and exploit subtle relationships that linear models don't detect.

#5 - Barnes et al (2015) -  Real-time prediction of inpatient length of stay for discharge prioritization - is easily the most interesting paper of the five. And of course, this one isn't an academic exercise, it's a real-world solution to a real-world problem that people in hospitals need answers to every day. One thing that's critical here is that the authors recognized that for their business problem, patient-level accuracy isn't the key thing for the model's usefulness ... what's most important is how well it predicts on the aggregate level, i.e., how many beds will we have empty by 2 pm (so that new patients can arrive) and by midnight? And if the machine can make aggregate predictions that are as accurate as the highly skilled humans, then using the model saves a lot of time. Which allows those highly skilled humans to spend more time on patient care, and less on the administrative/paperwork part of their jobs. This paper I learned something from. 

Hope this commentary is useful to somebody.



Comment by Jianhua/Jason Li on December 28, 2016 at 6:16pm

I am working on this field as a developer and would be very interested in this kind of project. 

Follow Us


  • Add Videos
  • View All


© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service