Predicting Heart Disease using Machine Learning? Don’t!

Image Credits: Unsplash

Dive straight into the problem syndrome

Well, this is the first mistake many people make. Jumping straight into the problem and thinking which Machine learning algorithm to apply. Doing EDA as part of this process is not *thinking* about the problem. Rather it is a sign that you have already accepted the notion that the problem needs a data science solution. Instead, one of the pertinent questions that need to be asked before starting any analysis is, “Is this problem even predictable through the application of machine learning?”.

Blind faith in Data

This is an extension of the first point. Diving straight into the problem means you have blind faith in the data. People assume the data to be true and do not make an effort to scrutinize the data. For example, the dataset only provided systolic blood pressure. If you spoke to any doctor or even a paramedic, they would tell you that systolic blood pressure alone does not give the full picture. Reporting of the diastolic level is important too. Many don’t even ask the question, “are the features enough to predict the outcome or more features are needed.”

Not enough data per patient

Let’s take a look at the data set above. If you notice, there is only one data point under each feature for a patient. The fundamental problem here is that features like blood pressure, cholesterol, heartbeat are not static. They range. The blood pressure of a person varies from hour to hour, and daily, so does heartbeat. So when it comes to the prediction problem, there is no telling whether 135 mm hg blood pressure was one of the factors to cause the heart disease or was it 140, all while the data set might be reporting 130 mm hg. Ideally, multiple measurements need to be had for each feature for a patient.

Now let’s come to the crux of the matter

Applying algorithm without domain experience

One reason for the high failure rate of data science application in health care is that the data scientists applying the algorithm do not have adequate medical knowledge.

Image Credits: SKetchplanations

Believing they have solved a real healthcare problem

Last but not least, many believe that by fitting an ML algorithm to a *healthcare* data set and getting some accuracy metrics, they have solved a real healthcare problem. Nothing can be further from the truth than this, especially when it pertains to the healthcare domain.

In conclusion:

There are perhaps thousands of business problems that genuinely warrant data science/machine learning solutions. But at the same time, one should not fall into the trap of “To a person with a hammer, everything looks like a nail.” Seeing everything as a nail (data science problem) and machine learning algorithms (hammer) can be very counterproductive. Much of the 80% failure rate in data science applied to business problems could be attributed.

Views: 941

Tags: dsc_biotech, dsc_tagged


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Venkat Raman on November 17, 2020 at 10:32am

Thank you Leonardo. Yes you are right, the same problem is prevalent in other domains too. I just happened to notice this because lot of 'Data Science Gurus ' are teaching naïve students that one can predict heart disease via myriad ML algorithms. Not to mention without any medical knowledge.  

Comment by leonardo auslender on November 17, 2020 at 10:07am

Shukriah for a very good article. While your focus is on healthcare, the same problems arise in crime analysis,  law among many other areas. Since I teach data science and statistics, I know that it is very difficult to convey the need to know the context of the problem at hand. In parallel, most data science methods are not easily interpretable and thus it is easy to just take refuge in a black-box and consider the task finished.

Comment by Venkat Raman on November 15, 2020 at 5:16am

Thank you Kurt Cagle.

Comment by Kurt A Cagle on November 14, 2020 at 8:19pm


This is a superb article, and very true. Understanding algorithms does no good if you do not understand the domain, and every poor piece of analysis simply decreases the respect and expectations of reliability and utility that most people have about data science, in any field.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service