Home » Uncategorized

10 Great Healthcare Data Sets

  • Patnab 


Healthcare will be one of the biggest beneficiaries of big data & analytics. Here are 10 great data sets to start playing around with & improve your healthcare data analytics chops.

Big Cities Health Inventory Data

The Health Inventory Data Platform is an open data platform that allows users to access and analyze health data from 26 cities, for 34 health indicators, and across six demographic indicators. It is the sixth edition of a report initially developed by the Chicago Department of Public Health to present epidemiologic data specific to large cities. The last BCHI was published in 2007. This edition is the first to be produced and issued by the Big Cities Health Coalition.

Healthcare Cost and Utilization Project (HCUP)

The Healthcare Cost and Utilization Project (HCUP, pronounced “H-Cup”) is a family of health care databases and related software tools and products developed through a Federal-State-Industry partnership and sponsored by the Agency for Healthcare Research and Quality (AHRQ). The largest collection of longitudinal hospital care data in the United States.


data.gov data sets come from across the Federal Government with the goal of improving the health and lives of all Americans. Many industry players have already experienced benefits of advanced healthcare data warehouses, enabling easier integration, storage, processing, and analytics of patient-related data.

Kent Ridge Bio-medical Dataset

This is an online repository of high-dimentional biomedical data sets, including gene expression data, protein profiling data and genomic sequence data that are related to classification and that are published recently in Science, Nature and so on prestigious journals.


The motive of HealthData.gov is to making high value health data more accessible to entrepreneurs, researchers, and policy makers in the hopes of better health outcomes for all. The site also provides tools and applications along with data sets from agencies across the Federal government.

MHEALTH Dataset Data Set

The MHEALTH (Mobile HEALTH) dataset comprises body motion and vital signs recordings for ten volunteers of diverse profile while performing several physical activities. Sensors placed on the subject’s chest, right wrist and left ankle are used to measure the motion experienced by diverse body parts, namely, acceleration, rate of turn and magnetic field orientation. The sensor positioned on the chest also provides 2-lead ECG measurements, which can be potentially used for basic heart monitoring, checking for various arrhythmias or looking at the effects of exercise on the ECG.

Surveillance, Epidemiology & End Results (SEER)-Medicare Health Outcomes Survey (MHOS)

The SEER-MHOS is a survey-level analysis file organized chronologically, based on the earliest survey administration date. The file includes all Medicare Advantage enrollees from Cohorts 1 to 14 that have completed at least one MHOS. Each cohort consists of a baseline survey and a two-year follow-up survey. Beneficiaries who responded to a baseline survey may or may not have completed a follow up survey. Some beneficiaries were sampled in more than one cohort, resulting in multiple baseline and follow-up surveys per person. The survey records are arranged chronologically starting with the earliest completed survey, and include the survey date.

The Human Mortality Database (HMD)

The Human Mortality Database (HMD) was created to provide detailed mortality and population data to researchers, students, journalists, policy analysts, and others interested in the history of human longevity. Presently, the database contains detailed data for over 35 countries. A companion project is the Human Life-Table Database, which contains data for 40 countries.

Child Health and Developmental Studies

The Child Health and Development Studies (CHDS) is committed to investigating how health and disease are passed on between generations–not just by genes, but also through social, personal, and environmental surroundings.

Medicare Provider Utilization and Payment Data: Physician and Other…

This public data set contains information about services and procedures provided to Medicare beneficiaries by physicians and other healthcare professionals, with information about utilization, payment, and submitted charges organized by National Provider Identifier (NPI), Healthcare Common Procedure Coding System (HCPCS) code, and place of service.

This list was compiled by Patnab

Leave a Reply

Your email address will not be published. Required fields are marked *