Subscribe to DSC Newsletter

Building off my last post, I want to use the same healthcare data to demonstrate the use of R packages. Packages in R are stored in libraries and often are pre-installed, but reaching the next level of skill requires being able to know when to use new packages and what they contain. With that let’s get to our example.

Useful function: gsub

When working with vectors and strings, especially in cleaning up data, gsub makes cleaning data much simpler. In my healthcare data, I wanted to convert dollar values to integers (ie. $21,000 to 21000), and I used gsub as seen below.

gsub code 1

gsub output 1

Package: reshape2

In looking at the data, I wanted to focus on the Payment estimate. So I used the melt() function that is part of reshape2. Melt allows pivot-table style capabilities to restructure data without losing values.

melt code 1

melt output 1

Package: sqldf

With my data melted, I wanted to get the average estimate for heart attack patients by state. This is a classic SQL query, so bringing in sqldf allows for that.

sqldf code 1

sqldf output 1

Now that my data is in perfect shape to visualize with a map overlay, ggplot2 and maps are two other R packages that would be useful. In the future, I’ll look to discuss those as well.

About: Divya Parmar is a recent college graduate working in IT consulting. For more posts every week, and to subscribe to his blog, please click here. He can also be found on LinkedIn and Twitter

Views: 3160

Tags: R, SQL, healthcare


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service