Home

Map the Life Expectancy in United States with data from Wikipedia with R

Original post is published at DataScience+

Recently, I become interested to grasp the data from webpages, such as Wikipedia, and to visualize it with R. As I did in my previous post, I use rvest package to get the data from webpage and ggplot package to visualize the data.
In this post, I will map the life expectancy in White and African-American in US.
Load the required packages.

## LOAD THE PACKAGES ####
library(rvest)
library(ggplot2)
library(dplyr)
library(scales)

Import the data from Wikipedia.

## LOAD THE DATA ####
le = read_html("https://en.wikipedia.org/wiki/List_of_U.S._states_by_life_expectancy")
le = le %>%
html_nodes("table") %>%
.[[2]]%>%
html_table(fill=T)

Now I have to clean the data. Below I have explain the role of each code.

## CLEAN THE DATA ####
# select only columns with data
le = le[c(1:8)]
# get the names from 3rd row and add to columns
names(le) = le[3,]
# delete rows and columns which I am not interested
le = le[-c(1:3), ]
le = le[, -c(5:7)]
# rename the names of 4th and 5th column
names(le)[c(4,5)] = c("le_black", "le_white")
# make variables as numeric
le = le %>%
mutate(
le_black = as.numeric(le_black),
le_white = as.numeric(le_white))

Since there are some differences in life expectancy between White and African-American, I will calculate the differences and will map it.

le = le %>% mutate(le_diff = (le_white - le_black))

I will load the map data and will merge the datasets togather.
## LOAD THE MAP DATA ####
states = map_data("state")
# create a new variable name for state
le$region = tolower(le$State)
# merge the datasets
states = merge(states, le, by="region", all.x=T)

Now its time to make the plot. First I will plot the life expectancy in African-American in US. For few states we don’t have the data, and therefore I will color it in grey color.
## MAKE THE PLOT ####
# Life expectancy in African American
ggplot(states, aes(x = long, y = lat, group = group, fill = le_black)) +
geom_polygon(color = "white") +
scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="#eeeeee", breaks = pretty_breaks(n = 5)) +
labs(title="Life expectancy in African American") +
coord_map()

Here is the plot:
Le_african_american

The code below is for White people in US.

# Life expectancy in White American
ggplot(states, aes(x = long, y = lat, group = group, fill = le_white)) +
geom_polygon(color = "white") +
scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="Gray", breaks = pretty_breaks(n = 5)) +
labs(title="Life expectancy in White") +
coord_map()

Here is the plot:
Le_white

Finally, I will map the differences between white and African American people in US.

# Differences in Life expectancy between White and African American
ggplot(states, aes(x = long, y = lat, group = group, fill = le_diff)) +
geom_polygon(color = "white") +
scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="#eeeeee", breaks = pretty_breaks(n = 5)) +
labs(title="Differences in Life Expectancy between \nWhite and African Americans by States in US") +
coord_map()

Here is the plot:
Le_differences
On my previous post I got a comment to add the pop-up effect as I hover over the states. This is a simple task as Andrea exmplained in his comment. What you have to do is to install the plotly package, to create a object for ggplot, and then to use this function ggplotly(map_plot) to plot it.

library(plotly)
map_plot = ggplot(states, aes(x = long, y = lat, group = group, fill = le_black)) +
geom_polygon(color = "white") +
scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="#eeeeee", breaks = pretty_breaks(n = 5)) +
labs(title="Life expectancy in African American") +
coord_map()
ggplotly(map_plot)

Here is the plot:
le_plotly
Thats all! Leave a comment below if you have any question.

Original post: Map the Life Expectancy in United States with data from Wikipedia

Tags: