Original post is published at DataScience+
Recently, I become interested to grasp the data from webpages, such as Wikipedia, and to visualize it with R. As I did in my previous post, I use rvest package to get the data from webpage and ggplot package to visualize the data.
 In this post, I will map the life expectancy in White and African-American in US.
 Load the required packages.
 
 ## LOAD THE PACKAGES ####
 library(rvest)
 library(ggplot2)
 library(dplyr)
 library(scales)
Import the data from Wikipedia.
 
 ## LOAD THE DATA ####
 le = read_html("https://en.wikipedia.org/wiki/List_of_U.S._states_by_life_expectancy")
 le = le %>%
 html_nodes("table") %>%
 .[[2]]%>%
 html_table(fill=T)
 Now I have to clean the data. Below I have explain the role of each code.
 
 ## CLEAN THE DATA ####
 # select only columns with data
 le = le[c(1:8)]
 # get the names from 3rd row and add to columns
 names(le) = le[3,]
 # delete rows and columns which I am not interested
 le = le[-c(1:3), ]
 le = le[, -c(5:7)]
 # rename the names of 4th and 5th column
 names(le)[c(4,5)] = c("le_black", "le_white")
 # make variables as numeric
 le = le %>% 
 mutate(
 le_black = as.numeric(le_black), 
 le_white = as.numeric(le_white))
Since there are some differences in life expectancy between White and African-American, I will calculate the differences and will map it.
 
 le = le %>% mutate(le_diff = (le_white - le_black))
I will load the map data and will merge the datasets togather.
 ## LOAD THE MAP DATA ####
 states = map_data("state")
 # create a new variable name for state
 le$region = tolower(le$State)
 # merge the datasets
 states = merge(states, le, by="region", all.x=T)
Now its time to make the plot. First I will plot the life expectancy in African-American in US. For few states we don’t have the data, and therefore I will color it in grey color.
 ## MAKE THE PLOT #### 
 # Life expectancy in African American
 ggplot(states, aes(x = long, y = lat, group = group, fill = le_black)) + 
 geom_polygon(color = "white") +
 scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="#eeeeee", breaks = pretty_breaks(n = 5)) +
 labs(title="Life expectancy in African American") +
 coord_map()
 Here is the plot:
 
The code below is for White people in US.
 
 # Life expectancy in White American
 ggplot(states, aes(x = long, y = lat, group = group, fill = le_white)) + 
 geom_polygon(color = "white") +
 scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="Gray", breaks = pretty_breaks(n = 5)) +
 labs(title="Life expectancy in White") +
 coord_map()
 Here is the plot:
 
Finally, I will map the differences between white and African American people in US.
 
 # Differences in Life expectancy between White and African American
 ggplot(states, aes(x = long, y = lat, group = group, fill = le_diff)) + 
 geom_polygon(color = "white") +
 scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="#eeeeee", breaks = pretty_breaks(n = 5)) +
 labs(title="Differences in Life Expectancy between \nWhite and African Americans by States in US") +
 coord_map()
 Here is the plot:
 
 On my previous post I got a comment to add the pop-up effect as I hover over the states. This is a simple task as Andrea exmplained in his comment. What you have to do is to install the plotly package, to create a object for ggplot, and then to use this function ggplotly(map_plot) to plot it.
 
 library(plotly)
 map_plot = ggplot(states, aes(x = long, y = lat, group = group, fill = le_black)) + 
 geom_polygon(color = "white") +
 scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="#eeeeee", breaks = pretty_breaks(n = 5)) +
 labs(title="Life expectancy in African American") +
 coord_map()
 ggplotly(map_plot)
 Here is the plot:
 
 Thats all! Leave a comment below if you have any question.
Original post: Map the Life Expectancy in United States with data from Wikipedia

