Home » Uncategorized

How to make any plot with ggplot2?

Ggplot2 is the most elegant and aesthetically pleasing graphics framework available in R. It has a nicely planned structure to it. This tutorial focusses on exposing this underlying structure you can use to make any ggplot. But, the way you make plots in ggplot2 is very different from base graphics making the learning curve steep. So leave what you know about base graphics behind and follow along. You are just 5 steps away from cracking the ggplot puzzle.

The distinctive feature of the ggplot2 framework is the way you make plots through adding ‘layers’. The process of making any ggplot is as follows.

1. The Setup

First, you need to tell ggplot what dataset to use. This is done using the ggplot(df) function, where df is a dataframe that contains all features needed to make the plot. This is the most basic step. Unlike base graphics, ggplot doesn’t take vectors as arguments.

Optionally you can add whatever aesthetics you want to apply to your ggplot (inside aes() argument) – such as X and Y axis by specifying the respective variables from the dataset. The variable based on which the color, size, shape and stroke should change can also be specified here itself. The aesthetics specified here will be inherited by all the geom layers you will add subsequently.

If you intend to add more layers later on, may be a bar chart on top of a line graph, you can specify the respective aesthetics when you add those layers.

Below, I show few examples of how to setup ggplot using in the diamonds dataset that comes with ggplot2 itself. However, no plot will be printed until you add the geom layers.

Examples:
library(ggplot2)
ggplot(diamonds) # if only the dataset is known. ggplot(diamonds, aes(x=carat)) # if only X-axis is known.
ggplot(diamonds, aes(x=carat, y=price)) # if both X and Y axes are fixed for all layers.ggplot(diamonds, aes(x=carat, color=cut)) # color will now vary based on `cut`

The aes argument stands for aesthetics. ggplot2 considers the X and Y axis of the plot to be aesthetics as well, along with color, size, shape, fill etc. If you want to have the color, size etc fixed (i.e. not vary based on a variable from the dataframe), you need to specify it outside the aes(), like this.

ggplot(diamonds, aes(x=carat), color="steelblue")

See this color palette for more colors.

2. The Layers

The layers in ggplot2 are also called ‘geoms’. Once the base setup is done, you can append the geoms one on top of the other. The documentation provides a compehensive list of all available geoms.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point() + geom_smooth() # Adding scatterplot geom (layer1) and smoothing geom (layer2).

We have added two layers (geoms) to this plot – the geom_point() and geom_smooth(). Since the X axis Y axis and the color were defined in ggplot() setup itself, these two layers inherited those aesthetics. Alternatively, you can specify those aesthetics inside the geom layer also as shown below.

ggplot(diamonds) + geom_point(aes(x=carat, y=price, color=cut)) + geom_smooth(aes(x=carat, y=price, color=cut)) # Same as above but specifying the aesthetics inside the geoms.
ggplot_1

Notice the X and Y axis and how the color of the points vary based on the value of cut variable. The legend was automatically added. I would like to propose a change though. Instead of having multiple smoothing lines for each level of cut, I want to integrate them all under one line. How to do that? Removing the color aesthetic from geom_smooth()layer would accomplish that.

library(ggplot2) ggplot(diamonds) + geom_point(aes(x=carat, y=price, color=cut)) + geom_smooth(aes(x=carat, y=price)) # Remove color from geom_smooth

ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut)) + geom_smooth() # same but simpler


ggplot_1_single_smooth

Here is a quick challenge for you. Can you make the shape of the points vary with color feature?

Though setting up took us quite a bit of code, adding further complexity such as the layers, distinct color for each cut etc was easy. Imagine how much code you would have had to write if you were to make this in base graphics? Thanks to ggplot2!

# Answer to the challenge.
ggplot(diamonds, aes(x=carat, y=price, color=cut, shape=color)) + geom_point()

3. The Labels

Now that you have drawn the main parts of the graph. You might want to add the plot’s main title and perhaps change the X and Y axis titles. This can be accomplished using the labs layer, meant for specifying the labels. However, manipulating the size, color of the labels is the job of the ‘Theme’.

library(ggplot2) gg <- ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point() + labs(title="Scatterplot", x="Carat", y="Price")  # add axis labels and plot title. print(gg)

ggplot_2

The plot’s main title is added and the X and Y axis labels capitalized.

Note: If you are showing a ggplot inside a function, you need to explicitly save it and then print using the print(gg), like we just did above.

4. The Theme

Almost everything is set, except that we want to increase the size of the labels and change the legend title. Adjusting the size of labels can be done using the theme() function by setting the plot.titleaxis.text.x and axis.text.y. They need to be specified inside the element_text(). If you want to remove any of them, set it to element_blank() and it will vanish entirely.

Adjusting the legend title is a bit tricky. If your legend is that of a color attribute and it varies based in a factor, you need to set the name using scale_color_discrete(), where the color part belongs to the color attribute and the discrete because the legend is based on a factor variable.

gg1 <- gg + theme(plot.title=element_text(size=30, face="bold"), axis.text.x=element_text(size=15), axis.text.y=element_text(size=15), axis.title.x=element_text(size=25), axis.title.y=element_text(size=25)) + scale_color_discrete(name="Cut of diamonds")  # add title and axis text, change legend title. 

print(gg1) # print the plot
ggplot_3

If the legend shows a shape attribute based on a factor variable, you need to change it using scale_shape_discrete(name="legend title"). Had it been a continuous variable,  (name="legend title") instead.

So now, Can you guess the function to use if your legend is based on a fill attribute on a continuous variable?

The answer is scale_fill_continuous(name="legend title").

5. The Facets

In the previous chart, you had the scatterplot for all different values of cut plotted in the same chart. What if you want one chart for one cut?

gg1 + facet_wrap( ~ cut, ncol=3)  # columns defined by 'cut'
ggplot_4

facet_wrap(formula) takes in a formula as the argument. The item on the RHS corresponds to the column. The item on the LHS defines the rows.

gg1 + facet_wrap(color ~ cut)  # row: color, column: cut

ggplot_5

In facet_wrap, the scales of the X and Y axis are fixed to accomodate all points by default. This would make comparison of attributes meaningful because they would be in the same scale. However, it is possible to make the scales roam free making the charts look more evenly distributed by setting the argument scales=free.

gg1 + facet_wrap(color ~ cut, scales="free")  # row: color, column: cut

For comparison purposes, you can put all the plots in a grid as well using facet_grid(formula).

gg1 + facet_grid(color ~ cut)   # In a grid

ggplot_6
Note, the headers for individual plots are gone leaving more space for plotting area.

This post was originally published at r-statistics.co. The full post is available in this ggplot2 tutorial.