There are several thousands of languages in the world and they all have in common that they are defined and explained by some set of rules. This set of rules is so called grammar and with its help individual and separate words (like nouns and verbs) are combined into the right and meaningful sentences. The similar methodology can be applied on creating graphics. If we imagine that each graph is made up from its basic parts — components, we can define basic set of rules by which we will define and combine components into right and meaningful visualizations (graphs). This set of rules is so called grammar of graphics and in this article we’ll explain the methodology and syntax for one of the most famous graphics packages in R — ggplot2.
The main idea that lies behind grammar of graphics is that each plot can be made from the same few components. Those components are:
Each component has its own set of rules and specific syntax (so called grammar of components) and together they are forming one single entity called graph.
In the next section we will introduce set of rules on two levels:
The first step in mastering the grammar is understanding individual components and their set of rules that are needed for proper definition and control. Below we are presenting short explanations for each of the components.
Data frame object as input data source
Description: Through this component we are defining input data set that will be used in visualization.
Syntax: data set name is defined inside ggplot() function. Ggplot() function initializes a new graph and data set name is one of its necessary arguments:
#graph initialization and data source
Set of rules: ggplot2 requires that data are stored in a tidy data frame object. It is the most popular R data type object used for storing tabular data. You can imagine data frame as a table which has variables in the columns and observations in the rows. Any other objects like matrices, lists or similar are not accepted by the ggplot2.
Mapping x,y to age and amount variable. Third variable gender is shown through color
Description: Component represents mapping between variables and visual properties like axes, size, shape and color. What will represent the axes on my plot? Beside variables that will represent axes do we want to see some additional informations? Through this component two variables will be mapped to horizontal and vertical axis. Additional informations (variables) can be added through color, shape and size.
Syntax: aesthetics are defined inside aes() function.
Set of rules: aes() function can be defined inside ggplot() or it can be defined inside other components like geometrical shapes and statistics. If aes() is defined inside ggplot() function then its definition is common for all components (for example x and y axis will be the same for all geometrical shapes on the graph). Otherwise, its definition is recognized only inside specific component.
#aesthetics that is common for all components - points and text
#aesthetics that is specified only for points
Description: Plot type definition. More precise, we are defining how our observations (points) will be displayed on the graph. There are many different types (like bar chart, histogram, scatter-plot, box-plot ,…) and each type is defined for specific number and types of variables.
Syntax: Syntax starts with geom_* and here are the most used shapes:
Set of rules:
#two geom shapes - geom_line and geom_point used on one graph
ggplot(GOT, aes(x=Episode,y=Number_of_viewers, colour=Season, group=Season)) + geom_line()+geom_point()
Famous statistical transformation — smoothing
Description: Component is used to transform the data (summarize the data in some matter) before visualization. Many of those transformations are used “behind the scene” during geometrical shapes creation. Often we don’t define them directly, ggplot2 is doing that for us.
Syntax: Syntax depends on a used transformation. Below are often used statistics:
Set of rules:
#define stat_*() function and geom argument inside that function
#define geom_*() function and stat argument inside that function
Controlling the colors with scaling
Description: With aesthetics we define what we want to see on the graph and with scaling we define how we want to see those aesthetics. We can control colors, sizes, shapes and positions. Scales also provide the tools that let us to read the plot: the axes and legends (we can customize axis titles, labels and their positions). Ggplot2 creates automatically default predefined scales for each aesthetics that we define. However, if we want to customize scales we can modify each scale component by ourselves.
Syntax: Basic syntax is following:
Basic scaling syntax
Here are the scales for different types of aesthetics:
Set of rules:
Description: With faceting we are dividing the data into subsets by some discrete variable and displaying the same type of a graph for each data subset.
Syntax: Facet_wrap() or facet_grid() function is used for displaying subsets of data.
Faceting — sub-plotting by col1 variable
ggplot(data_set, aes(col1,col2))+ geom_point()+
Set of rules:
Changing background color of the plot
Description: With themes it is possible to control non-data elements on the graph. With this component we don’t change a type of graph, scaling definition or used aesthetics. Instead of that, we are changing things like fonts, ticks, panel strips and background colors.
Syntax: There are several predefined themes and here is the list of some of them:
Each of this themes will change all theme elements to values which are designed to work together harmoniously (complete theme is changed, not just individual elements). However, if we want to change individual elements (for example just background color or just font of our title) we can use theme() function and specify the exact element we want to change.
Theme and element function
Set of rules: Each theme element (that is controlled via theme() arguments) is associated with an element function, which describes visual properties of that element. For example, if you want to set up background color you will need to define background color argument inside element_rect() function. If you decide to change axis labels you will need to define new labels inside element_text() function. Each argument in theme function needs to be defined with the help of one of these element_*() functions.
There are four basic element_functions and each is used in a combination with specific theme arguments:
Here is an example how we combine arguments with element functions:
ggplot(data_set_name, aes(col1,col2)) + geom_point() +
#panel background is used with element_rect()
theme(panel.background = element_rect(fill = "white",colour = "grey"))
Usually you’ll use predefined themes but it is useful to know that you can change each individual element using theme() function.
With that said, we explained basic rules related to each component of the graph. The next question which we ask ourselves is:” How are these components combined into one single entity called graph?”
Combining the components
After we defined each component separately we need to combine them together and create a proper and meaningful composition called graph.
Basic set of rules for combining:
Pseudo code is presented below:
ggplot(data_frame_name, aes()) +
component_for_geom2_*() + … +
component_for_themes_*() + ...
For the end we are presenting one real example:
ggplot2 — sub-plotting bar-charts
Result is a graph that looks like this:
ggplot2 — faceting bar-charts
In this article we showed in what way ggplot2 relies on grammar of graphics. It may seem complex at the beginning because there a lot of rules and topics to master. Firstly you need to understand each component separately — meaning, syntax and rules for each of them independently. After that, you need to additionally learn how to properly combine those component in a one single entity called graph. There is a lot of theory behind the scene. But once you overcome this theory you can control and modify anything you like on your plot so that is nothing left to chance. After mastering the grammar distance from mind to “paper” becomes really short — almost every your idea can be accurately transposed on the screen.
To read original blog , click here.