Introduction

This is a heavily modified version of a much more involved workshop that was made by the Data Science Services statistical software workshops from the Institute For Quantitative Social Science at Harvard. I’ve used much of their code, data, etc. The original version can be found here: http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html

Starting At The End

By the end of the workshop you will be able to reproduce a (highly) modified version of a graphic from the Economist:

Our version will look like this:

Get the Data

Download today’s datasets and put them into a sub-folder of your working directory named datasets.

What Is The Grammar Of Graphics?

The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want. Building blocks of a graph include:

  • data
  • aesthetic mapping
  • geometric object
  • statistical transformations
  • scales
  • coordinate system
  • position adjustments
  • faceting
  • themes

…and more! Today, we’ll be looking at data, aesthetic mapping, geometric object, scales, and themes.

Example Data: Housing prices

Let’s look at housing prices.

  housing <- read.csv("datasets/landdata-states.csv")
  head(housing[1:5])

(Data from https://www.lincolninst.edu/subcenters/land-values/land-prices-by-state.asp)

ggplot2 VS Base Graphics

Compared to base graphics, ggplot2

  • is more verbose for simple / canned graphics
  • is less verbose for complex / custom graphics
  • does not have methods (data should always be in a data.frame)
  • uses a different system for adding plot elements

Geometric Objects And Aesthetics

Aesthetic Mapping

In ggplot land aesthetic means “something you can see”. Examples include:

  • position (i.e., on the x and y axes)
  • color (“outside” color)
  • fill (“inside” color)
  • shape (of points)
  • linetype
  • size

Each type of geom accepts only a subset of all aesthetics–refer to the geom help pages to see what mappings each geom accepts. Aesthetic mappings are set with the aes() function.

Geometic Objects (geom)

Geometric objects are the actual marks we put on a plot. Examples include:

  • points (geom_point, for scatter plots, dot plots, etc)
  • lines (geom_line, for time series, trend lines, etc)
  • bars (geom_bar, for bar graphs)

A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator

You can get a list of available geometric objects using the code below:

  help.search("geom_", package = "ggplot2")

Points (Scatterplot)

Now that we know about geometric objects and aesthetic mapping, we can make a ggplot. geom_point requires mappings for x and y, all others are optional.

  hp2001Q1 <- subset(housing, Date == 20011) 
  ggplot(hp2001Q1,
         aes(y = Structure.Cost, x = Land.Value)) +
    geom_point()

Text (Label Points)

Each geom accepts a particualar set of aesthetic mappings–for example geom_text() accepts a labels mapping.

  p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))
  p1 + 
    geom_text(aes(label=State), size = 3)

Aesthetic Mapping VS Assignment

Note that variables are mapped to aesthetics with the aes() function, while fixed aesthetics are set outside the aes() call. This sometimes leads to confusion, as in this example:

  p1 +
    geom_point(aes(size = 2),# incorrect! 2 is not a variable
               color="red") # this is fine -- all points red

Mapping Variables To Other Aesthetics

Other aesthetics are mapped in the same way as x and y in the previous example.

  p1 +
    geom_point(aes(color=Home.Value, shape = region))

Exercise I

The data for the exercises is available in the datasets/EconomistData.csv file. Read it in with

  dat <- read.csv("datasets/EconomistData.csv")

Original sources for these data are http://www.transparency.org/content/download/64476/1031428 and http://hdrstats.undp.org/en/indicators/display_cf_xls_indicator.cfm?indicator_id=103106&lang=en

These data consist of Human Development Index and Corruption Perception Index scores for several countries from 2011.

  1. Create a scatter plot with CPI on the x axis and HDI on the y axis.
  2. Color the points in the previous plot according to Region.
  3. Change the size of all the points to 2.

Scales

Scales: Controlling Aesthetic Mapping

Aesthetic mapping (i.e., with aes()) only says that a variable should be mapped to an aesthetic. It doesn’t say how that should happy. For example, when mapping a variable to shape with aes(shape x)= you don’t say what shapes should be used. Similarly, aes(color =z) doesn’t say what colors should be used. Describing what colors/shapes/sizes etc. to use is done by modifying the corresponding scale. In ggplot2 scales include

  • position
  • color and fill
  • size
  • shape
  • line type

Scales are modified with a series of functions using a scale_<aesthetic>_<type> naming scheme.

Common Scale Arguments

The following arguments are common to most scales in ggplot2:

  • name: the first argument gives the axis or legend title
  • limits: the minimum and maximum of the scale
  • breaks: the points along the scale where labels should appear
  • labels: the labels that appear at each break

Specific scale functions may have additional arguments; for example, the scale_color_continuous function has arguments low and high for setting the colors at the low and high end of the scale.

Scale Modification Examples

Start by constructing a dotplot showing the distribution of home values by Date and State.

library(scales) 
p3 <- ggplot(housing, aes(x = State, y = Home.Price.Index)) +
        theme(legend.position="top",
              axis.text=element_text(size = 6))
p3
p3 + geom_point(aes(color = Date),
                    alpha = 0.5,
                    size = 1.5,
                    position = position_jitter(width = 0.25, height = 0))

Now modify the breaks and labels for the x axis and color scales

p4 + scale_x_discrete(name="State Abbreviation") +
  scale_color_continuous(name="",
                         breaks = c(1976, 1994, 2013),
                         labels = c("'76", "'94", "'13"))

Next change the low and high values to blue and red:

p4 +
  scale_x_discrete(name="State Abbreviation") +
  scale_color_continuous(name="",
                         breaks = c(1976, 1994, 2013),
                         labels = c("'76", "'94", "'13"),
                         low = "blue", high = "red")

Exercise II

  1. Create a scatter plot of the Economist data with CPI on the x axis and HDI on the y axis. Color the points to indicate region.
  2. Modify the x, y, and color scales so that they have more easily-understood names (e.g., spell out “Human Development Index” instead of “HDI”).
  3. Modify the color scale using scale_color_brewer and specifying one of the qualitative palettes (type ?scale_color_brewer to help you figure this step out).

Themes

The ggplot2 theme system handles non-data plot elements such as

Built-in themes include:

  p4 + theme_minimal()

Overriding theme defaults

Specific theme elements can be overridden using theme(). For example:

  p4 + theme_minimal() +
    theme(text = element_text(color = "turquoise"))

All theme options are documented in ?theme.

Creating and saving new themes

You can create new themes and assign them to a variable to call later, as in the following example:

theme_new <- theme_bw() +
  theme(plot.background = element_rect(size = 1, color = "blue", fill = "grey"),
        text=element_text(size = 12, color = "ivory"),
        axis.text.y = element_text(colour = "purple"),
        axis.text.x = element_text(colour = "red"),
        panel.background = element_rect(fill = "pink"))
p4 + theme_new

Putting It All Together

Challenge: Recreate This Economist Graph

Let’s try to recreate a modified version of the original Economist graph:

Building off of the graphics you created in the previous exercises, try to make a plot as close as possible to the following:

Different elements to add:

  • Add a title with ggtitle
  • Change the limits (start and end points) and breaks (intervals) on the X and Y axes
  • Manually input new labels for scale_color_brewer (hint: use the \n character for line breaks)
  • Change the shape of geom_point to be outlined circles rather than colored points (hint: http://www.sthda.com/english/wiki/ggplot2-point-shapes) and change the alpha to make them semi-transparent.
  • Specify a theme that has “minimal” features
  • Move the legend to the bottom of the plot (hint: look up legend.position and theme())
  • Save a PNG file of your graph using the following code: ggsave(filename = "images/final.png", plot = final, width = 4, height = 3, units="in", scale=1.7)