This is a heavily modified version of a much more involved workshop that was made by the Data Science Services statistical software workshops from the Institute For Quantitative Social Science at Harvard. I’ve used much of their code, data, etc. The original version can be found here: http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html
By the end of the workshop you will be able to reproduce a (highly) modified version of a graphic from the Economist:
Our version will look like this:
Download today’s datasets and put them into a sub-folder of your working directory named datasets
.
The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want. Building blocks of a graph include:
…and more! Today, we’ll be looking at data
, aesthetic mapping
, geometric object
, scales
, and themes
.
Housing prices
Let’s look at housing prices.
housing <- read.csv("datasets/landdata-states.csv")
head(housing[1:5])
(Data from https://www.lincolninst.edu/subcenters/land-values/land-prices-by-state.asp)
ggplot2
VS Base GraphicsCompared to base graphics, ggplot2
data.frame
)In ggplot land aesthetic means “something you can see”. Examples include:
Each type of geom accepts only a subset of all aesthetics–refer to the geom help pages to see what mappings each geom accepts. Aesthetic mappings are set with the aes()
function.
geom
)Geometric objects are the actual marks we put on a plot. Examples include:
geom_point
, for scatter plots, dot plots, etc)geom_line
, for time series, trend lines, etc)geom_bar
, for bar graphs)A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the +
operator
You can get a list of available geometric objects using the code below:
help.search("geom_", package = "ggplot2")
Now that we know about geometric objects and aesthetic mapping, we can make a ggplot. geom_point
requires mappings for x and y, all others are optional.
hp2001Q1 <- subset(housing, Date == 20011)
ggplot(hp2001Q1,
aes(y = Structure.Cost, x = Land.Value)) +
geom_point()
Each geom
accepts a particualar set of aesthetic mappings–for example geom_text()
accepts a labels
mapping.
p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))
p1 +
geom_text(aes(label=State), size = 3)
Note that variables are mapped to aesthetics with the aes()
function, while fixed aesthetics are set outside the aes()
call. This sometimes leads to confusion, as in this example:
p1 +
geom_point(aes(size = 2),# incorrect! 2 is not a variable
color="red") # this is fine -- all points red
Other aesthetics are mapped in the same way as x and y in the previous example.
p1 +
geom_point(aes(color=Home.Value, shape = region))
The data for the exercises is available in the datasets/EconomistData.csv
file. Read it in with
dat <- read.csv("datasets/EconomistData.csv")
Original sources for these data are http://www.transparency.org/content/download/64476/1031428 and http://hdrstats.undp.org/en/indicators/display_cf_xls_indicator.cfm?indicator_id=103106&lang=en
These data consist of Human Development Index and Corruption Perception Index scores for several countries from 2011.
Aesthetic mapping (i.e., with aes()
) only says that a variable should be mapped to an aesthetic. It doesn’t say how that should happy. For example, when mapping a variable to shape with aes(shape
x)= you don’t say what shapes should be used. Similarly, aes(color =z)
doesn’t say what colors should be used. Describing what colors/shapes/sizes etc. to use is done by modifying the corresponding scale. In ggplot2
scales include
Scales are modified with a series of functions using a scale_<aesthetic>_<type>
naming scheme.
The following arguments are common to most scales in ggplot2:
Specific scale functions may have additional arguments; for example, the scale_color_continuous
function has arguments low
and high
for setting the colors at the low and high end of the scale.
Start by constructing a dotplot showing the distribution of home values by Date and State.
library(scales)
p3 <- ggplot(housing, aes(x = State, y = Home.Price.Index)) +
theme(legend.position="top",
axis.text=element_text(size = 6))
p3
p3 + geom_point(aes(color = Date),
alpha = 0.5,
size = 1.5,
position = position_jitter(width = 0.25, height = 0))
Now modify the breaks and labels for the x axis and color scales
p4 + scale_x_discrete(name="State Abbreviation") +
scale_color_continuous(name="",
breaks = c(1976, 1994, 2013),
labels = c("'76", "'94", "'13"))
Next change the low and high values to blue and red:
p4 +
scale_x_discrete(name="State Abbreviation") +
scale_color_continuous(name="",
breaks = c(1976, 1994, 2013),
labels = c("'76", "'94", "'13"),
low = "blue", high = "red")
scale_color_brewer
and specifying one of the qualitative palettes (type ?scale_color_brewer
to help you figure this step out).The ggplot2
theme system handles non-data plot elements such as
Built-in themes include:
theme_gray()
(default)theme_bw()
theme_classic()
p4 + theme_minimal()
Specific theme elements can be overridden using theme()
. For example:
p4 + theme_minimal() +
theme(text = element_text(color = "turquoise"))
All theme options are documented in ?theme
.
You can create new themes and assign them to a variable to call later, as in the following example:
theme_new <- theme_bw() +
theme(plot.background = element_rect(size = 1, color = "blue", fill = "grey"),
text=element_text(size = 12, color = "ivory"),
axis.text.y = element_text(colour = "purple"),
axis.text.x = element_text(colour = "red"),
panel.background = element_rect(fill = "pink"))
p4 + theme_new
Economist
GraphLet’s try to recreate a modified version of the original Economist graph:
Building off of the graphics you created in the previous exercises, try to make a plot as close as possible to the following:
Different elements to add:
ggtitle
limits
(start and end points) and breaks
(intervals) on the X and Y axeslabels
for scale_color_brewer
(hint: use the \n
character for line breaks)shape
of geom_point
to be outlined circles rather than colored points (hint: http://www.sthda.com/english/wiki/ggplot2-point-shapes) and change the alpha
to make them semi-transparent.legend.position
and theme()
)ggsave(filename = "images/final.png", plot = final, width = 4, height = 3, units="in", scale=1.7)