Digital Education Resources - Vanderbilt Libraries Digital Lab

Previous lesson: introduction to ggplot

R viz using ggplot: Controlling the appearance of plots

In this lesson we will learn how to control details of plots, such as colors used, orientation of features, and descriptive elements associated with the plot.

Learning objectives At the end of this lesson, the learner will be able to:

Total video time: n/a

Links

Lesson R script at GitHub

Lesson slides

ggplot2 book - colour scales and legends

ggplot2 book - position scales and axes


Controlling colors for geom_point (X-Y scatterplot)

Setting fixed colors for a layer

Specifying color by name

ggplot(data = rel_data, mapping = aes(x = limited_proficiency, y = economically_disadvantaged)) +
  geom_point(color = "red")

Specifying color by hex code

ggplot(data = rel_data, mapping = aes(x = limited_proficiency, y = economically_disadvantaged)) +
  geom_point(color = "#349c35")

Setting colors based on a factor using a color aesthetic

Use the color attribute of the aesthetic to vary color based on a factor

ggplot(data = three_level, mapping = aes(x = limited_proficiency, y = economically_disadvantaged)) +
  geom_point(aes(color = as.factor(`School Level`)))

Setting and using a color palette for color vision deficiency

cbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7", "#999999")
ggplot(data = three_level, mapping = aes(x = limited_proficiency, y = economically_disadvantaged)) +
  geom_point(aes(color = as.factor(`School Level`))) + # not required to convert to factor if character
  scale_colour_manual(values=cbPalette)

Note: Use scale_fill_manual instead of scale_colour_manual when using color as a fill instead of points.

Setting colors based on a continuous variable using a color aesthetic

Use the color attribute of the aesthetic to vary color based on a continuous variable

ggplot(data = three_level, mapping = aes(x = limited_proficiency, y = economically_disadvantaged)) +
  geom_point(aes(color = percent_white)) +
  scale_colour_gradientn(colours=rainbow(4)) # the number controls the number of colors included

Note: Omitting scale_colour_gradientn will use the default color ramp.

Controlling point markers

Code to generate a table showing all of the available point markers and their numeric codes (from here)

d=data.frame(p=c(0:25,32:127))
ggplot() +
  scale_y_continuous(name="") +
  scale_x_continuous(name="") +
  scale_shape_identity() +
  geom_point(data=d, mapping=aes(x=p%%16, y=p%/%16, shape=p), size=5, fill="red") +
  geom_text(data=d, mapping=aes(x=p%%16, y=p%/%16+0.25, label=p), size=3)

Setting fixed point characteristics

Use shape, size, color, and fill to control the appearance of the markers

ggplot(data = rel_data, mapping = aes(x = limited_proficiency, y = economically_disadvantaged)) +
  geom_point(shape = 24, size = 5, color = "blue", fill = "yellow")

Using an aesthetic to set point characteristics

The size of a point marker can be controlled in an aesthetic by a continuous variable

ggplot(data = rel_data, mapping = aes(x = limited_proficiency, y = economically_disadvantaged)) +
  geom_point(aes(size = percent_white), shape = 2, color = "blue")

The shape of a point marker can only be controlled in an aesthetic by a categorical variable (factor)

ggplot(data = three_level, mapping = aes(x = limited_proficiency, y = economically_disadvantaged)) +
  geom_point(aes(shape = `School Level`), size = 3, color = "blue")

Filled plots

Single variable plots

The outline (color) and interior (fill) of a bar can be manually specified by either name or hex code

ggplot(data = schools_data) +
  geom_histogram(mapping = aes(x = Female), binwidth = 100, color = "red", fill = "gray")

ggplot(data = schools_data) +
  geom_histogram(mapping = aes(x = Female), binwidth = 100, color = "#349c35", fill = "#9C349B")

The color of the category bars can be set in the aesthetic by setting the fill argument to the same column as is used for the x variable.

ggplot(data = three_level) +
  geom_bar(mapping = aes(x = `School Level`, fill = `School Level`), color = "black")

The default color scheme can be tweaked by modifying the luminance (amount of black) and chromo (saturation; intensity of color) using scale_fill_hue

ggplot(data = three_level) +
  geom_bar(mapping = aes(x = `School Level`, fill = `School Level`), color = "black") +
  scale_fill_hue(c = 80, l = 20)

**Note: Luminance (l) values vary from 0 to 100. Chroma (c) values vary starting at 0 and going up to an indefinite value (depends on the hue; could be around 400).

Positioning in two-variable geom_bar (counts bar plot)

If one categorical variable is used for x, another variable can be used for sub-bars using fill argument in the aesthetic.

ggplot(data = three_level) +
  geom_bar(mapping = aes(x = as.factor(`Zip Code`), fill = `School Level`))

Notice how this differs from the last example in which x and fill used the same column from the data frame.

The position argument can be used to change the way the sub-bars are stacked. A value of fill makes the total bars the same size and represents fractions, and a value of dodge places the sub-bars side-by-side

ggplot(data = three_level) +
  geom_bar(mapping = aes(x = as.factor(`Zip Code`), fill = `School Level`), position = "fill")

ggplot(data = three_level) +
  geom_bar(mapping = aes(x = as.factor(`Zip Code`), fill = `School Level`), position = "dodge")

geom_col (numeric column plot)

In a geom_col, the aesthetic’s has a y value that is a continuous variable. That makes the y-axis display that value rather than counts as in a geom_bar.

ggplot(data = co2) +
  geom_col(mapping = aes(x = State, y= metric_tons, fill = sector))

As with geom_bar, the position argument controls placement of the sub-bars

ggplot(data = co2) +
  geom_col(mapping = aes(x = State, y= metric_tons, fill = sector), position = "dodge")

Controlling category order

The order of levels in a factor determines the order in which elements of the plot are displayed. The order can be changed manually using the factor function.

co2$sector <- factor(co2$sector, c("Transportation", "Electric.Power", "Industrial", "Residential", "Commercial"))

The first argument is the factor vector to be modified and the second argument is a vector of labels indicating the order they should appear in the reordered factor vector.

The order of levels can also be changed by sorting based on some characteristic of a variable as it applies to rows that have that level. This is done with the reorder function. In the following example, the mean CO2 value for all the rows having a particular state level is used to order those state levels.

co2$State <- reorder(co2$State, -co2$Total, mean)

In this example, co2$Total is preceeded by a minus sign to make the ordering descending. The final argument (mean) determines how the set of values applying to a level are processed to produce a single number that can be used for sorting.

For figures in papers that can’t be in color, a grayscale color ramp can be used by adding a scale_fill_grey component to the plot.

ggplot(data = co2) +
  geom_col(mapping = aes(x = State, y= metric_tons, fill = sector), position = "dodge", color = "black") +
  scale_fill_grey()

Descriptive elements

Legends, axis labels, and title

The position of the legend can be controlled using theme(legend.position). Valid values are left, right, top, bottom, and none.

ggplot(data = co2) +
  geom_col(mapping = aes(x = State, y= metric_tons, fill = sector), position = "dodge", color = "black") +
  theme(legend.position = "left")

The legend can also be positioned within the plot using a vector composed of the X and Y position within the plot (ranging between 0 and 1)

ggplot(data = co2) +
  geom_col(mapping = aes(x = State, y= metric_tons, fill = sector), position = "dodge", color = "black") +
  theme(legend.position = c(.8, .8))

The labs object is used to set values of the X and Y axes, and title.

ggplot(data = co2) +
  geom_col(mapping = aes(x = State, y= metric_tons, fill = sector), position = "dodge", color = "black") +
  labs(
    x = "states with the largest total",
    y = "CO<sub>2</sub> released (in 10^6 metric tons)",
    title = "Fig. 3. Annual fossil fuel combustion in 2018 (EPA 430-R-18-003)"
    )

Examples of improving descriptive elements

The title of the legend is also changed using an argument of the labs function. The key of that argument is the aspect used in the aesthetic to visualize the variable described in the legend. For example, if a variable is used to vary the color of markers, color is used as the key for the argument:

ggplot(data = three_level, mapping = aes(x = limited_proficiency, y = economically_disadvantaged)) +
  geom_point(aes(color = as.factor(`School Level`))) + # not required to convert to factor if character
  labs(
    x = "percent limited English proficiency",
    y = "percent economically disadvantaged",
    color = "School level"
  )

If the axis labels are crowded together, they can be rotated using guides

ggplot(data = three_level) +
  geom_bar(mapping = aes(x = `Zip Code`, fill = `School Level`)) +
  guides(x = guide_axis(angle = 90)) +
  labs(
    y = "number of schools",
    title = "distribution of schools in Davidson County"
  )

Exporting plots as image files

Like everything else in R, ggplot plots are objects that can be assigned to named objects (i.e. variables). Assigning a name to a plot object avoids an unwieldy amount of code in the export function.

grayscale_figure <- ggplot(data = co2) +
  geom_col(mapping = aes(x = State, y= metric_tons, fill = sector), position = "dodge", color = "black") +
  scale_fill_grey()

The ggtext library is used to export plots as image files for use as figures in papers or diagrams in presentations. JPEG and PNG are supported natively. SVG images, which scale infinitely and are preferred by publishers, require installation of the svglite package.

ggsave("figure.png", plot=grayscale_figure, width = 20, height = 30, units = "cm")
ggsave("figure.svg", plot=grayscale_figure, width = 20, height = 30, units = "cm")

Note: ggsave apparently determines the export type based on the file extension used in the output file name.


Practice assignment

There are a number of built-in datasets included with the R installation that can be referenced without loading them from an external file. We will use some of them in the practice assignment.

  1. Load th

Next lession: Statistical functions


Revised 2021-09-17

Questions? Contact us

License: CC BY 4.0.
Credit: "Vanderbilt Libraries Digital Lab - www.library.vanderbilt.edu"