Digital Education Resources - Vanderbilt Libraries Digital Lab

Previous lesson: displaying complex data

R viz using ggplot: Controlling plot dimensions

In the plots of continuous data that we have created so far, we have used standard, linear X and Y axes. In this lesson, we will see how to change the number and types of axes.

Learning objectives At the end of this lesson, the learner will be able to:

Total video time: n/a

Links

Lesson R script at GitHub

Lesson slides

ggplot function reference


Manipulating axes

We can change the scale or direction of either axis if it makes patterns in the data more apparent. We can also control the number of scale lines shown on the plot background.

Reversing axes

To reverse an axis, use scale_x_reverse() or scale_y_reverse()

ggplot(data = lion_noses, aes(x = proportionBlack, y = ageInYears)) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_x_reverse()

Setting the breaks on a scale

The numbered divisions on a scale can be controlled by specifing their interval or number. Note: The scales:: notation calls the functions from the scales package without loading the entire package. scales::breaks_width() sets the width of the scale intervals. scales::breaks_extended() suggests the number of intervals to be shown on the graph, although it’s a suggestion and the actual number will vary.

ggplot(data = lion_noses, aes(x = proportionBlack, y = ageInYears)) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_x_continuous(breaks = scales::breaks_extended(n = 10)) + # set number of breaks
  scale_y_continuous(breaks = scales::breaks_width(2)) # set width of breaks

Changing to a log scale

The scale of an axis can be changed from linear to logarithmic using scale_x_log10(). For other scale transformations, see this section.

ggplot(data = fish_species, aes(x = poolArea, y = nFishSpecies)) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_x_log10()

Polar axes and pie charts

Some less well-known types of visualizations depend on using polar rather than rectangular coordinates. However, polar axes can also be used to generate a familiar type of plot: pie charts.

Pie charts are not considered a particularly great way to visualize data. However, since they are in widespread use, it’s useful to know how to generate them in ggplot.

The basic strategy is to create a column plot where the X value is a discontinuous (categorical) “dummy” variable that is constant. One way to accomplish this is to give X the value of empty string "", which will be treated by ggplot as a label-based category. The position = "fill" argument is used to make the columns the same size with the subbars showing the fractional values.

ggplot(data = co2, aes(x= "", y=metric_tons, fill = sector)) + # The x variable is a dummy variable
  geom_col(width = 1, color = "black", position = "fill") # "fill" makes the bars fractional

The coord_polar(theta = "y") function is then used to indicate that the y coordinate should be treated as a polar coordinate rather than rectangular. The x and y axis labels are meaningless, since x is a dummy variable and y has no units and is a fractional value. So the axis labels, break markers and distracting background colors can be eliminated using the theme_void() function.

ggplot(data = co2, aes(x= "", y=metric_tons, fill = sector)) +
  geom_col(width = 1, color = "black", position = "fill") +
  coord_polar(theta = "y") + # use the y variable for the polar coordinate
  theme_void() # add this to get rid of distracting background and labels

To create a set of side-by-side pie charts for comparison, you can facet on an additional categorical variable.

ggplot(data = co2, aes(x= "", y=metric_tons, fill = sector)) +
  geom_col(width = 1, color = "black", position = "fill") +
  coord_polar(theta = "y") + # use the y variable for the polar coordinate
  facet_wrap(~State) +
  theme_void() # add this to get rid of distracting background and labels

Maps

Simple maps can be generated by treating decimal longitudes and latitudes as typical x and y coordinates and plotting them as polygons on axes with equal scales. This ignores any aspects of projection, which can make the map very distorted if the locations are far from the equator.

The sf package uses a more sophisticated way to encode the vectors that are used to construct lines on maps. It is based on the “simple features” standard of the Open Grospatial Consortium. The sf package supports various projections as well as many other features for manipulating maps. For more information on creating maps using the sf package, see this page.

Basic ggplot includes a library called maps that provides some basic map outline data that can be plotted as polygons. It must be installed prior to use:

install.packages("maps")

Once installed, you can pull data about particular geographic features into a data frame and use it to create outline maps.

tennessee_counties <- map_data("county", "tennessee")
ggplot() +
  geom_polygon(data = tennessee_counties, mapping = aes(long, lat, group = group), fill = "white", colour = "grey50") + 
  coord_quickmap() # forces x and y scales to be the same

To overlay point data, use geom_point as you would for any scatterplot. The color and shape of the points can be controlled using aesthetics based on third variable.

tennessee_counties <- map_data("county", "tennessee")
# filter Davidson County outline data
davidson <- filter(tennessee_counties, subregion == "davidson")
schools_data <- read_csv("https://github.com/HeardLibrary/digital-scholarship/raw/master/data/gis/wg/Metro_Nashville_Schools.csv") %>%
  filter(`School Level` == "Middle School" | `School Level` == "High School" | `School Level` == "Elementary School" )
ggplot() +
  geom_polygon(data = davidson, mapping = aes(long, lat, group = group), fill = "white", colour = "grey50") + 
  geom_point(data = schools_data, mapping = aes(x = Longitude, y = Latitude, colour = `School Level`)) + 
  coord_quickmap()

Surface plots for three dimensions

We’ve seen ways to visualize more complex data by eliminating or summarizing categorical variables. If we have a third variable that is continuous, we can visualize its relationship to two other continuous variables using surface plots.

ggplot does not really support true 3D plots. There are other non-ggplot plotting packages like plot3D that will generate actual 3D plots. However, ggplot can visualize a third dimension on a two-dimensional plot by visualizing the “surface” of the third dimension using contour lines or colors.

One method of doing this is geom_contour(), which interpolates between the points and generates “contours” representing lines of equal height.

ggplot(data = filtered, aes(x = date, y = video_index)) + 
  geom_contour(aes(z = video_views)) 

Another method is geom_tile(), which represents the height of rectangular tiles by color:

ggplot(data = filtered, aes(x = date, y = video_index)) + 
  geom_tile(aes(fill = video_views))

The default single color ramp (blue) doesn’t distinguish between gradiations of the scale very well. scale_fill_gradientn() works better with rainbow chosen as the color:

ggplot(data = filtered, aes(x = date, y = video_index)) + 
  geom_tile(aes(fill = video_views)) +
  scale_fill_gradientn(colours=rainbow(4)) 

For cases where the data ranges across orders of magnitude, a log transformation can be done on the data before applying the scale:

ggplot(data = filtered, aes(x = date, y = video_index)) + 
  geom_tile(aes(fill = video_views)) +
  scale_fill_gradientn(colours=rainbow(4), trans = "log") 

Practice assignment

There are a number of built-in datasets included with the R installation that can be referenced without loading them from an external file. We will use some of them in the practice assignment.

  1. Load th

Next lession: Interactive ggplots using Shiny


Revised 2021-10-03

Questions? Contact us

License: CC BY 4.0.
Credit: "Vanderbilt Libraries Digital Lab - www.library.vanderbilt.edu"