Digital Education Resources - Vanderbilt Libraries Digital Lab

Previous lesson: displaying complex data

# R viz using ggplot: Controlling plot dimensions

In the plots of continuous data that we have created so far, we have used standard, linear X and Y axes. In this lesson, we will see how to change the number and types of axes.

Learning objectives At the end of this lesson, the learner will be able to:

• control the direction and scale of axes using `scale_x_continuous` and `scale_x_reverse`
• set the number of breaks on a scale
• create plots with log scales using `scale_x_log10`
• use polar coordinates to construct pie charts
• generate a simple map using `coord_quickmap` and overlay formatted points using lat/lon data.
• create 3 dimensional visualizations using `geom_contour` and `geom_tile`
• add text labels to a plot
• add horizontal or vertical lines to a plot

Total video time: n/a

Lesson R script at GitHub

Lesson slides

ggplot function reference

# Manipulating axes

We can change the scale or direction of either axis if it makes patterns in the data more apparent. We can also control the number of scale lines shown on the plot background.

## Reversing axes

To reverse an axis, use `scale_x_reverse()` or `scale_y_reverse()`

``````ggplot(data = lion_noses, aes(x = proportionBlack, y = ageInYears)) +
geom_point() +
geom_smooth(method = "lm") +
scale_x_reverse()
``````

## Setting the breaks on a scale

The numbered divisions on a scale can be controlled by specifing their interval or number. Note: The `scales::` notation calls the functions from the `scales` package without loading the entire package. `scales::breaks_width()` sets the width of the scale intervals. `scales::breaks_extended()` suggests the number of intervals to be shown on the graph, although it’s a suggestion and the actual number will vary.

``````ggplot(data = lion_noses, aes(x = proportionBlack, y = ageInYears)) +
geom_point() +
geom_smooth(method = "lm") +
scale_x_continuous(breaks = scales::breaks_extended(n = 10)) + # set number of breaks
scale_y_continuous(breaks = scales::breaks_width(2)) # set width of breaks
``````

## Changing to a log scale

The scale of an axis can be changed from linear to logarithmic using `scale_x_log10()`. For other scale transformations, see this section.

``````ggplot(data = fish_species, aes(x = poolArea, y = nFishSpecies)) +
geom_point() +
geom_smooth(method = "lm") +
scale_x_log10()
``````

# Polar axes and pie charts

Some less well-known types of visualizations depend on using polar rather than rectangular coordinates. However, polar axes can also be used to generate a familiar type of plot: pie charts.

Pie charts are not considered a particularly great way to visualize data. However, since they are in widespread use, it’s useful to know how to generate them in ggplot.

The basic strategy is to create a column plot where the X value is a discontinuous (categorical) “dummy” variable that is constant. One way to accomplish this is to give X the value of empty string `""`, which will be treated by ggplot as a label-based category. The `position = "fill"` argument is used to make the columns the same size with the subbars showing the fractional values.

``````ggplot(data = co2, aes(x= "", y=metric_tons, fill = sector)) + # The x variable is a dummy variable
geom_col(width = 1, color = "black", position = "fill") # "fill" makes the bars fractional
``````

The `coord_polar(theta = "y")` function is then used to indicate that the y coordinate should be treated as a polar coordinate rather than rectangular. The x and y axis labels are meaningless, since x is a dummy variable and y has no units and is a fractional value. So the axis labels, break markers and distracting background colors can be eliminated using the `theme_void()` function.

``````ggplot(data = co2, aes(x= "", y=metric_tons, fill = sector)) +
geom_col(width = 1, color = "black", position = "fill") +
coord_polar(theta = "y") + # use the y variable for the polar coordinate
theme_void() # add this to get rid of distracting background and labels
``````

To create a set of side-by-side pie charts for comparison, you can facet on an additional categorical variable.

``````ggplot(data = co2, aes(x= "", y=metric_tons, fill = sector)) +
geom_col(width = 1, color = "black", position = "fill") +
coord_polar(theta = "y") + # use the y variable for the polar coordinate
facet_wrap(~State) +
theme_void() # add this to get rid of distracting background and labels
``````

# Maps

Simple maps can be generated by treating decimal longitudes and latitudes as typical x and y coordinates and plotting them as polygons on axes with equal scales. This ignores any aspects of projection, which can make the map very distorted if the locations are far from the equator.

The `sf` package uses a more sophisticated way to encode the vectors that are used to construct lines on maps. It is based on the “simple features” standard of the Open Grospatial Consortium. The sf package supports various projections as well as many other features for manipulating maps. For more information on creating maps using the `sf` package, see this page.

Basic ggplot includes a library called `maps` that provides some basic map outline data that can be plotted as polygons. It must be installed prior to use:

``````install.packages("maps")
``````

Once installed, you can pull data about particular geographic features into a data frame and use it to create outline maps.

``````tennessee_counties <- map_data("county", "tennessee")
ggplot() +
geom_polygon(data = tennessee_counties, mapping = aes(long, lat, group = group), fill = "white", colour = "grey50") +
coord_quickmap() # forces x and y scales to be the same
``````

To overlay point data, use `geom_point` as you would for any scatterplot. The color and shape of the points can be controlled using aesthetics based on third variable.

``````tennessee_counties <- map_data("county", "tennessee")
# filter Davidson County outline data
davidson <- filter(tennessee_counties, subregion == "davidson")
filter(`School Level` == "Middle School" | `School Level` == "High School" | `School Level` == "Elementary School" )
ggplot() +
geom_polygon(data = davidson, mapping = aes(long, lat, group = group), fill = "white", colour = "grey50") +
geom_point(data = schools_data, mapping = aes(x = Longitude, y = Latitude, colour = `School Level`)) +
coord_quickmap()
``````

# Surface plots for three dimensions

We’ve seen ways to visualize more complex data by eliminating or summarizing categorical variables. If we have a third variable that is continuous, we can visualize its relationship to two other continuous variables using surface plots.

ggplot does not really support true 3D plots. There are other non-ggplot plotting packages like `plot3D` that will generate actual 3D plots. However, ggplot can visualize a third dimension on a two-dimensional plot by visualizing the “surface” of the third dimension using contour lines or colors.

One method of doing this is `geom_contour()`, which interpolates between the points and generates “contours” representing lines of equal height.

``````ggplot(data = filtered, aes(x = date, y = video_index)) +
geom_contour(aes(z = video_views))
``````

Another method is `geom_tile()`, which represents the height of rectangular tiles by color:

``````ggplot(data = filtered, aes(x = date, y = video_index)) +
geom_tile(aes(fill = video_views))
``````

The default single color ramp (blue) doesn’t distinguish between gradiations of the scale very well. `scale_fill_gradientn()` works better with rainbow chosen as the color:

``````ggplot(data = filtered, aes(x = date, y = video_index)) +
geom_tile(aes(fill = video_views)) +
``````

For cases where the data ranges across orders of magnitude, a log transformation can be done on the data before applying the scale:

``````ggplot(data = filtered, aes(x = date, y = video_index)) +
geom_tile(aes(fill = video_views)) +
``````

# Practice assignment

There are a number of built-in datasets included with the R installation that can be referenced without loading them from an external file. We will use some of them in the practice assignment. 