Previous lesson: displaying complex data
In the plots of continuous data that we have created so far, we have used standard, linear X and Y axes. In this lesson, we will see how to change the number and types of axes.
Learning objectives At the end of this lesson, the learner will be able to:
coord_quickmapand overlay formatted points using lat/lon data.
Total video time: n/a
We can change the scale or direction of either axis if it makes patterns in the data more apparent. We can also control the number of scale lines shown on the plot background.
To reverse an axis, use
ggplot(data = lion_noses, aes(x = proportionBlack, y = ageInYears)) + geom_point() + geom_smooth(method = "lm") + scale_x_reverse()
The numbered divisions on a scale can be controlled by specifing their interval or number. Note: The
scales:: notation calls the functions from the
scales package without loading the entire package.
scales::breaks_width() sets the width of the scale intervals.
scales::breaks_extended() suggests the number of intervals to be shown on the graph, although it’s a suggestion and the actual number will vary.
ggplot(data = lion_noses, aes(x = proportionBlack, y = ageInYears)) + geom_point() + geom_smooth(method = "lm") + scale_x_continuous(breaks = scales::breaks_extended(n = 10)) + # set number of breaks scale_y_continuous(breaks = scales::breaks_width(2)) # set width of breaks
The scale of an axis can be changed from linear to logarithmic using
scale_x_log10(). For other scale transformations, see this section.
ggplot(data = fish_species, aes(x = poolArea, y = nFishSpecies)) + geom_point() + geom_smooth(method = "lm") + scale_x_log10()
Some less well-known types of visualizations depend on using polar rather than rectangular coordinates. However, polar axes can also be used to generate a familiar type of plot: pie charts.
Pie charts are not considered a particularly great way to visualize data. However, since they are in widespread use, it’s useful to know how to generate them in ggplot.
The basic strategy is to create a column plot where the X value is a discontinuous (categorical) “dummy” variable that is constant. One way to accomplish this is to give X the value of empty string
"", which will be treated by ggplot as a label-based category. The
position = "fill" argument is used to make the columns the same size with the subbars showing the fractional values.
ggplot(data = co2, aes(x= "", y=metric_tons, fill = sector)) + # The x variable is a dummy variable geom_col(width = 1, color = "black", position = "fill") # "fill" makes the bars fractional
coord_polar(theta = "y") function is then used to indicate that the y coordinate should be treated as a polar coordinate rather than rectangular. The x and y axis labels are meaningless, since x is a dummy variable and y has no units and is a fractional value. So the axis labels, break markers and distracting background colors can be eliminated using the
ggplot(data = co2, aes(x= "", y=metric_tons, fill = sector)) + geom_col(width = 1, color = "black", position = "fill") + coord_polar(theta = "y") + # use the y variable for the polar coordinate theme_void() # add this to get rid of distracting background and labels
To create a set of side-by-side pie charts for comparison, you can facet on an additional categorical variable.
ggplot(data = co2, aes(x= "", y=metric_tons, fill = sector)) + geom_col(width = 1, color = "black", position = "fill") + coord_polar(theta = "y") + # use the y variable for the polar coordinate facet_wrap(~State) + theme_void() # add this to get rid of distracting background and labels
Simple maps can be generated by treating decimal longitudes and latitudes as typical x and y coordinates and plotting them as polygons on axes with equal scales. This ignores any aspects of projection, which can make the map very distorted if the locations are far from the equator.
sf package uses a more sophisticated way to encode the vectors that are used to construct lines on maps. It is based on the “simple features” standard of the Open Grospatial Consortium. The sf package supports various projections as well as many other features for manipulating maps. For more information on creating maps using the
sf package, see this page.
Basic ggplot includes a library called
maps that provides some basic map outline data that can be plotted as polygons. It must be installed prior to use:
Once installed, you can pull data about particular geographic features into a data frame and use it to create outline maps.
tennessee_counties <- map_data("county", "tennessee") ggplot() + geom_polygon(data = tennessee_counties, mapping = aes(long, lat, group = group), fill = "white", colour = "grey50") + coord_quickmap() # forces x and y scales to be the same
To overlay point data, use
geom_point as you would for any scatterplot. The color and shape of the points can be controlled using aesthetics based on third variable.
tennessee_counties <- map_data("county", "tennessee") # filter Davidson County outline data davidson <- filter(tennessee_counties, subregion == "davidson") schools_data <- read_csv("https://github.com/HeardLibrary/digital-scholarship/raw/master/data/gis/wg/Metro_Nashville_Schools.csv") %>% filter(`School Level` == "Middle School" | `School Level` == "High School" | `School Level` == "Elementary School" ) ggplot() + geom_polygon(data = davidson, mapping = aes(long, lat, group = group), fill = "white", colour = "grey50") + geom_point(data = schools_data, mapping = aes(x = Longitude, y = Latitude, colour = `School Level`)) + coord_quickmap()
We’ve seen ways to visualize more complex data by eliminating or summarizing categorical variables. If we have a third variable that is continuous, we can visualize its relationship to two other continuous variables using surface plots.
ggplot does not really support true 3D plots. There are other non-ggplot plotting packages like
plot3D that will generate actual 3D plots. However, ggplot can visualize a third dimension on a two-dimensional plot by visualizing the “surface” of the third dimension using contour lines or colors.
One method of doing this is
geom_contour(), which interpolates between the points and generates “contours” representing lines of equal height.
ggplot(data = filtered, aes(x = date, y = video_index)) + geom_contour(aes(z = video_views))
Another method is
geom_tile(), which represents the height of rectangular tiles by color:
ggplot(data = filtered, aes(x = date, y = video_index)) + geom_tile(aes(fill = video_views))
The default single color ramp (blue) doesn’t distinguish between gradiations of the scale very well.
scale_fill_gradientn() works better with rainbow chosen as the color:
ggplot(data = filtered, aes(x = date, y = video_index)) + geom_tile(aes(fill = video_views)) + scale_fill_gradientn(colours=rainbow(4))
For cases where the data ranges across orders of magnitude, a log transformation can be done on the data before applying the scale:
ggplot(data = filtered, aes(x = date, y = video_index)) + geom_tile(aes(fill = video_views)) + scale_fill_gradientn(colours=rainbow(4), trans = "log")
There are a number of built-in datasets included with the R installation that can be referenced without loading them from an external file. We will use some of them in the practice assignment.
Next lession: Interactive ggplots using Shiny
Questions? Contact us