Digital Education Resources - Vanderbilt Libraries Digital Lab
Previous lesson: Extracting and changing DataFrame data
Preparing data for analysis and visualization can involve cleaning, reformatting, summarizing, and changing the organization of the data. This data-wranging lessons introduces ways to summarize and rearrange data to “wrangle” your data into a usable form.
Learning objectives At the end of this lesson, the learner will:
.groupby()
and .get_group()
..sum()
or .mean()
with or without .groupby()
..unstack()
.melt()
function..pivot()
method.Total video time: 17m 34s
Lesson Jupyter notebook at GitHub
Grouping is used to group rows that have a common label index. Groups can then be used to collapse the table by summarizing grouped rows using a method like .sum()
, .mean()
, etc.
Examples:
# Group by a column, then view one of the grouping values:
co2_state_grouped = state_co2.groupby(['State'])
co2_state_grouped.get_group(('Texas'))
# Group by a column, then create a collapsed view by summing the values by group:
co2_sector_grouped = state_co2.groupby(['Sector'])
total_co2_sector = co2_sector_grouped.sum()
Example of summing rows, then unstacking to form a “wide” DataFrame:
# Load the data
url = 'https://github.com/HeardLibrary/digital-scholarship/raw/master/data/codegraf/co2_data.xlsx'
state_co2 = pd.read_excel(url)
# Change two generic columns to row index labels
double_label = state_co2.copy().set_index(['Sector', 'State'])
# Collapse the remaining columns by summing their values
year_total = double_label.sum(axis='columns')
# Generate the "wide" DataFrame by unstacking the values and creating columns based on the Sector value
column_df = year_total.unstack('Sector')
Example of changing a “wide” DataFrame to a “long” DataFrame using the melt()
function:
long = pd.melt(wide, ['State'])
Notes:
Example of changing a “long” DataFrame to a “wide” DataFrame using the .pivot()
method:
state_wide = long.pivot('State', 'Sector', 'value')
Notes:
Next lesson: Introduction to plotting
Revised 2022-11-15
Questions? Contact us
License: CC BY 4.0.
Credit: "Vanderbilt Libraries Digital Lab - www.library.vanderbilt.edu"