Digital Education Resources - Vanderbilt Libraries Digital Lab
Previous lesson: Dictionaries and complex data structures
In this lesson we will introduce for loops, which are ways to step through iterable objects like lists or to repeat an action many times. We will also learn several ways to control the flow through statements in your code. We will use variations on if
statements to do general conditional execution and use try
… except
… statements to trap errors. Finally, we will introduce the Matplotlib library to create plots of data.
Learning objectives At the end of this lesson, the learner will be able to:
for
loop..iterrows()
method to iterate through the rows of a pandas DataFrame.if
, elif
, and else
to control execution of code blocks in a script.try
and except
to control the behavior of a script when errors occur in particular lines of code.Lesson Jupyter notebook at GitHub
A very common task in Python is to repeat some code multiple times. For example, suppose we want to do something with every item in a list. A list is iterable, meaning that you can step through the list and operate on each of the items in the sequence. Here’s an example:
basket = ['apple', 'orange', 'banana', 'lemon', 'lime']
for fruit in basket:
print('I ate one ' + fruit)
print("I'm full now!")
Each time the script iterates to another item in the list, it repeats the indented code block below the for
statement and the value of the iterator (fruit
in this case) changes to the next item. Strings are also iterable:
word = 'supercalifragilisticexpialidocious'
print('Spell it out!')
for letter in word:
print(letter)
print('That wore me out.')
A common strategy in programming is to build a sequence by starting with an empty sequence and adding items to it one at a time. Here’s an example using strings:
list_of_words = ['The ', 'quick ', 'brown ', 'fox ', 'jumps ', 'over ', 'the ', 'lazy ', 'dog ']
sentence = ''
for word in list_of_words:
sentence = sentence + word # Concatenate the word to the sentence
print(sentence + '!')
The statement pattern sequence = sequence + item
is a common pattern in programming. It is so common that there is a shorthand for it: sequence += item
. Here’s the same example using the shorthand:
sentence = ''
for word in list_of_words:
sentence += word
print(sentence + '!')
We can use the same kind of strategy to add numbers:
total = 0
for number in [3, 5, 7, 9]:
total += number # Add the number to the total
print('The total is', total)
In both of these cases, we knew the items in advance. We can use the range()
function to generate a sequence of numbers that can be used to control the number of times we add items to a sequence. Here’s an example:
bird_list = []
for i in range(4):
bird = input('Enter a bird name: ')
bird_list.append(bird) # Append the bird name to the list
print('Your bird list is:', bird_list)
Typically, we don’t want to iterate through rows in a pandas DataFrame because it’s faster and more efficient to make changes to an entire column of the table using vectorized operations. However, there are some cases where it is difficult or impossible to do this and it’s necessary to operate on one row at a time. Examples of such situations include:
The .iterrows()
method creates an iterable object from the DataFrame. With each iteration, a tuple is generated that contains the label index as its first item and a series containing row data as the second item. (For more information about tuples, see the optional section at the end of the lesson.) The row data series has the column headers as the series label indices and the row values as the series values.
Values in the row series can be referenced as shown in this lesson, either by direct indexing:
row_series['column_header']
or by passing the series label (the column header) into .loc[]
:
row_series.loc['column_header']
Code example and notes:
name = 'Fred Flintstone'
# Evaluate the boolean value of is_micky based on the comparison of name and 'Mickey Mouse'
is_micky = name == 'Mickey Mouse'
print(name)
print(is_micky)
if is_micky:
print('You are a Disney character')
print('That is all!')
Notes:
==
is a comparison operator to test for equality. When name == 'Micky Mouse'
is evaluated, the resulting boolean value is assigned to the variable is_micky
. Other conditional operators are: !=
(not equal), >
(greater than), <=
(less than or equal to), etc.if
statement controls whether the code block following the colon is executed or not (don’t forget the colon!). If the value following the keyword if
has a value of True
, then the code block is executed. If the value is False
, the code block is not executed.print 'That is all!'
statement is not included in the code block, so it will be executed regardless of the condition.else
code example and notes:
name = input('What is the name of the character? ')
print(name)
if name == 'Mickey Mouse':
print('You are a Disney character')
print('You are almost ready to go out of copyright!')
else:
print('You are not a Disney character')
print('That is all?')
Notes:
Mickey Mouse
, the indented block after the if
clause is executed.else
clause is executed.We could nest if
statements inside else
clauses like this:
name = input('What is the name of the character? ')
if name == 'Mickey Mouse':
print('You are a mouse')
else:
if name == 'Donald Duck':
print('You are not a mouse')
else:
if name == 'Minnie Mouse':
print('Your boyfriend is getting old')
else:
print('You are not a Disney character')
print("That's all folks!")
Each of the subsequent if
clauses is executed only if the previous ones were False
. One problem with this structure is that it results in a complicated structure that is hard to read because of the many indentation levels.
The structure can be simplified by replacing elses that are followed immediately by ifs with a different keyword: elif
. Each elif
clause is only executed if the previous clauses are False
.
elif
code example and notes:
name = input('What is the name of the character? ')
if name == 'Mickey Mouse':
print('You are a mouse')
elif name == 'Donald Duck':
print('You are not a mouse')
elif name == 'Minnie Mouse':
print('Your boyfriend is getting old')
else:
print('You are not a Disney character')
print("That's all folks!")
If a user enters the wrong type of object, it can cause an error that will interrupt the execution of the script. We can guard against this using try
… except
. The indented block following the except
clause is executed when an error is thrown. Here is an example:
from math import pi
typed_in = input('What is the diameter of your circle? ')
try:
diameter = float(typed_in)
print('The circumference is:', diameter * pi)
except:
print("Sorry, you didn't enter a number.")
Here is another example to handle the situation where the script tries to access a dictionary item whose key does not exist:
catalog = {'1008':'widget', '2149':'flange', '19x5':'smoke shifter', '992':'poiuyt'}
part_number = input("What part do you want to order? ")
try:
print('You have ordered a ' + catalog[part_number])
except:
print("Sorry, that part isn't available.")
print("It's been a pleasure doing business with you!")
Matplotlib (https://matplotlib.org/) is a plotting library for Python that is built on the NumPy extension. One part of the library, pyplot
, is designed to operate in a fashion that is familiar to users of MATLAB. Typically, the input data for creating plots with Matplotlib are NumPy arrays, which we have not studied, but generic lists of numbers and pandas Series are also accepted as input. There are two interfaces for using Matplotlib. We will use the object-oriented interface, but you sometimes may see examples that look quite different because they use the other interface.
To import Pyplot, it is conventional to use:
import matplotlib.pyplot as plt
We will plot some interesting data described in Example 1.2 of Whitlock and Schluter about injuries sustained by cats falling out of apartment building windows. The data are from vet office records (not an experimental manipulation!) and look like this:
average_injury_rate = [0.7, 1.0, 1.9, 2.0, 2.3, 2.4, 1.0]
stories_fallen = [2, 3, 4, 5, 6, 8, 11]
Each corresponding item in the lists represents a particular data point (e.g. average injury rate for falling 2 stories was 0.7).
We could also acquire the values from a pandas DataFrame. If the first few rows of the DataFrame cat_injuries
look like this:
stories_fallen | average_injury_rate |
---|---|
2 | 0.7 |
3 | 1.0 |
4 | 1.9 |
we could refer to the columns of the DataFrame like this:
cat_injuries['average_injury_rate']
cat_injuries['stories_fallen']
These expressions would be evaluated as pandas Series objects, which are iterable and can be used as input to Matplotlib functions.
.
Creating a figure instance sets aside space for all of the plots in the figure. The figsize=
argument can be used to set the width and height of the figure. Example:
fig = plt.figure(figsize=(10, 10))
One to many subplot instances can be inserted into the figure. Example:
axes2 = fig.add_subplot(2, 1, 2)
creates two rows of plots with one plot in each row, and instantiates the second plot.
“Axes” is often used to refer to the subplots, hence the use of ax
as the name of a subplot object. This use of the term differs from the conventional use to indicate X and Y axes of the plot itself.
If we just want to create a single plot, we can use
ax = fig.add_subplot(1, 1, 1)
The show
function is not needed in Jupyter notebooks, but is required in stand-alone Python installations to display the plot.
plt.show()
Matplotlib provides methods to generate simple two-dimensional plot types. The first two arguments are the variables to be plotted as (x, y)
.
Plots generated from lists:
ax.scatter(stories_fallen, injury_rate, color='r') # dot plot (scatterplot) with red color
ax.plot(stories_fallen, injury_rate) # line plot (points connected by lines)
ax.bar(stories_fallen, injury_rate) # bar plot
# error bar plot with dot markers
ax.errorbar(stories_fallen, injury_rate, yerr=[lower_deviation, upper_deviation], fmt='o')
lower_deviation
and upper_deviation
are one-dimensional data structures (e.g. lists) that have the same length as the data list, with corresponding items.
Plots generated from pandas Series:
ax.scatter(cat_injuries['stories_fallen'], cat_injuries['average_injury_rate'], color='r')
ax.plot(cat_injuries['stories_fallen'], cat_injuries['average_injury_rate'])
ax.bar(cat_injuries['stories_fallen'], cat_injuries['average_injury_rate'])
ax.errorbar(cat_injuries['stories_fallen'], cat_injuries['average_injury_rate'], yerr=[lower_deviation, upper_deviation], fmt='o')
Axis labels are generated using two methods:
ax.set_xlabel('stories fallen')
ax.set_ylabel('average injury per cat')
To overlay a best-fit line or curve, you must calculate the function of the best-fit polynomial using numpy functions.
z = np.polyfit(stories_fallen, injury_rate, 2) # third argument is the order, e.g. 1=linear, 2=quadratic, etc.
p = np.poly1d(z)
The created function p()
can be plotted as a function of the x variable:
ax.plot(stories_fallen, p(stories_fallen))
Instructions: The questions for the practice assignment are in this Jupyter notebook at GitHub. Download it to your local drive within the GitHub repository you created for this course.
When you’ve finished the assignment, be sure to save a final time. Do not clear the output so that viewers can see what the output was when you submit the assignment. Commit the changes to the repository and push the changes to the remote repository on GitHub.
Go to the web page for the repository on GitHub and locate the notebook. Submit the link for the notebook web page to Brightspace as instructed.
The following topics introduce some additional concepts that are useful to know, but not critical for this lesson.
Another condition we can test for is whether a particular item is included in an iterable object using the in
keyword. Consider the following code:
test = int(input())
if test in range(0,5):
print(True)
else:
print(False)
If the number entered is 0, 1, 2, 3, or 4, True
will be printed. Here’s another example:
animal = input()
if animal in ['dog', 'cat', 'snake', 'bird']:
print(True)
else:
print(False)
If the input animal name is one of those in the list, True
will be printed.
This approach provides an alternative way to make sure that a dictionary key exists before trying to retrieve its value. There are several methods associated with dictionary items: .keys()
, .values()
, and .items()
. Each of these methods produces an iterable data structure containing parts of the dictionary. For example, catalog.keys()
will enumerate all of the keys in the dictionary and catalog.values()
will enumerate all of the values. Here’s how we can use that information to avoid a “KeyError” error:
catalog = {'1008':'widget', '2149':'flange', '19x5':'smoke shifter', '992':'poiuyt'}
print(catalog.keys())
part_number = input("What part do you want to order? ")
if part_number in catalog.keys():
print('You have ordered a ' + catalog[part_number])
else:
print("Sorry, that part isn't available.")
print("It's been a pleasure doing business with you!")
The structure of this code is similar to what we used in the try...except...
example, except that instead of trying to retrieve the value of the input key and handling the exception (i.e. error) if it fails, we check whether the key exists using an if...in...
statement and handle the False
condition with the else:
code block.
Tuples are a sequential data structure similar to lists. The items in a tuple can be referenced by indices, just like lists, but those items cannot be changed once the tuple is created. Literal tuples can be created by placing a list of objects in parentheses, like this:
animals_tuple = ('dog', 'cat', 'snake', 'bird')
print(animals_tuple[0])
print(animals_tuple[3])
The .items()
dictionary method generates an iterable object consisting of a sequence of tuples where each tuple is a pair consisting of a key and its corresponding value. We can use this code to explore how the .items()
dictionary method works:
catalog = {'1008':'widget', '2149':'flange', '19x5':'smoke shifter', '992':'poiuyt'}
print(catalog.items())
for item in catalog.items():
print(item, type(item))
We can unpack a tuple by assigning it’s parts to a sequence of variable names. The number of variable names must be the same as the number of items in the tuple. Here’s an example:
number, string = (1, 'xyz')
print(number, string)
If we iterate through the item tuples that result from the .items()
method, we can unpack them into separate number
and part
variables. Those variables can be used to print out the catalog to help users make a choice.
catalog = {'1008':'widget', '2149':'flange', '19x5':'smoke shifter', '992':'poiuyt'}
print('number', 'part')
print('-----------')
# Assign each key and value from the dictionary item to its own variable
for number, part in catalog.items():
print(number, part)
print()
part_number = input("Enter the number of the part you want to order: ")
print()
if part_number in catalog.keys():
print('You have ordered a ' + catalog[part_number])
else:
print("Sorry, that part isn't available.")
print("It's been a pleasure doing business with you!")
This is the end of the EES 2580 Beginning Python module.
If you want to continue on with lessons on vectorized programming with pandas, you can begin the lesson on NumPy arrays
Continue to the intermediate series on files and tables
Revised 2024-03-06
Questions? Contact us
License: CC BY 4.0.
Credit: "Vanderbilt Libraries Digital Lab - www.library.vanderbilt.edu"