Digital Education Resources - Vanderbilt Libraries Digital Lab

CodeGraf landing page

Complex data structures

Reading and writing CSV files

The csv module provides objects that can be used to simplify the process of reading from and writing to CSV files. In this lesson, we will see how we can use these tools to create a list of lists or list of dictionaries from a file locally or from an Internet URL.

Learning objectives At the end of this lesson, the learner will be able to:

Total video time: 33 m 46 s

Lesson slides

Lesson Jupyter notebook at GitHub

Lesson Colab notebook


CSV files

About CSV files (7m29s)

An extremely common way to store tabular data is in fielded text files, commonly called “CSV” (comma separated values) files. “Fielded text” is probably a better term, because the fields in the text aren’t always delimited by commas. It is also fairly common for fields to be separated by tabs or some other characters. But it’s still common to call them “CSV” files regardless of the delimiter.

A CSV file is simply a text file where the rows are on separate lines terminated by newlines, and the fields (i.e. columns) within the row are separated by commas. Here’s an example:

given_name,family_name,username,student_id
Jimmy,Zhang,rastaman27,37258
Ji,Kim,kimji8,44947
Veronica,Fuentes,shakira<3,19846

If you save this text with a text editor, then open the file with a spreadsheet program like Excel, Open Office, or Libre Office, it will look something like this:

It’s more common to create the CSV file directly in the spreadsheet program, then use Save As… to save it in the CSV format.

Note: Although it is convenient to use Excel to work with CSV files since most people have Excel on their computers, if you are going to do serious work with CSV files using Excel is NOT recommended. The reason for this is that Excel automatically converts some strings that include dashes (typically ID numbers or other codes) into dates when it imports them. Because CSV files do not contain any metadata describing the format of the files, there is no known way to prevent this problem. It is a frustrating and insidious problem that can ruin a large dataset in ways that are difficult to repair. So it is better to install and use Libre Office (open source) or Open Office and use them instead.


Generating a CSV file from text (1m52s)

When saving a newly-created CSV file, you should use Save As… In Libre or Open Office, there is a checkbox in the dialog where you have the option to Edit Filter Settings.

The subsequent dialog box allows you to select a field delimiter, which will usually default to comma:

There is also a dropdown where you can select UTF-8 as the file encoding:

If your file contains only ASCII characters (Latin alphabet, numeric, and typical symbol characters), the Edit Filter Settings isn’t that critical, but if your text contains any non-Latin language characters, unusual symbols, or letters with diacritics, it’s critical to make sure that the file is saved as UTF-8.

Libre or Open Office tend to assume that you want to use the last delimiter and character encoding that you used previously, so once you have saved a CSV file in this manner, it’s usually safe to open and close it by just saving without going through the Save As... dialog.


CSV reader/writer module

Using the csv.reader() object (2m44s)

Example code:

import csv

with open('students.csv', 'r', newline='', encoding='utf-8') as file_object:
    # The reader() object is instantiated and assigned to a variable.
    reader_object = csv.reader(file_object)
    print(type(reader_object))
    print()
    
    # The reader object is iterable
    for row in reader_object:
        # each iterated item is a Python list
        print(type(row))
        print(row)

Function to read from a CSV into a list of lists (3m44s)

Reusable function to read a CSV file as a list of lists:

import csv

path = '' # uncomment this line for a local installation
#path = '/content/drive/My Drive/' # uncomment this line if using Colab

# The function takes the file path as an argument and returns a list of lists
def read_csv_to_list_of_lists(file_path):
    with open(file_path, 'r', newline='', encoding='utf-8') as file_object:
        reader_object = csv.reader(file_object)
        list_of_lists = []
        for row_list in reader_object:
            list_of_lists.append(row_list)
    return list_of_lists

# Main script
student_info = read_csv_to_list_of_lists(path + 'students.csv')
print(student_info)

Writing a list of lists to a CSV (2m52s)

Example code:

import csv

path = '' # uncomment this line for a local installation
#path = '/content/drive/My Drive/' # uncomment this line if using Colab

data = [ ['name', 'company', 'nemesis'], ['Mickey Mouse', 'Disney', 'Donald Duck'], ['Road Runner', 'Warner Brothers', 'Wile Ethelbert Coyote'] ]
with open(path + 'cartoons.csv', 'w', newline='', encoding='utf-8') as file_object:
    writer_object = csv.writer(file_object)
    for row in data:
        print(row)
        writer_object.writerow(row)

Function to read from a CSV into a list of dictionaries (8m07s)

Example code:

import csv

path = '' # uncomment this line for a local installation
#path = '/content/drive/My Drive/' # uncomment this line if using Colab

with open('cartoons.csv', 'r', newline='', encoding='utf-8') as file_object:
    # The DictReader() object is instantiated and assigned to a variable.
    reader_object = csv.DictReader(file_object)
    # The iterable items in a DictReader object are a special kind of dictionary (OrderedDict).
    # But we can use them like regular dictionary if we ignore that they are ordered.
    print(type(reader_object))
    print()
    
    # If we want to reuse the row dictionaries, we can add them to a list.
    cartoon_table = []
    for row_list in reader_object:
        print(type(row_list))
        print(row_list)
        cartoon_table.append(row_list)

print()
# We refer to items in the row lists by their keys, just as we do for normal dictionaries.
# Because each row has its own dictionary, we must specify the row in the first square brackets.
print(cartoon_table[1]['name'] + ' works for ' + cartoon_table[1]['company'] + '. Its enemy is ' + cartoon_table[1]['nemesis'])
print()
for character in cartoon_table:
    print('Character name:', character['name'], ' company:', character['company'], ' nemesis:', character['nemesis'])

Example code including a reusable function for reading from a CSV file into a list of dictionaries. In the following example, the code to read from the CSV file to a list of dictionaries is placed in a function. The file path is passed in as the only argument of the function. The function returns a single object, a list of dictionaries containing the data from the CSV file. The keys for the dictionaries are taken from the header row of the CSV.

The main script is a modification of the earlier script that looks up cartoon characters. By using a file rather than hard-coding the characters data, it’s easier to include a lot more information and to change it by updating the CSV file as a spreadsheet.

import csv

path = '' # uncomment this line for a local installation
#path = '/content/drive/My Drive/' # uncomment this line if using Colab

# The function takes the file path as an argument and returns a list of lists
def read_csv_to_list_of_dicts(filename):
    with open(filename, 'r', newline='', encoding='utf-8') as file_object:
        dict_object = csv.DictReader(file_object)
        list_of_dicts = []
        for row_dict in dict_object:
            list_of_dicts.append(row_dict)
    return list_of_dicts

# Main script
cartoons = read_csv_to_list_of_dicts(path + 'cartoons.csv')
name = input("What's the character? ")
found = False
for character in cartoons:
    if name.lower() in character['name'].lower():
        if character['nemesis'] == '':
            print("I don't know the nemesis of " + character['name'])
        else:
            print(character['name'] + " doesn't like " + character['nemesis'])
        found = True
if not found:
    print("Sorry, I don't know that character.")

You can download a CSV file with around 4000 cartoon characters from here. Right click on the Raw button and select Save file as.... Save the file in the same directory as your Jupyter notebook if you are using a local installation, or in the root of your Google Drive if using Colab. Note: if your browser changes the file extension to .txt, you may need to change the format from text to All Files, then manually change the extension in the dialog from .txt to .csv.


Writing a list of dictionaries to a CSV file (2m08s)

Reusable function for writing a list of dictionaries to a CSV file (specifying the header names):

import csv

def write_dicts_to_csv_fieldnames(list_of_dicts, file_path, field_names):
    with open(file_path, 'w', newline='', encoding='utf-8') as csv_file_object:
        writer = csv.DictWriter(csv_file_object, fieldnames=field_names)
        writer.writeheader()
        for row_dict in list_of_dicts:
            writer.writerow(row_dict)

field_names = ['name', 'company', 'nemesis']
data = [ {'name': 'Mickey Mouse', 'company': 'Disney', 'nemesis': 'Donald Duck'}, {'name': 'Road Runner', 'company': 'Warner Brothers', 'nemesis': 'Wile Ethelbert Coyote'} ]
path = 'mini-cartoon-table.csv'
write_dicts_to_csv_fieldnames(data, path, field_names)

Reusable function for writing a list of dictionaries to a CSV file (header names taken from the first row):

import csv

def write_dicts_to_csv(list_of_dicts, file_path):
    field_names = list_of_dicts[0].keys()
    with open(file_path, 'w', newline='', encoding='utf-8') as csv_file_object:
        writer = csv.DictWriter(csv_file_object, fieldnames=field_names)
        writer.writeheader()
        for row_dict in list_of_dicts:
            writer.writerow(row_dict)

data = [ {'name': 'Mickey Mouse', 'company': 'Disney', 'nemesis': 'Donald Duck'}, {'name': 'Road Runner', 'company': 'Warner Brothers', 'nemesis': 'Wile Ethelbert Coyote'} ]
path = 'another-cartoon-table.csv'
write_dicts_to_csv(data, path)

Reading a CSV file from the Internet (4m50s)

Nashville schools data on GitHub.

Reusable function for reading a CSV on the Internet into a list of lists via a URL.

import requests
import csv

def url_csv_to_list_of_lists(url):
    r = requests.get(url)
    file_text = r.text.splitlines()
    file_rows = csv.reader(file_text)
    list_of_lists = []
    for row in file_rows:
        list_of_lists.append(row)
    return list_of_lists
        
# Main script
url = 'https://raw.githubusercontent.com/HeardLibrary/digital-scholarship/master/data/gis/wg/Metro_Nashville_Schools.csv'
schools_data = url_csv_to_list_of_lists(url)

# print the IDs and names of all of the schools
print(schools_data[0][2] + '\t' + schools_data[0][3])
for school in range(1, len(schools_data)):
    print(schools_data[school][2] + '\t' + schools_data[school][3])

Reusable function for reading a CSV on the Internet into a list of dictionaries via a URL.

import requests
import csv

def url_csv_to_list_of_dicts(url):
    r = requests.get(url)
    file_text = r.text.splitlines()
    file_rows = csv.DictReader(file_text)
    list_of_dicts = []
    for row in file_rows:
        list_of_dicts.append(row)
    return list_of_dicts

# Main script
url = 'https://raw.githubusercontent.com/HeardLibrary/digital-scholarship/master/data/gis/wg/Metro_Nashville_Schools.csv'
schools_data = url_csv_to_list_of_dicts(url)

# use the dictionary to look up a school ID
school_name = input("What's the name of the school? ")
found = False
for school in schools_data:
    if school_name.lower() in school['School Name'].lower():
        print('The ID number for', school['School Name'], 'is: ' + school['School ID'])
        found = True
if not found:
    print("I couldn't find that school.")

Practice exercises

1. In this lesson segment there is example code showing how to allow a user to enter an indefinite number of items and add each one to a list. Modify this script to allow the user to enter a catalog number, label, and price for each item in a catalog, pressing Enter/Return when finished. Add each characteristic of the item to a dictionary for that item (see this segment if you don’t remember how). Append each dictionary to a list. When the user is done entering the items, write the list of dictionaries to a CSV file. Since every dictionary has the same keys (catalog_number, label, and price), you can get the column headers from the first dictionary in the list rather than having to enter it explicitly. After running your script, open the CSV file to verify that it works.

2. Here is some code that generates a deck of cards and that deals cards to players:

import random

# returns a list of 52 card descriptions
def make_deck():
    suits = ['hearts', 'spades', 'clubs', 'diamonds']

    # generate the deck of cards
    deck = []
    for suit in suits:
        deck.append('A ' + suit)
        for num in range(2,11):
            deck.append(str(num) + ' ' + suit)
        deck.append('J ' + suit)
        deck.append('Q ' + suit)
        deck.append('K ' + suit)
    random.shuffle(deck)
    return deck

# returns a list of lists
def deal_cards(deck, n_players, n_cards):
    hands = []
    for player_number in range(n_players):
        hand = []
        for card_number in range(n_cards):
            hand.append(deck[0])
            del deck[0]
        hands.append(hand)
    return(hands)

Create a deck of cards, then deal poker hands (5 cards) to four players. The deal_cards() function returns a list of lists. Save the list of lists to a CSV file. Open the CSV file. How are the cards organized in the file? Is there a header row?

3. Modify the deal_cards() function above to return both the hands and the leftover cards in the deck. You can do this by putting two values in the return statement like this:

return(hands, deck)

When you call the function, you can assign the returned values to two variables like this:

hands, leftover_cards = deal_cards(deck, 4, 5)

Modify the script you made above to save both the dealt hands and the leftover deck. If you don’t remember how to save a list to a file, see this segment. Now write a second script that will read the leftover cards into a list and read the saved hands into a list of lists. Using nested for loops, show each player the cards that were in their saved hands.


Next lesson: pandas Series or

optional lesson on NumPy arrays


Revised 2021-01-31

Questions? Contact us

License: CC BY 4.0.
Credit: "Vanderbilt Libraries Digital Lab - www.library.vanderbilt.edu"