Digital Education Resources - Vanderbilt Libraries Digital Lab

Previous lesson: Reading and writing CSV files

NumPy arrays

Vectorized computing is introduced in this lesson by working with NumPy arrays. NumPy arrays are a data structure that consists of a single type of data and can be one, two, or many-dimensioned. The lesson shows how to specify single elements and slices of the array and how to carry out several simple types of calculations. Image manipulations are used as a practical example to illustrate these principles.

Learning objectives At the end of this lesson, the learner will:

Total video time: 52m 45s

Lesson Jupyter notebook at GitHub

Lesson Colab notebook

Lesson slides

Recommended reference: Python for Data Analysis, 2nd Edition by Wes McKinney via Vanderbilt’s catalog / direct link to O’Reilly (VUNet login required)

Vectorized computing with NumPy arrays

Review of looping through a list (3m10s)

Code example:

price_list = [3.89, 14.78, 20.01, 99.62, 0.47]
euro_conversion_factor = 0.93

# Use a for loop to multiply each item by the conversion factor
euro_list = []
for price in price_list:
    euro_list.append(price * euro_conversion_factor)

print(euro_list)

Introduction to vectorized computing (7m34s)

Direct link to Chapter 4 of Python for Data Analysis, “NumPy Basics: Arrays and Vectorized Computation” (VUNet login required)

Code example:

import numpy as np
price_list = [3.89, 14.78, 20.01, 99.62, 0.47]
euro_conversion_factor = 0.93

euro_array = price_array * euro_conversion_factor
print(euro_array)

The procedural approach requires the operation to be applied to each item one at a time.

The vectorized approach automatically applies the operation to all elements of the array.

Operations on NumPy arrays are carried out 10 to 100 times faster than on “normal” Python lists.


NumPy ndarrays

Array dimensions (3m52s)

One dimension is like a list:

[ 3.89 14.78 20.01 99.62 0.47]

Two dimensions are like a table:

[[34506 35446 40190 43824 46456]
 [45369 46894 43901 44870 45978] 
 [21554 28745 34369 43593 53982]]

More than two dimensions are possible

Code example:

annual_sales_cars = [34506, 35446, 40190, 43824, 46456]
annual_sales_trucks = [45369, 46894, 43901, 44870, 45978]
annual_sales_suvs = [21554, 28745, 34369, 43593, 53982]

annual_sales_array = np.array([annual_sales_cars, annual_sales_trucks, annual_sales_suvs])
print(annual_sales_array)

print(annual_sales_array.ndim)
print(annual_sales_array.shape)
print(annual_sales_array.dtype)

Array indices and slices (4m46s)

Image from “Python for data analysis: data wrangling with pandas, NumPy, and IPython” by Wes McKinney, Chapter 4

Example:

print(annual_sales_array)
print(annual_sales_array[0, 2])
print(annual_sales_array[0])
print(annual_sales_array[:, 2])

Array-based calculations (5m47s)

Code examples:

# Convert annual sales to daily sales
daily_sales_array = annual_sales_array/365
print(daily_sales_array)

Diagram of the where() function

assessment = np.where(trucks >= 45000, 'good year', 'bad year')
print(assessment)

Images

Representing images as ndarrays (5m54s)

Code example:

import skimage.data as data
import matplotlib.pyplot as plt

camera = data.camera()
print(type(camera)
print(camera)
plt.imshow(camera, cmap='gray')
plt.show();

Manipulating images (8m15s)

Code examples:

# Negative
inverse_camera = 255 - camera
plt.imshow(inverse_camera, cmap='gray')
plt.show();

# Threshold (black and white)
threshold_camera = data.camera()

# Method 1: Set pixels less than mid-gray to black
threshold_camera[threshold_camera < 128] = 0
# Set remaining pixels to white
threshold_camera[threshold_camera > 0] = 255
plt.imshow(threshold_camera, cmap='gray')
plt.show();

# Method 2: Screen pixels using np.where()
threshold_camera = np.where(camera < 128, 0, 255)
plt.imshow(threshold_camera, cmap='gray')
plt.show();

Thanks to Sanjay Mishra for these examples of making simple manipulations to images using NumPy.


Practice

  1. Create a one-dimensional numpy array containing only the annual SUV sales.
  2. Create a one-dimensional array of annual car sales, then calculate the difference between car and SUV sales for each year.
  3. Calculate the total sales for cars, trucks, and SUVs for the second year (index 1) using the np.sum() function, which take an array as its argument.
  4. Generalize the function for lightening a photo by any number of grayscale units. Ask the user for input, then display the picture.

Practice live coding

Practice problem 1 (1m28s)


Practice problem 2 (2m28s)


Practice problem 3 (3m00s)


Practice problem 4 (7m01s)


Next lesson: pandas Series


Revised 2021-01-31

Questions? Contact us

License: CC BY 4.0.
Credit: "Vanderbilt Libraries Digital Lab - www.library.vanderbilt.edu"