Data Overview

The sample images for these notebooks, stored in the cards folder, come from the National Archives and Records Administration of the United States (NARA).

Each card was created according to IBM 029 standard dimensions, with 80 columns of punch locations that each represent a character of text. In practice, most operations only used the first 73 columns to represent text and reserved the last few columns to record an order number. This was so that if a stack of cards was spilled on the floor, a machine could be used to re-sort the cards into the proper order. The columns in this NARA do not include an order number of that form.

Example of a punchcard front

The NARA produced these images as an experiment in algorithmically processing punchcards as images. Most of the cards were scanned on their reverse side, so as to exclude their ink guidelines and other marks, leaving only clearly punched holes on an otherwise blank card.

Example of the back of a punchcard

Loading the Images in Python

We are going to be loading and manipulating these images in a Python code environment, specifically using the Python Image Library (PIL). We use a recent spin-off project of PIL that is called Pillow, but you won't notice the difference expect in our requirements.txt file. The first step in using our images is to load a file into an Image object, using the PIL's Image class.

Exercise: Loading Images

Look at the code block below and then execute it to load and display an image. Now adjust the code to display a different punchcard image.

In [1]:
# Loading the Data
from PIL import Image

# Read the image file into an Image object:
image ="cards/C04D01L-1.png")
display(image)  # We use the IPython display function to show the image below.

PIL Image Object Attributes

The Image class can do more than display the image on the screen. It can also tell you more about the image. See the full list of Image class attributes on the Pillow website.

In [2]:
# Here we print out some attributes of the image object:
(1152, 544)
{'gamma': 0.45455, 'dpi': (150, 150)}

My Dataset Notes

(This area provided for students to record their notes on the dataset.)

Next: Image Preparation

In [ ]:
In [ ]: