Data Extraction

In the last notebook we made extensive preparations, in order to make sure that an image was ready for reading by code we obtained from a 2012 blog post. The code from Michael Hamilton's post has been modified in some important ways:

  • Updated it for Python 3
  • Changed image color mode to 8-bit grayscale

The modified code has been contributed as a module to the PyPi repository, which means that it is available to python library tools, such as "pip" and the module "punchcards". In this notebook we are going to run a local copy of punchcards.py, which makes this set of notebooks more portable. We'll run the code on an image that was prepared according to the steps in the Image Preparation notebook.

In [2]:
from punchcard import PunchCard
from PIL import Image
image = Image.open('prepared_image.png')
# using 127 or neutral gray as threshold for hole vs. card
card = PunchCard(image, bright=127)
print('Punchcard Text: {}'.format(card.text))
print()
card.dump('my card')
Punchcard Text: MACON FORT                       4628      NORTH CAROLINA                       

 Card Dump of Image file: my card Format Dump threshold= 127
 123456789-123456789-123456789-123456789-123456789-123456789-123456789-123456789-
 ________________________________________________________________________________ 
/MACON FORT                       4628      NORTH CAROLINA                       |
|.OO...O........................................O.OO...O.O.......................|
|O..OO..OO..................................OOO.....OOO.O........................|
|.........O....................................O.................................|
|.O................................................O.....O.......................|
|...................................O............................................|
|..O......O....................................O..O...O..........................|
|O................................O..............................................|
|....O......................................O...........O........................|
|...O..OO..........................O.........O.......O...........................|
|................................................................................|
|....................................O..........O................................|
|........O....................................O.....O..O.........................|
`--------------------------------------------------------------------------------'
 123456789-123456789-123456789-123456789-123456789-123456789-123456789-123456789-

What Happened?

A detailed walkthrough of the PunchCard class is an excellent exercise, but that will require a separate series of notebooks. In broad strokes the PunchCard Python class take the following steps:

  1. Finds the edges of the card.
  2. Calculate the positions of holes.
  3. Measure brightness of each hole position.
  4. Interpret the holes as characters by looking up their position in a template.

You can inspect the detailed code by opening the "punchcard.py" file.

My Notes

(for student notes)

Return: Main Notebook

In [ ]: