#!/usr/bin/env python
# coding: utf-8
# ## Next Steps
# ***
#
Address the Archival Silence
# While a city directory may seem like a definitive listing of people and businesses, this exercise shows the certain people and businesses were left out or not represented in the ways they would likely represent themselves. What stories can we find in the _American Jewish Yearbook_ that aren't in city directories?
# __Step 1:__ Download the _1910 American Jewish Yearbook_ from the Internet Archive.
# * You can start with the [PDF version](https://ia600904.us.archive.org/12/items/americanjewishye5671adle/americanjewishye5671adle.pdf) and run it through an OCR application like ABBYY FineReader to extract the data yourself.
# * Or you can start with the Internet Archive's [txt file](https://ia800904.us.archive.org/12/items/americanjewishye5671adle/americanjewishye5671adle_djvu.txt), which has already gone through OCR.
#
# __Step 2:__ Load your dirty dataset into OpenRefine and start transforming.
# * Keep in mind the original book uses abbreviations and shorthand — don't mistake these for OCR errors.
#
# 
#
# * Instead, look through the PDF file for keys to help you translate the data into plain English.
#
# 
#
# * Remember to reuse GREL expressions (and regular expressions, if you'd like) that other people have already created to save time and avoid headaches.
#
# __Step 3:__ Visualize your clean dataset in Tableau Public and share it.