The 1911 Charlotte City Directory is available on the Internet Archive. To recreate this exercise from scratch, you can download a PDF version of the directory and run it though an optical character recognition (OCR) program such as ABBYY FineReader to start working with the rough data. Or you can start where I did, with this CharlotteStarterData csv file. This file has been manipulated so the data is partially cleaned up from the original OCR output. You can upload all 15,711 rows into OpenRefine, where it will look like this:
I used GREL (General Refine Expression Language) expressions to query the data and identify the records for places of worship. Searching for the term "church" is a good start, but it doesn't capture all of the relevant listings. Pages 22-23 in the 1911 Charlotte City Directory include a list of churches that I used as a starting point to search for keywords, including "chapel," "tabernacle," "salvation army," and "brotherhood." Below is one of the expressions I used to indicate that every entry in the "Orig-copy" column with the word "chapel" in it should be marked as a church.
After using the above expression for each of the keywords, I had my universe of 79 relevant records. I also added a new column named "Church" to make it easier to find these records.
Note: While I queried keywords for other types of organizations, including synagogues and mosques, the data only included information about churches.
With a dataset this big and this messy, there were bound to be some mistakes. After going through my 79 records, I realized that my queries had also captured personal directory listings for people associated with churches, such as pastors. (The image above shows records for both churches and church personnel.) I removed those entries from the "Church" column by removing any records with "Rev" in the "Title" column, which indicates a personal title.
While I tried to use bulk transformations as much as possible, there were a few instances where I had to edit records individually to get the entries as clean as I wanted them. After doing so, I ended up with 71 church records.
I used more GREL expressions to identify and note the denomination of each church; and to clean up the "Company," "Company Details," and "Address" columns to make sure every piece of information was in the appropriate place. Below is one of the expressions I used on the new "Church denomination" column to identify Methodist churches. If I had used this transformation on the full dataset, I surely would have transformed records I didn't mean to transform. But, because I used a facet to limit the data I was working on to the church-related records, I transformed only the records I intended to transform.
In total, it took over 40 transformations to clean up the data. You can see all of the transformations and recreate this result by applying this this JSON file to the CharlotteStarterData file in OpenRefine. The final dataset, CharlotteClean, is also available as a csv file.
A snapshot of a portion of the records is below, including the newly created columns and transformed columns.
Next: Tableau