by JB, NL, AM
Our team of three completed a data modeling project exploring the 1933 portion of Volume 1 of the Henry Morgenthau Press Conference transcripts, housed by the Franklin D. Roosevelt Presidential Library and Museum. The transcripts are a part of the Morgenthau Project motivated to increase scholarly study of the original microfilm through digitization. Our team’s goal was to present our portion of the transcripts in a way that they can be better utilized in research. This portion of the transcripts cover the two months in which Morgenthau was Acting Secretary of the Treasury, prior to his official appointment to the position. In this section of the transcripts, reporters were largely questioning the amount of power that Morgenthau held as Acting Secretary.
Before November 15, 1933, Morgenthau acted as governor of the Federal Farm Board, which influenced the topics most discussed during his later time as Acting Secretary of the Treasury. With that knowledge we then searched for the most common topics discussed with reporters during the last two months of 1933, finding that he spoke often of the relationship between his work with the Federal Farm Board and the U.S. Treasury during Roosevelt’s presidency. Using a cleaned CSV of the transcripts we were able to visualize how often Morgenthau spoke on those main topics.
During Morgenthau’s time as Acting Secretary of the Treasury in 1933, the most common topics of discussion in 1933 were “gold”, “silver”, “deposit insurance”, and “farm credit administration”. We explored how these press conference topics were tied to Morgenthau’s two positions during 1933 and how they affected his later decisions as Secretary of the Treasury. Our key content patterns show his progression between the two positions, and how his relationship with FDR and his personal background influenced his decisions during FDR’s presidency.
To determine the content and scope of the transcripts, we first used OCR technology on the original PDF scans to determine their content and create a file that can be cleaned and analyzed using OpenRefine. The PDF scans are not optimized for use with OCR and so required extensive human input once converted in order to analyze the data created. This process is detailed below.
The original PDF was first scanned using OCR to pull out as many recognizable words as possible, in order to reduce the amount of time spent manually transcribing the text. The team used two different methods to convert the files, to determine which OCR program was best suited to this endeavor.
In the first conversion (by AM), the PDF was uploaded to http://docdrop.org/ocr and the “Redo OCR” function used before downloading the final PDF. This PDF was then uploaded to Convertio.co to create a CSV file that could be uploaded to OpenRefine. This resulted in a very garbled CSV file that had to be manually cleaned to reflect only the desired portion of the PDF scans before it could be analyzed. The cleaned CSV enabled us to visualize how often Morgenthau spoke on main topics.
The goal of the second conversion (by NL) was to obtain a CSV or TXT file for OpenRefine, using the intermediate step of JPEGs in Google Docs (converted from PDFs via a free trial from a website called Open Conversion). The JPEGs were then moved into Google Drive, each converted manually into a Google Doc, and those contents were manually merged into one, a long and laborious step. The automated OCR in Docs was used to convert the singular transcript into a text file to be uploaded to OpenRefine for cleaning and analysis. The transformation created a more accurate CSV file for keyword isolation, while allowing us to select the pages that we wanted to analyze.
We used the main keywords found in our given transcripts (Figure 1) to create a visual in Figure 2 of the topics mentioned most often by Morgenthau during these initial press conferences. Besides “Treasury,” the four keywords that appeared most often were “gold”, “silver”, “deposit insurance”, and “Farm Credit Administration”.
OpenRefine allowed terms to be organized and allowed us to view how many times each of these terms was used. Looking at the information contained in the data sets this way, allowed us to determine which topics were most prevalent during Morgenthau’s time in office. This in turn led to deeper understanding of the significance of events addressed in Morgenthau’s interviews. As shown in our data model (Figure 3), Morgenthau relied on a select few topics during the course of his interviews with the press. In his goal of regulating key portions of the market through government intervention, Morgenthau helped pave the way for FDR’s New Deal strategies.
The largest ethical concern was validity and accuracy, as there is room for both human and computer error in converting the dataset. While use of technology reduces the need for some manual entering of data, it still required the team to review the results of the OCR program and make changes where necessary. Our team spent time cleaning the data before creating a visual data model to ensure that the results were accurate.
The second focus of our work was to create an end result that is usable for research. This meant that the end product had to accurately reflect the contents of the transcripts, present a data model that shows Morgenthau’s key topics in his press conferences, and tie the content of the documents to the broader historical context. The contents of the documents in the Morgenthau Project are relevant to present-day, in that they show Morgenthau’s part in Roosevelt’s New Deal as well as the resulting economic impact. The documents may also be useful for other research, as Morgenthau was the only Jewish member of Roosevelt’s cabinet, and was in close fellowship with Roosevelt during his time as President. Many of Morgenthau’s actions as Secretary of the Treasury were influenced by the strong opinions of the public in regards to his cultural and religious heritage.
The team found that Morgenthau used his experience with the Federal Farm Board and his own personal beliefs to inform his decisions as Secretary of the Treasury. He continued the work begun by his predecessor, William H. Woodin, allowing the Treasury to regulate the price of gold and the value of War Bonds. The War Bond System became Morgenthau’s prized project in future years. These first few months acting as Secretary during Woodin’s decline in health were the springboard for Morgenthau’s influence on Roosevelt’s New Deal. Despite his disagreeances with FDR’s economic plan, Morgenthau was extremely influential to the future of the American economy. (MF)