by WK, MP, MS
In 1934, President Franklin Delano Roosevelt selected Henry Morgenthau, Jr. as the new Secretary of the Treasury for the United States of America. In his time as Secretary, a position he held from 1934 to 1945, Morgenthau made significant direct contributions to the United States’s economic welfare, specifically regarding aspects of the New Deal and the U.S. efforts during World War II. Throughout his tenure as Secretary of the Treasury, Morgenthau delivered hundreds of press conferences that were transcribed and then created into microfilm. The vast number of pages, totaling 15,000, were created into bound volumes that have since been photographed and digitized online (Franklin Delano Roosevelt Presidential Library and Museum, n.d.). Of the twenty-seven volumes, this story will focus on Volumes 10 and 11, which date from January 1938 to December 1938, to determine data patterns. Within these volumes, we chose to focus on both economic/currency-related and geographic (mostly country-specific) terms, both domestic and foreign, cross-reference how often they were mentioned, and compare the numbers between the two volumes and the year.
Our analysis seeks to uncover patterns regarding economic/currency-related and geographic (mostly country-specific) terms, both domestic and foreign, cross-reference how often they were mentioned, and compare the numbers between the two volumes. By using different tools to extract the data, we can determine patterns within the volumes and use visualizations to show the comparison between the two volumes. Our manipulations consisted of splitting the PDFs into smaller sets to adjust the contrast/brightness to pull OCR and transform the data into CSVs. OpenRefine used text filtering to pull terms relating to economic/currency and geographic places and then created visuals from the data to show the relation of these terms throughout 1938 that correlate to certain international developments. With a more comprehensive list of terms, future projects can analyze the data between the volumes, as well as the rest of the series (MS).
Historical Context: In 1938 the major events connected to our analysis were the effects of the Tripartite Agreement of 1936 on the price of gold (Faudot, 2022), the second Sino-Japanese War (U.S. Embassy & Consulates in China, 2018), and the Mexican oil expropriation of 1938 (U.S. Department of State, n.d.) (WK). Additionally, the U.S. economy was experiencing the end of a brief recession that lasted from mid-1937 to mid-1938 (Waiwood, n.d.), and WWII’s start in 1939 was not far off (Franklin Delano Roosevelt Presidential Library and Museum, n.d.) (MP).
Within the Morgenthau Press Conference series, Volumes 10 and 11 both consist of a little over 400 scanned pages in each volume. The initial data pull of each volume yielded inconsistent results, and due to the sheer size of the volumes, we were unable to pull data from the whole volume. Since the digital surrogates consisted of photographs of the original microfilm and then scanned to create a full PDF document, the contrast and quality of the volumes pulled data that was illegible. To create more consistency with the data extraction, the volumes were split into eighteen 50-page PDF files using Adobe Acrobat. Though the smaller PDF files produced more data results, the contents were still illegible (MS).
By using Adobe Photoshop, the files were adjusted for contrast and brightness to pull better OCR results. After the contrast and brightness adjustment, we ran the files through DocDrop, an online program that pulls optical character recognition in images and PDFs. Once each PDF was converted using OCR, the files were then converted to CSVs using Convertio to be inputted to OpenRefine in order to establish textual patterns throughout each volume (MS).
Once the eighteen 50-page PDF files were successfully processed in OpenRefine, the ability to begin working on manipulating the data and analyzing the text in Volumes 10 and 11 was finally established (See Figure 1: OpenRefine Data Results). In engaging in a cursory look at the contents of these two volumes to look for key words/terms in the processed text, we began by making a general list of words/terms that featured repeatedly to create a basis for beginning the analysis. Through this initial list, many economic- and geographic-related terms were noted. In discussing whether we should continue refining our research through a single list of terms or two, we ultimately decided to disaggregate the single list into two categories, economic/currency-related and geographic. The number of times each term was listed in each volume in each category was established with OpenRefine’s Text Filter function which allowed us to search for the words/terms we initially isolated from the processed data through brainstorming. After double-checking the numbers, these tallies were placed into Excel to begin further analyzing and later visualizing this data (MP).
For the visualization of the Excel data, an extra column was added to the table to show the frequency of the geographic and economic terms over 1938, or both volumes 10 and 11 together. The two tables of terms were converted to bar charts using Excel to show the relationships between the two volumes and the year. In Figure 2 (Geographic Term Occurrence) we see the most mentioned geographic term for volume 10 is “Mexico”, for volume 11 is “China” and for 1938 is “France”. Figure 3 (Economic Term Occurrence) indicates the most mentioned term in both volume 10 and 11, and in 1938 is “gold”.
In studying the figures and tables created from our data, correlations can be drawn from the most frequently occurring terms to some of the major economic and international events of 1938. The continuing effects of the Tripartite Agreement of 1936 between the U.S., France, and Great Britain, and later Belgium, Switzerland, and the Netherlands, to stabilize national currencies can be correlated to the frequent occurrence of the term “gold” (Faudot, 2022). The frequent occurrence of the term “Mexico” in volume 10 can be correlated to the Mexican oil expropriation on March 18, 1938, when “Mexican President Lázaro Cárdenas signed an order that expropriated the assets of nearly all of the foreign oil companies operating in Mexico” (U.S. Department of State, n.d.). The frequent occurrence of the term “China” in volume 11 can be correlated to the second Sino-Japanese War when “Japan launched an all-out offensive in China in the summer of 1937” and the subsequent aid the U.S. offered to China at that time and throughout the duration of the war (U.S. Embassy & Consulates in China, 2018) (WK).
The volumes of Morgenthau’s Press Conferences possess a numerous amount of historical information during a tumultuous time in American, as well as World, history. One of the key aspects of the Society of American Archivists is its dedication “to the selection, care, preservation, access to, and administration of historical and documentary records of enduring value for the benefit of current and future generations” (SAA, 2020).
The digitization of the volumes certainly provides future generations access, but the quality of the microfilm photographs that were then scanned into PDF files, proved to be a roadblock in extracting accurate data. Many of the words in the volumes were misspelled, and some faded over time, which corrupted the data's accurate pull. Given more time, better contrast and brightness adjustments to the PDF files could prove useful to these roadblocks, but it is a reminder of the consistent need for not just accessibility of the documents, but the digital quality of them as well (MS).
Through troubleshooting efforts and eventual processing of Volumes 10 and 11 (through Adobe technologies, DocDrop, Convertio, and finally OpenRefine), our group was able to better parse through the ~800 pages of Morgenthau’s 1938 press conference papers in order to analyze how his work as Secretary of the Treasury may have changed and responded to numerous events happening around the world. By using OpenRefine’s Text Filter and pulling key economic- and geographic-related terms from the processed data, we were able to create visuals that aided in the process of comparing Volume 10, which focused on the first half of 1938, to Volume 11, which focused on the second half of the same year.
In the future, a more extensive term list gathered from the processed OpenRefine data to expand the scope of the analysis could be conducted to improve this initial data modeling effort. We decided to limit the list of economic- and geographic-related terms to eight each to make the scope manageable as a starting point, but additional insights could likely be gleaned through a more comprehensive listing and analysis of key terms gathered from OpenRefine. Additionally, through possible future efforts, these 1938 analyses can be juxtaposed against adjacent volume analyses from 1937 and 1939.