by EM, NR, CR
The Franklin D. Roosevelt Presidential Library & Museum (FDR Library) houses the collection of press conference transcripts of Henry Morgenthau, Jr. while he served as Secretary of the Treasury from January 1934 through July 1945. The FDR Library microfilmed the approximately 15,000 pages of transcribed press conferences and made them available on Franklin, the library’s digital repository (Franklin, n.d.).
This project is a review of the six most frequent topics discussed in Morgenthau’s press conference transcripts from January 4, 1937, through December 30, 1937, as contained in the collection’s volumes 8 and 9 (CR). We tracked six of the most frequent topics in the transcripts: foreign exchange; gold; silver; treasury notes, bills, and bonds; taxes; and the Tripartite Monetary Agreement. The press asked the most questions about taxes (116) and gold (113). Morgenthau answered 48% of questions about these topics on average, answering questions about treasury notes, bills, and bonds at the highest rate (64%) and questions about gold at the lowest rate (39%) (EB). Results show that throughout the latter years of the Great Depression, a major concern of the population rested on gold and silver reserves; the stability of and U.S. involvement within foreign economies; and the use of taxes by the federal government (NR).
The main objective of this project, and the computational treatments used, was to discover Morgenthau’s most discussed topics during press conferences given throughout the 1937 calendar year. The purpose of this endeavor was to take a large amount of conference transcriptions, use various free-to-use tools, extrapolate data from the provided PDFs, and then to present our findings to aid future researchers in gaining new insights into the collection. Overall, volumes 8 and 9 contain a total of 705 pages, 26 of which contain an index to subjects discussed within the press conferences. A brief scan of these indexes can give the researcher a good starting point on where to find information on any given topic, person, or place they are looking for, but it is by no means comprehensive (NR).
Through our preliminary examination of the documents, we noticed that Morgenthau quite often would refuse to answer questions asked about each of the most discussed topics. We then asked ourselves, if there were specific topics he was less likely to answer, would knowing his percentage of answers vs non-answers provide any useful data to which researchers could draw conclusions upon. Much like our ‘unique’ response approach taken for finding the topics, we employed a similar method for this objective. We did not count each single line containing a ‘Q’ or an ‘A’ as a separate instance, but eliminated that which we considered clarification or repeat statements in order to better capture his response rates.
Each transcript begins with a title, in the format of “Report on Secretary Morgenthau’s Press Conference” followed by the date. Press questions are distinguished by a “Q” and Morgenthau’s answers by an “A” (EB).
In order to complete our primary objective, we decided that we would need to first set some parameters for our examination of the material. The group decided that we were going to only count what we considered to be unique mentions or statements made about the topics. This would not include every single time Morgenthau spoke the word, but instead, each unique idea or response presented pertaining to the topic. In not containing ourselves to simply counting how many times he said a specific word within his responses, we believed that we could provide a more complete picture of the most discussed topics throughout 1937 (NR).
There was a total of five tools/programs used to gather and present our data: Microsoft Excel, Convertio file converter, Voyant, DocDrop OCR converter, and Infogram’s chart creation software. By using Excel, we could individually record our data and combine our findings into a single spreadsheet. Excel’s ability to manipulate and manage substantial amounts of data made it an ideal program for our efforts to create datasets from our two volumes. Although this project only tackled a portion of the whole collection, Excel could be used to record and manipulate data for any number of volumes.
Convertio, Voyant, and DocDrop – three programs that can turn PDFs to OCR – were used to turn the Morgenthau collection’s PDFs into a searchable format. Convertio and Voyant provided text searchable PDFs, but failed to capture a majority of the words. We were aware that the PDFs were quite old and taken from reels of microfilm, which would give these conversion tools issues due to their inferior quality. We found that DocDrop, our other OCR conversion tool, provided us with higher quality conversions. DocDrop took around 10 minutes to convert each volume. Although it would take approximately 4.5 hours to use this for the complete collection, it did provide higher quality translations of the files, therefore, making it a better overall tool than the other two tools which were used.
Lastly, to present our data as visualizations, we used infogram.com. Infogram does have both a free to use and an upgraded version that requires a subscription. The free version, which was used for this project, is a tool that allows its users to create visualizations, through data entry that mirrors the Excel interface. That allowed us to easily transfer our data from Excel to Infogram and avoid having to convert our datasets into another format. Although its interface did not allow for as much overall data entry as Excel, it was more than sufficient for creating visualizations for our two volumes (NR).
To complete this project our group used a basic secondary data collection method, in which we examined the digitized transcriptions in order to extract our own data to assist future researchers in interpreting the records in a new light. We began by splitting the two volumes into more manageable sections, between the three researchers. By examining our separate sections, we compiled a list of the topics spoken on in each individual press conference. We found that some of the top topics of discussion included gold, silver, taxes, foreign exchange/commerce, Treasury Bonds, and the Tripartite Monetary Agreement. After discussing these topics, we chose to include Treasury Notes and Bills with Treasury Bonds, as they are all related forms of securities sold by the federal government. Also, during this initial examination of the records, we decided to also capture Morgenthau’s response rates per topic.
After finding our most common topics, we then concluded that further examination of our sections would be needed to gather specific data about these topics and their use. We decided to capture only what would be considered unique mentions of the chosen six topics. That excluded clarifying remarks or questions, and would only capture initial statements or answers touching on distinct aspects of the topic. Those same standards were also used when calculating Morgenthau’s response rates. The two questions for which we chose to capture data were whether or not Morgenthau was prompted on a topic by a particular question, and how many unique instances the topic was mentioned within the press conference; also, whether or not the questions on the topic were answered.
We individually recorded our results into Excel and combined that data into one spreadsheet once all of the transcriptions had been surveyed (Figure 1: Combined Datasets)
After this initial combining of the material, the data was then converted into a PivotTable, where the overall calculated totals could be viewed (Figure 2: PivotTable of Recorded Data).
The PivotTable offered insight into what topics were most popular each month, such as the discussion of taxes seeing a significant increase in March. The creation of the PivotTable in Excel also allowed the creation of a PivotChart in Excel, which provided the same information and the same customization as the PivotTable but as a visualization. As seen in Figure 3 (PivotChart of Collected Data), the PivotChart shows the sum of Morgenthau's unprompted remarks on a topic, the sum of the press's questions about a topic, and the sum of Morgenthau's answers to press questions.
After the PivotChart process was completed, the PDF transcriptions were converted into searchable text documents using DocDrop.org. Those were then combed by searching OCR hits for the chosen topics. They returned the following overall usage totals: gold, 275; bonds/notes/bills, 297; foreign exchange/commerce, 16; silver, 107; tax/taxes, 219; and the Tripartite Monetary Agreement, 48. While useful for understanding how many times these topics were mentioned within the text, they proved less useful in determining overall topic mentions. One of the major drawbacks of keyword searching the text was the incorporation of hits that did not fit within the topics themselves. An example of this was the usage of the terms as Treasury Bills and legislative bills, which both were often used with the single word, ‘bills.’ OCR could not distinguish between terms used in different contexts and in order to capture these distinctions, the manual examination was necessary to capture reliable data relating to our objectives.
To enhance the information gathered by this project, visualizations were then created to help researchers see trends and patterns found within the datasets. Our visualizations were created using infogram.com. They present the information by showing the answered versus non-answered questions by topic (Figure 4: Questions Answered vs. Not Answered, by Month) and the OCR hits versus the manually recorded unique mentions (Figure 5: OCR Hits vs. Unique Topic Mentions).
The Society of American Archivists (SAA) (2020) notes that their core values and code of ethics should be used together to “provide guidance to archivists and address and increase awareness of ethical concerns among archivists,” but they also acknowledge that both are aspirational and “not all members of the profession abide by these beliefs or guidelines.” As aspirational as the values and ethics may be, their guidance is also practical.
SAA lists several core values, among which, access and use, history and memory, and preservation are of greatest interest to this computational story. Access and use, SAA notes, means archivists “should promote and provide the widest possible accessibility of materials, while respecting legal and ethical access restrictions.” By providing the scans of the Morgenthau press conference transcriptions online, the FDR Library demonstrates a commitment to accessibility while also clearly stating that the collection is in the public domain (Franklin, n.d.). Regarding history and memory, SAA (2020) explains that “archival materials provide digital and physical surrogates for human memory…and serve as evidence against which individual and social memory can be compared.” This is of particular relevance to the Morgenthau press releases considering Morgenthau represented a department within the United States government. As for preservation, SAA insists that materials need to “be accessible for continued future use,” and the format chosen as a surrogate for that material is important for storage and accessibility. That is seen with the Morgenthau press conferences being first transcribed and then, later, microfilmed and placed online for access by a wider audience.
SAA’s (2020) code of ethics points out that archivists “should embrace principles that foster the transparency of their actions and that inspire confidence in the profession.” Out of the seven ethical areas SAA addresses, those of greatest interest to this computational story include authenticity, access, and use. Authenticity, SAA explains, requires archivists to “document the unique archival characteristics of records, including their intellectual, digital, and physical integrity,” while also not altering or destroying materials. The digital surrogates of the Morgenthau papers demonstrate authenticity by being unaltered microfilm of the transcribed press releases, revealing no missing pages within the two volumes this group reviewed. However, that rests on the presumption that the original typewritten records were unaltered before microfilming. Regarding access and use, SAA notes that archivists need to “strive to minimize restrictions and maximize ease of access” at the same time that they “seek to balance the principles of stewardship, access, and respect.” As was mentioned earlier regarding the preservation core value, the FDR Library is providing digital surrogates of the press conference transcriptions and clearly indicates that the records are in the public domain, thereby working toward minimizing restrictions and maximizing ease of access (CR).
After a visual review of Morgenthau’s press conferences, we selected topics which arose frequently. We then analyzed the transcript, asking the following questions:
We used our data to create visualizations comparing the frequency of Morgenthau’s unprompted mentions, press questions, and Morgenthau’s answers, with respect to each topic, and comparing the frequency of Morgenthau’s answers and non-answers over the course of the year. We also compared the total number of instances of each topic term throughout the transcripts with the number of unique conversations about each topic that we counted. The latter emphasizes the superior ability of human analysis to recognize context, compared with computer analysis of a text.
The data tells us how forthcoming Morgenthau was about the important topics and how his willingness to answer questions about the topics changed over the course of the year, and how interested the press (and presumably the American public) were in each topic. Our data could be of interest to researchers studying the Great Depression, FDR’s presidency, the economic and political conditions of the 1930s, or Morgenthau himself. There are many more questions that can be asked of our data:
There are yet more questions to be asked of our dataset which the data we collected does not address:
Those questions could be answered by converting the text to an OCR-capable form using a tool such as DocDrop and analyzing the text using a tool like Voyant, which has a Collocate Tool to find words which appear within proximity to each other (Villanova, 2020). The dataset is a rich source of historical information and improving the OCR on the digital versions on the FDR Library’s website would increase utility and improve user experience (EB).