by JB, KK, JO Group Bad Data Battalion
In 1934, Henry Morgenthau, Jr. was appointed Secretary of the Treasury by President Franklin D. Roosevelt. Morgenthau used this position to investigate organized crime and government corruption, but the federal law enforcement system was fragmented and uncoordinated (Wikipedia, 2022a). Morgenthau's investigations eventually led to the prosecution of Al Capone, and political bosses such as Thomas Pendergast and Charles Carrollo (Reppetto, 2005, p. 195).
President Roosevelt was not in support of Prohibition and on December 5th, 1933 the 21st Amendment was enacted, which repealed the 20th Amendment and legalized the manufacture, sale, and consumption of liquor in the United States once again. In order to help with the new legal alcohol industry, Roosevelt used an executive order to create the Federal Alcohol Control Administration (FACA). The Department of Agriculture and Department of the Treasury helped guide the process until the Federal Alcohol Administration (FAA) Act went into effect in 1935. The Department of the Treasury established itself as a vital component of alcohol regulation, upholding the sentiment of the FAA Act into the present day while allowing the FAA to operate independently within the Treasury Department (TTB, 2013). Morgenthau was intent on putting a stop to illegal substances’ manufacture, sale, and import into the United States. In 1934 he went on record to state that “any method will be used to get dope peddlers, smugglers, etc.” (FDR Library, 2022) in one of the three collection index cards under the subject ‘Wire Tapping.’
Largely due to Prohibition’s failure to enforce its laws, Morgenthau elected to combine Treasury agencies in a way that concentrated efforts to stop the import of illegal alcohol-related substances as well as narcotics, which is reflected in our dataset as early as 1934. The outcome was the creation of the Committee for the Coordination of Treasury Law Enforcement Activities in 1935, made up of the leaders and sub-leaders of the Coast Guard, the Customs Service, the Alcohol Tax Unit of the Bureau of Internal Revenue, the Bureau of Narcotics, and the Secret Service (Phillips, 1963, pp. 369-370).
The Federal Bureau of Investigation (FBI) established a national police force in 1935 and J. Edgar Hoover was put in charge of this force in 1936. Hoover was not one to share power and feared that the consolidated Treasury agencies would overshadow his agency – the FBI. Roosevelt, as well as much of the rest of the country, was increasingly terrified of communism and felt that communist sympathizers had no place in the United States. Great measures were put in place to investigate government employees’ loyalties, especially pertaining to paying income tax.
The Treasury years between 1937 and 1945 focused heavily on tax evasion and taxation, the former being the reason for Al Capone’s imprisonment in 1931. In 1937, President Roosevelt addressed Congress on the topic of tax evasion. It was reported that large numbers of income tax went unreported as evidence from a 1936 study conducted by Morgenthau. Secretary Morgenthau wrote a letter to President Roosevelt outlining eight types of tax avoidance, which he felt were “sufficient to show that there [was] a well-defined purpose and practice on the part of some taxpayers to defeat the intent of Congress to tax incomes in accordance with ability to pay” (Roosevelt, “Message to Congress” 1937). Morgenthau ended his letter to the President by stating that he felt Congress would make the right decision and give the Treasury the authority to complete the tax evasion investigation, which ultimately increased the power of the federal government, a common theme during Roosevelt’s administration. In order to keep the country moving after the Great Depression, Morgenthau believed in the importance of reducing the overall deficit by increasing taxes for individuals that could afford it (Wikipedia, 2022b).
Morgenthau later donated his 840-volume diary and press conference transcripts to the Franklin D. Roosevelt Presidential Library & Museum. The press conference index was used to visualize connections between said historical information and the mention of crime throughout Morgenthau’s press conference collection. (JO, KK)
Isabella Diamond, the Treasury Librarian, created a subject index of hundreds of Morgenthau’s press conferences beginning in 1936.
The digital press conference transcripts are arranged in a single, chronological series of 27 volumes that mimic the physical collection. Each volume begins with a title page featuring the volume number and its inclusive dates. The title page is followed by a volume-specific table of contents (TOC) that reflects the “various subject headings, sub-headings, and cross-references assigned by Isabella Diamond according to her custom schema” (Carter et al., 2022, p. 841). Diamond also created index card files according to this custom schema, which provide document-level access across the volumes by subject. Diamond’s index cards are arranged alphabetically and feature the following characteristics or anatomy (as shown in Figure 1):
Figure 1. An Example Index Card with its Characteristics Labeled.
After performing a collection analysis, the group determined that our primary research objective would be to manipulate the dataset of index cards to effectively determine patterns related to crime during Morgenthau’s tenure, and then create visual representations of the data to analyze patterns in crime that reflect the historical context of the collection.
The group used the following research questions to aid in the analysis of this collection:
To meet the research objectives, a five-phase research methodology was implemented (as shown in Figure 2). First, a collection survey was completed to determine what index cards in the collection were directly related to crime or criminal activities. Once all related cards were found, their raw data was extracted in the form of a text (.txt) file. Though the FDR Library used Adobe Acrobat Pro as an optimal character recognition (OCR) software for the index card PDFs, the action did not guarantee 100% accuracy when extracting the text from the cards (Carter et al., 2022). As such, a quality check (Phase 3) was performed on the text file to ensure that the extraction process was accurate. Our text file was then parsed and manipulated using OpenRefine, which is an open-source software package for cleaning, manipulating, and transforming data (Delpeuch, 2022). The last phase of our methodology, visualization, used the manipulated data from OpenRefine to create visualizations that aided in analyzing the data and allowed patterns to emerge that tied back to the background context for this collection. (JB)
Figure 2. The Five-Phase Research Methodology Process
The following table (Table 1) details the software and tools used to execute this research methodology. (JB, KK)
When the collection survey was complete and all crime-related index cards found, a PDF document was created that only featured our 55 cards. To aid in text recognition, a batch process in Adobe Photoshop was used to increase the contrast of each PDF page by +75. (JB)
After reviewing the OpenRefine documentation, it became clear that data manipulation inside the software was more efficient when the data were separated into columns. To mimic this column structure in the text file, each index card’s data was placed on one line and the desired columns separated by the | symbol. The following standard format was used for each card in the text file:
Figure 3 illustrates the difference between the raw four-line output of the index card data and the edited one-line output with column separator. (JB)
Figure 3. Increasing the Usability of the Text File by Condensing the Raw Four-Line Output into One Line and Designating Columns Using the | Symbol.
Figure 4. Import Settings.
The following steps detail the manipulation process conducted in OpenRefine: (JB)
Figure 12 illustrates the tidied data that resulted from the manipulation phase. The manipulations resulted in six columns, including the following:
Recall that the group decided to use the Year column for visualization purposes, and not the corresponding month and day. Our tidy data was exported from OpenRefine directly into an Excel file for visualization purposes.
Figure 12. Tidy Data in OpenRefine.
The following three visualizations resulted from the tidy data:
The first visualization is a word cloud, which “displays how frequently words appear in a given body of text, by making the size of each word proportional to its frequency” (The Data Visualisation Catalogue, 2022). The word cloud in Figure 13 was created using Word Cloud Generator, which is an open-source web software, and shows a visual representation of the subject headings of the crime-related index cards. The visual provides an efficient way to show how often the word “liquor” was used compared to another word, like “silver.”
Figure 13. A Word Cloud Generated from the Crime-Related Index Cards.
The next visualization is a pie chart (Figure 14), which helps “show proportions and percentages between categories by dividing a circle into proportional segments” (The Data Visualisation Catalogue, 2022). This type of chart was chosen to provide the viewer with a quick idea of the proportional distribution of the crime-related index cards. It is important to note that this pie chart features percentages of crime categories that were assigned by the group to increase comprehension of the visualization. The subject entries that make up each crime category are explained in the Data Analysis section below.
Figure 14. Percentage of Assigned Categories of Crime.
A histogram “visualizes the distribution of data over a continuous interval” (The Data Visualisation Catalogue, 2022). As such, this type of graph was chosen because the group wanted to determine the distribution of crime by year. Our histogram (Figure 15) displays the number of crime-related index cards per year.
Figure 15. Number of Crime-Related Index Cards Per Year
The two Series 1 index cards in 1936 with the subject entry of ‘Agency-Related’ reflect the combination of these five agencies’ resources to enforce the country’s laws. Also in 1936, a shift occurred in the focus for illicit vices as Morgenthau publicly announced that alcohol smuggling had been wiped out and opiate narcotics, such as heroin and opium, must be stopped (Schaffer Library of Drug Policy, 2022).
The first visualization presented is a word cloud, as shown in Figure 13, that demonstrates the frequency of the index card’s subject entry terms. It is clear that ‘liquor’, ‘evasion’, and ‘tax’ are the three most prominent terms, which coincides with the general two categories of this dataset; organized crime and government corruption.
The group attempted to visualize the subject entry categories on a pie chart, but the number of unique entries resulted in a messy visual; therefore the group decided to assign categories to various entries that were related to the same type of crime and present those categories on a pie chart (Figure 14). In some cases, descriptions were added to the index cards that enabled further categorization of the already labeled subject entries to see the larger patterns that were present. Table 3 below details which subject entries fit into each crime category.
Figure 14 helps show the frequency of occurrence for the assigned categories in the dataset. It is clear that alcohol-related index cards consume almost half (45.5%) of the total number of cards; however, the percentages of tax and investigations, 18.2% and 14.5% respectively, are also substantial. The investigations being referred to are in regard to tax evasion crimes as Morgenthau fought against government corruption and strongly believed in the 16th Amendment.
When the frequency of crime-related index cards is plotted over time (Figure 15), the year 1936 has the greatest frequency at 17 index cards. The year 1936 seems to be a transition point where the Treasury shifts their focus from alcohol and drugs to income tax evasion. It was during this time that President Roosevelt delivered an address to Congress regarding the seriousness of tax evasion, and Morgenthau strongly supported consolidating five agencies under one umbrella to combat the issues of organized crime and evasion of taxes. (KK, JO)
We practice with our computational story the Society of American Archivists (SAA) code of ethics and core values, specifically by:
The Press Conferences Collection was analyzed to determine patterns in the index cards related to crime. The group’s research objectives centered around the discovery of patterns in crime (speeches) during Morgenthau’s tenure as well as how such information coincides with historical context. The crime-related index cards were extracted from the collection, manipulated, and then visualized to view patterns in the data. Visualizations clearly show that the two most prevalent crime-related topics during Morgenthau’s tenure dealt with alcohol and taxes. Historical context supports that finding with Morgenthau’s position on tax evasion and the illegal manufacture of alcohol.
To expand upon our analysis, the group recommends the extraction of Series 2 Press Conference transcripts that are related to crime. Their added details will likely clarify inconsistencies in the index cards, and result in a more thorough data analysis. For example, (Diamond’s schema for) the index cards distinguish terms such as ‘Liquor’ from ‘Liquor, evasion of tax on’ and ‘Whiskey.’ Additional research needs to be done to determine if these categories should remain separate or be combined into an overarching category. For the sake of this project, these alcohol-related entries were combined into the crime category of ‘alcohol.’ A few more involved entries, such as ‘Tax, Evasion’ and ‘Political Activities,’ require more background research and support from the press conference transcripts.
While the data visualizations show a definite decline in crime after 1936, which correlates with the actions Morgenthau influenced in his campaign against corruption, more research would enumerate the specific actions and laws passed that helped reduce the instances of crime. Moreover, the types of crime in this dataset were limited, and it may be assumed that because of Morgenthau’s narrow interest on fiscal responsibility and the Treasury, other types of crime, such as violent crime, sexual assault and sex trafficking, domestic violence, child abuse, etc., were handled on a local level. (JB, JO, KK)