A Data-Driven Approach to Reparative Description at the University of Chicago¶

Designed by Ashley Gosselar, 2022

Introduction¶

Reparative description of collections is a burgeoning element of diversity, equity, and inclusion efforts at cultural heritage institutions. Broadly speaking, reparative description aims to remediate or contextualize metadata about collections that exclude, silence, harm, or mischaracterize people. Reparative description practices strive to be accurate, inclusive, culturally competent, and respectful. Reparative description work can take may forms as illustrated in the graphic below from this presentation about reparative description given by the Native American Archives Section of the Society of American Archivists in 2021.

For many years, archivists at the University of Chicago Library’s Hanna Holborn Gray Special Collections Research Center have made inroads in the “Representation/Under-description” tier of the above illustration through a concerted effort to thoughtfully process and make accessible collections that represent diverse groups of people and organizations. However, due in part to severe staffing shortages, the Center has not invested significant time and resources in other tiers of the illustration. The Center maintains over 1600 online finding aids – a daunting amount of legacy metadata to review and remediate.

To meet this challenge, I proposed a data-driven approach to identify and prioritize finding aids for reparative description. While the work of editing a finding aid through a reparative description lens is not something that can be automated or sensitively done without careful, time-intensive attention by a human being, this project demonstrates that computational thinking can be applied to the planning stages of reparative description work. I tackled the question of “where do we begin?” by treating the Center’s 1600+ finding aids as data that can be harvested, transformed, and analyzed with computational methodologies.

I am not the first archivist to try this. By searching the Internet, archival literature, and GitHub, and by informally surveying 75 members of a reparative description Slack channel for processing archivists, I arrived at the following list of archival institutions and archivists who are using computer script to audit their finding aids for potentially biased and harmful language:

Princeton University Library, Special Collections. Code by Kelly Bolding.
Getty Research Institute’s Anti-Racist Description Working Group. Code by Laura Schroffel.
University of Pittsburgh Library, Archives and Special Collections. Code by Kayla Heslin.
Duke University Rubenstein Library. Code by Miriam Shams-Rainey.
Harvard University Houghton Library. Code by Vernica Downey.
Yale University Library. Code by Alicia Detelich Boersig.
University of California, Riverside. Code by Noah Geraci.

Factoring in limitations on my time, a lack of technical support, and my rudimentary understanding of XQuery, XSLT, regular expressions, and Python , I decided to not reinvent the wheel and write my own script. From this list, I chose Laura Schroffel’s Python script, “XML-Term-Detective” as my tool for scraping data from UChicago’s finding aids. I chose XML-Term-Detective because it utilizes a GUI and does not require a full understanding of Python to operate. I am grateful to Laura for graciously permitting me to use this code and publicly discuss my experience using it.

This project builds upon the scripting work established by others to include other data curation steps. Thanks to knowledge gained in the University of Maryland iSchool’s Digital Curation for Information Professionals certificate program, I used a suite of data curation tools to clean, enhance, visualize, and analyze my data. In so doing, I was able to learn not just when and where biased or harmful language may occur in UChicago’s finding aids, but the scale and scope of the problem and which finding aids require the most remediation.

The following is an explication of my methodology and the inferences and recommendations I drew from the data. It is written with two audiences in mind: my colleagues at the University of Chicago who will use this report to begin the slow but important work of reparative description of our finding aids, and a more general audience of archivists with basic technology skills embarking on reparative description work at their own institutions. By utilizing the format of a Jupytr Notebook, I hope to make this project more easily reproducible and repurpose-able for others.

For beginners seeking a basic grasp of XQuery and XSLT, I recommend Library Juice Academy’s course Transforming and Querying XML with XSLT and XQuery. For an introduction to Python within a humanistic context, I recommend Melanie Walsh’s Introduction to Cultural Analytics & Python. For a crash course in regular expressions, command line, and Git, I recommend Library Carpentry workshops. I used each of these courses to expand my technical skillset and prepare for this project.

A Data-Driven Approach to Reparative Description at the University of Chicago¶

Table of Contents¶

Introduction¶

Methodology¶

Gathering the Data¶

The Lexicon¶

Added Terms¶

Imperfections in the Lexicon¶

Bias Bias Bias! ⚠️¶

Using XML-Term-Detective¶

Cleaning and Enhancing the Data¶

Categories¶

Analysis¶

Tableau¶

Neo4j¶

Overview of Tableau Visualizations for Full Dataset¶

Race, Ethnicity, Citizenship¶

Sexism¶

Aggrandizement¶

Slavery¶

Ableism¶

Colonialism¶

Incarceration and Forced Removal¶

Class¶

Recommendations¶

Workflow¶

Policy Setting and Editing Finding Aids¶

Resources¶

Examples¶

Appendix¶

XML-Term-Detective Files¶

Tableau¶

Neo4j¶