Social Network Analysis

A Framework for Unlocking and Linking WWII Japanese American Incarceration Biographical Data

Creator: Emily Ping O'Brien

In this notebook, we examine relationships and events at Tule Lake using social network analysis. Social network analysis studies “the behavior of the individual at the micro level, the pattern of relationships (network structure) at the macro level, and the interactions between the two. Social networks are both the cause of and result of individual behavior” [1].

For this case study, social network analysis involves extracting data from WRA records and applying graph algorithms found within networking tools. NetworkX, an open source Python package used to create and study dynamic structures and functions of complex networks [2], was selected for its ability to integrate graphing and modeling functions directly in Jupyter notebooks. Another open source Python graphing library, Plotly, was explored to analyze the same data and verify findings.

The approach to leverage both the NetworkX framework and Plotly visualizations in Jupyter notebooks, provides a model for students and researchers to apply to their own archival science research.

Creators' Note

The creators of this notebook recognize the records used in this project were created and provided by the US government and therefore do not accurately convey the lives and experiences of the incarcerated Japanese Americans in the WWII US Concentration Camps. We also recognize some information from the federal records might contain personal, sensitive, or damaging information. Our work seeks to respect the privacy of individuals and their families, and approach this time in US history with humility and a willingness to learn.

Social Network Models using the NetworkX Python library

The first step involves pulling data from the transcribed National Archives "Internal Security Case Reports" Incident Cards. While certain information in the cards was not always consistently provided, what was available was transcribed was entered into the Incident Card dataset we are using for this notebook. In order to work with the data using Python, we create a dataframe from the imported dataset.

In [47]:
#Import Incident Card dataset and create Python dataframe

#Import python libraries and functions for the Social Network Analysis project
!pip install openpyxl
import pandas as pd
import networkx
import matplotlib.pyplot as plt
import numpy as np
import math
import as px
import plotly.graph_objects as go

from IPython.display import display, HTML

# Bokeh functions to create interactive network visualizations
from import output_notebook, show, save
from bokeh.models import Range1d, Circle, ColumnDataSource, MultiLine
from bokeh.plotting import figure
from bokeh.plotting import from_networkx

#Read the Incident Card csv file into a pandas dataframe
tlincard = pd.read_csv("WRA_incard_2021.csv",dtype=object,na_values=[],keep_default_na=False)
Requirement already satisfied: openpyxl in /opt/conda/lib/python3.8/site-packages (3.0.7)
Requirement already satisfied: et-xmlfile in /opt/conda/lib/python3.8/site-packages (from openpyxl) (1.1.0)

To simply things, we make a few changes to the dataframe. We rename fields and update values in certain fields to all lowercase letters. We also convert the associated values to strings. This helps us compare the data more effectively and avoid issues in the data that was inconsistently recorded (for example: the same individual's name with inconsistent use of capitalization).

In [48]:
#Rename dataframe fields, change all values to lowercase letters, make all values strings
pd.set_option('display.max_colwidth', None)

#Rename columns
tlincard.rename(columns={'NEW-DATE':'nd','CASE#':'case','Other':'other','NEW-OFFENSE':'noffense','OFFENSE':'offense','Image#':'inum', 'NAME':'name','Last Name':'lname', 'First Name':'fname','Other Names (known as)':'oname'}, inplace = True)

#Declare values as strings
tlincard['case'] = tlincard['case'].astype(str)
tlincard['other'] = tlincard['other'].astype(str)
tlincard['noffense'] = tlincard['noffense'].astype(str)
tlincard['offense'] = tlincard['offense'].astype(str)
tlincard['inum'] = tlincard['inum'].astype(str)
tlincard['lname'] = tlincard['lname'].astype(str)
tlincard['fname'] = tlincard['fname'].astype(str)
tlincard['oname'] = tlincard['oname'].astype(str)
tlincard['name'] = tlincard['name'].astype(str)

#Update values to all lowercase letters
tlincard['case'] = tlincard['case'].str.casefold().astype('category')
tlincard['noffense'] = tlincard['noffense'].str.casefold().astype('category')
tlincard['offense'] = tlincard['offense'].str.casefold().astype('category')
tlincard['inum'] = tlincard['inum'].str.casefold().astype('category')
tlincard['lname'] = tlincard['lname'].str.casefold().astype('category')
tlincard['fname'] = tlincard['fname'].str.casefold().astype('category')
tlincard['oname'] = tlincard['oname'].str.casefold().astype('category')
tlincard['name'] = tlincard['name'].str.casefold().astype('category')

#Create copies of the data frame
tlincard_v1 = tlincard.copy()
tlincard_v2 = tlincard.copy()

Example 1: Incidents involving Incarcerated Individual

To view all incidents involving a particular individual at the Tule Lake Camp, we create a simple graph containing edges and nodes from the dataframe using the NetworkX .from_pandas_edgelist and .draw functions. In this example, we view all reported incidents involving Tetuso Abe during his time at the camp. The center node represents the incarceree, and the outer nodes contain the descriptions of the reported incidents.

In [3]:
#Slice from dataframe and selecting cards where name of incarcerated individual is Tetsuo Abe
model1 = tlincard_v1.loc[(tlincard_v1["name"]=='abe, tetsuo')]
model1 = networkx.from_pandas_edgelist(model1,'name','other')

#Plot simple graph of all of the incident descriptions that Tetsuo Abe was accused of

#Differentiate node colors and sizes based on whether the value of the node is equal to "abe, tetsuo"
color_map = []

for node in model1:
    if node == "abe, tetsuo":
#Add axes functions to display title
ax = plt.gca()
ax.set_title('Incidents involving Incarcerated Individual')

#Draw nodes and edges, indicate font size
networkx.draw(model1, with_labels=True, node_color=color_map, node_size=node_sizes, width=.3, font_size=10, ax=ax)
_ = ax.axis('off')

Interactive Visualization using Bokeh

The same graph can be interactive by calling the Bokeh Library. Hovering over each node will display the labels.

In [5]:
# Function to display Bokeh interactive visualizations

#Slice from dataframe and selecting cards where name of incarcerated individual is Tetsuo Abe
model1 = tlincard_v1.loc[(tlincard_v1["name"]=='abe, tetsuo')]
model1 = networkx.from_pandas_edgelist(model1,'name','other')

#Title of graph
title = 'Incidents involving Incarcerated Individual'

#Categories that will appear when hovering over each node
HOVER_TOOLTIPS = [("Notes","@index")]

#Create plot - set dimensions, toolbar, and title
plot = figure(tooltips = HOVER_TOOLTIPS, 
              tools="pan,wheel_zoom,save,reset", active_scroll='wheel_zoom',
              x_range = Range1d(-10.1, 10.1), y_range=Range1d(-10.1, 10.1), title=title)

network_graph = from_networkx(model1, networkx.spring_layout, scale=10, center=(0, 0))
network_graph.node_renderer.glyph = Circle(size=15, fill_color='violet')
network_graph.edge_renderer.glyph = MultiLine(line_alpha=0.5, line_width=1)


Loading BokehJS ...

Example #2: Individuals Involved in 11/4/43 "Riot"

In the next example, we focus on the date November 4th, 1943, when a large congregation of incarcerated Japanese Americans at Tule Lake was deemed a “riot” by government officials. On that day, as groups began to gather, the project director of the incarceration camp called in the army and the inmates “were picked up by WRA internal security and savagely beaten before being turned over to the military police and imprisoned in ‘bullpen’ area of the hastily assembled stockade” [3].

Fig. 1

Newspaper article about protests at Tule Lake Camp in November, 1943 which led to martial law being in place for three months. Article describes camp administration refusing to listen to incarceree demands and violence against staff.

Revolt at Tule Lake: Crop workers blamed for riots; Honolulu Japanese led the mob, 1943-11-04

*Note.* Willard E. Schmidt Papers, Courtesy of San Jose State University Library Special Collections and Archives, photograph, accessed from Densho Digital Repository. [4]

Almost all of the Incident Cards were given a category for the type of reported incident. To view all cards, and therefore individuals, related to the "riot" event we locate rows in the dataframe where the category is equal to "riot". Since we know the date of the event, we also target the records where the date recorded is 11/4/43. From what we know about this incident, we expect the graph to be quite large. Based on this expectation, we can increase the figure size, decrease the size of the nodes, and reduce the font size for the labels.

In [6]:
#Use incident card dataset and select cards where name of incarcerated individual is Tetsuo Abe
model2 = tlincard_v1.loc[(tlincard_v1["noffense"]=='riot') & (tlincard_v1["nd"]=='43-11-04')]
model2 = networkx.from_pandas_edgelist(model2,'noffense','name')

#Plot simple graph of all of the incident descriptions that Tetsuo Abe was accused of

#Differentiate node colors and sizes based on whether the value of the node is "riot"
color_map = []

for node in model2:
    if node == "riot":

#Add axes functions to display title
ax = plt.gca()
ax.set_title("Incarcerated Individuals Reportedly Involved in 'Riot'")

#Draw nodes and edges, indicate font and edge color
networkx.draw(model2, with_labels=True, node_color=color_map, node_size=node_sizes, edge_color = 'dimgrey', width=.1, font_size=6, ax=ax)
_ = ax.axis('off')