Step - 2: A Framework for Unlocking and Linking WWII Japanese American Incarceration Biographical Data - Context Based Data Manipulation and Analysis - Part 2¶

The focus of this module will be manipulating the geographical data collected in part 1 to explore a variety of structures for visualizing spatial data.

The actions taken in part 1 to locate the place of origin, assembly center, camp relocations, residence at Tule Lake and the final movement for George Kuratomi were repeated for the other 24 selected individuals and aggregated into an Excel spreadsheet which can be seen below. Following this, the latitude and longitude coordinates were added for all five movements for the 25 individuals. A separate spreadsheet, included below, was also created in Excel which structures the latitute and longitude coordinates in a format that will allow us to map out the paths of each person. ***Note:The creation of the paths spreadsheet can be done through python but would require a lot of manipulation of the data to structure it in a useful way, as a result and for ease the data was formatted in an excel spreadsheet.

To begin working with the geographical data, the spreadsheet(s) should be saved as a comma separated value (.csv) file(s) then read into the jupyter notebook following the same process as in part 1. This process can be seen below. **Note: Excel files can also be read in, it's a matter of personal preference.

In [1]:

# Import libraries used for dataframe (table-like) operations, and numeric data structure operations
import pandas as pd
import numpy as np

In [2]:

# The below command will read your file into your notebook 
fullstackeddf = pd.read_csv('python-fullmovements-stacked.csv',dtype=object,na_values=[],keep_default_na=False)
pathsdf = pd.read_csv('python-paths.csv',dtype=object,na_values=[],keep_default_na=False)

Creation of Points¶

Spatial data is geographic information about the earth and typically references a specific geospatial area or location. To perform any kind of spatial analysis a dataset must include the latitude and longitude coordinates. Additional elements added like name, city, state, dates, etc. give rise to exploring other visualizations and also provide more context about the dataset.

The headers for this dataset include, Name, lat, long, city, state, order, dates, fid, and notes, as shown below.

In [3]:

# The below command shows the first ten rows of the dataset
fullstackeddf.head(10)

Out[3]:

	name	lat	long	city	state	order	dates	fid	abbrev	Notes	iso_alpha	iso_no
0	george kuratomi	32.7157	-117.1611	san diego	california	origin		1	CA		USA	840
1	george kuratomi	34.1333	-118.0333	santa anita	california	assembly	1942-10-30	1	CA		USA	840
2	george kuratomi	33.3833	-91.4667	jerome	arkansas	first camp	1943-09-26	1	AR		USA	840
3	george kuratomi	41.8931	-121.3735	tule lake	california	second camp	1943-09-30	1	CA		USA	840
4	george kuratomi	40.7300	-77.9380	pennsylvania	pennsylvania	final departure	1946-01-10	1	PA	terminal departure with grant	USA	840
5	tom (yoshio) kobayashi	34.0522	-118.2437	los angeles	california	origin		2	CA		USA	840
6	tom (yoshio) kobayashi	34.1404	-118.0442	santa anita	california	assembly	1942-09-04	2	CA		USA	840
7	tom (yoshio) kobayashi	44.5167	-109.0501	heart mountain	wyoming	first camp	1943-09-27	2	WY		USA	840
8	tom (yoshio) kobayashi	41.8814	-121.3556	tule lake	california	second camp	1943-09-30	2	CA		USA	840
9	tom (yoshio) kobayashi	46.8000	-100.7833	north dakota	north dakota	final departure	1945-02-11	2	ND	terminal internment	USA	840

To view all reported movements of one person, we can use Python's contains function to return results of that particular individual. The data is already structured in a way that will make it easy to explore and plot locations for other indivdiuals or groups.

In [4]:

# The contains function can pull results specific to a name 
kuratomi = fullstackeddf[fullstackeddf['name'].str.contains('kuratomi')]
kuratomi

Out[4]:

	name	lat	long	city	state	order	dates	fid	abbrev	Notes	iso_alpha	iso_no
0	george kuratomi	32.7157	-117.1611	san diego	california	origin		1	CA		USA	840
1	george kuratomi	34.1333	-118.0333	santa anita	california	assembly	1942-10-30	1	CA		USA	840
2	george kuratomi	33.3833	-91.4667	jerome	arkansas	first camp	1943-09-26	1	AR		USA	840
3	george kuratomi	41.8931	-121.3735	tule lake	california	second camp	1943-09-30	1	CA		USA	840
4	george kuratomi	40.7300	-77.9380	pennsylvania	pennsylvania	final departure	1946-01-10	1	PA	terminal departure with grant	USA	840

Similiarly, as illustrated above the contains function can also return results for distinct cities, states, orders, and dates.

This is particularly useful for exploring and viewing the data through a different lens, especially if you want to analyze where individuals or groups were on a particular date or location. In the example table below, the contains function was used to pull data that contain 'california'. When mapped the result will show points for individuals where their location, point of origin, assigned assembly center, first and/or second incarceration center, and final departure state, was in California.

In [5]:

# The below command displays values that contain california
california = fullstackeddf[fullstackeddf['state'].str.contains('california')]
california

Out[5]:

	name	lat	long	city	state	order	dates	fid	abbrev	Notes	iso_alpha	iso_no
0	george kuratomi	32.7157	-117.1611	san diego	california	origin		1	CA		USA	840
1	george kuratomi	34.1333	-118.0333	santa anita	california	assembly	1942-10-30	1	CA		USA	840
3	george kuratomi	41.8931	-121.3735	tule lake	california	second camp	1943-09-30	1	CA		USA	840
5	tom (yoshio) kobayashi	34.0522	-118.2437	los angeles	california	origin		2	CA		USA	840
6	tom (yoshio) kobayashi	34.1404	-118.0442	santa anita	california	assembly	1942-09-04	2	CA		USA	840
...	...	...	...	...	...	...	...	...	...	...	...	...
118	yukio kobayashi	41.8936	-121.3678	tule lake	california	second camp	1943-09-30	24	CA		USA	840
120	kazuo uneda	33.8910	-118.3010	gardena	california	origin		25	CA		USA	840
121	kazuo uneda	38.5737	-121.4945	sacramento (walerga)	california	assembly	1942-06-21	25	CA		USA	840
122	kazuo uneda	41.8904	-121.3721	tule lake	california	first camp		25	CA		USA	840
123	kazuo uneda	41.8936	-121.3590	tule lake	california	second camp		25	CA		USA	840

74 rows × 12 columns

Creation of Cluster Data¶

So far we've only looked at and constructed tables that plot points on a map. An alternative approach for viewing our data is to create tables that show the relative size or cluster of given variables.

Clustering data to view relative sizes is important for performing surface level analysis and can give us a better understainding of where large groups were concentrated at each time point.

To view the number of individuals at each location for all movements, we can apply the value_counts method introduced in part 1 to return counts of unique values. As seen below, the list does include a couple of states in the cities column such as California, Pennsylvania, North Dakota, New Mexico, and Hawaii. This was strategically done for a few of the 25 individuals due to missing data in the FAR as a result it was unclear of their final departure city. Additionally, if these cells in the spreadsheet were left blank then they would be counted as a unique value when the value_counts function is performed, and in this specific case we do not want that value included.

In [6]:

# The below command returns count values of the cities
fullstackeddf['city'].value_counts()

Out[6]:

tule lake               30
jerome                   9
california               8
sand island              7
new mexico               7
santa anita              6
oahu                     6
topaz                    5
los angeles              4
heart mountain           3
pennsylvania             3
sacramento (walerga)     3
sacramento               3
north dakota             2
salinas                  2
fresno                   2
tanforan                 2
manzanar                 2
hawaii                   2
gardena                  2
san francisco            2
tokyo                    2
none                     1
terminal island          1
menlo park               1
artesia                  1
rohwer                   1
auburn                   1
pomona                   1
tulare                   1
poston                   1
gila river               1
san diego                1
garden grove             1
waikele                  1
Name: city, dtype: int64

Once the unique value count is retrieved then the values need to be appended (i.e., added) to the table. This can be achieved by using Pythons groupby function.

In [7]:

# The below command groups the city and order into a new column we titled as counts
fullstackeddf['counts'] = fullstackeddf.groupby(['city'])['order'].transform('count')
fullstackeddf.head()

Out[7]:

	name	lat	long	city	state	order	dates	fid	abbrev	Notes	iso_alpha	iso_no	counts
0	george kuratomi	32.7157	-117.1611	san diego	california	origin		1	CA		USA	840	1
1	george kuratomi	34.1333	-118.0333	santa anita	california	assembly	1942-10-30	1	CA		USA	840	6
2	george kuratomi	33.3833	-91.4667	jerome	arkansas	first camp	1943-09-26	1	AR		USA	840	9
3	george kuratomi	41.8931	-121.3735	tule lake	california	second camp	1943-09-30	1	CA		USA	840	30
4	george kuratomi	40.7300	-77.9380	pennsylvania	pennsylvania	final departure	1946-01-10	1	PA	terminal departure with grant	USA	840	3

Creation of Paths¶

By using the paths dataset we can spatially view and analyze the movement of a person or group in a unique way. Mapping paths lets us connect points plotted on the map and visually see the routes and the distances between locations. We can use the paths data to identify if and where indvidual paths cross allowing us to glimpse where individuals might have met or at what point families were separated from one another.

As seen below, the contains function can be used to extract path data from one or more persons. The pandas operator "|" aka "OR" tells the contains function to also search and pull specific value from separate columns.

In [8]:

# The below command will return path results for Kuratomi
kuratomipaths = pathsdf[pathsdf['name'].str.contains('kuratomi')]
kuratomipaths

Out[8]:

	startlat	startlong	endlat	endlong	name	loc1	loc2	uid	dates	year	iso_alpha
0	32.7157	-117.1611	34.1333	-118.0333	george kuratomi	san diego	santa anita	1	1942-10-30	1945	US-CA
1	34.1333	-118.0333	33.3833	-91.4667	george kuratomi	santa anita	jerome	1	1943-09-26	1943	US-CA
2	33.3833	-91.4667	41.8931	-121.3735	george kuratomi	jerome	tule lake	1	1943-09-30	1943	US-AR
3	41.8931	-121.3735	40.7300	-77.9380	george kuratomi	tule lake	pennsylvania	1	1946-01-10	1946	US-CA

In [10]:

# The below contains function will allow for searching through the data for two separate variables
kuratomiandterada = pathsdf[pathsdf['name'].str.contains('kuratomi')| pathsdf['name'].str.contains('terada')]
kuratomiandterada

Out[10]:

	startlat	startlong	endlat	endlong	name	loc1	loc2	uid	dates	year	iso_alpha
0	32.7157	-117.1611	34.1333	-118.0333	george kuratomi	san diego	santa anita	1	1942-10-30	1945	US-CA
1	34.1333	-118.0333	33.3833	-91.4667	george kuratomi	santa anita	jerome	1	1943-09-26	1943	US-CA
2	33.3833	-91.4667	41.8931	-121.3735	george kuratomi	jerome	tule lake	1	1943-09-30	1943	US-AR
3	41.8931	-121.3735	40.7300	-77.9380	george kuratomi	tule lake	pennsylvania	1	1946-01-10	1946	US-CA
24	34.0430	-118.2190	34.1396	-118.0430	singer terada	los angeles	santa anita	7	1942-10-30	1942	US-CA
25	34.1396	-118.0430	33.6284	-91.3957	singer terada	santa anita	jerome	7	1943-09-15	1943	US-CA
26	33.6284	-91.3957	41.8866	-121.3575	singer terada	jerome	tule lake	7	1943-09-19	1943	US-CA
27	41.8866	-121.3575	40.6230	-77.8520	singer terada	tule lake	pennsylvania	7	1946-01-10	1946	US-AZ

In this second module, I have shown how to use Pythons contains function to search and pull values from separate columns within datasets using the "|" aka OR operator. We used the value_counts function to return the number of individuals located at each city in our dataset which will let us view the concentration of groups. Additionally, we filtered out the paths dataset to view results for George Kuratomi as well as paths for Singer Terada which will be saved and used in part 3.

In the following module, we will look at how to use the datasets we created outside of the notebook as well as the data that we processed and prepared in this module to create spatial and graph visualizations.

In [ ]:

# The below command let's us save the modified dataframes into a new output csv file. 
# This can be useful when using these files for further steps of processing.
kuratomiandterada.to_csv('kuratomiandterada.csv', index=False)

Notebooks¶

The below module is organized into a sequential set of Python Notebooks that allows us to interact with the collections related to the Framework for Unlocking and Linking WWII Japanese American Incarceration Biographical Data to explore, clean, prepare, visualize and analyze it from historical context perspective.

A Framework for Unlocking and Linking WWII Japanese American Incarceration Biographical Data - Data Visualization
A Framework for Unlocking and Linking WWII Japanese American Incarceration Biographical Data - Context Based Data Manipulation and Analysis - Part 1