Exploring Data from WRA Form 26¶

The first dataset that we will examine is the data that comes from a record created (Form 26) by the War Relocation Authority (WRA) at the beginning of the Japanese American internment. This are records of individual and family "evacuations" to internment camps. The file WRAForm26.csv was supplied by Densho, and organization dedicated to "preserving, educating, and sharing the story of World War II-era incarceration of Japanese Americans".

Our first step in exploring this data is to load the file into a Pandas data frame.

In [19]:

# Loading the Data
import pandas as pd

# Read the CSV file into a Pandas data frame:
data_Form26 = pd.read_csv("Datasets/WRAForm26.csv")

Now that the data is loaded, the code below will show you the first three rows of index card data, as printed out by the Pandas head() function.

In [20]:

# Show the first three rows
data_Form26.head(3)

Out[20]:

	LastName	FirstName	BirthYear
0	AANAGAWA	MARY	1927
1	AAWATO	HISAKICH	1889
2	ABBEY	ROY	1905

Next we call a few other Pandas Data Frame functions. For more information about Pandas data frames, see the documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/frame.html

In [21]:

data_Form26.describe()  # basic numeric and object stats

Out[21]:

	BirthYear
count	109192.000000
mean	1912.535131
std	18.914068
min	1851.000000
25%	1897.000000
50%	1918.000000
75%	1926.000000
max	1946.000000

In [22]:

data_Form26.ndim  # how many dimensions are there?

Out[22]:

In [23]:

data_Form26.shape  # how many row and columns are there? (length in each dimension)

Out[23]:

(109192, 3)

In [24]:

data_Form26.dtypes  # what data types are detected in each column?

Out[24]:

LastName     object
FirstName    object
BirthYear     int64
dtype: object

You can also summarize the data from individual columns, like this:

In [25]:

data_Form26['BirthYear'].value_counts()  # Counts for each distinct value in "Year" column

Out[25]:

1921    3847
1923    3507
1922    3486
1920    3274
1924    3100
1925    3088
1926    2732
1919    2701
1927    2428
1918    2397
1917    2251
1928    2242
1916    2098
1915    2095
1929    2012
1930    1856
1942    1779
1914    1761
1931    1760
1941    1729
1932    1665
1888    1600
1940    1566
1913    1545
1939    1521
1938    1507
1934    1502
1900    1484
1933    1445
1937    1442
        ... 
1907     837
1909     813
1908     805
1876     672
1875     569
1874     414
1873     348
1872     311
1871     203
1870     169
1869     153
1868     105
1867     101
1866      64
1865      54
1864      38
1863      18
1861      15
1859      13
1862      11
1860       9
1857       6
1856       5
1858       3
1852       3
1854       2
1855       1
1851       1
1946       1
1943       1
Name: BirthYear, Length: 93, dtype: int64

In [26]:

data_Form26.tail(3)  # tail(3) is the opposite of head(3) and shows the last 3 rows

Out[26]:

	LastName	FirstName	BirthYear
109189	ZORIKI	MIKE	1922
109190	ZORIKI	JUDY	1942
109191	ZUICHO	FUMIO	1887

My Dataset Notes (WRA Form 26)¶

(This area provided for you to record your own notes on the dataset.)

In [18]:

# Add your code here and create more cells if needed..

# Hint: Your first step is to load the FAR CSV file into another data frame.

My Dataset Notes (FAR)¶

(your notes)