The first dataset that we will examine is the data that comes from a record created (Form 26) by the War Relocation Authority (WRA) at the beginning of the Japanese American internment. This are records of individual and family "evacuations" to internment camps. The file WRAForm26.csv was supplied by Densho, and organization dedicated to "preserving, educating, and sharing the story of World War II-era incarceration of Japanese Americans".
Our first step in exploring this data is to load the file into a Pandas data frame.
# Loading the Data
import pandas as pd
# Read the CSV file into a Pandas data frame:
data_Form26 = pd.read_csv("Datasets/WRAForm26.csv")
Now that the data is loaded, the code below will show you the first three rows of index card data, as printed out by the Pandas head() function.
# Show the first three rows
data_Form26.head(3)
LastName | FirstName | BirthYear | |
---|---|---|---|
0 | AANAGAWA | MARY | 1927 |
1 | AAWATO | HISAKICH | 1889 |
2 | ABBEY | ROY | 1905 |
Next we call a few other Pandas Data Frame functions. For more information about Pandas data frames, see the documentation:
data_Form26.describe() # basic numeric and object stats
BirthYear | |
---|---|
count | 109192.000000 |
mean | 1912.535131 |
std | 18.914068 |
min | 1851.000000 |
25% | 1897.000000 |
50% | 1918.000000 |
75% | 1926.000000 |
max | 1946.000000 |
data_Form26.ndim # how many dimensions are there?
2
data_Form26.shape # how many row and columns are there? (length in each dimension)
(109192, 3)
data_Form26.dtypes # what data types are detected in each column?
LastName object FirstName object BirthYear int64 dtype: object
You can also summarize the data from individual columns, like this:
data_Form26['BirthYear'].value_counts() # Counts for each distinct value in "Year" column
1921 3847 1923 3507 1922 3486 1920 3274 1924 3100 1925 3088 1926 2732 1919 2701 1927 2428 1918 2397 1917 2251 1928 2242 1916 2098 1915 2095 1929 2012 1930 1856 1942 1779 1914 1761 1931 1760 1941 1729 1932 1665 1888 1600 1940 1566 1913 1545 1939 1521 1938 1507 1934 1502 1900 1484 1933 1445 1937 1442 ... 1907 837 1909 813 1908 805 1876 672 1875 569 1874 414 1873 348 1872 311 1871 203 1870 169 1869 153 1868 105 1867 101 1866 64 1865 54 1864 38 1863 18 1861 15 1859 13 1862 11 1860 9 1857 6 1856 5 1858 3 1852 3 1854 2 1855 1 1851 1 1946 1 1943 1 Name: BirthYear, Length: 93, dtype: int64
data_Form26.tail(3) # tail(3) is the opposite of head(3) and shows the last 3 rows
LastName | FirstName | BirthYear | |
---|---|---|---|
109189 | ZORIKI | MIKE | 1922 |
109190 | ZORIKI | JUDY | 1942 |
109191 | ZUICHO | FUMIO | 1887 |
(This area provided for you to record your own notes on the dataset.)
# Add your code here and create more cells if needed..
# Hint: Your first step is to load the FAR CSV file into another data frame.
(your notes)