#!/usr/bin/env python
# coding: utf-8

# # Step - 3: Legacy of Slavery Certificate of Freedom - Data Visualization and Analysis
# ### Computing the Legacy of Slavery: Applying Computational Thinking to an Archival Dataset
# * **Student Contributors:** K. Sarah Ostrach, Natalie Salive, Olivia Isaacs
# * **Faculty Mentor:** Richard Marciano
# * **Community Mentor:** Ryan Cox (Maryland State Archives)
# * **Source Available:** https://github.com/cases-umd/Legacy-of-Slavery
# * **License:** [Creative Commons - Attribute 4.0 Intl](https://creativecommons.org/licenses/by/4.0/)
# * [Lesson Plan for Instructors](./lesson-plan.ipynb)
# * **Related Publications:**
#  * **IEEE Big Data 2019 CAS Workshop:** [A Case Study in Creating Transparency in Using Cultural Big Data: The Legacy of Slavery Project](https://dcicblog.umd.edu/cas/wp-content/uploads/sites/13/2018/12/12.Cox_-2.pdf)
# * **More Information:**
#  * **SAA Outlook March/April 2019:** [Turning Data into People in Maryland's Slave Records](https://twitter.com/archivists_org/status/1116132520255479809)
# 
# We organized the data preparation step around [David Weintrop’s model of computation thinking] (https://link.springer.com/content/pdf/10.1007%2Fs10956-015-9581-5.pdf) and worked based on a [questionnaire] (TNA_Questionnaire.ipynb) developed by The National Archives, London, UK to document this step as well. 
# 
# ![CT-STEM taxonomy](taxonomy.png "David W.'s CT Taxonomy")
# 
# ### **C**omputational Thinking Practices
# * Data Practices
#  * Visualizing Data
#  * Mainpulating Data
# * Systems Thinking Practices
#  * Thinking in Levels
# 
# ### **E**thics and Values Considerations
#  * Historical and Cultural Context Based Exploration and Cleaning
#  * Understanding the sensitivity of the data
# 
# ### **A**rchival Practices
#  * Digital Records and Access Systems
# 
# ### Learning Goals
# A step-by-step understanding of using computational thinking practices on a digitally archived Maryland State Archives Legacy of Slavery dataset collection

# In[3]:


import pandas as pd
import networkx
import matplotlib.pyplot as plt
import numpy as np


# In[2]:


#reimport the csv saved from the previous step 2
#code to import the csv saved from the previous step
df = pd.read_csv("Datasets\LoS_Prep_Output.csv") 
print(df.head(10))


# In[7]:


# write python code to plot charts and maps
get_ipython().system('pip install bokeh')
get_ipython().system('pip install cufflinks plotly')


# In[8]:


from bokeh.io import output_notebook, show, save


# In[11]:


# Standard plotly imports
# import plotly.chart-studio as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
# Using plotly + cufflinks in offline mode
import cufflinks
cufflinks.go_offline(connected=True)
init_notebook_mode(connected=True)


# In[12]:


output_notebook()


# In[25]:


LoS_CoF = networkx.from_pandas_edgelist(df, 'Freed_FirstName', 'Owner_FirstName')


# In[26]:


from bokeh.io import output_notebook, show, save
from bokeh.models import Range1d, Circle, ColumnDataSource, MultiLine
from bokeh.plotting import figure
from bokeh.plotting import from_networkx


# In[27]:


#Choose a title!
title = 'Legacy Of Slavery Certificates of Freedom - Enslaved, Owner, Witness Network'

#Establish which categories will appear when hovering over each node
HOVER_TOOLTIPS = [("Freed_FirstName", "@index")]

#Create a plot — set dimensions, toolbar, and title
plot = figure(tooltips = HOVER_TOOLTIPS,
              tools="pan,wheel_zoom,save,reset", active_scroll='wheel_zoom',
            x_range=Range1d(-10.1, 10.1), y_range=Range1d(-10.1, 10.1), title=title)

#Create a network graph object with spring layout
# https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.drawing.layout.spring_layout.html
network_graph = from_networkx(LoS_CoF, networkx.spring_layout, scale=10, center=(0, 0))

#Set node size and color
network_graph.node_renderer.glyph = Circle(size=15, fill_color='skyblue')

#Set edge opacity and width
network_graph.edge_renderer.glyph = MultiLine(line_alpha=0.5, line_width=1)

#Add network graph to the plot
plot.renderers.append(network_graph)

show(plot)
#save(plot, filename=f"{title}.html")


# In[18]:


df['Sex'].iplot(kind='hist')


# In[3]:


# save the output file
dfo = pd.DataFrame(df)
dfo.to_csv('Datasets\LoS_Viz_Output.csv', index=False)


# # Future Steps:
# 
# As next steps, we have plans to understand more about linking the data collections so we could create networks of connected data elements that could create insights not seen or understood before. We would like to do more natural language processing analysis on certain features like notes and comments as the transcribers had entered valuable information into this feature. Also, more in-depth research on the reasons and rationale behind using different words to determine “Prior Status” and “Complexion”.
# 
# ## Questions to ponder about and resources:
# Questions from the discussion related to the collection: Were the scars used for identification purposes in terms of determining which slave belongs to which owner? There was a spike in the number of Certificates of Freedom (CoF) from 1831 to 1832, then COF issues ceased around the year 1860. Is this because slavery was coming to an end? What is the significance of the differences in the prior status column? There are many records, including “Born Free”, “Free Born”, “Slave”, “Enslaved”, and “Descendant of a white female woman”. Are there differences between these statuses? Skin complexion is very subjective, so how should we divide and classify the multiple different skin tones recorded? Resources to read: A Guide to the History of Slavery in Maryland (MSA) (Read sections below) III. Africans to African Americans VI. Slavery and Freedom in the New Nation
# 

# In[ ]: