#!/usr/bin/env python
# coding: utf-8

# ## Exploratory Data Analysis and Visualization

# The goal of __exploratory data analysis (EDA)__ is to explore attributes across multiple entities to decide what statistical or machine learning techniques to apply to the data. Visualizations are used to assist in understanding the data.

# In[15]:

# loads the pandas library
import pandas as pd
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)  # Ignore Pandas future warnings

# creates data frame named df by reading in the Baltimore csv
df = pd.read_csv("manipulated_baltimore_data.csv")
df.head(n=3)


# The `.describe()` function summarizes a data frame column. Since the data type of `max_building_age` is currently type 'object', which in python is an indcator of type 'string', we have to first convert this attribute into a numeric value.

# In[16]:

df['max_building_age'].describe()


# Now that `max_building_age` is numeric type, we see that `describe()` provides __summary statistics__ on this attribute.

# In[17]:

# converts max_building age to numeric type
df["max_building_age"] = pd.to_numeric(df["max_building_age"])
df['max_building_age'].describe()


# We can the same operations to `max_annual_income`.

# In[18]:

df['max_annual_income'].describe()


# In[19]:

df['max_annual_income'] = pd.to_numeric(df['max_annual_income'])
df['max_annual_income'].describe()


# Finally we create some plots our data. A __scatter plot__ and a __bar chart__ are shown below.

# In[20]:
# ### Exercise 4
# > 1. Hover over different points and explore their additional characteristics. __Note__:`INHABITANTS_F/N` should be multiplied by 100 to be a percent.
# 2. The different points are clustered by grades. Which clusters have the most variation?
# 3. How does `BUILDINGS_Construction` vary across the different points?
# 4. Can you identify a trend overall?

# In[21]:
# ### Excercise 5
# > 1. Recall the preperations done to the INHABITANTS_Foreignborn, how might this have influenced these outcomes?
# 2. What can you learn from this graph?
# 3. What do you learn about the different grades?

# In[22]:
# ### Exercise 6
# > 1. What can you learn from this graph?
# 2. What are some explanations for the outcomes?
# 3. What can you learn about the different grades?
# 4. Compare to the previous graph, what are the similarities and differences?