#!/usr/bin/env python # coding: utf-8 # # Data Overview # Discuss the original sources of this data and any important issues or cautions related to the way this data was created. # # If this data is sourced from scanned images, then it may help to include an example image here. This gives students insights into how the data was originally collected in raw form. # In[1]: # Loading the Data import pandas as pd # Read the CSV file into a Pandas data frame: df = pd.read_csv("mydata.csv") # Show the first three rows df.head(n=3) df.describe() # After you run the cell above, you will see the first three rows printed out by the Pandas head() function. Then there is also some text that is output by the describe() function. For more information about Pandas data frame functions, see the documentation: # * https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html # ## Dataset Exploration # In order to further explore the dataset in depth, we may rely on Pandas functions and raw row data to discover the following information: # # * variable names # * number of rows # * number of missing values per variable # * numeric variables: mean, max, min # * categoric variables: levels, count # * check for duplicate columns # # **Activity**: Explore this dataset using Pandas data frame functions and identify the information above. Record what you discover in the Markdown cell below. # # My Dataset Notes # (This area provided for students to record their notes on the dataset.) # In[ ]: