Notebook

Data Overview¶

Discuss the original sources of this data and any important issues or cautions related to the way this data was created.

If this data is sourced from scanned images, then it may help to include an example image here. This gives students insights into how the data was originally collected in raw form.

In [1]:

# Loading the Data
import pandas as pd

# Read the CSV file into a Pandas data frame:
df = pd.read_csv("mydata.csv")

# Show the first three rows
df.head(n=3)
df.describe()

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-e89869e48309> in <module>()
      3 
      4 # Read the CSV file into a Pandas data frame:
----> 5 df = pd.read_csv("mydata.csv")
      6 
      7 # Show the first three rows

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    438 
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441 
    442     if chunksize or iterator:

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    785             self.options['has_index_names'] = kwds['has_index_names']
    786 
--> 787         self._make_engine(self.engine)
    788 
    789     def close(self):

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1706         kwds['usecols'] = self.usecols
   1707 
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709 
   1710         passed_names = self.names is None

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'mydata.csv' does not exist

After you run the cell above, you will see the first three rows printed out by the Pandas head() function. Then there is also some text that is output by the describe() function. For more information about Pandas data frame functions, see the documentation:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

Dataset Exploration¶

In order to further explore the dataset in depth, we may rely on Pandas functions and raw row data to discover the following information:

variable names
number of rows
number of missing values per variable
numeric variables: mean, max, min
categoric variables: levels, count
check for duplicate columns

Activity: Explore this dataset using Pandas data frame functions and identify the information above. Record what you discover in the Markdown cell below.

My Dataset Notes¶

(This area provided for students to record their notes on the dataset.)

In [ ]: