Overview of this week's notebook:

- use pandas to read a csv file
- learn how to delete rows

In [None]:
import pandas as pd

In [None]:
# load a data file
# note the relative filepath! where is this file located?
# note the dtype argument! always specify that fips codes are strings, otherwise pandas guesses int
# df = pd.read_csv('data/social_explorer.csv', dtype={'GEOID10':str})
# df = pd.read_csv('data/social_explorer.csv')
df = pd.read_csv('../../data/se_essentials.csv')


In [None]:
df.shape

In [None]:
df.head()

In [None]:
# I want to see all the data
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [None]:
df.sample()

In [None]:
# look at the data types
df.dtypes

Wow, that's a lot of objects! Shouldn't they be strings and numbers? Why is this?

And oops! That first row is not a data row. Let's get rid of it.

Determine which row you want to drop by its index number. In this case, it is `[0]`. 

In [None]:
df.drop([0])

or... we can re-import the data skipping the second row. This is better. Why?

In [None]:
df = pd.read_csv('data/social_explorer.csv',skiprows=[1])

In [None]:
df.head()

Waaaaaiiit... what happened to the leading "0" in the FIPS code?

That's right, python automatically makes decisions on import, and interprets "FIPS" as an integer column, thus cropping the leading zeroes. Let's fix this by re-re-importing the data, this time, specifying the FIPS column as a string:

In [None]:
df = pd.read_csv('data/social_explorer.csv',
                 skiprows=[1],
                dtype={'FIPS':str})

In [None]:
df.head()

In [None]:
df.dtypes

In [None]:
df.sample(5)

In [None]:
# output the columns
df.columns

In [None]:
# if there are too many, try outputting as a list:
list(df)

In [None]:
# how many records in each column that are not null?
df.count()

In [None]:
# access a single column like df['col_name']
df['Median Household Income (In 2018 Inflation Adjusted Dollars)'].head(100)