# Socioeconomics and your Diet
**by Yvonne King**

**Quick Notebook Reference**

1. Project Plan
2. Acquire Data
3. Prepare Data
4. Exploration
5. Modeling
6. Conclusions

## Project Plan

**Acquisition, Prep, and Initial Exploration**

- Collect all files
- Create a dataframe using pandas for each file
- Clean and prepare the data to perform aggregations and merge each dataframe together
- Remove/repair erroneous data
- Look at shape of data

**Exploration**
- Answer the following question:
    > Do more affluent neighborhoods have better restaurant choices 

## Imports

In [1]:
import warnings
warnings.filterwarnings("ignore")

#data manipulation
import pandas as pd

#Data Visualization
import seaborn as sns
import matplotlib.pyplot as plt

#Hypothesis testing
from math import sqrt
from scipy import stats

import acquire
import prepare

## Wrangle
**Acquire**

The ```acquire.py``` file has all the functions to call in our dataframes


**Prepare**

Functions have been created to handle all of the below in the ```prepare.py``` file
1. Dropped erroneous columns
2. Added the following calculated fields:
    - avg_gross_income
    - avg_total_income 
3. Updated all datatypes
4. Update postal codes ensuring all are 5 digits
4. Combined all data into one dataframe



### IRS Data
This dataframe will give us the incomes for all zipcodes

In [2]:
# read in dataframe
irs = prepare.prep_irs_data()

In [3]:
#take a look at a sample of our data to ensure everything came in as anticipated
irs.sample(3)

Unnamed: 0,zipcode,STATE,STATEFIPS,N1,A00100,A02650,N02650,avg_gross_income,avg_total_income
1173,5068,VT,50,1570,77483.0,78876.0,1570,49352.229299,50239.490446
24868,86327,AZ,4,5020,273293.0,276717.0,5020,54440.836653,55122.908367
18435,62612,IL,17,930,50641.0,51385.0,930,54452.688172,55252.688172


In [4]:
#Let's take a quick look at the shape of the data to make sure no rows were dropped
irs.shape

(27658, 9)

In [5]:
#make sure there are no nulls
irs.isnull().sum()

zipcode             0
STATE               0
STATEFIPS           0
N1                  0
A00100              0
A02650              0
N02650              0
avg_gross_income    0
avg_total_income    0
dtype: int64

In [6]:
irs.isna().sum().sum()

0

In [7]:
irs.isna().sum().sum()

0

**Quick Notes for IRS dataframe**

- Each row in our dataframe represents a zipcode in the US
- A Data Dictionary is available in our README file

### Restaurant Data
This Dataframe is a list of Vegan/Vegetarian restaurants in the US (*please note this list may not be comprehensive*) 