# Midterm Exam "Food Environment Atlas"

The following questions center around data from the **Food Environment Atlas**. This dataset contains food environment factors for counties in the United States that may influence nutrition, health, and socio-economic factors. ([Link](https://catalog.data.gov/dataset/food-environment-atlas-f4a22))

**Structure of the dataset**:
* Each row is a U.S. county.
* Each column is a food environment factor.

## Cleaning

In [24]:
import pandas as pd

In [25]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [26]:
pd.options.display.float_format = '{:.2f}'.format

In [27]:
data = pd.ExcelFile('DataDownload.xls')

In [28]:
data.sheet_names

['Read_Me',
 'Variable List',
 'Supplemental Data - County',
 'Supplemental Data - State',
 'ACCESS',
 'STORES',
 'RESTAURANTS',
 'ASSISTANCE',
 'INSECURITY',
 'PRICES_TAXES',
 'LOCAL',
 'HEALTH',
 'SOCIOECONOMIC']

In [6]:
access = data.parse('ACCESS').rename(columns=lambda x: x.strip())
stores = data.parse('STORES').rename(columns=lambda x: x.strip())
restaurants = data.parse('RESTAURANTS').rename(columns=lambda x: x.strip())
health = data.parse('HEALTH').rename(columns=lambda x: x.strip())
local = data.parse('LOCAL').rename(columns=lambda x: x.strip())
socio = data.parse('SOCIOECONOMIC').rename(columns=lambda x: x.strip())

comb = [access, stores,restaurants, local, health, socio]

https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns

In [7]:
from functools import reduce

food_complete = reduce(lambda left, right: pd.merge(left, right, on=['FIPS','State','County'], how='outer'), comb)

In [8]:
food_complete.columns

Index(['FIPS', 'State', 'County', 'LACCESS_POP10', 'LACCESS_POP15',
       'PCH_LACCESS_POP_10_15', 'PCT_LACCESS_POP10', 'PCT_LACCESS_POP15',
       'LACCESS_LOWI10', 'LACCESS_LOWI15',
       ...
       'PCT_NHPI10', 'PCT_65OLDER10', 'PCT_18YOUNGER10', 'MEDHHINC15',
       'POVRATE15', 'PERPOV10', 'CHILDPOVRATE15', 'PERCHLDPOV10', 'METRO13',
       'POPLOSS10'],
      dtype='object', length=219)

In [9]:
food_complete[['State'\
      ,'County'\
      ,'PCT_OBESE_ADULTS13'\
      ,'FMRKT16'\
      ,'FFR14'\
      ,'GROC09'\
      ,'GROC14'\
      ,'GROCPTH14'\
      ,'PCT_LACCESS_CHILD15'\
      ,'FFRPTH14'\
      ,'FMRKTPTH16'\
      ,'FFRPTH14'\
      ,'PCT_OBESE_ADULTS08'\
      ,'PCT_DIABETES_ADULTS13'\
      ,'PCT_DIABETES_ADULTS08'\
      ,'CHILDPOVRATE15'\
      ,'PCT_HSPA15'\
     ]].to_csv('midterm_exam.csv')

## Description

Please run the following line of code to load the dataset.

In [29]:
food = pd.read_csv('midterm_exam.csv', index_col=0)

Please run the following line of code to get an overview of the columns in the dataset. Each question below specifies what each column means.

In [30]:
pd.Series(food.columns)

0                     State
1                    County
2        PCT_OBESE_ADULTS13
3                   FMRKT16
4                     FFR14
5                    GROC09
6                    GROC14
7                 GROCPTH14
8       PCT_LACCESS_CHILD15
9                  FFRPTH14
10               FMRKTPTH16
11               FFRPTH14.1
12       PCT_OBESE_ADULTS08
13    PCT_DIABETES_ADULTS13
14    PCT_DIABETES_ADULTS08
15           CHILDPOVRATE15
16               PCT_HSPA15
dtype: object

The data is cleaned and each column only contains objects of one type.

In [12]:
food.dtypes

State                     object
County                    object
PCT_OBESE_ADULTS13       float64
FMRKT16                    int64
FFR14                      int64
GROC09                     int64
GROC14                     int64
GROCPTH14                float64
PCT_LACCESS_CHILD15      float64
FFRPTH14                 float64
FMRKTPTH16               float64
FFRPTH14.1               float64
PCT_OBESE_ADULTS08       float64
PCT_DIABETES_ADULTS13    float64
PCT_DIABETES_ADULTS08    float64
CHILDPOVRATE15           float64
PCT_HSPA15               float64
dtype: object

A sample of the data:

In [13]:
food.sample(10,random_state=12345)

Unnamed: 0,State,County,PCT_OBESE_ADULTS13,FMRKT16,FFR14,GROC09,GROC14,GROCPTH14,PCT_LACCESS_CHILD15,FFRPTH14,FMRKTPTH16,FFRPTH14.1,PCT_OBESE_ADULTS08,PCT_DIABETES_ADULTS13,PCT_DIABETES_ADULTS08,CHILDPOVRATE15,PCT_HSPA15
245,CO,Alamosa,23.0,1,14,4,4,0.25,1.68,0.87,0.06,0.87,21.0,6.7,5.6,30.2,
2023,ND,Pembina,32.1,3,1,5,5,0.7,10.16,0.14,0.42,0.14,29.1,12.1,9.2,11.2,25.4
791,IA,Allamakee,33.6,2,3,6,7,0.5,1.65,0.21,0.14,0.21,27.6,9.1,8.8,17.0,
1841,NY,Dutchess,25.9,14,244,99,94,0.32,7.39,0.82,0.05,0.82,26.7,9.6,8.3,12.9,23.3
429,GA,Decatur,30.0,1,21,7,6,0.22,3.41,0.77,0.04,0.77,31.6,12.1,12.2,39.9,
386,FL,Washington,38.5,1,17,2,3,0.12,6.99,0.7,0.04,0.7,29.3,12.9,10.7,34.2,24.1
2917,VA,Bristol,27.3,0,31,6,4,0.23,6.34,1.8,0.0,1.8,25.4,10.5,10.7,33.4,25.1
509,GA,Schley,30.4,0,2,1,1,0.19,0.12,0.39,0.0,0.39,30.0,12.6,11.3,29.0,
389,GA,Bacon,32.7,0,7,3,2,0.18,3.57,0.62,0.0,0.62,29.0,10.4,10.5,36.8,
613,IL,DeKalb,28.3,2,75,15,10,0.09,5.26,0.71,0.02,0.71,26.6,8.3,6.9,16.3,26.8


## Questions

### Question 1

What is the adult obesity rate (`PCT_OBESE_ADULTS13`) for the county at the index `228`? (**Please return only the numeric value.**)

In [14]:
food.iloc[228][['PCT_OBESE_ADULTS13']][0]

19.5

### Question 2

Is the county with the most farmers markets also the county with the most fast food restaurants? (**Please return only `True` or `False`.**)

In [22]:
(food[food['FMRKT16'] == food['FMRKT16'].max()]['County'] \
== food[food['FFR14'] == food['FFR14'].max()]['County']).values[0]

True

### Question 3

How many U.S. states have at least one county with **more than 10** grocery stores (`GROC14`) and **fewer than 10** fast food restaurants (`FFR14`)? (**Please return only the numeric value.**)

In [33]:
len(food[(food['GROC14'] > 10) & (food['FFR14'] < 10)]['State'].unique())

3

### Question 4

What is the average percentage of children that have a low access to grocery stores (`PCT_LACCESS_CHILD15`) in California? (**Please return only the numeric value. Round the value to 2 decimals.**)

In [17]:
food[food['State']=='CA']['PCT_LACCESS_CHILD15'].mean().round(2)

4.22

### Question 5

What are the **three** U.S. states (**excluding** Washington D.C.) that have, on average, the highest number of fast food restaurants per 1000 people (`FFRPTH14`)? (**Please return only an array with the state abbreviations `State`**)

In [18]:
food[food['State']!='DC'].groupby('State')['FFRPTH14'].mean().nlargest(3).index.values

array(['UT', 'MA', 'RI'], dtype=object)

### Question 6

In which U.S. state is the county with the most farmers markets (`FMRKT16`). (**Please return only the state abbreviation `State`.**)

In [19]:
food[food['FMRKT16'] == food['FMRKT16'].max()]['State'].values[0]

'CA'

### Question 7

What are the U.S. states that have on average **less** than 25 percent obese adults? (**Please return a sorted Series.**)

In [20]:
fg = food.groupby('State')['PCT_OBESE_ADULTS13'].mean()
fg[fg < 25].sort_values()

State
CO   20.68
HI   21.90
DC   22.40
MA   23.29
CA   24.13
Name: PCT_OBESE_ADULTS13, dtype: float64

### Question 8

Calculate the **FM_FF_Ratio**, the ratio of farmers markets per 1000 people to fast food restaurants per 1000 people, as follows:

In [16]:
food['FM_FF_Ratio'] = food['FMRKTPTH16'] / food['FFRPTH14']

What is the **FM_FF_Ratio** ratio for Santa Clara County? (**Please return just the numeric value, rounded to 2 decimals.**)

In [18]:
food[food['County']=='Santa Clara']['FM_FF_Ratio'].iloc[0].round(2)

0.03

### Question 9

How many Californian counties do not have farmers markets? (**Please return just a numeric value**)

In [23]:
len(food[(food['State']=='CA') & (food['FMRKTPTH16'] ==0)])

2

### Question 10

Is the `FM_FF_Ratio` for Santa Clara below the average `FM_FF_Ratio` for all Californian counties? (**Please return only the boolean value**)

In [24]:
(food[food['County']=='Santa Clara']['FM_FF_Ratio'] < food[food['State']=='CA']['FM_FF_Ratio'].mean()).iloc[0]

True

### Question 11

Which factor that does **not** contain the word `ADULT`, has the highest correlation with Adult Obesity Rate (`PCT_OBESE_ADULTS13`)? (**Please return only the name of the factor**.)

In [25]:
factors = food.corr()['PCT_OBESE_ADULTS13'].nlargest()
factors[~factors.index.str.contains('ADULT')].index[0]

'CHILDPOVRATE15'

### Question 12

What are the names of the counties nationwide that have lost all their grocery stores in 2014 (`GROC14`) and had more than two grocery stores in 2009 (`GROC09`)? (**Please return only the array with the names of the counties. Sorting is not required.**)

In [26]:
fc = food[food['GROC09']>2][['County', 'GROC09', 'GROC14']]
food.iloc[fc[((fc['GROC14']-fc['GROC09'])/fc['GROC09'] == -1)].index.values]['County'].values

array(['Lawrence', 'Schuyler', 'Fall River', 'Cannon'], dtype=object)

### Question 13

Identify the states that are among the 10 states with the smallest percentage of active highschoolers (`PCT_HSPA15`) **and** the 10 states with the highest obesity rate (`PCT_OBESE_ADULTS13`). (**Please return an array with the state abbreviations `State`**).

In [35]:
active = food.groupby('State')['PCT_HSPA15'].mean().nsmallest(10)#.index
active
#obese = food.groupby('State')['PCT_OBESE_ADULTS13'].mean().nlargest(10).index
#active[active.isin(obese)]

State
MD   19.50
KY   20.20
HI   20.30
RI   20.30
AK   20.90
MS   21.20
ME   21.60
NH   22.30
VT   23.10
NY   23.30
Name: PCT_HSPA15, dtype: float64

### Question 14

Which state has the largest difference in the adult obesity rate (`PCT_OBESE_ADULTS13`) among its counties? (**Please return the state abbreviations `State`.**)

In [28]:
spread = food.groupby('State')['PCT_OBESE_ADULTS13'].agg(['min','max'])
spread['spread'] = spread['max']-spread['min']
spread['spread'].nlargest(1).index[0]

'VA'

### Question 15

How many counties have fewer fast food restaurants per 1000 people (`FFRPTH14`) than the average of grocery stores per 1000 people (`GROCPTH14`) and farmers markets per 1000 people (`FMRKTPTH16`)? (**Please return a percentage, rounded to 2 decimals, of the overall number of counties in the dataset.**)

In [29]:
round((len(food[food['FFRPTH14'] \
                < food[['GROCPTH14','FMRKTPTH16']]\
                .apply(lambda x: x.mean(), axis=1)])/len(food))*100,2)

8.43

In [36]:
round((len(food[food['FFRPTH14'] \
                < food[['GROCPTH14','FMRKTPTH16']]\
                .apply(lambda x: x.mean(), axis=1)])/len(food)),5)

0.08431