## Consumer Food Demand Estimation

Team members: Stone Yan, Angela Chen, Yingyin Li, Tia Pappas, Daniela Salinas, Lyuheng(Kasper) Zheng

#### Our Topic and Population of Interest:

We are interested in the population of Panama. Our topic is about understanding diets in Panama, characterizing the relationship between people’s food choices, budget and prices. 

#### Our Goals:  

1. Analyze Panama households' food choices and expenditures to assess their diets' nutritional content and adequacy.
<br>
2. Estimate food demand among Panama, considering household needs and economic factors, and to analyze how changes in expenditure and food prices influence dietary choices.
<br>
3. Conduct counterfactual experiments to assess how hypothetical economic scenarios, such as changes in food expenditure, prices, shortages, and income, could affect Panama's nutrition.


### (A) Choice of Population & Supporting Data

#### Data Frame Set Up

In the following we will be estimating Constant Frisch Elasticity (CFE) demand systems for our selected population - Panama. We first installed our prerequisites. 

In [None]:
!pip install -r requirements.txt -q
!pip install CFEDemands --upgrade --pre -q
!pip install eep153_tools --upgrade -q

In the following we will be estimating Constant Frisch Elasticity (CFE) demand systems for our selected population - Panama. We first imported the data we need from the Google Sheet. 

In [None]:
InputFiles = {'Food Expenditures':('1gcAb2jlGQNrD2zrrTEbjL47vbXoxCHkkjHSYzD0-Tiw','Food Expenditures'),
              'Food Prices':('1gcAb2jlGQNrD2zrrTEbjL47vbXoxCHkkjHSYzD0-Tiw','Food Prices'),
              'Household Characteristics':('1gcAb2jlGQNrD2zrrTEbjL47vbXoxCHkkjHSYzD0-Tiw','Household Characteristics'),
              'FCT':('1gcAb2jlGQNrD2zrrTEbjL47vbXoxCHkkjHSYzD0-Tiw','FCT'),
              'Copy of RDI':('1gcAb2jlGQNrD2zrrTEbjL47vbXoxCHkkjHSYzD0-Tiw','Copy of RDI'),}

We then imported packages that are needed. 

In [None]:
import numpy as np
import pandas as pd
from eep153_tools.sheets import read_sheets
import cfe
from cfe.estimation import drop_columns_wo_covariance
from cfe import Regression

In [None]:
def get_clean_sheet(key,sheet=None):

    df = read_sheets(key,sheet=sheet)
    df.columns = [c.strip() for c in df.columns.tolist()]

    df = df.loc[:,~df.columns.duplicated(keep='first')]   

    df = df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)

    df = df.loc[~df.index.duplicated(), :]

    return df

In [None]:
# Get food expenditures 
expenditures = get_clean_sheet(InputFiles['Food Expenditures'][0],
                    sheet=InputFiles['Food Expenditures'][1])

if 'm' not in expenditures.columns:
    expenditures['m'] = 1

expenditures = expenditures.set_index(['i','t','m'])
expenditures.columns.name = 'j'

expenditures = expenditures.apply(lambda x: pd.to_numeric(x,errors='coerce'))
expenditures = expenditures.replace(0,np.nan)

In [None]:
# Get Household characteristics 
hh_characteristics = get_clean_sheet(InputFiles['Household Characteristics'][0],
                    sheet=InputFiles['Household Characteristics'][1])

if 'm' not in hh_characteristics.columns:
    hh_characteristics['m'] = 1

hh_characteristics = hh_characteristics.set_index(['i','t','m'])
hh_characteristics.columns.name = 'k'
#hh_characteristics.name = 'value'  

hh_characteristics = hh_characteristics.apply(lambda x: pd.to_numeric(x,errors='coerce'))

In [None]:
# Get food prices and call it p
url = 'https://docs.google.com/spreadsheets/d/1gcAb2jlGQNrD2zrrTEbjL47vbXoxCHkkjHSYzD0-Tiw/edit#gid=2085637103'
p = read_sheets(url,sheet='Food Prices',nheaders=2)

p.columns.names = ['t','m']
p.groupby(level='j').mean()

#use tranpose to switch columns and rows
p = p.transpose()
if 'm' not in p.columns:
    p['m']=1

p=p.apply(lambda x: pd.to_numeric(x,errors='coerce'))
p=p.replace(0,np.nan)

In [None]:
# Get FCT
fct = get_clean_sheet(InputFiles['FCT'][0],
                    sheet=InputFiles['FCT'][1])

fct = fct.set_index('j')
fct.columns.name = 'n'

fct = fct.apply(lambda x: pd.to_numeric(x,errors='coerce'))

In [None]:
# Get RDI
rdi = get_clean_sheet(InputFiles['Copy of RDI'][0],
                    sheet=InputFiles['Copy of RDI'][1])
rdi = rdi.set_index('n')
rdi.columns.name = 'k'

#### Show Organized Data Frame

Show food expenditures dataframe.

In [None]:
expenditures.head()

Show household characteristics dataframe.

In [None]:
hh_characteristics.head()

Show food prices dataframe.

In [None]:
p.head()

Show FCT dataframe

In [None]:
fct.head()

Show RDI dataframe.

In [None]:
rdi.head()

### (A) Estimate demand system

 Let $y_{i}^j$ be log household expenditure on food item $j$ from household $i$ of Panama. Our estimation regression takes the following form: 
 $$
      y^j_{i} = A^j(p) + \gamma_j'd_i + \beta_j w_i + \zeta^j_i.
$$

The formula above models the log household expenditure as a function of <br>

$A^j(p)$: A price index for food $j$, capturing how the pricing of good $j$ affects expenditure on food $j$;
<br>
$\gamma_j'd_i$: A household characterics demonstrating how demographics affects expenditure on food $j$; $\gamma_j$ is its coefficient.
<br>
$\beta_j w_i$: This term captures how the household's overall wealth affects its expenditure on food $j$; $\beta_j$ is its coefficent.
<br>
$\zeta^j_i$: This term captures other unobserved effect that influence food expenditure. 


In [None]:
# log_expenditures represents the logarithm of food expenditures.
log_expenditures = np.log(expenditures)

log_expenditures.head()

In [None]:
use = log_expenditures.index.intersection(hh_characteristics.index)

log_expenditures = log_expenditures.loc[use,:]
hh_characteristics = hh_characteristics.loc[use,:]

In [None]:
log_expenditures = log_expenditures.stack()
hh_characteristics = hh_characteristics.stack()

# Check that indices are in right places!
assert log_expenditures.index.names == ['i','t','m','j']
assert hh_characteristics.index.names == ['i','t','m','k']

#### 1. Basic Estimation Using Regression

In [None]:
#set up regression
result = Regression(y = log_expenditures,d = hh_characteristics)

print(log_expenditures.shape)
print(hh_characteristics.shape)

In [None]:
#get regression results
result.predicted_expenditures()

In [None]:
# use the graph to compare log food expenditures and actual expenditures
%matplotlib widget
df = pd.DataFrame({'y':log_expenditures,'yhat':result.get_predicted_log_expenditures()})
df.plot.scatter(x='yhat',y='y')

#### 2. Analyze Income Elasticity

In [None]:
#get the value of beta - As shown above, beta captures how the household's overall wealth affects its expenditures on food. 
result.get_beta().sort_values()

In [None]:
#graph beta
result.graph_beta()

In [None]:
#get the value of gamma - As shown above, gamma captures how household characteristics affects its expenditures on food. 
result.gamma

In [None]:
#To save result
result.to_pickle('estimates.pickle')

result = cfe.regression.read_pickle('estimates.pickle')