<h1>Project 1 Development Notebook: Climate and Housing</h1>
<p>
This file contains development code for project 1 for ECON 1680. It contains all code required to replicate the results outlined in the draft. Inline citations are included for all code segments that are not original. The code first cleans the data, generates some descriptive statistics, and then leverages the methodology discussed in the paper: 
We use different regressions and dimension reduction techniques to analyze the relationship between climate risk and housing prices, using space as our source of variation.
</p>
<h2>1. Imports and Functions</h2>

In [1]:
import pandas as pd
import numpy as np
from dateutil.parser import parse

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [2]:
def is_date(string):
    # From https://stackoverflow.com/questions/25341945/check-if-string-has-date-any-format
    # Checks if a string can be interpreted as a date
    try:
        parse(string)
        return True
    except ValueError:
        return False

<h2>2. Data: Cleaning and Saving</h2>

In [3]:
# Path to production directory: note that you must include Data, Figures, and Code subdirectories.
path = "C:\\Users\\garvg\\Downloads\\Project 1\\"

# Read in CSV data
zillow = pd.read_csv(path + "Data\\zillow.csv")
nri_fema = pd.read_csv(path + "Data\\nri_fema.csv")
oi_covars = pd.read_csv(path + "Data\\oi_covars.csv")

In [4]:
# Merge CSV data by State and County FIPS codes and save
clim_hous_data = zillow.merge(nri_fema, on=["state_fips", "county_fips"], validate="one_to_one")
all_data = clim_hous_data.merge(oi_covars, on=["state_fips", "county_fips"], validate="one_to_one")
all_data.to_csv(path + "Data\\all_data.csv", index=False)

In [5]:
# Specify columns to keep: hazard scores, controls, and house value
hazard_vars = [
    "Avalanche", "Coastal Flooding", "Cold Wave", "Drought", "Earthquake", 
    "Hail", "Heat Wave", "Hurricane", "Ice Storm", "Landslide", "Lightning",
    "Riverine Flooding", "Strong Wind", "Tornado", "Tsunami", "Volcanic Activity",
    "Wildfire", "Winter Weather"]
hazard_vars = [hazard + " - Hazard Type Risk Index Score" for hazard in hazard_vars]

housing_vars = [col for col in all_data.columns if is_date(col)]

control_vars = [
    "Population (2020)", "Building Value ($)", "Agriculture Value ($)", "Area (sq mi)", "job_density_2013",
    "ann_avg_job_growth_2004_2013", "ln_wage_growth_hs_grad", "emp2000", "foreign_share2010", 
    "mean_commutetime2000", "frac_coll_plus2000", "frac_coll_plus2010", "hhinc_mean2000",
    "med_hhinc1990", "med_hhinc2016", "poor_share1990", "poor_share2000", "poor_share2010",
    "share_white2000", "share_black2000", "share_hisp2000", "share_asian2000",
    "share_white2010", "share_black2010", "share_hisp2010", "share_asian2010"
    ]

columns = [
    "State Name", "County Name", "state_fips", "county_fips"]
columns.extend(hazard_vars)
columns.extend(housing_vars)
columns.extend(control_vars)

all_data_pruned = all_data[columns]

In [6]:
# For hazard data, rename variables and replace missing values with zeroes:
#   Missing values imply that the area has had no instances of a given hazard
#   or has never deemed that hazard a threat to life or property. We can roughly
#   infer that missing values imply perceived invulnerability to a given hazard.

hazard_rename_dict = dict()
for hazard in hazard_vars:
    hazard_rename_dict[hazard] = hazard[:hazard.index(" - ")]
    all_data_pruned[hazard] = all_data_pruned[hazard].fillna(0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_data_pruned[hazard] = all_data_pruned[hazard].fillna(0)


In [7]:
# Save data after cleaning
all_data_pruned.to_csv(path + "Data\\all_data_pruned.csv", index=False)

<h2>3. Descriptive Statistics</h2>
<p>Note: After running the above code, we may always start here since the pre-processing steps are saved in the all_data_pruned.csv file.</p>

In [8]:
# Load data
working_data = pd.read_csv(path + "Data\\all_data_pruned.csv")
working_data.head()

Unnamed: 0,State Name,County Name,state_fips,county_fips,Avalanche - Hazard Type Risk Index Score,Coastal Flooding - Hazard Type Risk Index Score,Cold Wave - Hazard Type Risk Index Score,Drought - Hazard Type Risk Index Score,Earthquake - Hazard Type Risk Index Score,Hail - Hazard Type Risk Index Score,...,poor_share2000,poor_share2010,share_white2000,share_black2000,share_hisp2000,share_asian2000,share_white2010,share_black2010,share_hisp2010,share_asian2010
0,California,Los Angeles,6,37,33.653846,43.259557,0.0,73.846643,100.0,48.106904,...,0.177636,0.157657,0.317937,0.100307,0.439193,0.105664,0.277873,0.089271,0.47745,0.121261
1,Illinois,Cook,17,31,0.0,44.265594,100.0,19.949093,96.659243,93.923003,...,0.126334,0.152655,0.498122,0.242277,0.196171,0.046331,0.438595,0.249698,0.239623,0.056553
2,Texas,Harris,48,201,0.0,73.843058,99.204582,88.450525,90.74133,94.05027,...,0.135599,0.165664,0.451573,0.17615,0.307376,0.045832,0.329789,0.189457,0.408444,0.051969
3,Arizona,Maricopa,4,13,0.0,0.0,0.0,85.841553,98.218263,99.331849,...,0.109395,0.13975,0.672659,0.037874,0.2433,0.017987,0.586845,0.054242,0.295705,0.030152
4,California,San Diego,6,73,31.25,32.796781,0.0,89.564111,99.745466,22.589882,...,0.119072,0.124306,0.555414,0.060665,0.260778,0.083679,0.484619,0.055997,0.320274,0.098978


In [None]:
# Write means and variances to logfile

In [None]:
# Generate map of hazard risk

# Generate map of housing prices


In [None]:
# Generate time trend of housing prices

# Generate time trend of housing price variance per year

# Generate 