Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCA Regressions #11

Open
shoonlee opened this issue Jul 17, 2021 · 23 comments
Open

RCA Regressions #11

shoonlee opened this issue Jul 17, 2021 · 23 comments

Comments

@shoonlee
Copy link
Collaborator

@wbinzhe

  • Define temperature using the past 5 years observations
  • Define temperature in F
  • Non-parametric estimation
  • Try (1) sales only and (2) sales + refinancing
  • Try without quality index, seller type, and buyer type (i.e., those with missing values)
  • Double check the number of observations, R2, fixed effects, etc
@shoonlee
Copy link
Collaborator Author

@wbinzhe

For the foot traffic regression:

  • Run with the entire sample (monthly)
  • Run non-parametric regression
  • Run the regression on the subset of zip codes that matches with the RCA zip codes
  • Check if winter months tmax_90F == 1 is an error (frequency, where it is, etc)
  • Include tmax_90F and tmin_30F separately with category, brand, year-of-month, and ZCTA3 FE

@shoonlee
Copy link
Collaborator Author

@wbinzhe

For both RCA and foot traffic, create a graph similar to Figure 2 in the attached paper. Basically, temperature bins on the X-axis and the estimated effect size on the logged transaction price (RCA), and the logged number of visitors (Safeguard) are on the y-axis. You can find a sample code in the sample code thread (#12). The code does not have confidence interval, but adding them shouldn't be too hard.

Barreca et al. - 2016 - Adapting to Climate Change The Remarkable Decline.pdf

@shoonlee
Copy link
Collaborator Author

Comments on the Jul 22 updates

  • When running the RCA regression using a 5-years average temperature, how did you construct the data? For instance, when you construct the number of days above 90F, did you count the number of such days out of the past 1825 (365*5) days?
  • In the summary stat, can you include temperature variables as well? For now, we have it only for the current year based temperature variables.
  • Related to the earlier point, how much change in the number of days above 90F do we see in the data?
  • Can you double check if the numbers are correct? In slide #16, for instance, if we compare columns (3) and (4), they have the same R2 and adjusted R2 although only column (4) has zip code FE. Also, the DF of the two models is the same, which I highly doubt. 
  • Also, when you run the regression using the number of days above 90F, can you exclude the mean temperature variable (for instance in slide #16)? The interpretation of the omitted category is rather tricky with that variable included. 
  • To make the map in the additional material part more meaningful, try creating the same map using the number of days above 90F for summer months for the first 3 years of the sample and the last 3 years of the sample. I suppose it's going to be 2006-2008 and 2017-2019, right?
  • Run monthly foot traffic analysis: in this case, you wouldn't have to use the 5% sample data, right?

@shoonlee
Copy link
Collaborator Author

@wbinzhe

Post-meeting summary Jul 23: let me know if anything needs to be clarified.

  • Upload the code and cleaned data (Please don't wait until you finish everything. Upload multiple times as you progress)
  • Extend the RCA analysis to start the analysis in 2006
  • Create the map for 2002-2006 vs 2015-2019
  • Non-parametric estimation for RCA and foot traffic data
    • For RCA, create a temperature bin based on the 5-year average temperature and regress using it. For instance, if a 5-year average temperature is 75F, it will belong to 70-79 bin. Omit the 60-69 category and run the regression.
    • For the foot traffic, repeat the same but at the monthly level. So calculate an average of daily mean temperature for each month and create a bin using it. Then run the regression.

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 23, 2021

@wbinzhe

Post-meeting summary Jul 23: let me know if anything needs to be clarified.

  • Upload the code and cleaned data (Please don't wait until you finish everything. Upload multiple times as you progress)

  • Extend the RCA analysis to start the analysis in 2006

  • Create the map for 2002-2006 vs 2015-2019

  • Non-parametric estimation for RCA and foot traffic data

    • For RCA, create a temperature bin based on the 5-year average temperature and regress using it. For instance, if a 5-year average temperature is 75F, it will belong to 70-79 bin. Omit the 60-69 category and run the regression.
    • For the foot traffic, repeat the same but at the monthly level. So calculate an average of daily mean temperature for each month and create a bin using it. Then run the regression.

@shoonlee I don't understand "Omit the 60-69 category and run the regression.", this is the code i would use for a specific bin. Anything wrong here?
lm_bin_i <- felm(formula = hedonic_1_5year, data = rca_retail %>% filter((annual_temperature) == bins[i]))

@shoonlee
Copy link
Collaborator Author

@wbinzhe

No, that's not what I mean. It's something like this:

felm(ln_price = t30 + t40 + t50 + t70 + t80 + t90 | xxxx | xxxx | data = data) where t30 == 1 when average temperature is between 30 and 40. For foot traffic, average temperature means an average of daily mean temperature for a given month, and for RCA it will be an average of 5-years. Does that make sense?

@shoonlee
Copy link
Collaborator Author

@wbinzhe

Take a look at the paper I attached to this thread (Barreca et al. - 2016 - Adapting to Climate Change The Remarkable Decline.pdf). It's a very similar specification.

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 23, 2021

@shoonlee Sure got it!

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 25, 2021

@shoonlee Please find Slide #31 and # 32 for RCA non-parametric estimations. https://docs.google.com/presentation/d/14_aDxt2O_Le4mCJj4lBfuK-rG9gI6WA8U69lhPJajis/edit?usp=sharing.
For annual/5-year mean temp, I adjusted the reference level to 52F instead of 60F, by learning from the data patterns. For 5-year or annual average temp, the range is 40-80F.
For annual/5-year max temp, the 5-year results is very noisy compared with all other plots.

Also, for parametric regression using # of days t>90F, adding the observations from 2006-2009 makes the negative effect non-significant (i.e., sample period 2006-2019). But excluding observations 2006-2008 will recover the negative effects. I did not find any sound heterogeneity over earlier years vs. later years, but it looks like to be a sample size issue (only three states).

If you want to look into these problems this weekend, codes are in RCA_retail_ca_tx_ny.R.

@shoonlee
Copy link
Collaborator Author

@wbinzhe

Thanks for the update. It sort of makes sense that we have a temperature range of 40-80 only for yearly data. As opposed to the monthly data where we have summer and winter months as separate observations, in yearly data things will be averaged out so we wouldn't really have observations above 90F or below 40F.

Can we go back to the number of days above a certain temperature as the definition of the temperature bin here? So basically run something like

felm(ln_price = t20 + t30 + t40 + t50 + t70 + t80 + t90 | xxxx | xxxx | data = data)

but t30 here is defined as the number of days where the average daily temperature is between 30-40F? The interpretation might change slightly from the foot traffic data, but we should try this.

Also, can we try this definition with the foot traffic data as well?

In summary, let's repeat the analysis with a different temperature bin definition (the number of days with daily mean temperature in a certain temperature bin).

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 25, 2021

@shoonlee Let me know if I am understanding this correctly:
In the RCA case, when we use the annual mean temp, for each observation, we only assign it to 1 specific bin, all others are zero.
If we change bins to the # of days when average daily temperature is between some F-range, then we are assigning 365 days to each bin (so we still need to drop one bin 60F). And the intepretation is that in locations with 1 more day in a specific temp-range, property value is xx higher/lower.

@shoonlee
Copy link
Collaborator Author

@shoonlee Let me know if I am understanding this correctly:
In the RCA case, when we use the annual mean temp, for each observation, we only assign it to 1 specific bin, all others are zero.
If we change bins to the # of days when average daily temperature is between some F-range, then we are assigning 365 days to each bin (so we still need to drop one bin 60F). And the intepretation is that in locations with 1 more day in a specific temp-range, property value is xx higher/lower.

@wbinzhe

I think your understanding is correct. Suppose that in 2015, we had 24 days with a daily mean temperature over 90F then for that year t90==24. Also, t20+t30+t40+...+t90 = 365.

Barreca et al (2016) - the paper we've repeatedly talked about - defined the variable in this way. See figure 1 and their econometric model section to make the variable definition more clear.

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 25, 2021

@shoonlee Let me know if I am understanding this correctly:
In the RCA case, when we use the annual mean temp, for each observation, we only assign it to 1 specific bin, all others are zero.
If we change bins to the # of days when average daily temperature is between some F-range, then we are assigning 365 days to each bin (so we still need to drop one bin 60F). And the intepretation is that in locations with 1 more day in a specific temp-range, property value is xx higher/lower.

@wbinzhe

I think your understanding is correct. Suppose that in 2015, we had 24 days with a daily mean temperature over 90F then for that year t90==24. Also, t20+t30+t40+...+t90 = 365.

Barreca et al (2016) - the paper we've repeatedly talked about - defined the variable in this way. See figure 1 and their econometric model section to make the variable definition more clear.

@shoonlee Thanks and I will double-check the intepretation in the paper!

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 26, 2021

@wbinzhe

Post-meeting summary Jul 23: let me know if anything needs to be clarified.

  • Upload the code and cleaned data (Please don't wait until you finish everything. Upload multiple times as you progress)

  • Extend the RCA analysis to start the analysis in 2006

  • Create the map for 2002-2006 vs 2015-2019

  • Non-parametric estimation for RCA and foot traffic data

    • For RCA, create a temperature bin based on the 5-year average temperature and regress using it. For instance, if a 5-year average temperature is 75F, it will belong to 70-79 bin. Omit the 60-69 category and run the regression.
    • For the foot traffic, repeat the same but at the monthly level. So calculate an average of daily mean temperature for each month and create a bin using it. Then run the regression.

@shoonlee The maps are under replication_folder/maps, I created multiple versions because maps of # of days above 90Fs are very similar. And if not necessary, do not run "prism_daily_assemble.R" to reproduce the maps, each map took 1 ~ 2h to be plotted out. Also pasted these maps in shared google slides for you to take a quick look.

@shoonlee
Copy link
Collaborator Author

shoonlee commented Jul 26, 2021 via email

@shoonlee
Copy link
Collaborator Author

shoonlee commented Jul 26, 2021

@wbinzhe

  • Histogram of the number of days variable (like figure 1 in Barreca et al 2016)
  • RCA entire sample analysis with the number of days in temperature bin variables
  • RCA centers vs shops
  • RCA current year temperature
  • Foot traffic (entire sample) with the number of days temperature definition
  • Foot traffic for (roughly) centers vs shopts

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 28, 2021

@wbinzhe

  • Histogram of the number of days variable (like figure 1 in Barreca et al 2016)
  • RCA entire sample analysis with the number of days in temperature bin variables
  • RCA centers vs shops
  • RCA current year temperature
  • Foot traffic (entire sample) with the number of days temperature definition
  • Foot traffic for (roughly) centers vs shopts

@shoonlee Hi Seunghoon, I am going to move to Safegraph. Please double check that we have everything needed for RCA analysis.

@shoonlee
Copy link
Collaborator Author

shoonlee commented Jul 28, 2021 via email

@shoonlee
Copy link
Collaborator Author

shoonlee commented Jul 28, 2021 via email

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 28, 2021

@shoonlee I used size 2 for each temp bin. For annual average, we only have 40-80, i also did trials for 5-degree bins, but 2-degree bins yields cleaner shape.

@shoonlee
Copy link
Collaborator Author

shoonlee commented Jul 28, 2021 via email

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 28, 2021

@shoonlee Monthly foot traffic part also updated

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 29, 2021

@shoonlee slides #18 : non-parametric est for different store types using # of days in each bin. slides #17 and #18 used all observations (store*month) rather than 10%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants