# REAL1-CG.3135: Real Estate Data Analytics

## Class 4: Correlation and Empirical Causality (An Ongoing Debate)

### Data in Commercial Real Estate and Finance: Raise the Rents to Lower Vacancy
* Not experimental, but arises from a complicated interplay of negotiations.
    * Brokers.
    * Lawyers.
    * Macroeconomics.
        * Complicated interactions that are data generating processes (DGPs).
        * Consider the *Torto-Wheaton Rent Index*.


* **Independently and identically distributed** (IID) data: **Not in CRE.**
    * Leplace's *classical view of probability*: independent experiments with the same **data generating process**.
        * In time series, we will see how *Andrei Markov* extended probabilistic framework in a world where today influences tomorrow.
    * See above when you think about IID data in CRE.
    
    

* I teach you how to injest data and perform hypothesis testing through the application of algorithms.



* In this discussion, I will address the issue of thinking critically about data generating processes and the conditions under which the results of hypothesis testing have merit.



* There are ways to treat analysis of non-experimental data to obtain results that are quasi-experimental.



* For superb books on these ideas, see:

> [Angrist and Pischke, Mostly Harmless Econometrics](https://www.amazon.com/Mostly-Harmless-Econometrics-Empiricists-Companion/dp/0691120358/ref=sr_1_2?crid=190CZONZT447B&dchild=1&keywords=angrist+mostly+harmless+econometrics&qid=1599136508&sprefix=angrist+%2Caps%2C148&sr=8-2)


> [Judea Pearl, The Book of Why: The New Science of Cause and Effect](https://www.amazon.com/Book-Why-Science-Cause-Effect/dp/1541698967/ref=sr_1_1?dchild=1&keywords=the+book+of+why&qid=1599136487&sr=8-1)

### Experimentation in the Physical Sciences

* In the physical sciences, experiments can be designed to explore theories about the physical world expressed in the language of mathematics.  Other than philosophical debates about existence (are we just a simulation on someone else's computer?), there are no questions regarding causation.
    * Experiments designed to test the implications of Einstein's the theory of gravitation.
        - Sir Arthur Eddington's photographs of solar eclipses to examine the bending of light by massive objects.
        - Imperfections in the classical prediction of Mercury's orbit around the Sun.
        - Gravitational lensing: two distinct images of the same object.
        - Gravitational waves: recently discovered.
    * Experiments to prove the accuracy of Quantum Electrodynamics (QED), the most accuracy theory in science. 
    * Experiments designed to detect the Higgs field at the Large Hadron Collider.



* In systems that involve humans, however, causation may be difficult to establish.  Certainly, correlations do not establish a causal mechanism.  
    * Umbrellas cause rain.
    * More education yields higher earnings.
    * More police reduce crime.



* It might be possible to establish a framework for human experimentation.
    * Ethics and Institutional Review Boards (IRBs).
    * Generalizability of results.

### A Discussion of Empirical Causality: We Largely Learn Through Failure

* Prior to the 1930s, there were few government-sponsored measures of economic activity.



* As the US and Europe slid into the Great Depression, policy makers lacked basic information.
    * 1929 and today: a lack of data



* In the US, the National Accounts were created in 1934 and greatly expanded during and after WWII.



* At the same time, Alfred Cowles established the *Cowles Commission for Research in Economics*.



* Cowles approach was an econometric framework to estimate systems of simultaneous equations to model an economy.



* Ultimately would develop large-scale models to examine a host of different economic variables. 
    * GNP = C + G + I + X - M
    * Develop a linear algorithm for each component.
    * As the NIPA were developed, C and I had increasing number of components: 
        * More equations!
    * Main insight was a demonstrated bias of linear regression estimates derived from such models.
    * Approach was found to be inadequate for policy evaluation (“Goodhart’s law” and “Lucas critique”).
    


### Goodhart's Law
* Goodhart “asserts that any economic relation tends to break down when used for policy purposes.”  (Wickens [2008].)
    * Proposed relationships, economic or otherwise, are not structural in nature.
    * Instead they sre derived from fundamental behavioral relationships (structural).
    


### The Lucas Critique
* Lucas (1976) notes that individual decision rules affected by policy are driven by “deep structural parameters.” 
    * Decision rules and, therefore, decisions are contingent on the state of the system as it is.
    * Change the system through policy, change the decision rule.
    * Such changes may not be captured in non-structural models.


#### Both Goodhard and Lucas Are Especially Relevant Today



### Experimental Design: Natural Experiments (Freakonomics and Super Freakonomics)
> A natural experiment is an empirical study in which individuals (or clusters of individuals) exposed to the experimental and control conditions are determined by nature or by other factors outside the control of the investigators, yet the process governing the exposures arguably resembles random assignment.

> Wikipedia



* Examples
    * **Natural crises**
        * 1906 S.F. earthquake to examine the impact of stock changes (vacancy) on rent.  [Friedman and Stigler](https://fee.org/resources/roofs-or-ceilings-the-current-housing-problem/) on assessing rent control.
        * Hurricane Sandy.  [Savage and Vo](https://github.com/thsavage/Causation/blob/master/Poster.pdf) on removing the impact of an intervention to evaluate transportation.
    * **Lotteries** that truly randomize a group of individuals.  
        * Vietnam draft in the U.S. as a means to explore the impacts of education on wages.
            * Delay in schooling reduces wages (even if the same level of schooling is achieved).
        * Randomized eligibility for mandatory military service in Argentina.  
            * Actual conscription increases the likelihood of having a criminal record later in adulthood. 
            * Possible inference: Delayed entry to the labor market has adverse implications in later labor market outcomes.
        * Recent paper on the impact of rent control on quality of housing stock.
            * Supports theoretical predictions of rent control: Locks in current tenants and degrades the quality of stock.
    * **Jurisdictional boundaries** over which policies are different.
        * Different minimum wage laws (New Jersey and Pennsylvania) to examine the impact of minimum wages on employment levels.
        * North v. South Korea.
    * **Same individuals** faced with exogenous changes (though doing the same thing).
        * Tim in the classroom versus Tim online.

### Judea Pearl's Ladder of Empirical Causation (Weakest to Strongest)

* **Association**: What if I see __________?
    * What does a symptom tell me about a disease?
    * What does a survey tell us about political attitudes?
* Example: Correlation between symptom and disease.
* Others?

* **Intervention**:  What if I do __________?
    * What if I take this medicine, will my disease be cured?
    * What if the government subsidizes education, will wages rise?
* Example: Regression accounting for alternative sources in wage variation, such as Griliches (1979).
* Others?

* **Counterfactual**: What if I acted differently?
    * Was it the medicine that cured my disease?
    * If Hurricane Sandy had not hit NYC, would traffic patterns be the same?
* Example: [If we have time](https://github.com/thsavage/Causation/blob/master/Poster.pdf).

### An Example from Pharmaceuticals: Stage-3 COVID Vaccine Studies

* When full-length articles on RCTs appear in the WSJ, it time for the industry to pay attention.



* Randomized clinical trials using a double-blind protocol, the technique used for clinical trials for disease treatment.
    * **Control** group.
    * **Treatment** group.
    * Data scientist **does not know** which individuals are in either group until the study is complete.



* The goal is to compare average outcomes across two groups to test for differences.
    * This can be done using either classical or Bayesian inference.



* Issues:
    * Costly (direct and in terms of time).
    * Often only a single study that not be repeated or reproduced.

### Tim Is Skeptical but Not Cynical
* These tools are powerful and help us understand the world around us.



* Easier to impliment.
    * Availability of data.
    * Availability of open source.  (Wait until we deploy TensorFlow.)



* But GIGO still prevails, as does scepticism.  (And I think that this is still fun!)



* In the meantime, let's move up Pearl's Ladder of Empirical Causation: **bivariate to multivariate regression**.