# REAL1-CG.3135: Real Estate Data Analytics (Accelerated Format)

## Session 1: Demystifying the Unmystical
## Like a hammer and nails, this class is about ideas first and the tools to implement them

## Course Introduction and Overview

* Like every other major industry, CRE will be disrupted by the application of technology empowered by data analytics. 
* It will be called different things, but it is the same thing.
    * Econometrics and forecasting
    * Statistical learning
    * Machine learning
    * Artificial intelligence
* Why?
    * Example: I want to predict what the 10-Year U.S. Treasury is going to be in the future to determine a realistic DCF for my property.
    * Example: I want to predict what the cap rate on this property is, based on its characteristics.
    * Example: I want to predict how likely my current tenant is to churn at the end of her lease term.  (And whether I can intervene in the decision.)
    * Example: I want to predict how labor costs will change with time.

## My Goals
* Develop a common language among us through rapid immersion.
    * Exploring conjectures about the world (without data, this is now called "thought leadership").
    * Time series as storytelling and for forecasting.
    * Examples of spurious correlation and considerations about causation.
* Introduce you to R and R Studio to rapidly expand your tool kit.
    * Provide historical background in probability and statistics.
    * R is a low-level programming language developed by Ross Ihaka and Robert Gentleman (hence R). 
* I will be using Jupyter notebooks (as a substitute for PowerPoint).
    * Occasional use of Python graphics in class.
    * Visualizations and Monte Carlo methods rather than mathematical proofs.
        * You are not responsible for Python.
        * You will use R and R Studio with sample R code to be implemented in class.
            * Why R and R Studio?
                * Stable code base and platform
                * Open source and free
                * Learning curve is not steep, allowing for immediate immersion in data acquisition and analysis
                * Compatible with other statistical learning environment, and this class is focused on ideas (not necessarily tools)
                * Open source and free
        * These notes will live until we turn out the lights.
* Provide you with many examples from a variety of sources, including real estate and finance.
* Typical session: Lecture and discussion with examples, as well as in-class labs.  
    * I know of no other way to learn data analytics than to do it repeatedly.
* By the end of this class, you will know more about data analysis than the chief economists of either CBRE or JLL. 
* **Never forget: If I can do this, so can you.  And you'll be better RE professionals.**

## The Myth of AI

* For the difficult questions that we face in CRE, there is no artificial intelligence.  
* We have tools, and you must put the human back into machine learning.
* You must think about relationships given your domain knowledge in real estate.
    * *Finance*: Can you predict interest rates or cap rates?
    * *Development*: Can you predict labor costs?

## A Review of the Syllabus

## A Discussion of Your Goals

## An Introduction to R and R Studio

* R v. Python v. Excel
    * Access to a massive collection of algorithms and datasets
    * R has better time-series algorithms
    * R, Python and Excel are merely tools to implement ideas
        * Present Discounted Value: It a built-in function in Excel, but **why do we discount**?
* Jupyter v. PPT: instantenous updating
    * I have a standard talk on the current state of the macroeconomy and its implications for CRE that I can update it in real time.

### R and R Studio
* Installing and loading libraries
* CRAN, task views, and R vignettes
* Graphics and story telling
* Linear regression
* Dataframes as spreadsheets
* Application Protocol Interfaces (APIs)
* Webscrapping
* Feature engineering (data manipulation)
* Practical examples

## Foundational Concepts in Probability and Statistics

### Purpose
* Introduce you to important concepts that allow us to operationalize ideas such as **Data Generating Processes** (or DGPs).
    * **All of this can be done using material you have already seen in RE Finance, RE Capital Markets or RPM.**
    * Ideally these concepts will help 
* Painful but **necessary**.
    * We should pay respect for the 300 years of history that has brought us here.
* **These ideas should complement your understanding of real estate finance.**

### Geltner Ex 9-1: Probability Distribution Functions
* I am an economist, and I have never *seen* a demand curve.  (I have deploy the tools we will cover in this class to estimate their shape.)
* Distribution functions are mathematical abstractions (like demand curves).
* They help us think about important concepts in probability and measures, such as moments.

### Geltner Ex 9-2: Expected Returns (Averages) and Risk (Variance or Standard Deviation)
* Historically called measures of central tendancy or "moments".  
    * Another mathematical abstraction is the *moment generating function*.
* An average is a first moment.
* A variance is a second moment.

### What is Probability?
* Classical view arose from gambling with dice and holds that outcomes have equal probabilty.
* Subjective view uses a model with randomness such as the payoff to a particular gamble.
* Classical view holds that probability is based on the history of outcomes from an experiment, such as the probability the stock market goes up tomorrow or P(stock market goes up tomorrow).

### Components of an Experiment

#### Sample space, $S$, which is the set of all possible outcomes.

Examples: 

1. Flipping a US penny, $S=\{Heads,Tails\}$

2. Throwing a die, $S=\{1,2,3,4,5,6\}$

3. Throwing two dice, $S=\{i,j\}: i,j=1,2,3,4,5,6$

#### Events, $A$, which are any subset of $S$

Examples: 

1. $\text{Heads from } \{Heads,Tails\}$

2. $2 \text{ from } \{1,2,3,4,5,6\}$

#### Probability 

$P: A\rightarrow[0,1]$ or $P(A)$


#### Properites of Probability

1. The probability of an event occuring lies between 0 and 1: $P(A)\in[0,1]$

2. The probability of the sample space occurring is 1: $P(S)=1$

3. Summation: $P(A \bigcup B) = P(A) + P(B)$ for independent events

4. Conditioning: $P(A|B) = P(A)$ for independent events

5. Complimentary: $P(A^c) = 1-P(A)$

### Random Variables

We operationalize this through the use of **Random Variables** (both discrete and continuous) and examine important characteristics of these animals (moments) such as 

* Distribution Functions (Ex. 9-1 of *Geltner et al.*)
* Measures of Central Tendency 
    * Mean 
    * Variance (standard error)
    * Covariance or correlation (when we have more than one RV)
* **These are simply methods of statistical accounting**

### Mathematics of Central Tendency

*All data analtyics drives toward the summarization of information that we call data*.  On occasion, we call this **dimensionality reduction**.

Let $X$ denote a **Random Variable** and $x$ denote an event.  We seek to summarize important components through **dimensionality reduction**.

Population Average:

$\mu=E[X]=\sum P(X=x) \cdot x$

Population Variance:

$\sigma^2=Var[X]=E[(X-E[X])^2]=\sum_k P(X=x) \cdot (x-\mu)^2$

Population Standard Deviation:

$\sigma=\sqrt{\sigma^2}$

Coefficient of Variation:

$c_v=\frac{\sigma}{\mu}$

Sharpe Ratio:

$S_r=\frac{\mu}{\sigma}$

Let $Y$ denote a different **Random Variable** and $y$ denote an event.

$Cov(X, Y) = E[X-\mu_x][Y-\mu_y]$

$Corr(X, y) = \frac{Cov(X, Y)}{\sigma_x \cdot \sigma_y} \in (-1, 1)$

### Measures of Central Tendancy (Means and Variances): R Example

### The Normal Distribution: Nature's Distribution

### Returns on Two Assets: The Bivariate Normal

### A Hint: Why We Test Against the Normal