# 1. Data Import and Manipulation

We first import a dataset from the workshop website. This is a dataset on housing prices and air pollution in [Harrison & Rubinfeld (1978)](https://www.sciencedirect.com/science/article/pii/0095069678900062). The dataset is also used throughout Wooldridge's text book: Introductory Econometrics: A Modern Approach. After briefly inspecting the data, we create two new columns/variables in preparation for a simple regression analysis.

In [0]:
# load data
data_url <- "https://tdmdal.github.io/r-workshop-students/data/hprice.csv"
hprice <- read.csv(data_url)

In [0]:
# take a look at the structure of the data
str(hprice)

See a description of the data columns [here](http://fmwww.bc.edu/ec-p/data/wooldridge/hprice2.des).

In [0]:
# print the first few rows of the dataset
head(hprice)

In [0]:
# create log price and log nox
hprice["lprice"] <- log(hprice["price"])
hprice["lnox"] <- log(hprice["nox"])

# 2. Modelling

We will run a simple regression to investigate the effect of air pollution on housing price.

$log(price) = \beta_0 + \beta_1log(nox) + u$.

In [0]:
# setup a regression model
lr <- lm(formula = lprice ~ lnox, data = hprice)

# 3. Report & Graph

We report the regression result, and plot a few graphs.

In [0]:
# report regression result
summary(lr)

In [0]:
# plot data and regression line
par(mfrow = c(1, 1))
plot(hprice[c("lnox", "lprice")])
abline(coef(lr))

In [0]:
# plot a few other things
par(mfrow = c(2, 2))
plot(lr)