# 1. Data Import and Manipulation

We first import a dataset from the workshop website. This is a dataset on married women labor force participation used in [Mroz 1987](http://unionstats.gsu.edu/9220/Mroz_Econometrica_LaborSupply_1987.pdf). The dataset is also used throughout Wooldridge's text book: Introductory Econometrics: A Modern Approach. After briefly inspecting the data, we create a new column `lwage` in preparation for a simple regression.

In [0]:
# load data
data_url <- "https://tdmdal.github.io/r-workshop-researchers/data/mroz_1987.csv"
mroz_1987 <- read.csv(data_url)

In [0]:
# take a look at the structure of the data
str(mroz_1987)

See a description of the data columns [here](https://justinmshea.github.io/wooldridge/reference/mroz.html).

In [0]:
# print the first few rows of the dataset
head(mroz_1987)

In [0]:
# create log wage
mroz_1987["lwage"] <- log(mroz_1987["wage"])

# 2. Modelling

We will run a simple regression to investigate return on education for married women: $log(wage) = \beta_0 + \beta_1educ + u$.

In [0]:
# setup a regression model
lr <- lm(formula = lwage ~ educ, data = mroz_1987)

# 3. Report & Graph

We report the regression result, and plot a few graphs.

In [0]:
# report regression result
summary(lr)

In [0]:
# plot data and regression line
par(mfrow = c(1, 1))
plot(mroz_1987[c("educ", "lwage")])
abline(coef(lr))

In [0]:
# plot a few other things
par(mfrow = c(2, 2))
plot(lr)