# Chapter 4. Multiple Regression Analysis: Inference

This chapter continues our treatment of multiple regression analyis. We now turn to the problem of testing hypotheses about the parameters in the population regression model.

## 4-1 Sampling Distributions of the OLS Estimators

Up to this point, we have formed a set of assumptions under which OLS is unbiased; we have also derived and discussed the bias caused by omotted variables. In Section 3-4 ,we obtained the variances of the OLS estimators under the Gauss-Markov assumptions. In section 3-5 we showed that this variance is smallest among linear unbiased estimators.

Knowing the expected value and varianace of the OLS estimators is useful for describing the precision of the OLS estimators. Howeverm in order to perform statistical inference, we need to know more than just the first two moments of $\hat{\beta}_j$: we need to know the full sampling distribution of the $\hat{\beta}_j$. Even under the Gauss-Markov assumptions, the distribution of $\hat{\beta}_j$ can have virtually any shape.

When we condition on the values of the independent variables in our sample, it is clear that the sampling distribution of the OLS estimators depend on the underlying distribution of the errors. To make the sampling distributions of the $\hat{\beta}_j$ tractable, we now assume that the unobserved error is normally distributed in the population. We call this the normality assumption.

### Assumption MLR.6 Normality

The population error u is independent of the explanatory variables $x_1,x_2,\dots,x_k$ and is normally distributed with zero mean and variance $\sigma^2:u \sim Normal(0,\sigma^2)$

For cross-sectional regression applications, Assumptions MLR.1 through MLR.6 are called the classical linear model (CLM) assumptions. Under the CLM assumptions, the OLS estimators have a stronger efficiency property than they would under the Gauss-Markov assumptions. It can be shown that the OLS estimators are the minimum variance unbiased estimators, which means that OLS has the smallest variance among unbiased estimators.

### Theorem 4.1 Normal Sampling Distributions

Under the CLM assumptions MLR.1 through MLR.6, conditional on the sample values of the independent variables, 

\begin{equation}
\hat{\beta}_j \sim Normal[\beta_j,Var(\hat{\beta}_j)]
\end{equation}

where $Var(\hat{\beta}_j)$ was given in Chapter 3 [equation (3.51)], Therefore,

\begin{equation}
(\hat{\beta}_j-\beta_j)/sd(\hat{\beta}_j) \sim Normal(0,1)
\end{equation}

## 4-2 Testing Hypotheses about a Single Population Parameter: The t Test

This section covers the very important topic of testing hypotheses about any single parameter in the population regression function. The population model can be written as

\begin{equation}
y=\beta_0+\beta_1*x_1+\beta_2*x_2+\dots+\beta_k*x_k+u
\tag{4.2}
\end{equation}

and we assume that it satisfies the CLM assumptions. We know already that OLS produces unbiased estimators of the $\beta_j$. We must remember that the $\beta_j$ are unknown features of the population, and we will never know them with certainty. Nevertheless we can hypothesize about the value of $\beta_j$ and then use statistical inference to test our hypothesis.

In oder to construct hypothesis we need the following result:

### Theorem 4.2 t Distribution for the Standarized Estimators

Under the CLM Assumptions MLR.1 through MLR.6,

\begin{equation}
(\hat{\beta}_j-\beta_j)/se(\hat{\beta}_j) \sim t_{n-k-1}=t_{df}
\end{equation}

Where k+1 is the number of unknown parameters in the population model and n-k-1 is the degrees of freedom (df).

Theorem 4.2 is important in that it allows us to test hypotheses involving the $\beta_j$. In most applications, our primary interest lies in testing the null hypothesis

\begin{equation}
H_0: \beta_j=0
\tag{4.4}
\end{equation}

Since $\beta_j$ measures the partial effect of $x_j$ on the expected value of y, after controlling for all other independent variables, expression (4.4) means that once $x_1,x_2,\dots,x_k$ have been accounted for, $x_j$ has no effect on the expected value of y. Classical testing is suited for testing simple hypotheses like (4.4).

As an example, consider the wage equation

\begin{equation}
log(wage)=\beta_0+\beta_1*educ+\beta_2*exper+\beta_3*tenure+u
\end{equation}

The null hypothesis (4.4) means that, once education and tenure have been accounted for, the number of years in the workforce (exper) has no effect on hourly wage. This is an economically interesting hypothesis. It it is true, it implies that a person's work history prior to the current employment does not affect wage. If $\beta_2>0$ then prior work experience contributes to productivity and hence to wage.

The statistic we use to test (4.4) (against any alternative) is called "the" t statistic or "the" t ratio of $\hat{\beta}_j$ and is defined as

\begin{equation}
t_{\hat{\beta}_j}:= \hat{\beta}_j/se(\hat{\beta}_j)
\tag{4.5}
\end{equation}

Since we are testing $H_0:\beta_j=0$, it is only natural to look at our unbiased estimator of $\beta_j \hat{\beta}_j$. The point estimate $\hat{\beta}_j$ will never be zero, whether or not $H_0$ is true. The question is: How far is $\hat{\beta}_j$ from zero ? A sample value of $\hat{\beta_j}$ very far from zero provides evidence against $H_0: \beta_j=0$. Since the standard error of $\hat{\beta}_j$ is an estimate of the standard deviation of $\hat{\beta}_j$, $t_{\hat{\beta}_j}$ measures how many estimated standard error of $\hat{\beta}_j$ is away from zero. this is precisely what we do in testing whether the mean of the population is zero, using the standard t statistic. Values of $t_{\hat{\beta}_j}$ sufficiently far from zero will result in a rejection of $H_0$. The precise rejection rule depends on the alternative hypothesis and the chose significance level of the test.

Determining a rule for rejecting (4.4) at a given significance level- that is the probability of rejecting $h_0$ when it is true- requires knowing the sampling distribution of $t_{\hat{\beta}_j}$ when $H_0$ is true. From theorem 4.2 we know this to be $t_{n-k-1}$. This is the key theoretical result needed for  testing (4.4).

### 4-2a Testing against One-Sided Alternatives

To determine a rule for rejecting $H_0$, we need to decide on the relevant alternative hypothesis. First, consider a one-sided alternative of the form

\begin{equation}
H_1: \beta_j>0
\tag{4.6}
\end{equation}

How should we choose a rejection rule ? We must first decide on a significance level or the probability of rejecting $H_0$ when it is in fact true. Suppose we have decided on a 5% significance level, this means we are willing to mistakenly reject $H_0$ when it is true 5% of the time. The definition of "sufficiently large" with a 5% significance level, is the 95th percentile in a t distribution with n-k-1 degrees of freedom (denote this by c). In other words, the rejection rule is that $H_0$ is rejected in favor of $H_1$ at the 5% significance level is

\begin{equation}
t_{\hat{\beta}_j}>c
\end{equation}

By our choice of the critical value c, a rejection of $H_0$ will occur for 5% of all random samples when $H_0$ is true.

The rejection rule in (4.7) is an example of a one-tailed test. To obtain c, we only need the significance level and the degrees of freedom. For example, for a 5% level test and with n-k-1=28 degrees of freedom, the critical value is c=1.701. If, $t_{\hat{\beta}_j}<1.701$, then we fail to reject $H_0$ in favor of $H_1$ 

As the degrees of freedom in the t distribution get large, the t distribution approaches the standard normal distribution. For degrees of freedom greater than 120, one can use the standard normal critical values.

### Wooldridge. Example 4.1 Hourly Wage Equation

In Example 3.2 we have estimated the wage equation using the data in WAGE1. Now we are interested in testing $H_0:\beta_2=0$ against $H_1:\beta_2>0$. This us we want to test whether the return of exper, controlling for educ and tenure, is zero in the population against the alternative that is positive.

Since we have 522 degrees of freedom we can use the standard normal critical values. The 5% critical value is 1.645, and the 1% critical value is 2.326. The t statistic for $\hat{\beta}_{exper}$ is

\begin{equation}
t_{exper}=.0041/.0017=2.41
\end{equation}

and so $\hat{\beta}_{exper}$ or exper is statistically significant even at the 1% level. We also say that $\hat{\beta}_{exper}$ is statistically greater than zero at the 1% significance level.

In [None]:
library(foreign)
wage1 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/wage1.dta?raw=true")

# Just obtain parameter estimates:
lm(log(wage) ~ educ+exper+tenure, data=wage1)

# Store results under "GPAres" and display full table:
WAGEres <- lm(log(wage) ~ educ+exper+tenure, data=wage1)
summary(WAGEres)

### Wooldridge. Example 4.2a Student Performance and School Size (Level-Level Model)

There is much interest in the effect of school size on student performance. On claim is that, everything else equal, students at smaller schools fare better than those at larger schools. The file MEAP93 contains data on 408 high school in Michigan for the year 1993. We can use this data to test the null hypothesis that school size has no effect on standarized test scores against the alternative that size has a negative effect. Performance is measured by the percentage of students receiving a passing score on the Michigan Eucational Assesment Programme (MEAP) standarized tenth-grade math test (math10). School size is measured by student enrollment (enroll). The null hypothesis is $H_0: \beta_{enroll}=0$ and the alternative $H_1: \beta_{enroll}<0$. For now we will control for two other factors, average annual teacher compensation (totcomp) and the number of staff per one thousand students (staff). Teacher compensation is a measure of teacher quality, and staff size is a rough measure of how much attention students reveive.

In [None]:
library(foreign)
meap93 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/meap93.dta?raw=true")

# Store results under "sumres" and display full table:

( sumres <- summary( lm(math10 ~ totcomp+staff+enroll, data=meap93) ) )



The coefficient on enroll, -0.0020, is in accordance with the conjecture that larger schools hamper performance: higher enrollment leads to a lower percentage of students with a passing tenth-grade math score. The fact that enroll has an estimated coefficient different from zero could just be due to sampling error; to be convinced of an effect, we need to conduct a t test.

Since n-k-1=408-4=404 we use the standard normal critical value. At the 5% level, the critical value is -1.65, the t statistic on enroll must be less than -1.65 to reject $H_0$ at the 5% level.

According to the results of the previous model, the t statistic on enroll is -0.918 which is larger than -1.65 therefore we fail to reject $H_0$ in favor of $H_1$ at the 5% level. We conclude that enroll is not statistically significant at the 5% level.

On the other hand the variable totcomp is statistically significant even at the 1% significance level because its t statistic is 4.6. The t statistic for staff is 1.2 and so we can not reject $H_0:\beta_{staff}=0$ against $H_1:\beta_{staff}>0$.

Please note that R reports on the significance of each factor (totcomp, staff, enroll) using star symbols. 3 stars ($***$) mean that the factor is statistically significant at the 0% level, 2 stars ($**$) at the 0.1% level, 1 star ($*$) at the 1% level and so on.

### Wooldridge. Example 4.2b Student Performance and School Size (Level-Log Model)

To illustrate how changing functional form can affect our conclussions, we also estimate the model with all independent variables in logarithmic form (also called level-log model). This allows, for example, the school size effect to diminish as school size increases. The results are computed as follows:

In [None]:
library(foreign)
meap93 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/meap93.dta?raw=true")

# Store results under "sumres" and display full table:

( sumres <- summary( lm(math10 ~ log(totcomp)+log(staff)+log(enroll), data=meap93) ) )


This time the t statistic on log(enroll) is about -1.87,  since this is below the 5% critical value -1.65,  we reject $H_0:\beta_{log(enroll)}=0$ in favor $H_1:\beta_{log(enroll)}<0$ at the 5% level.

so, which model do we prefer ? The one using the level of enroll or the one using log(enroll) ? In the level-level model, enrollment does not have a statistically significant effect, but in the level-log it does. This translates into a higher R-squared for the level-log model, which means we explain more of the variation in math10 by using enroll in logarithmic form (6.5% to 5.4%). This level-log model is preferred because it more closely captures the relationship between math10 and enroll.

### 4-2b Two-Sided Alternatives

In applications it is common to test the null hypothesis $H_0:\beta_j=0$ against a two-sided alternative: that is,

\begin{equation}
H_1: \beta_j\neq0
\tag{4.10}
\end{equation}

Under this alternative, $x_j$ has a ceteris paribus effect on y without specifying whether the effect is positive or negative. This is the relevan alternative when the sign of $\beta_j$ is not well determined by theory (or common sense).

When the alternative is two-sided, we are interested in the absolute value of the t statistic. The rejection rule for $H_0:\beta_j=0$ against (4.10) is

\begin{equation}
|t_{\hat{\beta}_j}|>c
\end{equation}

Where |.| denotes absolute value and c is an appropriately chosen critical value. To find c, we again specify a significance level, say 5%. For a two-sided test, c is chosen to make the area in each tail of the t distribution equal 2.5%. Thus the 5% critical value for a two-sided test is c=2.060.

### Wooldridge. Example 4.3 Determinants of College GPA

We use data in GPA1 to estimate a model explaining college GPA (colGPA),with the average number of lectures missed per week (skipped) as an additional explanatory variable. The following code estimates the model

In [None]:
library(foreign)
gpa1 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/gpa1.dta?raw=true")

# Store results under "GPAres" and display full table:
GPAres <- lm(colGPA ~ hsGPA+ACT+skipped, data=gpa1)
summary(GPAres)


The t statistic on hsGPA is 4.38, which is significant. Thus we say that "hsGPA" is statistically significant at any conventional significance level. 

The t statistic on ACT is 1.36, which is not statistically significant at the 10% against the two-sided alternative. The coefficient on ACT is also practically small: a 10-point increase in ACT, which is large, is predicted to increase colGPA by only .15 points. Therefore, the variable ACT is practically, as well as statistically insignificant.

The coefficient on skipped has a t statistic of -3.19 so skipped is statistically significant at the 1% significance level (3.19>2.58). Thus, holding hsGPA and ACT fixed, the predicted difference in colGPA between a student who misses no lectures per week and a student who misses 5 lectures a week is about .42.

### 4-2c Testing other Hypotheses about $\beta_j$

Sometimes we want to test whether $\beta_j$ is equal to some other given constant. Two common examples are $\beta_j=1$ and $\beta_j=-1$. Generally, if the null is stated as

\begin{equation}
H_0: \beta_j=a_j
\tag{4.12}
\end{equation}

Where $a_j$ is our hypothesized value of $\beta_j$, then the appropriate statistic is 

\begin{equation}
t:= (\hat{\beta}_j-a_j)/se(\hat{\beta}_j)
\end{equation}

## 4-3 Confidence intervals

Under the CLM assumptions, we can easily construct a confidence interval (CI) for the population parameter $\hat{\beta_j}$. Confidence intervals are also called interval estimates because they provide a range of likely values for the population parameter, and not just a point estimate.

Using the fact that $\hat{\beta_j}/se(\hat{\beta_j})$ has a t distribution with n-k-1 degrees of freedom, a 95% CI for the unknown  $\hat{\beta_j} is:

\begin{equation}
\hat{\beta_j}\pm c*se(\hat{\beta_j})
\tag{4.16}
\end{equation}

Where the constant c is the 97.5% percentile in a $t_{n-k-1}$ distribution.

At this point, it is useful to review the meaning of a confidence interval. If random samples were obtained over and over again, then the unknown population value $\hat{\beta_j}$ would lie in the interval $[\hat{\beta_j}\pm c*se(\hat{\beta_j})]$

### Wooldridge. Example 4.8 Model of R&D Expenditures

Economist studying industrial organization are interestesd in the relationship between firm size and spending on research and development. Typically a constant elasticity model is used. One might be interested in the ceteris paribus effect on the profit margin -that is, profits as a percentage of sales- on R&D spending. Using the data in RDCHEM on 32 US firms in the chemical industry, we estimate the following model

In [None]:
library(foreign)
rdchem<-read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/rdchem.dta?raw=true")

# OLS regression:
myres <- lm(log(rd) ~ log(sales)+profmarg, data=rdchem)

# Regression output:
summary(myres)





The estimated elasticity of R&D spending with respect to firm sales is 1.084, so that, holding profit margin fixed, a 1% increase in sales is associated with a 1.084% increase in R&D spending.

In [None]:
# 95% CI:
confint(myres)

The 95% confidence interval for $\beta_{log(sales)}$ is (0.961;1.207). That zero is well outside of the interval is hardly surprising: we expect R&D spending to increase with firm size. More interesting is that unity is included in the 95% confidence interval for $\beta_{log(sales)}$ which means that we cannot reject $H_0:beta_{log(sales)}=1$ against $H_1:beta_{log(sales)}\neq 1$ at the 5% significance level. In other words, the estimated R&D-sales elasticity is not statistically different from 1 at the 5% level.

The estimated coefficient on profmarg is alro positive,  and the 95% confidence interval for the population parameter $\beta_{profmarg}$ is (-0.0044,0.0478). In this case zero is included in the 95% confidence interval, so we fail to reject $H_0:beta_{profmarg}=0$ against $H_1:beta_{profmarg}\neq 0$ at the 5% significance level.

## 4-5 Testing Multiple Linear Restrictions: The F Test

So far we have convered hypotheses involving a single restriction Frequently we wish to test multiple hypotheses about the underlying parameters $\beta_0,\beta_1,\ldots,\beta_k$. We begin with the leading case of testing whether a set of independent variables has no partial effect on a dependent variable.

### 4-5a Testing Exclusion Restrictions

We already know how to test whether a particular variable has no partial effect on the dependent variable: use the t statistic. Now we want to test whether a group of variables has no effect on the dependent variable. More precisely, the null hypothesis is that a set of variables has no effect on y, once another set of variables has been controlled.

As an illustration of why testing significance of a group of variables is useful, we consider the following model that explains major league basebal player's salaries:

\begin{equation}
log(salary)=\beta_0+\beta_1*years+\beta_2*gamesyr+\beta_3*bavg+\beta_4*hrunsyr+\beta_5rbisyr+u
\tag{4.28}
\end{equation}

Where salary is the 1993 total salary, years is years in the league, gamesyr is average games played per year, bavg is career batting average, hrunsyr is home runs per year and rbisyr is runs batted per year. Suppose we want to test the null hypothesis that, once years in the league and games per year have been controlled for, the statistics measuring performance -bavg, hrunsyr and rbisyr- have no effect on salary. Essentially ,the null hypotheses states that productivity a measured by baseball statistics has no effect on salary.

In terms of the parameters of the model, the null hypothesis is stated as

\begin{equation}
H_0:\beta_3=0,\beta_4=0,\beta_5=0
\tag{4.29}
\end{equation}

The null (4.29) constitutes three exclusion restrictions: if (4.29) is true, then bavg, hrunsyr and rbisyr have no effect on log(salary) after years and gamesyr have been controlled for and therefore should be excluded from the model leading to:

\begin{equation}
log(salary)=\beta_0+\beta_1*years+\beta_2*gamesyr+u
\tag{4.30}
\end{equation}

 A test of multiple restrictions is called a multiple hypotheses test or a joint hypotheses test.

What should be the alternative to (4.29) ? If what we have in mind is that "performance statistics matter, even after controlling for years in the league and games per year", then the appropriate alternative is simply

\begin{equation}
H_1:H_0 is not true
\tag{4.31}
\end{equation}

Wooldridge 2016 shows that testing (4.29) against (4.31) cannot be performed using t-statistics on individual variables. We need a way to test the exclusion restrictions jointly, in this case we will make use of the F statistic (or F ratio).

In the context of hypothesis testing, equation (4.28) is called the unrestricted model and (4.30) the restricted model. If we estimate both models using the data in MLB1, we obtain

In [None]:
library(foreign)
mlb1 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/mlb1.dta?raw=true")

# Unrestricted OLS regression:
res.ur <- lm(log(salary) ~ years+gamesyr+bavg+hrunsyr+rbisyr, data=mlb1)

# Regression output:
summary(res.ur)

# Total sum of square 
cat("SSR:",sum(anova(res.ur)[6,2]))

And for the restricted model:

In [None]:
library(foreign)
mlb1 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/mlb1.dta?raw=true")

# Unrestricted OLS regression:
res.r <- lm(log(salary) ~ years+gamesyr, data=mlb1)

# Regression output:
summary(res.r)

# Total sum of square 
cat("SSR:",sum(anova(res.r)[3,2]))

As expected the SSR from the unrestricted model (183.18) is greater than the SSR from the restricted model (198.31). What we need to decide is whether the increase in the SSR in going from the restricted model to the unrestricted is large enough to warrant rejection of (4.29).

The F statistic (or F ratio) is defined by

\begin{equation}
F:=\frac{(SSR_r-SSR_{ur})/q}{SSR_{ur}/(n-k-1)}
\end{equation}

where $SSR_r$ is the sum of squared residuals from the restricted model and $SSR_{ur}$ is the sum of squared residuals from the unrestricted model.

To use the F statistic, we must know its sampling distribution under the null in order to choose critical values and rejection rules. It can be shown that, under $H_0$ (and assuming the CLM assumtion hold), F is distributed as an F random variable with (q,n-k-1) degrees of freedom.

\begin{equation}
F \sim F_{q,n-k-1}
\end{equation}

It should be clear from the definition of F that we will reject $H_0$ in favor of $H_1$ when F is sufficiently "large". How large depends on our chosen significance level.

If $H_0$ is rejected we say that $x_{k-q+1},\dots,x_k$ are jointly statistically significant. If the null is not rejected, then the variables are jointly significant, which often justifies dropping them from the model.

Returning to our example, in order to test hypothesis (4.29) we manually compute the F statistic as follows:

In [None]:
# F statistic:
cat("F statistic",( F <- (r2.ur-r2.r) / (1-r2.ur) * 347/3 ))

# p value = 1-cdf of the appropriate F distribution:
cat("\np value",1-pf(F, 3,347))


Given this large value of the F statistic, well above the 1% critical value in the F distribution with 3 and 347 degrees of freedom, we soundly reject the hypothesis that bavg, hrunsyr and rbisyr have no effect on salary.

We can also test hypothesis (4.29) on a more automatic manner in R as follows:

In [None]:
library(foreign)
mlb1 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/mlb1.dta?raw=true")

# Unrestricted OLS regression:
res.ur <- lm(log(salary) ~ years+gamesyr+bavg+hrunsyr+rbisyr, data=mlb1)

# Load package "car" (which has to be installed on the computer)
library(car)

# F test
myH0 <- c("bavg","hrunsyr","rbisyr")
linearHypothesis(res.ur, myH0)


### 4-5e The F Statistic for Overall Significance of a Regression

A special set of exclusion restrictions is routinely tested by most regression packages. In the model with k independent variables, we can write the null hypothesis as: $H_0:x_1,x_2,\dots,x_k$ do no help to explain y. This null hypothesis is, in a way, very pessimistic. It states that none of the explanatory variables has an effect on y.

\begin{equation}
H_0:\beta_1=\beta_2=\dots=\beta_k=0
\tag{4.44}
\end{equation}

and the alternative is that at least one of the $\beta_j$ is different from zero. The F statistic for testing (4.44) can be written as

\begin{equation}
\frac{(R^2/k}{(1-R^2)/(n-k-1)}
\tag{4.46}
\end{equation}

Where $R^2$ is just the usual R-squared from the regression of y on $x_1,x_2,\dots,x_k$. Most regression packages report the F statistic in (4.46), this special form is valid only for testing joint exclussion of all independent variables. This is sometimes called determining the overall significance of the regression. Not to be confused with the procedure described in 4.5-a.

If we fail to reject (4.44), then there is no evidence that any of the independent variables help to explain y, we must then look for other variables to explain y. 

## 4-6 Reporting regression results

The estimated OLS coefficients should always be reported. For the key variables in an analysis, you should interpret the estimated coefficients. The economic or practical importance of the estimates of the key variables should be discussed.

The standard errors should always be included along with the estimated coefficients.The R-squared from the regression should always be included. Reporting the sum of squared residuals and the standard error of the regression is sometimes a good idea, but is not crucial. The number of observations used in estimating any equation should appear near the estimated equation.

### Wooldridge. Example 4.10 Salary-Pension Tradeoff for Teachers

In R there is a useful package called "stargazer" providing most of the reporting functionalities

Installing package into '/home/nbcommon/R'
(as 'lib' is unspecified)


Let totcomp denote average total annual compensation for a teacher, including salary and all fringe benefits (pension, health insurance, and so on). Extending the standard wage equation, total compensation should be a function of productivity and perhaps other characteristics. A is standard we use the logarithm form: $log(totcomp)=f(productivity characteristics, other factors)$

Where f() is some function (unspecified for now). Write 

\begin{equation}
totcomp=salary+benefits=salary(1+\frac{benefits}{salary})
\end{equation}

This equation shows that total compensation is the product of two terms: salary and $1+b/s$, where $b/s$ is a shorthand for the "benefits to salary ratio". Taking the log of this equation gives $log(totcomp)=log(salary)+log(1+b/s)$. For small b/s, $log(1+b/s)\approx b/s$, this leads to the following econometric model

\begin{equation}
log(salary)=\beta_0+\beta_1(b/s)+others
\end{equation}

Testing the salary-benefits tradeoff then is the same as a test of $H_0:\beta_1=-1$ against $H-1:\beta_1\neq -1$. We use data in MEAP93 to test this hypothesis. These data are average at the school level. We will include controls for size of school (enroll), staff per thousand students (staff) and measures such as the school dropout and graduation rates. 

The following code computes three different models, standard errors are given in parentheses below coefficient estimates. the key variable is b/s, the benefits-salary ratio.

In [6]:
install.packages("stargazer");

library(foreign)
meap93<-read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/meap93.dta?raw=true")

# define new variable within data frame
meap93$b_s <- meap93$benefits / meap93$salary

# Estimate three different models
model1<- lm(log(salary) ~ b_s                       , data=meap93)
model2<- lm(log(salary) ~ b_s+log(enroll)+log(staff), data=meap93)
model3<- lm(log(salary) ~ b_s+log(enroll)+log(staff)+droprate+gradrate
                                                    , data=meap93)
# Load package and display table of results
library(stargazer);
stargazer(list(model1,model2,model3),type="text",keep.stat=c("n","rsq"))


Installing package into '/home/nbcommon/R'
(as 'lib' is unspecified)

Please cite as: 

 Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2. http://CRAN.R-project.org/package=stargazer 




                  Dependent variable:     
             -----------------------------
                      log(salary)         
                (1)       (2)       (3)   
------------------------------------------
b_s          -0.825*** -0.605*** -0.589***
              (0.200)   (0.165)   (0.165) 
                                          
log(enroll)            0.087***  0.088*** 
                        (0.007)   (0.007) 
                                          
log(staff)             -0.222*** -0.218***
                        (0.050)   (0.050) 
                                          
droprate                          -0.0003 
                                  (0.002) 
                                          
gradrate                           0.001  
                                  (0.001) 
                                          
Constant     10.523*** 10.844*** 10.738***
              (0.042)   (0.252)   (0.258) 
                                          
----------

As reported in the previous table, the first column indicates that without controlling for any other factors, the OLS coefficient for b/s is -0.825. The t statistic for testing the null hypothesis $H_0:\beta_1=-1$ is $t=(-.825+1)=.875$ and so this simple model fails to reject $H_0$. After adding controls for school size and staff size (which roughly captures the number of students taught by each teacher), the estimate of the b/s coefficient becomes -.605. Now the test of $\beta_1=-1$ gives a t statistic of about 2.39 thus $H_0$ is rejected at the 5% level against a two level alternative. The variables log(enroll) and log(staff) are very statistically significant.