# Goodness-of-Fit measure. Example 1.

### Intro and objectives
#### review the concept of goodness-of-fit in simple model regressions

### In this lab you will learn:
1. examples of simple regression models.
2. how to fit simple regression models in Python.
3. how to compute R-squared metrics


## What I hope you'll get out of this lab
* The feeling that you'll "know where to start" when you need to fit a simple regression model.
* Examples of simple regression models
* Examples of R-squared computation
* How to interpret the results obtained

In [12]:
!pip install wooldridge
import wooldridge as woo
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Example 1. CEO Salary and Return on Equity



### For the population of chief executive officers, let y be annual salary (salary) in thousands of dollars.
### Thus, y=856.3 indicates an annual salary of 856,300 dollars, and y=1,452.6 indicates a salary of 1,452,600 dollars.


### Let x be the average return on equity (roe) for the CEO’s firm for the previous three years. (Return on equity is defined in terms of net income as a percentage of common equity.) For example, if roe=10, then average return on equity is 10%.

### To study the relationship between this measure of firm performance and CEO compensation, we postulate the simple model:

$ salary=\beta_0+\beta_1*roe+u $


### The slope parameter $ \beta_1 $ measures the change in annual salary, in thousands of dollars, when return on equity increases by one percentage point.

### The data set CEOSAL1 contains information on 209 CEOs for the year 1990; these data were obtained from Business Week (5/6/91). In this sample, the average annual salary is 1,281,120 dollars with the smallest and largest being 223,000 dollars and 14,822,000 dollars respectively. 
### The average return on equity for the years 1988, 1989, and 1990 is 17.18%, with the smallest and largest values being 0.5% and 56.3%, respectively.

### Using the data in CEOSAL1, the OLS regression line relating salary to roe is:

In [4]:
CeoSalaries = woo.dataWoo('ceosal1')

In [5]:
CeoSalaries.head()

Unnamed: 0,salary,pcsalary,sales,roe,pcroe,ros,indus,finance,consprod,utility,lsalary,lsales
0,1095,20,27595.0,14.1,106.400002,191,1,0,0,0,6.998509,10.225389
1,1001,32,9958.0,10.9,-30.6,13,1,0,0,0,6.908755,9.206132
2,1122,9,6125.899902,23.5,-16.299999,14,1,0,0,0,7.022868,8.720281
3,578,-9,16246.0,5.9,-25.700001,-21,1,0,0,0,6.359574,9.695602
4,1368,7,21783.199219,13.8,-3.0,56,1,0,0,0,7.221105,9.988894


In [6]:
# We impose a simple, linear, model: 
# We specify CeoSalaries as the empirical dataset

reg = smf.ols(formula='salary ~ roe', data=CeoSalaries)


In [7]:
# We fit the model
results = reg.fit()


In [8]:
b = results.params
print(f'b: \n{b}\n')

b: 
Intercept    963.191336
roe           18.501186
dtype: float64



## Based on the previous we have fitted the following model:

$ salary=963.19+18.50*roe+u $


## Let's compute $R^2$ as a measure of goodness-of-Fit

#### If the data points all lie on the same line, OLS provides a perfect fit to the data. In this case,$R^2=1$. A value of $R^2$ that is nearly equal to zero indicates a poor fit of the OLS line: very little of the variation in the $y_i$ is captured by the variation in the $y^i$

In [18]:
results.rsquared

0.01318862408103405

#### $R^2$ is usually reported as a percent: $100*R^2$

In [19]:
100*results.rsquared

1.318862408103405

## This is a bad model, it only explains 1.3% of the total variation in salaries. 
## Conversely, the current model leaves 98.7% of the salary variation unexplained!

## We would need to include more factors in the model if we want to improve the quality of the model!