# Estimating the linear regression noise variance

This notebook demonstrates some example methods to estimate the noise variance, $\sigma^2$, for a linear regression model

$$
y = x^T \beta + \sigma \epsilon.
$$

This is quite challenging in high-dimensional settings e.g. the usual estimate based on the residual sum of squares (RSS) fails.

The noise variance is used, among other places, for information criteria based tuning parameter selection methods. If you care about model selection then recent work (Zhang et al, 2010) says you should use a BIC-like information criteria (i.e. not cross-validation). 



#### Currently support methods

We currently support the following estimators for $\sigma$

- The natural lasso (Yu and Bien, 2019)

- The RSS based estimate from a tuned Lasso (Reid et al, 2016)

- A ridge regression based estimate (Liu, 2020)


The first two of these require tuning a Lasso penalized estimate (e.g. via cross-validation) while the latter fits a single ridge regression estiate. 



### References

Zhang, Y., Li, R. and Tsai, C.L., 2010. Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association, 105(489), pp.312-323.

Reid, S., Tibshirani, R. and Friedman, J., 2016. A study of error variance estimation in lasso regression. Statistica Sinica, pp.35-67.

Hastie, T., Tibshirani, R. and Tibshirani, R.J., 2017. Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv preprint arXiv:1707.08692.

Yu, G. and Bien, J., 2019. Estimating the error variance in a high-dimensional linear model. Biometrika, 106(3), pp.533-546.

Liu, X., Zheng, S. and Feng, X., 2020. Estimation of error variance via ridge regression. Biometrika, 107(2), pp.481-488.

# sample toy data

Sample linear regression data following one of the distribution discussed in Section 3 of (Hastie et al, 2017).

In [1]:
from ya_glm.toy_data import sample_sparse_lin_reg

In [98]:
# setup distribution
n_samples = 100
n_features = 500
n_nonzero = 10
snr = 1
beta_type = 1
corr = 0.3

# sample data
X, y, info = sample_sparse_lin_reg(n_samples=n_samples, n_features=n_features,
                                   n_nonzero=n_nonzero,
                                   beta_type=beta_type, corr=corr,
                                   snr=snr, random_state=0)

true_noise_std = info['noise_std']
print('True noise std: {}'.format(true_noise_std))

True noise std: 3.1622776601683795


# try various estimators

In [99]:
import numpy as np
from ya_glm.models.Lasso import LassoCV

from ya_glm.tune.lin_reg_var import lin_reg_var_via_ridge, \
    lin_reg_var_natural_lasso, lin_reg_var_from_rss_of_sel

In [100]:
ridge_est_var = lin_reg_var_via_ridge(X, y)
print("Ridge estimate: {}".format(np.sqrt(ridge_est_var)))

Ridge estimate: 4.520698074989474


In [101]:
# for the next two estimators we need to first tune a Lasso
est_cv = LassoCV(cv=5).fit(X, y)

est_ceof = est_cv.best_estimator_.coef_
est_intercept = est_cv.best_estimator_.intercept_

In [102]:
# lasso selected RSS
rss_lasso_est_var = lin_reg_var_from_rss_of_sel(X=X, y=y,
                                                coef=est_ceof,
                                                intercept=est_intercept)

print("RSS of tuned Lasso estimate: {}".format(np.sqrt(rss_lasso_est_var)))

RSS of tuned Lasso estimate: 4.515077129871736


In [103]:
# natural lasso
nat_est_var = lin_reg_var_natural_lasso(X=X, y=y,
                                        coef=est_ceof,
                                        intercept=est_intercept)

print("Natural Lasso estimate: {}".format(np.sqrt(nat_est_var)))

Natural Lasso estimate: 4.482135258919333
