In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
from sklearn import linear_model

In [1]:
def noisy_data(x, coef=[0,1], noise_mean=0, noise_sigma=1, random_seed=0, outlier_fraction=0):
    """
    Generate response variable data using a linear model with coefficients, noise, and outliers specified in the input.
    Parameters
    ----------
    x : array-like
        array of explanatory variables
    coef : array-like
        array of coefficients for the generating linear model
    noise_mean : float
        mean of Gaussian noise
    noise_sigma : float
        std of Gaussian noise
    random_seed : int
        seed for numpy random state for generating noise
    outlier_fraction : float
        fraction of data points which are to be made "outliers" (10x higher error)
    Returns
    -------
    y : numpy array
        response variables generated by linear model
    
    """
    rng = np.random.RandomState(random_seed)

    noise = rng.normal(noise_mean, noise_sigma, len(x))
    outlier_mask = rng.choice([True, False], size=len(x),
                              p=[outlier_fraction, 1-outlier_fraction])
    noise[outlier_mask] *= 10
    y = np.zeros(len(x))
    for i in range(len(coef)):
        y += coef[i]*x**i
    return y + noise

1.  Generate 100 random x-values from 0 to 10.  Use `noisy_data` to make noisy response data.  Fit the data using Scipy's `linregress`.  What is the r^2 value?  What about the mean squared error (MSE)?

2.  Generate 500 random x-values from 0 to 10, with 5% being outliers.  Fit the data using scikit-learn's `linear_model.LinearRegression()` and `linear_model.RANSACRegressor()`.  What is the MSE for each model?

3. Increase the percentage of outliers to 20%, then 60%.  In each case, how does vanilla linear regression and RANSAC compare?  How do results change when you vary the RANSAC parameters `max_trials` or `residual_threshold`?

4.  Try fitting the 60% outlier sample with scikit-learn's `linear_model.Lasso` (L1 penalized linear model), `linear_model.Ridge` (L2), and `linear_model.ElasticNet` (L1 and L2).  Is there any noticable difference in the resulting coefficients?

5.  Would you expect the results of question 4 to change if there were 20 features instead of 2? How?