# Supervised Learning Algorithms: Lasso Regression

*In this template, only **data input** and **input/target variables** need to be specified (see "Data Input & Variables" section for further instructions). None of the other sections needs to be adjusted. As a data input example, .csv file from IBM Box web repository is used.*

## 1. Libraries

*Run to import the required libraries.*

In [1]:
%matplotlib notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

## 2. Data Input and Variables

*Define the data input as well as the input (X) and target (y) variables and run the code. Do not change the data & variable names **['df', 'X', 'y']** as they are used in further sections.*

In [3]:
### Data Input
# df = 

### Defining Variables  
# X = 
# y = 

### Data Input Example 
df = pd.read_csv('https://ibm.box.com/shared/static/q6iiqb1pd7wo8r3q28jvgsrprzezjqk3.csv')

X = df[['horsepower']]
y = df['price']

## 3. The Model

*Run to build the model.*

In [8]:
from sklearn.linear_model import Lasso

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)
scaler = MinMaxScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

linlasso = Lasso(alpha=2.0, max_iter = 10000).fit(X_train_scaled, y_train)

### Intercept & coefficient, # of non-zero features & weights, R-squared for training & test data set
print('lasso regression linear model intercept: {}'
     .format(linlasso.intercept_))
print('lasso regression linear model coeff:{}'
     .format(linlasso.coef_))
print('\nNon-zero features: {}'
     .format(np.sum(linlasso.coef_ != 0)))
print('\nR-squared score (training): {:.3f}'
     .format(linlasso.score(X_train_scaled, y_train)))
print('R-squared score (test): {:.3f}\n'
     .format(linlasso.score(X_test_scaled, y_test)))
print('Features with non-zero weight (sorted by absolute magnitude):')

for e in sorted (list(zip(list(X), linlasso.coef_)),
                key = lambda e: -abs(e[1])):
    if e[1] != 0:
        print('\t{}, {:.3f}'.format(e[0], e[1]))

lasso regression linear model intercept: 3984.172831038173
lasso regression linear model coeff:[33549.01456544]

Non-zero features: 1

R-squared score (training): 0.623
R-squared score (test): 0.666

Features with non-zero weight (sorted by absolute magnitude):
	horsepower, 33549.015


### 3.1. Regularization parameter alpha on R-squared

*Run to check how alpha affects the model score.*

In [5]:
print('Lasso regression: effect of alpha regularization\n\
parameter on number of features kept in final model\n')

for alpha in [0.5, 1, 2, 3, 5, 10, 20, 50]:
    linlasso = Lasso(alpha, max_iter = 10000).fit(X_train_scaled, y_train)
    r2_train = linlasso.score(X_train_scaled, y_train)
    r2_test = linlasso.score(X_test_scaled, y_test)
    
    print('Alpha = {:.2f}\nFeatures kept: {}, r-squared training: {:.2f}, \
r-squared test: {:.2f}\n'
         .format(alpha, np.sum(linlasso.coef_ != 0), r2_train, r2_test))

Lasso regression: effect of alpha regularization
parameter on number of features kept in final model

Alpha = 0.50
Features kept: 1, r-squared training: 0.62, r-squared test: 0.67

Alpha = 1.00
Features kept: 1, r-squared training: 0.62, r-squared test: 0.67

Alpha = 2.00
Features kept: 1, r-squared training: 0.62, r-squared test: 0.67

Alpha = 3.00
Features kept: 1, r-squared training: 0.62, r-squared test: 0.67

Alpha = 5.00
Features kept: 1, r-squared training: 0.62, r-squared test: 0.66

Alpha = 10.00
Features kept: 1, r-squared training: 0.62, r-squared test: 0.66

Alpha = 20.00
Features kept: 1, r-squared training: 0.62, r-squared test: 0.66

Alpha = 50.00
Features kept: 1, r-squared training: 0.62, r-squared test: 0.65

