# Part 1: Regularization

A) Use the Boston dataset, and use Ridge regression model with tuning parameter set to 100 (alpha =100). Find the $R^2$ score and number of non zero coefficients.

B) Use Lasso regression instead of Ridge regression, also set the tuning parameter to 100. Find the $R^2$ score and number of non zero coefficients.

C) Change the tuning parameter of the Lasso model to a very low value (alpha =0.001). What is the $R^2$ score.

D) Comment on your result. In this problem, do all feature seem important in making predictions?

In [2]:
from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge 
from sklearn.linear_model import Lasso
import numpy as np

dataset = load_boston()
X=dataset.data
Y=dataset.target
X_train, X_test, Y_train, Y_test= train_test_split(X, Y, random_state= 0)
linreg= LinearRegression().fit(X_train, Y_train)
R2 = linreg.score(X_test,Y_test)
print(R2)




0.635362078667


In [3]:
RidgeModel=Ridge( ).fit(X_train, Y_train)
ridge_R2 = RidgeModel.score(X_test,Y_test)
print(ridge_R2)

0.626511622377


In [4]:
RidgeModel100=Ridge(alpha=100).fit(X_train, Y_train)
RidgeModel100.score(X_test,Y_test)
ridge100_R2 = RidgeModel100.score(X_test,Y_test)
print(ridge100_R2)

0.592535803616


In [11]:
lassoModel=Lasso( ).fit(X_train, Y_train)
lasso_1 = lassoModel.score(X_test,Y_test)
print(lasso_1)
coeff = np.sum(lassoModel.coef_==0)
print(coeff)
##Ridgeregression with alpha=1 provides better performance than linear regression


0.551511093619
2


In [12]:
lassoModel100=Lasso(alpha=100).fit(X_train, Y_train)
lasso_100 = lassoModel100.score(X_test,Y_test)
print(lasso_100)
coeff100 = np.sum(lassoModel100.coef_==0)
print(coeff100)

0.118669161755
11


In [13]:
lassoModel_0_01=Lasso(alpha=0.01).fit(X_train, Y_train)
lasso_0_01= lassoModel_0_01.score(X_test,Y_test)
print(lasso_0_01)
coeff_0_01 = np.sum(lassoModel_0_01.coef_==0)
print(coeff_0_01)

0.63159135839
0


In [None]:
## Tuning parameter might be too high for this dataset and model might be a a underfit. So, we got low R2.

In [None]:
##Ridge regression with alpha =1 provides better performance than linear regression
## since more weight is given to shrinking the magnitude of the coefficients than to lower the error in fitting , Increasing the regularization strength, results in higher errors 
##Using Lasso with different tuning parameters, we can clearly see that most features are relevant and discarding most of them will result in a very poor performance

# Part 2: Logistic Regression

In this exercise, you will use logistic regression to classify breast cancer as malignant or benign using the sklearn data set. Run the code below to print and read the description of the data set. Use logistic regression, with Lasso regularization (penelty =l1) and the default regularization parameter to build the classifier. What is the accuracy?


In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
import numpy as np

DataCancer=load_breast_cancer()
print(DataCancer.keys())
print(DataCancer.DESCR)

X_features=DataCancer.data
Y_targetClass=DataCancer.target

X_train, X_test, Y_train, Y_test= train_test_split(X_features, Y_targetClass, random_state= 0)



dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])
Breast Cancer Wisconsin (Diagnostic) Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Ra

In [4]:

#C=1
FittedLogRegModel1= LogisticRegression(C=1).fit(X_train,Y_train)
score1 = FittedLogRegModel1.score(X_test, Y_test)
print(score1)

#C=1000
FittedLogRegModel1000= LogisticRegression(C=1000).fit(X_train,Y_train)
score1000 = FittedLogRegModel1000.score(X_test, Y_test)
print(score1000)

0.958041958042
0.965034965035


In [5]:
scaler=preprocessing.MinMaxScaler().fit(X_train) #define scaler depending on the features in training data
X_train_transformed=scaler.transform(X_train) #apply scaling on training set
X_test_transformed=scaler.transform(X_test)

In [6]:
##2

#C=1
FittedLogRegModel1= LogisticRegression(C=1).fit(X_train_transformed,Y_train)
score1 = FittedLogRegModel1.score(X_test_transformed, Y_test)
print(score1)

#C=1000
FittedLogRegModel1000= LogisticRegression(C=1000).fit(X_train_transformed,Y_train)
score1000 = FittedLogRegModel1000.score(X_test_transformed, Y_test)
print(score1000)

0.958041958042
0.951048951049


In [7]:
Probabilities=FittedLogRegModel1.predict_proba(X_test_transformed)
print(Probabilities)

#sum of probabilites in each row is equal to 1

[[  8.07579955e-01   1.92420045e-01]
 [  1.59728398e-01   8.40271602e-01]
 [  1.01589156e-01   8.98410844e-01]
 [  1.77849057e-01   8.22150943e-01]
 [  4.57880756e-02   9.54211924e-01]
 [  7.90294796e-02   9.20970520e-01]
 [  1.09865111e-01   8.90134889e-01]
 [  5.44324296e-02   9.45567570e-01]
 [  1.01915806e-02   9.89808419e-01]
 [  9.13914410e-03   9.90860856e-01]
 [  3.08242732e-01   6.91757268e-01]
 [  2.14828517e-01   7.85171483e-01]
 [  2.07421603e-02   9.79257840e-01]
 [  3.78332022e-01   6.21667978e-01]
 [  4.13296200e-01   5.86703800e-01]
 [  8.38513672e-01   1.61486328e-01]
 [  7.93702870e-02   9.20629713e-01]
 [  9.73939114e-01   2.60608863e-02]
 [  9.10126230e-01   8.98737697e-02]
 [  9.95144762e-01   4.85523808e-03]
 [  6.90776290e-01   3.09223710e-01]
 [  8.59164043e-01   1.40835957e-01]
 [  2.27405761e-01   7.72594239e-01]
 [  7.54715752e-02   9.24528425e-01]
 [  9.57565248e-01   4.24347520e-02]
 [  6.90475491e-02   9.30952451e-01]
 [  2.70732585e-02   9.72926742e-01]
 

In [8]:

LDAmodelFitted = LinearDiscriminantAnalysis().fit(X_train_transformed, Y_train)
print(LDAmodelFitted.score(X_test_transformed, Y_test))

0.972027972028


In [9]:

QDAmodelFitted = QuadraticDiscriminantAnalysis().fit(X_train_transformed, Y_train)
print(QDAmodelFitted.score(X_test_transformed, Y_test))

0.958041958042
