The zoo of common baseline models. These usually will not produce the best prediction, but they are fast (orders of magnitude faster than more complicated models).
In my opinion, they are useful for:
* Getting a quick baseline result for the problem, which serves as a sanity check.
* Can throw in a small portion into the final result, which is somewhat a final regularization.
* Help to find a good KFold split for expensive models, as will be explained in my random-idea repository.
* ...

In [None]:
import numpy as np
import pandas as pd
import warnings; warnings.simplefilter('ignore')
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.linear_model import Ridge, RidgeCV, LinearRegression, Lasso, ElasticNet
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import KFold, RepeatedKFold, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR

In [None]:
train = pd.read_csv('train.csv', index_col = 0)
cols = list(train.columns)
cols.remove('target')

In [None]:
scaler = StandardScaler()
scaler.fit(train[cols])
train_aft = scaler.transform(train[cols])

In [None]:
kf = KFold(n_splits = 5, shuffle = True, random_state = 2018)

* Linear regression
  * Technically there is nothing to gridsearch for. But Scikit-learn's GridSearchCV provides a very convenient dict format output, so I use it as a wrapper anyway.

In [None]:
lr = LinearRegression()
params = {}
gs = GridSearchCV(lr, params, scoring = 'neg_mean_squared_error', cv = kf, return_train_score = False)
gs.fit(train_aft, train['target'])
pd.DataFrame(gs.cv_results_).to_csv('gridcv.csv')
print(gs.best_score_)

* Ridge regression
  * The larger alpha is, the more l2 regularization.

In [None]:
rd = Ridge(tol = 0.000001)
params = {'alpha':[214]}
gs = GridSearchCV(rd, params, scoring = 'neg_mean_squared_error', cv = kf, return_train_score = False)
gs.fit(train_aft, train['target'])
pd.DataFrame(gs.cv_results_).to_csv('gridcv.csv')
print(gs.best_score_)

* Lasso regression
  * The larger alpha is, the more l1 regularization.
  * This encourages sparser weights.

In [None]:
ls = Lasso(tol = 0.000001)
params = {'alpha':[214]}
gs = GridSearchCV(ls, params, scoring = 'neg_mean_squared_error', cv = kf, return_train_score = False)
gs.fit(train_aft, train['target'])
pd.DataFrame(gs.cv_results_).to_csv('gridcv.csv')
print(gs.best_score_)

* ElasticNet
  * The larger alpha is, the more regularization.
  * l1_ratio is between 0 and 1, indicating how much regularization should come from l1 penalty. 
  * In other words, l2 penalty constitutes (1-l1_ratio) portion. 

In [None]:
en = ElasticNet(tol = 0.000001)

params = {'alpha':[214], 'l1_ratio':[0.5]}
gs = GridSearchCV(en, params, scoring = 'neg_mean_squared_error', cv = kf, return_train_score = False)
gs.fit(train_aft, train['target'])
pd.DataFrame(gs.cv_results_).to_csv('gridcv.csv')
print(gs.best_score_)

* Support Vector Machine regression
  * Opposite to alpha above, lower C leads to more regularization, and hence smoother boundaries
  * gamma affects the influence radius of the kernel. The lower gamma is, the farther the kernel affects neighboring points, which lead to smoother boundaries
  * epsilon is a tolerance term. If the error is within epsilon, it is ignored completely.
  * If the data size is large, `svr` is slow because it is kernel-trick based. Consider `LinearSVR` which is not kernel-trick based but only supports linear kernel.

In [None]:
svr  = SVR(kernel = 'rbf',tol = 0.000001, cache_size = 2000)
params = {'C':[0.1, 0.2, 0.3], 'gamma':[0.080], 'epsilon':[0.1]}
gs = GridSearchCV(svr, params, scoring = 'neg_mean_squared_error', cv = kf, return_train_score = False, verbose = 1)
gs.fit(train_after, train['target'])
pd.DataFrame(gs.cv_results_).to_csv('gridcv.csv')
print(gs.best_score_)

In [None]:
gs.best_params_

In [None]:
et = gs.best_estimator_
et.coef_

In [None]:
test = pd.read_csv('test.csv', index_col = 0)
test_aft = scaler.transform(test[cols])
test['target'] = et.predict(test_aft)