# Machine Learning Mastery - Multi-Output Regression

Notebook is working through the example of the [Machine Learning Mastery](machinelearningmastery.com) [blog](https://machinelearningmastery.com/multi-output-regression-models-with-python/)

Shows how to use scikit-learn wrapper classes to predict multi-output regression models for models that typically do not support multiple outputs.


In [13]:
# check scikit-learn version
import sklearn
print(sklearn.__version__)

0.22.2.post1


In [14]:
# linear regression for multioutput regression
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.multioutput import RegressorChain
from sklearn.metrics import mean_squared_error
import numpy as np
np.set_printoptions(precision=2)
from numpy import absolute
from numpy import mean
from numpy import std
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline


## Create Train Test Dataset

In [15]:
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

In [16]:
X_train.shape

(800, 10)

In [17]:
X_test.shape

(200, 10)

In [18]:
len(y)

1000

In [19]:
X_train[:5]

array([[-0.86,  0.03,  2.11, -0.07, -0.6 , -0.77,  0.75,  0.75,  0.06,
        -0.79],
       [-0.4 ,  0.28, -1.06,  0.25, -1.8 ,  0.14,  0.63, -0.88,  2.89,
         2.48],
       [ 0.83, -0.59,  0.67,  1.59,  1.74, -0.13,  0.55, -0.41, -0.73,
         0.98],
       [-0.11,  2.28, -0.61,  2.25, -0.29, -0.39,  0.49,  0.01, -0.33,
         1.24],
       [ 0.89, -0.5 , -0.13, -0.83, -0.54,  0.43,  0.34,  1.27,  0.86,
         0.46]])

In [20]:
y_test[:1]

array([[-126.63,   35.71]])

## Linear Regression

In [21]:

# define model
model = LinearRegression()
# fit model
model.fit(X_train, y_train)
# make a prediction
print(X_test[0])
print(y_test[0])
data_in = [X_test[0]]
yhat = model.predict(data_in)
# summarize prediction
print(yhat[0])

[ 0.96  0.62  1.19 -0.71  0.28 -0.14 -0.83 -0.8  -0.24  1.16]
[-126.63   35.71]
[-126.63   35.71]


In [22]:
y_preds = model.predict(X_test)

In [23]:
mean_squared_error(y_test, y_preds, multioutput='raw_values',squared=True)

array([1.96e-26, 9.55e-27])

## k-Nearest Neighbors

In [24]:
model = KNeighborsRegressor()
# fit model
model.fit(X_train, y_train)
# make a prediction
y_preds = model.predict(X_test)
# summarize prediction

mean_squared_error(y_test, y_preds, multioutput='raw_values',squared=True)

array([4814.67, 2614.5 ])

## Random Forest

In [25]:
model = RandomForestRegressor()
# fit model
model.fit(X_train, y_train)
# make a prediction
y_preds = model.predict(X_test)
# summarize prediction

mean_squared_error(y_test, y_preds, multioutput='raw_values',squared=True)

array([2203.14, 1632.86])

## Use Cross-Validation and DecisionTreeRegressor

In [26]:

# define model
model = DecisionTreeRegressor()
# evaluate model
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')
# summarize performance
n_scores = absolute(n_scores)
print('Result: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Result: 51.797 (3.123)


## Wrap Multi-Output Regression Algorithms

Some algorithms do not lend themselves to multi-outputs for regression.  For these we have to use scikit-learns

> * MultiOutputRegressor
> * RegressorChain

### MultiOutputRegressor

In [27]:

# define model
model = LinearSVR()
wrapper = MultiOutputRegressor(model)
# fit model
wrapper.fit(X, y)
# make a prediction
data_in = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]
yhat = wrapper.predict(data_in)
# summarize prediction
print(yhat[0])


[-93.15  23.27]


In [28]:
model = LinearSVR()
wrapper = MultiOutputRegressor(model)

# fit model
wrapper.fit(X_train, y_train)
# make a prediction
y_preds = wrapper.predict(X_test)
# summarize prediction

mean_squared_error(y_test, y_preds, multioutput='raw_values',squared=True)

array([9.47e-24, 9.96e-24])

### RegressorChain

In [29]:

# define model
model = LinearSVR()
wrapper = RegressorChain(model)
# fit model
wrapper.fit(X, y)
# make a prediction
data_in = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]
yhat = wrapper.predict(data_in)
# summarize prediction
print(yhat[0])

[-93.15  23.27]


In [30]:
model = LinearSVR()
wrapper = RegressorChain(model)

# fit model
wrapper.fit(X_train, y_train)
# make a prediction
y_preds = wrapper.predict(X_test)
# summarize prediction

mean_squared_error(y_test, y_preds, multioutput='raw_values',squared=True)

array([7.84e-24, 1.50e-04])