# Part 2: Depth Estimation limited to scikit-learn

## Task

"Imagine right now that you are only limited to use scikit-learn for this task. What model will you pick and why? Implement the code that creates this model, train it on the data from TFDS NYU Depth V2 (some part of it that may fit into RAM), and evaluate it."

## Approach

I haven't used scikit-learn a lot (mostly the train_test_split), so I'm starting by taking a look at the User Guide and Examples. I've encountered some of the algorithms before in one way or another, so they sound familiar (SGD (obviously), kNN, decision trees, ...) but I also see that there's still a lot I don't know in this library.

The section I'm looking at right now is called [1.17 Neural network models (supervised)](https://scikit-learn.org/stable/modules/neural_networks_supervised.html) and scikit-learn seems to agree with me that it is not the right library to build a complex deep neural net: There's a warning right at the top cautioning that GPU support is missing:

    Warning: This implementation is not intended for large-scale applications. In particular, scikit-learn offers no GPU support. For much faster, GPU-based implementations, as well as frameworks offering much more flexibility to build deep learning architectures, see Related Projects.


The relevant *related projects* are:

**Deep neural networks etc.**
* skorch A scikit-learn compatible neural network library that wraps PyTorch.
* scikeras provides a wrapper around Keras to interface it with scikit-learn. SciKeras is the successor of tf.keras.wrappers.scikit_learn.

The model from scikit-learn that comes closest to our task is the [MLPRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor) as its outputs are continuous values. But there are no straightforward ways in scikit-learn to build one's own architecture e.g. a U-Net, FPN or similar. I can't see any building blocks (NN layers, easily available activation functions, groups / blocks for repeating layer groups, ...). The issue why these models from scikit-learn don't work here is that for depth estimation you need a 2D depth map as output, not a (multilabel-)classification or a continuous value (like salary).

Maybe there's a technique before neural nets which I'm not aware of that can do depth estimation based on monocular images? Maybe scikit-learn (or scikit-image?) supports it. Let's do an arxiv search:

 ([Sample search](https://arxiv.org/search/advanced?advanced=&terms-0-operator=AND&terms-0-term=depth+estimation+single+image&terms-0-field=all&classification-physics_archives=all&classification-include_cross_list=include&date-year=&date-filter_by=date_range&date-from_date=2000&date-to_date=2014&date-date_type=submitted_date&abstracts=show&size=50&order=-announced_date_first), [1](https://arxiv.org/abs/1411.6387), [2](https://arxiv.org/abs/1011.5694), [3](https://arxiv.org/abs/1406.2283)). Only the second result (from 2010) is not deep neural network based. Before that I could only find techniques based on binocular techniques.

**Ending part 2**: I've run out of assessment time and because I wanted to finish this task last, I have to stop at this point.

If I had to continue this task and had to use scikit-learn, I would probably subclass or adapt the [MLPRegressor class](https://github.com/scikit-learn/scikit-learn/blob/9e38cd00d032f777312e639477f1f52f3ea4b3b7/sklearn/neural_network/_multilayer_perceptron.py#L1257). Before doing so I would definitely want to know the (hopefully good) reasons for wanting to use scikit-learn for depth estimation.

## Quick MLPRegressor to understand what it does 

Quickly copying the MLPRegressor code from the documentation to see how it works.

In [28]:
from sklearn.neural_network import MLPRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
X, y = make_regression(n_samples=200, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    random_state=1)
regr = MLPRegressor(random_state=1, max_iter=5000).fit(X_train, y_train)
regr.predict(X_test[:2])
regr.score(X_test, y_test)

0.520915744081988

In [29]:
X.shape, y.shape

((200, 100), (200,))

In [30]:
X[1], y[1]

(array([-1.04054030e+00, -2.19631333e-03,  1.41229617e+00,  1.89476506e+00,
        -9.54135902e-01,  1.57603932e+00,  3.14523396e-02,  9.82683848e-01,
        -1.69967967e-01,  1.39589955e+00,  1.87349238e-01, -1.59635149e-01,
         4.77698004e-01,  4.89547102e-01, -1.29925873e-01, -1.09395008e+00,
         1.44854348e+00, -6.04188418e-01,  3.85565248e-01,  4.40692643e-01,
        -1.03110525e-01,  1.33311252e-01, -2.00934679e+00,  1.08959712e+00,
        -1.04576192e-01,  2.36560147e+00, -1.26858978e+00,  1.89256941e+00,
         4.56092972e-01, -4.99648746e-01, -2.96346890e-01,  1.18391882e+00,
         1.23496869e+00,  3.36368160e-01,  9.43203325e-01,  6.98800372e-01,
         4.00153673e-01,  2.04898474e+00, -1.53165075e+00, -5.51358292e-01,
         5.30446938e-01, -1.57830129e-01,  6.95895745e-01,  6.88356472e-01,
         2.28370262e-01, -1.12131370e+00, -3.68854466e-01, -7.60631755e-01,
         2.49894343e+00, -7.15845967e-01, -7.23002502e-01,  7.69551013e-01,
         2.2

In [31]:
import numpy as np
arr = np.random.rand(10, 100)
regr.predict(arr)

array([105.72547787,  72.38709649,  88.81242861,  94.65223891,
        44.85262731,  71.90318558,  79.40550481,  32.31343773,
       121.14568286,  93.75822279])