## Home assignment 05: Bagging and OOB score

Please, fill the lines in the code below.
This is a simplified version of `BaggingRegressor` from `sklearn`. Please, notice, that `sklearn` API is **not preserved**.

Your algorithm should be able to train different instances of the same model class on bootstrapped datasets and to provide [OOB score](https://en.wikipedia.org/wiki/Out-of-bag_error) for the training set.

The model should be passed as model class with no explicit parameters and no parentheses.

Example:
```
import numpy as np
from sklearn.linear_model import LinearRegression

bagging_regressor = SimplifiedBaggingRegressor(num_bags=10, oob=True)
bagging_regressor.fit(LinearRegression, X, y)

```

In [2]:
import numpy as np

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
try:
    del SimplifiedBaggingRegressor
except:
    pass
from bagging import SimplifiedBaggingRegressor

### Local tests:

In [5]:
from sklearn.linear_model import LinearRegression
from tqdm.auto import tqdm

  from .autonotebook import tqdm as notebook_tqdm


### Simple tests:

In [200]:
for _ in tqdm(range(100)):
    X = np.random.randn(2000, 10)
    y = np.mean(X, axis=1)
    bagging_regressor = SimplifiedBaggingRegressor(num_bags=10, oob=True)
    bagging_regressor.fit(LinearRegression, X, y)
    predictions = bagging_regressor.predict(X)
    assert np.mean((predictions - y) ** 2) < 1e-6, 'Linear dependency should be fitted with almost zero error!'
    assert bagging_regressor.oob, 'OOB feature must be turned on'
    oob_score = bagging_regressor.OOB_score()
    assert oob_score < 1e-6, 'OOB error for linear dependency should be also close to zero!'
    assert abs(
        np.mean(
            list(map(len, bagging_regressor.list_of_predictions_lists))
        ) / bagging_regressor.num_bags - 1 / np.exp(1)) < 0.1, 'Probability of missing a bag should be close to theoretical value!'
print('Simple tests done!')
        

  1%|          | 1/100 [00:00<00:45,  2.20it/s]

Simple tests done!


  2%|▏         | 2/100 [00:00<00:44,  2.19it/s]

Simple tests done!


  3%|▎         | 3/100 [00:01<00:44,  2.20it/s]

Simple tests done!


  4%|▍         | 4/100 [00:01<00:43,  2.19it/s]

Simple tests done!


  5%|▌         | 5/100 [00:02<00:43,  2.19it/s]

Simple tests done!


  6%|▌         | 6/100 [00:02<00:43,  2.17it/s]

Simple tests done!


  7%|▋         | 7/100 [00:03<00:42,  2.17it/s]

Simple tests done!


  8%|▊         | 8/100 [00:03<00:42,  2.16it/s]

Simple tests done!


  9%|▉         | 9/100 [00:04<00:42,  2.16it/s]

Simple tests done!


 10%|█         | 10/100 [00:04<00:41,  2.16it/s]

Simple tests done!


 11%|█         | 11/100 [00:05<00:41,  2.17it/s]

Simple tests done!


 12%|█▏        | 12/100 [00:05<00:40,  2.17it/s]

Simple tests done!


 13%|█▎        | 13/100 [00:05<00:40,  2.17it/s]

Simple tests done!


 14%|█▍        | 14/100 [00:06<00:39,  2.16it/s]

Simple tests done!


 15%|█▌        | 15/100 [00:06<00:39,  2.14it/s]

Simple tests done!


 16%|█▌        | 16/100 [00:07<00:39,  2.13it/s]

Simple tests done!


 17%|█▋        | 17/100 [00:07<00:39,  2.10it/s]

Simple tests done!


 18%|█▊        | 18/100 [00:08<00:38,  2.13it/s]

Simple tests done!


 19%|█▉        | 19/100 [00:08<00:38,  2.13it/s]

Simple tests done!


 20%|██        | 20/100 [00:09<00:37,  2.13it/s]

Simple tests done!


 21%|██        | 21/100 [00:09<00:37,  2.09it/s]

Simple tests done!


 22%|██▏       | 22/100 [00:10<00:36,  2.11it/s]

Simple tests done!


 23%|██▎       | 23/100 [00:10<00:36,  2.11it/s]

Simple tests done!


 24%|██▍       | 24/100 [00:11<00:35,  2.12it/s]

Simple tests done!


 25%|██▌       | 25/100 [00:11<00:35,  2.08it/s]

Simple tests done!


 26%|██▌       | 26/100 [00:12<00:35,  2.10it/s]

Simple tests done!


 27%|██▋       | 27/100 [00:12<00:35,  2.08it/s]

Simple tests done!


 28%|██▊       | 28/100 [00:13<00:35,  2.04it/s]

Simple tests done!


 29%|██▉       | 29/100 [00:13<00:35,  2.02it/s]

Simple tests done!


 30%|███       | 30/100 [00:14<00:34,  2.04it/s]

Simple tests done!


 31%|███       | 31/100 [00:14<00:33,  2.07it/s]

Simple tests done!


 32%|███▏      | 32/100 [00:15<00:32,  2.08it/s]

Simple tests done!


 33%|███▎      | 33/100 [00:15<00:31,  2.10it/s]

Simple tests done!


 34%|███▍      | 34/100 [00:16<00:31,  2.07it/s]

Simple tests done!


 35%|███▌      | 35/100 [00:16<00:30,  2.10it/s]

Simple tests done!


 36%|███▌      | 36/100 [00:16<00:30,  2.12it/s]

Simple tests done!


 37%|███▋      | 37/100 [00:17<00:29,  2.13it/s]

Simple tests done!


 38%|███▊      | 38/100 [00:17<00:29,  2.13it/s]

Simple tests done!


 39%|███▉      | 39/100 [00:18<00:29,  2.07it/s]

Simple tests done!


 40%|████      | 40/100 [00:18<00:29,  2.03it/s]

Simple tests done!


 41%|████      | 41/100 [00:19<00:28,  2.04it/s]

Simple tests done!


 42%|████▏     | 42/100 [00:19<00:28,  2.07it/s]

Simple tests done!


 43%|████▎     | 43/100 [00:20<00:27,  2.05it/s]

Simple tests done!


 44%|████▍     | 44/100 [00:20<00:27,  2.03it/s]

Simple tests done!


 45%|████▌     | 45/100 [00:21<00:26,  2.05it/s]

Simple tests done!


 46%|████▌     | 46/100 [00:21<00:26,  2.07it/s]

Simple tests done!


 47%|████▋     | 47/100 [00:22<00:25,  2.09it/s]

Simple tests done!


 48%|████▊     | 48/100 [00:22<00:24,  2.08it/s]

Simple tests done!


 49%|████▉     | 49/100 [00:23<00:24,  2.06it/s]

Simple tests done!


 50%|█████     | 50/100 [00:23<00:24,  2.04it/s]

Simple tests done!


 51%|█████     | 51/100 [00:24<00:23,  2.04it/s]

Simple tests done!


 52%|█████▏    | 52/100 [00:24<00:23,  2.06it/s]

Simple tests done!


 53%|█████▎    | 53/100 [00:25<00:23,  2.03it/s]

Simple tests done!


 54%|█████▍    | 54/100 [00:25<00:22,  2.00it/s]

Simple tests done!


 55%|█████▌    | 55/100 [00:26<00:22,  2.00it/s]

Simple tests done!


 56%|█████▌    | 56/100 [00:26<00:22,  1.99it/s]

Simple tests done!


 57%|█████▋    | 57/100 [00:27<00:21,  1.99it/s]

Simple tests done!


 58%|█████▊    | 58/100 [00:27<00:21,  1.98it/s]

Simple tests done!


 59%|█████▉    | 59/100 [00:28<00:20,  1.98it/s]

Simple tests done!


 60%|██████    | 60/100 [00:28<00:20,  1.99it/s]

Simple tests done!


 61%|██████    | 61/100 [00:29<00:19,  2.03it/s]

Simple tests done!


 62%|██████▏   | 62/100 [00:29<00:18,  2.06it/s]

Simple tests done!


 63%|██████▎   | 63/100 [00:30<00:18,  2.04it/s]

Simple tests done!


 64%|██████▍   | 64/100 [00:30<00:17,  2.03it/s]

Simple tests done!


 65%|██████▌   | 65/100 [00:31<00:17,  1.99it/s]

Simple tests done!


 66%|██████▌   | 66/100 [00:31<00:17,  1.98it/s]

Simple tests done!


 67%|██████▋   | 67/100 [00:32<00:16,  2.02it/s]

Simple tests done!


 68%|██████▊   | 68/100 [00:32<00:15,  2.06it/s]

Simple tests done!


 69%|██████▉   | 69/100 [00:33<00:14,  2.08it/s]

Simple tests done!


 70%|███████   | 70/100 [00:33<00:14,  2.03it/s]

Simple tests done!


 71%|███████   | 71/100 [00:34<00:14,  2.05it/s]

Simple tests done!


 72%|███████▏  | 72/100 [00:34<00:13,  2.07it/s]

Simple tests done!


 73%|███████▎  | 73/100 [00:35<00:12,  2.08it/s]

Simple tests done!


 74%|███████▍  | 74/100 [00:35<00:12,  2.06it/s]

Simple tests done!


 75%|███████▌  | 75/100 [00:36<00:12,  2.06it/s]

Simple tests done!


 76%|███████▌  | 76/100 [00:36<00:11,  2.07it/s]

Simple tests done!


 77%|███████▋  | 77/100 [00:37<00:11,  2.07it/s]

Simple tests done!


 78%|███████▊  | 78/100 [00:37<00:10,  2.07it/s]

Simple tests done!


 79%|███████▉  | 79/100 [00:38<00:10,  2.07it/s]

Simple tests done!


 80%|████████  | 80/100 [00:38<00:09,  2.08it/s]

Simple tests done!


 81%|████████  | 81/100 [00:38<00:09,  2.11it/s]

Simple tests done!


 82%|████████▏ | 82/100 [00:39<00:08,  2.11it/s]

Simple tests done!


 83%|████████▎ | 83/100 [00:39<00:08,  2.09it/s]

Simple tests done!


 84%|████████▍ | 84/100 [00:40<00:07,  2.11it/s]

Simple tests done!


 85%|████████▌ | 85/100 [00:40<00:07,  2.11it/s]

Simple tests done!


 86%|████████▌ | 86/100 [00:41<00:06,  2.10it/s]

Simple tests done!


 87%|████████▋ | 87/100 [00:41<00:06,  2.09it/s]

Simple tests done!


 88%|████████▊ | 88/100 [00:42<00:05,  2.07it/s]

Simple tests done!


 89%|████████▉ | 89/100 [00:42<00:05,  2.10it/s]

Simple tests done!


 90%|█████████ | 90/100 [00:43<00:04,  2.11it/s]

Simple tests done!


 91%|█████████ | 91/100 [00:43<00:04,  2.13it/s]

Simple tests done!


 92%|█████████▏| 92/100 [00:44<00:03,  2.14it/s]

Simple tests done!


 93%|█████████▎| 93/100 [00:44<00:03,  2.14it/s]

Simple tests done!


 94%|█████████▍| 94/100 [00:45<00:02,  2.15it/s]

Simple tests done!


 95%|█████████▌| 95/100 [00:45<00:02,  2.15it/s]

Simple tests done!


 96%|█████████▌| 96/100 [00:46<00:01,  2.13it/s]

Simple tests done!


 97%|█████████▋| 97/100 [00:46<00:01,  2.15it/s]

Simple tests done!


 98%|█████████▊| 98/100 [00:46<00:00,  2.15it/s]

Simple tests done!


 99%|█████████▉| 99/100 [00:47<00:00,  2.15it/s]

Simple tests done!


100%|██████████| 100/100 [00:47<00:00,  2.09it/s]

Simple tests done!





#### Medium tests

In [201]:
for _ in tqdm(range(10)):
    X = np.random.randn(200, 150)
    y = np.random.randn(len(X))
    bagging_regressor = SimplifiedBaggingRegressor(num_bags=20, oob=True)
    bagging_regressor.fit(LinearRegression, X, y)
    predictions = bagging_regressor.predict(X)
    average_train_error = np.mean((predictions - y) ** 2)
    assert bagging_regressor.oob, 'OOB feature must be turned on'
    oob_score = bagging_regressor.OOB_score()
    assert oob_score > average_train_error, 'OOB error must be higher than train error due to overfitting!'
    assert abs(
        np.mean(
            list(map(len, bagging_regressor.list_of_predictions_lists))
        ) / bagging_regressor.num_bags - 1 / np.exp(1)
    ) < 0.1, 'Probability of missing a bag should be close to theoretical value!'
print('Medium test done!')

 10%|█         | 1/10 [00:00<00:02,  3.82it/s]

Medium test done!
Medium test done!


 30%|███       | 3/10 [00:00<00:01,  4.10it/s]

Medium test done!
Medium test done!


 60%|██████    | 6/10 [00:01<00:01,  3.90it/s]

Medium test done!
Medium test done!


 70%|███████   | 7/10 [00:01<00:00,  4.04it/s]

Medium test done!


 80%|████████  | 8/10 [00:01<00:00,  4.11it/s]

Medium test done!


 90%|█████████ | 9/10 [00:02<00:00,  3.88it/s]

Medium test done!
Medium test done!


100%|██████████| 10/10 [00:02<00:00,  4.02it/s]


#### Complex tests:

In [202]:
for _ in tqdm(range(10)):
    X = np.random.randn(2000, 15)
    y = np.random.randn(len(X))
    bagging_regressor = SimplifiedBaggingRegressor(num_bags=100, oob=True)
    bagging_regressor.fit(LinearRegression, X, y)
    predictions = bagging_regressor.predict(X)
    oob_score = bagging_regressor.OOB_score()
    assert abs(
        np.mean(
            list(map(len, bagging_regressor.list_of_predictions_lists))
        ) / bagging_regressor.num_bags - 1 / np.exp(1)
    ) < 1e-2, 'Probability of missing a bag should be close to theoretical value!'

print('Complex tests done!')

100%|██████████| 10/10 [00:44<00:00,  4.42s/it]

Complex tests done!



