## Train a `RandomForestRegressor`

You have been hired by a food delivery startup to optimize the way that jobs are assigned to drivers. They want you to develop a predictive model to estimate the how long it is likely to take for a restaurant to prepare an order.

You have been provided with a dataset from previous orders. Each sample includes the following columns, with values given as of the time that the order is placed:

- `id` number of the order
- `cost` of the order in dollars, excluding delivery fees and tip
- `average_cost` of an order at that restaurant, over the previous week
- `average_time_wk` is the average time in minutes taken to prepare an order at that restaurant, over the previous week
- `average_time_hr` is the average time in minutes taken to prepare an order at that restaurant, over the previous hour
- `has_drive_thru` (1 or 0, indicating whether or not the restaurant has a drive-through order window)
- `unfulfilled_orders` is the number of orders that have been placed at that restaurant, but not yet prepared (excluding this one)
- `time` is the actual time in minutes it took to prepare the order

In the attached workspace, you will read this data from a file, and split it into training and test sets. Then, you will fit a `RandomForestRegressor` (using the `sklearn` implementation, you may refer to its documentation [here](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html)) on the training set, and evaluate its accuracy in predicting `time` on the test set.

You'll need to specify this random state in your notebook:

> random_state = 29

The following items will be graded:
| Name | Type | Description |
| ---- | ---- | ---- |
|`Xtr`	|pandas dataframe	|Training data - features.|
|`Xts`	|pandas dataframe	|Test data - features.|
|`ytr`	|pandas series OR pandas data frame OR 1d numpy array	|Training data - target variable.|
|`yts`	|pandas series OR pandas data frame OR 1d numpy array	|Test data - target variable.|
|`yts_hat`	|1d numpy array	|Model prediction for test data.
|`rsq`	|float	|R2 of model on test data.|


In [20]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

In this question, we will try to predict the time it will take for an order to be delivered.

First, we'll load the dataset:

In [21]:
df = pd.read_csv('data.csv', names=['id', 'cost', 'average_cost', 'average_time_wk', 'average_time_hr', 'has_drive_thru', 'unfulfilled_orders', 'time'], header=None, index_col='id')

You can add some code here to inspect the data, see the names of features, and see the data types - the cell below will not be graded.

In [22]:
df.head()

Unnamed: 0_level_0,cost,average_cost,average_time_wk,average_time_hr,has_drive_thru,unfulfilled_orders,time
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1375.0,9.595529,12.929266,24.808019,28.140299,1.0,2.581362,21.88515
1376.0,58.069832,38.395697,62.905777,103.955612,0.0,4.882039,85.343048
1377.0,16.364309,16.678048,36.988889,54.138861,1.0,5.846501,42.501774
1378.0,10.411111,20.756315,36.53133,59.250801,1.0,5.673834,51.07425
1379.0,6.501663,21.79252,41.203542,56.736735,1.0,5.142906,50.419929


(but, note that your code will be evaluated on *different* data organized in a data frame with the same columns - so in your solution, you should not hard-code anything specific to this data.)

Now we will split into training and test sets, using `train_test_split`! 

* Reserve 20% of the data for testing.
* Use the random state specified on the question page.

The following cell should create 

* `Xtr` and `Xts` as pandas data frames including only the features, 
* and `ytr` and `yts` as either pandas data series or 1d numpy arrays containing the target variable. 

In [23]:
#grade (write your code in this cell and DO NOT DELETE THIS LINE)
features = ['cost', 'average_cost', 'average_time_wk', 'average_time_hr', 'has_drive_thru', 'unfulfilled_orders']
target = ['time']
X = df[features]
y = df[target]
random_state = 29
Xtr, Xts, ytr, yts = train_test_split(X, y, test_size = 0.2, random_state = random_state)

Now we are ready to fit the `RandomForestRegressor`. Using 

* the random state specified in the question page
* and setting the number of trees in the forest to 10
* and the default settings otherwise, 

fit the model on the training data. Then, use it to make predictions for the test samples, and save this prediction in `yts_hat`. Evaluate the R2 score of the model on the test data, and save this in `rsq`.

In [24]:
#grade (write your code in this cell and DO NOT DELETE THIS LINE)
model = RandomForestRegressor(n_estimators=10, random_state = random_state)
model.fit(Xtr, ytr)
yts_hat = model.predict(Xts)
rsq = r2_score(yts, yts_hat)

  return fit_method(estimator, *args, **kwargs)
