# Tutorial of DNNR with a comparison to KNN and Catboost
DNNR works by estimating the gradient of the target function at each neighbor, and then instead of averaging the labels of the neighbors it averages the n-th order taylor approximations of the target function. remember that the taylor approximation around a point is given by $ \eta_{\text{DNNR}}(x) = \frac{1}{k} 
    \sum_{
        X_m \in B_{x, \#k}
    }
        \left(
            Y_m + \hat \gamma_m (x - X_m) 
        \right).$ the taylor approximation also includes the averaging of targets that KNN does but accounts for the function changes with the estimated derivative. 


In [None]:
! python -m pip install dnnr

In [None]:
from sklearn.datasets import fetch_california_housing,make_friedman1
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error


from dnnr import DNNR

## Data fetching and preprocessing


In [None]:
import sklearn

# Friedman1 is a simple synthetic dataset. See: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman1.html
dataset = 'friedman1'
# Uncomment the following line to use the California housing dataset (https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html).
# dataset = 'california_housing'

if dataset == 'friedman1':
    X, y = make_friedman1(n_samples=20000)

elif dataset == 'california_housing':
    cali = sklearn.datasets.fetch_california_housing()
    y = cali.target
    X = cali.data


It is critical to scale the data for good performance. Here, we use the `sklearn.preprocessing.StandardScaler` to scale the data to have mean 0 and standard deviation 1.

In [None]:
X = StandardScaler().fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=2022
)

## DNNR: fitting and evaluation

In [None]:
model = DNNR(n_derivative_neighbors=32)
model.fit(X_train, y_train)


In [None]:
print("Evaluating DNNR Model")
mse_error = mean_squared_error(y_test, model.predict(X_test))
print("MSE={error}".format(error=mse_error))


## Comparing to KNN model 

In [None]:
from sklearn.neighbors import KNeighborsRegressor

print("Evaluating KNN Regression")
knn_model = KNeighborsRegressor(n_neighbors=5)
knn_model.fit(X_train, y_train)
knn_mse_error = mean_squared_error(y_test, knn_model.predict(X_test))
print("MSE={error}".format(error=knn_mse_error))


## Comparing to [CatBoost](https://catboost.ai/)


In [None]:
try:
    import catboost
except ImportError:
    ! pip install catboost

from catboost import CatBoostRegressor

print("Evaluating Catboost")

catboost = CatBoostRegressor(verbose=False)
catboost.fit(X_train, y_train)
catboost_mse_error = mean_squared_error(y_test, catboost.predict(X_test))
print("MSE={error}".format(error=catboost_mse_error))

In [None]:
print("Method    |  MSE")
print("DNNR      |  {error}".format(error=mse_error))
print("KNN Reg   |  {error}".format(error=knn_mse_error))
print("CatBoost  |  {error}".format(error=catboost_mse_error))