# Predictive Models: Train / Evaluation Entrypoint

### Plugins: 
1. Load Dataset
2. Transform variables/ create features
3. Split dataset into train/test
4. Fit models
5. Evaluate models (on train and test)

### Example usage:
- Wine quality dataset - quality scores from 3 to 8 (mean is 5.6)
- 11 Dependent variables which are all numeric / real valued
- Transformations implemented: normalized each variable (subtract mean, divide by std dev)
- Models trained: linear regression, decision tree regressor, KNN regressor
- Evaluation metric considered: Mean Squared Error (MSE)
- Conclusion: KNN is best given hyperparameters and transformations

### [Set Up] Import modules

In [1]:
from urllib.request import urlretrieve
import predictive_model_plugins.plugins as plugins

### [Set Up] Download example data (Wine quality scores)

In [3]:
wine_data_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
save_path = 'wine_data.csv'
urlretrieve(wine_data_url, save_path)

# Convert semicolons to commas to make CSV
with open(save_path, 'r') as file:
    data = file.read().replace(';', ',')

with open(save_path, 'w') as file:
    file.write(data)

## Dioptra Entrypoint Example Usage

### 1. Set Parameters
Anything in all caps is parameter or artifact

In [42]:
# Entrypoint parameters

DATA_PATH = save_path
FILETYPE = 'csv'

DEP_VAR = 'quality'
INDEP_VARS = ['fixed acidity', 
              'volatile acidity', 
              'citric acid', 
              'residual sugar', 
              'chlorides', 
              'free sulfur dioxide', 
              'total sulfur dioxide',
              'density',
              'pH', 
              'sulphates', 
              'alcohol']

TRANSFORMATIONS = [ ('normalize', [INDEP_VARS], {'drop_original':True})] # (function, positional args, kwargs)

INDEP_VARS_TRANSFORMED = [var + '_norm' for var in INDEP_VARS]

MODEL1_TYPE = 'LinearRegressor'
MODEL1_HYPERPARAMETERS = {}

MODEL2_TYPE = 'DecisionTreeRegressor'
MODEL2_HYPERPARAMETERS = {'criterion':'squared_error',
                                   'min_samples_split': 10
                                    #,'max_depth' : 1000
                                    }

MODEL3_TYPE = 'KNNRegressor'
MODEL3_HYPERPARAMETERS = {'n_neighbors':10, 'weights':'distance'}

EVALUATION_METRICS = ['MSE']


# Run Entrypoint / Task Graph

### 1. Load Dataset

In [43]:
DF = plugins.load_dataset(DATA_PATH, FILETYPE)
#DF.head()

### 2. Transform columns / create features

In [44]:
DF = plugins.create_features(DF, TRANSFORMATIONS )
#DF.head()

### 3. Split dataset into train/test

In [45]:
SPLITS = plugins.make_data_splits(DF)
DF_TRAIN, DF_TEST = (SPLITS['train']['df'], SPLITS['test']['df'] )
#DF_TRAIN.head()

### 4. Train predictive models

In [46]:
MODEL_1 = plugins.train_predictive_model(MODEL1_TYPE, DF_TRAIN,  DEP_VAR, INDEP_VARS_TRANSFORMED, MODEL1_HYPERPARAMETERS )
MODEL_2 = plugins.train_predictive_model(MODEL2_TYPE, DF_TRAIN,  DEP_VAR, INDEP_VARS_TRANSFORMED, MODEL2_HYPERPARAMETERS )
MODEL_3 = plugins.train_predictive_model(MODEL3_TYPE, DF_TRAIN,  DEP_VAR, INDEP_VARS_TRANSFORMED, MODEL3_HYPERPARAMETERS )

#DT_MODEL.hyperparameters

### 5. Evaluate models (using MSE)

In [47]:
for model in [MODEL_1, MODEL_2, MODEL_3]:
    print('\n', type(model).__name__, ":", '\n', '-'*40, sep='')
    for dataset_name, dataset in {'train':DF_TRAIN, 'test':DF_TEST}.items():
        for metric in EVALUATION_METRICS:
            out = plugins.evaluate_model(model, dataset, metric, DEP_VAR, INDEP_VARS_TRANSFORMED)[metric]
            print(f"{dataset_name} {metric}: {round(out, 4)}")


LinearRegressor:
----------------------------------------
train MSE: 0.4129
test MSE: 0.4325

DTRegressor:
----------------------------------------
train MSE: 0.3846
test MSE: 0.4534

KNNRegressor:
----------------------------------------
train MSE: 0.0
test MSE: 0.3434
