In [3]:
import requests
import json
from pprint import pprint
import pandas as pd

# Setup

In [None]:
Baseurl = 'https://kgsa-dev.kochcloud.com/review'
##Baseurl = 'https://kgsa-dev.kochcloud.com'
user = 'badrul'
password = "%IaY0lolDEOeQqsii$w9UO"

# Model Training

### C3 FHR Data

In [None]:
## TODO: Read in the data and show the head
s3DataPath = "s3://prediction_services/data/TrainingInputData_Transformed_Test_Sample_2.csv"
fhrDf = pd.read_csv(s3DataPath, sep=',', header = 0, index_col = 0)
fhrDf.index = pd.to_datetime(fhrDf.index)
fhrDf.head()

### Training Parameters
Dictionary below contains the minimum configuration needed to run a training job.

In [15]:
trainingDict = {
        "training": {
            "train_name": "model-minimum-config-jupyter", 
            "target_data_location": "s3://prediction_services/data/TrainingInputData_Transformed_Test_Sample_2.csv",
            "train_data_end_dtm": "6/1/2020",
            "test_data_end_dtm": "9/1/2020",
            "validation_data_end_dtm": "12/1/2020",
            "model_names": ["ARIMA", "HOLTWINTERS", "PROPHET"] 
        }
    }


### Training Configuration Explanation


1. **train_name** = Used to identify training job and used in model version
2. **target_data_location** = S3 (only S3 is supported) location of the CSV file containing the training data
3. **train_data_end_dtm** - Last inclusive day to be used for training
4. **test_data_end_dtm** - Last inclusive day to be used for testing
5. **validation_data_end_dtm** - Last day or the remaining data
6. **model_names** - List of models to be trained on the data


### Important Default Parameters
1. Default training task is **Model** which trains on the data using single default set of model parameters
2. Code will run in **Sequential** mode, i.e., on a single core
3. Default loss function is **MAPE**
4. Default prediction frequency is **Monthly** 

### Executing Training Job

Training can be divided into two categories:
1. **Model** - Training one or more models based on single parameter set (either default or user provided)
2. **Tuning** - Train one or more models in automated mode (population based training) or configured supported algorithms.  Only the following algorithms are supported:
    1. Grid search
    2. Random search
    3. Bayesian Optimization
    4. Population Based Training
    
To train or tune use the base url + /training
Upon the successful submission of the training to the Prediction Service, training ID and status will be returned.
Training ID is needed to get status and results of training.

In [16]:
response = requests.post(Baseurl+"/training", 
                         auth=(user, password),
                         data = json.dumps(trainingDict))  
                         #json = trainingDict)

### Extracting Training Result

In [17]:
print(response)
result = response.json()
trainingId = result['runId']
print(trainingId)



<Response [200]>
b7086a89-60cf-4e27-8dcb-25cfec5e91eb


### Result of Calling the Training Job

The training is executed asynchronously.  This end point submits the job to the prediction service which gets queued and run as resource become available.  Thus, the end point returns a **runId** which is needed to get status, training results or use the trained model for prediction

In [15]:
trainingStatus = status_response.json()
print(trainingStatus)

{'runId': '0451290a-e15e-4a28-8a6b-6c455f27ce90', 'status': 'Submitted', 'trainStartTs': 'Wed, 30 Dec 2020 14:49:05 GMT', 'updateTs': 'Wed, 30 Dec 2020 14:49:05 GMT'}


### Parameters Returned by Status Call

1. **runId** - This is an UUID that uniquely identifies each training job.  This UUID is **very important** as all subsequent information and result of training is identified using the UUID.  The UUID can also be used during prediction to use the model trained
2. **status**: Current status of the training
3. **trainStartTs**: Timestamp of the training start 

In [16]:
resultResponse = requests.get(Baseurl+"/trainings/"+ trainingId,
                              auth=(user,password))    
print(resultResponse)
trainResults = resultResponse.json()
print(trainResults)

<Response [200]>
{'Status': 'Submitted', 'UpdateTs': 'Wed, 30 Dec 2020 14:49:05 GMT', 'resultLocation': 's3://prediction-services/train/', 'runId': '0451290a-e15e-4a28-8a6b-6c455f27ce90', 'trainEndTs': None, 'trainStartTs': 'Wed, 30 Dec 2020 14:49:05 GMT'}


### Training Result Information

Paramters Explnation:
1. **status** - Status of the training job.
2. **resultLocation** - S3 location where training scores will be stored once the training has completed


In [None]:
##TODO: Read in the forecast results and show 

# Model Tuning and Experimentation

### Hyperparameter Tuning Introduction
**Tuning jobs scale by**:
1. Scaling up or Vertical Scaling - Parallel or
2. Scaling out or Horizontal Scaling - Distributed

Univariate models execute independently.  For each server number of models being tuned simultaneously is limited by number of CPUs on the server

**Deep Learning Models and Parameter Server**
A parameter server typically exists as a remote process or service and interacts with clients through remote procedure calls.

Image(filename=images/param-server-arch.jpg)

![Parameter Server Architecture](images/param-server-arch.jpg)

For hyperparameter tuning a search space of paerameters has to be defined.

#### Search Algorithms
Search Algorithms are wrappers around open-source optimization libraries for efficient hyperparameter selection.

Search Algorithms cannot affect or stop training processes. However, you can use them together to early stop the evaluation of bad trials.

1. **Bayesian Optimization**: This constrained global optimization process builds upon bayesian inference and gaussian processes. It attempts to find the maximum value of an unknown function in as few iterations as possible. This is a good technique for optimization of high cost functions.
2. **BOHB (Bayesian Optimization HyperBand)**: An algorithm that both terminates bad trials and also uses Bayesian Optimization to improve the hyperparameter search. It is backed by the HpBandSter library. BOHB is intended to be paired with a specific scheduler class: HyperBandForBOHB.
3. **HyperOpt**: A Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.
4. **Scikit-Optimize*:
5. **Nevergrad**: HPO without computing gradients.


#### Schedulers
Schedulers are distributed implementations of early-stopping algorithm
Schedulers can early terminate bad trials, pause trials, clone trials, and alter hyperparameters of a running trial.

1. **Median Stopping Rule**: It applies the simple rule that a trial is aborted if the results are trending below the median of the previous trials.
2. **Population Based Training (PBT)** - 
3. **FIFOScheduler* - Simple scheduler that just runs trials in submission order

### Experiment Execution

### Tuning Parameters
Dictionary below contains the minimum configuration needed to run a tuning job.

In [7]:
trainingDict = {
        "training": {
            "train_task_type": "TUNING",
            "train_name": "TUNING-arima-config-jupyter", 
            "target_data_location": "s3://prediction-services/h0500hn_ft-worth.csv",
            "train_data_end_dtm": "2019-12-01",  
            "test_data_end_dtm": "2020-06-01",         
            "validation_data_end_dtm": "2020-11-01",
            "model_names": ["ARIMA"] 
        },
        "models": {
            "ARIMA": {
            "model_name": "ARIMA",
            "model_time_interval": "M",
            "hyperparam_alg": "GRID-SEARCH",
            "model_config": {
                "parameters": {"p": [1, 7], "d": [1, 2], "q": [0, 2]},
                "hyperparameters": {"disp": 0}}
            }
        }
    }
response = requests.post(Baseurl+"/training", 
                         auth=(user, password),
                         data = json.dumps(trainingDict))  
                         #json = trainingDict)


### Tuning Configuration Explanation
#### Required Parameters
1. **train_task_type** = "TUNING"
    * By default it's set to "MODEL"
2. **All the parameters mentioned in the training section**

#### Optional Parameters
1. **hyperparameter_algorithm** -


In [8]:
print(response)
result = response.json()
trainingId = result['runId']
print(trainingId)


<Response [200]>
fee0c8f0-4197-4fc8-ad13-ffb980bcc8af


In [5]:
resultResponse = requests.get(Baseurl+"/trainings/"+ trainingId,
                              auth=(user,password))    
print(resultResponse)
trainResults = resultResponse.json()
print(trainResults)

<Response [200]>
{'Status': 'Running', 'UpdateTs': 'Tue, 12 Jan 2021 02:17:57 GMT', 'resultLocation': 's3://prediction-services/train/', 'runId': '6b320ef3-917f-4ebf-9b7c-d0fc319ebfdb', 'trainEndTs': None, 'trainStartTs': 'Tue, 12 Jan 2021 02:17:57 GMT'}


# Prediction or Scoring

### Scoring Parameters
Dictionary below contains the minimum configuration needed to run a scoring job.

In [9]:
scoringDict = {
        "scoring": {
            "score_name": "score-minimum-config-manager",
            "target_data_location": "s3://prediction-services/data/test_single_target.csv",
            "model_names": ["ARIMA", "HOLTWINTERS"],
            "prediction_steps": 12,
            "prediction_count": 10
            
        }
    }
predResponse = requests.post(Baseurl+ "/predictions", 
                         auth=(user, password),
                         data = json.dumps(scoringDict))  

### Scoring Configuration Explanation
#### Required Parameters
1. **score_name** - 
2. **target_data_location** -
3. **model_names** -

#### Optional Parameters
1. **prediction_steps** - 
2. **prediction_count** - 

In [13]:
print(predResponse)
predResults = predResponse.json()
print(predResults)
predId = predResults['runId']
print(predId)

<Response [200]>
{'Status': 'Created', 'runId': 'a0a987ee-b2f3-47c7-80b9-98df99ff677b'}
a0a987ee-b2f3-47c7-80b9-98df99ff677b


In [14]:
predResponse = requests.get(Baseurl+"/predictions/"+ predId,
                              auth=(user,password))    
print(predResponse)
predStatus = predResponse.json()
print(predStatus)


<Response [200]>
{'Status': 'Running', 'runId': 'a0a987ee-b2f3-47c7-80b9-98df99ff677b'}


## Prediction: Using Previously Trained Model

In [34]:
scoringDict = {
        "scoring": {
            "score_name": "score-minimum-config-manager",
            "target_data_location": "s3://prediction_services/data/TrainingInputData_Transformed_Test_Sample_2.csv",
            "model_names": ["ARIMA"],
            "prediction_steps": 12,
            "prediction_count": 10,
            "train_run_id": "e5ff99ac-e260-49d0-934f-46c46d31f136"
        }
    }
predResponse = requests.post(Baseurl+ "/predictions", 
                         auth=(user, password),
                         data = json.dumps(scoringDict))  

In [None]:
print(predResponse)
predStatus = predResponse.json()
print(predStatus)
predId = predStatus['runId']
print(predId)

### Get Prediction Status

In [35]:
predResponse = requests.get(Baseurl+"/predictions/"+ predId,
                              auth=(user,password))    
print(predResponse)
predStatus = predResponse.json()
print(predStatus)
predId = predStatus['runId']
print(predId)

<Response [200]>
{'Status': 'Created', 'runId': '870212f3-6eee-4dd3-8e16-e0f5bb152418'}
870212f3-6eee-4dd3-8e16-e0f5bb152418


### Fetch Prediction Results

In [36]:
predResponse = requests.get(Baseurl+"/predictions/"+ predId,
                              auth=(user,password))    
print(predResponse)
predResults = predResponse.json()
print(predResults)

<Response [200]>
{'Status': 'Completed', 'UpdateTs': 'Wed, 23 Dec 2020 20:56:37 GMT', 'resultLocation': 's3://prediction-services/score/score-minimum-config-manager_2020-12-23-20.56.37.csv', 'runId': '870212f3-6eee-4dd3-8e16-e0f5bb152418', 'scoreEndTs': 'Wed, 23 Dec 2020 20:56:37 GMT', 'scoreStartTs': 'Wed, 23 Dec 2020 20:56:37 GMT'}
