Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Automated Machine Learning
_**Energy Demand Forecasting**_

## Contents
1. [Introduction](#Introduction)
2. [Setup](#Setup)
3. [Data](#Data)
4. [Train](#Train)
5. [Deploy](#Deploy)

## Introduction
In this example, we show how AutoML can be used for energy demand forecasting.

Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.

In this notebook you would see
1. Creating an Experiment in an existing Workspace
2. Instantiating AutoMLConfig with new task type "forecasting" for timeseries data training, and other timeseries related settings: for this dataset we use the basic one: "time_column_name" 
3. Training the Model using local compute
4. Exploring the results
5. Testing the fitted model

# Scenario
This scenario focuses on energy demand forecasting where the __goal is to predict the future load on an energy grid__. It is a critical business operation for companies in the energy sector as operators need to maintain the fine balance between the energy consumed on a grid and the energy supplied to it. 

Too much power supplied to the grid can result in waste of energy or technical faults. However, if too little power is supplied it can lead to blackouts, leaving customers without power. ypically, grid operators can take short-term decisions to manage energy supply to the grid and keep the load in balance. An accurate short-term forecast of energy demand is therefore essential for the operator to make these decisions with confidence.

This scenario details the construction of a machine learning energy demand forecasting solution. _The solution is trained on a public dataset from the New York Independent System Operator (NYISO)_ , which operates the power grid for New York State. 
The dataset includes hourly power demand data for New York City over a period of five years. An additional dataset containing hourly weather conditions in New York City over the same time period was taken from darksky.net. 


## Setup

As part of the setup you have already created a <b>Workspace</b>. For AutoML you would need to create an <b>Experiment</b>. An <b>Experiment</b> is a named object in a <b>Workspace</b>, which is used to run experiments.

In [None]:
import azureml.core
import pandas as pd
import numpy as np
import logging
import warnings
# Squash warning messages for cleaner output in the notebook
warnings.showwarning = lambda *args, **kwargs: None


from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from matplotlib import pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [None]:
ws = Workspace.from_config()

# choose a name for the run history container in the workspace
experiment_name = 'automl-energydemandforecasting'
# project folder
project_folder = './sample_projects/automl-local-energydemandforecasting'

experiment = Experiment(ws, experiment_name)

output = {}
output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Project Directory'] = project_folder
output['Run History Name'] = experiment_name
pd.set_option('display.max_colwidth', -1)
outputDf = pd.DataFrame(data = output, index = [''])
outputDf.T

## Data
Read energy demanding data from file, and preview data.

In [None]:
data = pd.read_csv("nyc_energy.csv", parse_dates=['timeStamp'])
data.head()

### Split the data to train and test



In [None]:
train = data[data['timeStamp'] < '2017-02-01']
test = data[data['timeStamp'] >= '2017-02-01']


### Prepare the test data, we will feed X_test to the fitted model and get prediction

In [None]:
y_test = test.pop('demand').values
X_test = test

### Split the train data to train and valid

Use one month's data as valid data


In [None]:
X_train = train[train['timeStamp'] < '2017-01-01']
X_valid = train[train['timeStamp'] >= '2017-01-01']
y_train = X_train.pop('demand').values
y_valid = X_valid.pop('demand').values
print(X_train.shape)
print(y_train.shape)
print(X_valid.shape)
print(y_valid.shape)

## Train

Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.

|Property|Description|
|-|-|
|**task**|forecasting|
|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>
|**iterations**|Number of iterations. In each iteration, Auto ML trains a specific pipeline on the given data|
|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|
|**X**|(sparse) array-like, shape = [n_samples, n_features]|
|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification.  This should be an array of integers. |
|**X_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, n_features]|
|**y_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]<br>Multi-class targets. An indicator matrix turns on multilabel classification.  This should be an array of integers. |
|**path**|Relative path to the project folder.  AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. 

In [None]:
time_column_name = 'timeStamp'
automl_settings = {
    "time_column_name": time_column_name,
}


automl_config = AutoMLConfig(task = 'forecasting',
                             debug_log = 'automl_nyc_energy_errors.log',
                             primary_metric='normalized_root_mean_squared_error',
                             iterations = 10,
                             iteration_timeout_minutes = 10,
                             X = X_train,
                             y = y_train,
                             X_valid = X_valid,
                             y_valid = y_valid,
                             path=project_folder,
                             # model_explainability=True,
                             verbosity = logging.INFO,
                            **automl_settings)

You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.
You will see the currently running iterations printing to the console.

In [None]:
local_run = experiment.submit(automl_config, show_output=True)

In [None]:
local_run

### Retrieve the Best Model
Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration.

In [None]:
best_run, fitted_model = local_run.get_output()
fitted_model.steps

### Retrieve explanation for best model
Model explainability is important to understand the features and their importance. This will retrieve the explainability of the model.

In [None]:
from azureml.train.automl.automlexplainer import explain_model

shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \
    explain_model(fitted_model, X_train, X_test, best_run)

### Widget for monitoring runs

In [None]:
from azureml.widgets import RunDetails
RunDetails(local_run).show()

### Test the Best Fitted Model

Predict on training and test set, and calculate residual values.

In [None]:
y_pred = fitted_model.predict(X_test)
y_pred

### Use the Check Data Function to remove the nan values from y_test to avoid error when calculate metrics 

In [None]:
if len(y_test) != len(y_pred):
    raise ValueError(
        'the true values and prediction values do not have equal length.')
elif len(y_test) == 0:
    raise ValueError(
        'y_true and y_pred are empty.')

# if there is any non-numeric element in the y_true or y_pred,
# the ValueError exception will be thrown.
y_test_f = np.array(y_test).astype(float)
y_pred_f = np.array(y_pred).astype(float)

# remove entries both in y_true and y_pred where at least
# one element in y_true or y_pred is missing
y_test = y_test_f[~(np.isnan(y_test_f) | np.isnan(y_pred_f))]
y_pred = y_pred_f[~(np.isnan(y_test_f) | np.isnan(y_pred_f))]

### Calculate metrics for the prediction


In [None]:
print("[Test Data] \nRoot Mean squared error: %.2f" % np.sqrt(mean_squared_error(y_test, y_pred)))
# Explained variance score: 1 is perfect prediction
print('mean_absolute_error score: %.2f' % mean_absolute_error(y_test, y_pred))
print('R2 score: %.2f' % r2_score(y_test, y_pred))



# Plot outputs
test_pred = plt.scatter(y_test, y_pred, color='b')
test_test = plt.scatter(y_test, y_test, color='g')
plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)
plt.show()

## Deploy
Deploy the model into an Azure Container Instance to enable inferencing on new data

### Register the model
Register the best model to the AML service

In [None]:
model = local_run.register_model(description = 'automated ml model for energy demand forecasting', tags = {'ml': "Forecasting", 'type': "automl"})
print(local_run.model_id) # This will be written to the script file later in the notebook.

### Create Scoring Script
This will be used to run the model on new data for predictions

In [None]:
%%writefile score_energy_demand.py
import pickle
import json
import numpy as np
import azureml.train.automl
from sklearn.externals import joblib
from azureml.core.model import Model


def init():
    global model
    model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)

def run(timestamp,precip,temp):
    try:
        rawdata = json.dumps({timestamp, precip, temp})
        data = json.loads(rawdata)
        data_arr = numpy.array(data)
        result = model.predict(data_arr)
        # result = json.dumps({'timeStamp':timestamp, 'precip':precip, 'temp':temp})
    except Exception as e:
        result = str(e)
        return json.dumps({"error": result})
    return json.dumps({"result":result.tolist()})

### Create a YAML File for the Environment

In [None]:
experiment = Experiment(ws, experiment_name)
ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)

In [None]:
dependencies = ml_run.get_run_sdk_dependencies(iteration = 0)

In [None]:
for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core', 'azureml-webservice-schema','azuremlftk']:
    print('{}\t{}'.format(p, dependencies[p]))

In [None]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=["azureml-webservice-schema", "azuremlftk", "azureml-train-automl"])
print(myenv.serialize_to_string())

conda_env_file_name = 'my_conda_env.yml'
myenv.save_to_file('.', conda_env_file_name)

In [None]:
# Substitute the actual version number in the environment file.
# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.
# However, we include this in case this code is used on an experiment from a previous SDK version.

with open(conda_env_file_name, 'r') as cefr:
    content = cefr.read()

with open(conda_env_file_name, 'w') as cefw:
    cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))

# Substitute the actual model id in the script file.

script_file_name = 'score_energy_demand.py'

with open(script_file_name, 'r') as cefr:
    content = cefr.read()

with open(script_file_name, 'w') as cefw:
    cefw.write(content.replace('<<modelid>>', local_run.model_id))

### Generate schema file
Schema file is used to define the deployed web service REST API, so it is consumable from "Swagger enabled" services, such as Power BI

In [None]:
from azureml.webservice_schema.sample_definition import SampleDefinition
from azureml.webservice_schema.data_types import DataTypes
from azureml.webservice_schema.schema_generation import generate_schema

schema_file_name = './schema.json'
def run(timestamp,precip,temp):
    return "OK"

import numpy as np
generate_schema(run, inputs={
    "timestamp" : SampleDefinition(DataTypes.STANDARD, '2012-01-01 00:00:00'),
    "precip" : SampleDefinition(DataTypes.STANDARD, 0.0),
    "temp" : SampleDefinition(DataTypes.STANDARD, 0.0)}, 
    filepath=schema_file_name)

### Create a Docker file to include extra dependencies in the image

In [None]:
%%writefile docker_steps.dockerfile
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y build-essential gcc g++ python-dev unixodbc unixodbc-dev

In [None]:
docker_file_name = "docker_steps.dockerfile"

### Create a Container Image
The container image will be based on the model and is used to deploy the container instance

In [None]:
from azureml.core.image import Image, ContainerImage

image_config = ContainerImage.image_configuration(runtime= "python",
                                 execution_script = script_file_name,
                                 docker_file = docker_file_name,
                                 schema_file = schema_file_name,
                                 conda_file = conda_env_file_name,
                                 tags = {'ml': "Forecasting", 'type': "automl"},
                                 description = "Image for automated ml energy demand forecasting predictions")

image = Image.create(name = "automlenergyforecasting",
                     models = [model],
                     image_config = image_config, 
                     workspace = ws)

image.wait_for_creation(show_output = True)

In [None]:
print(image.image_build_log_uri)

### Deploy the Image as a Web Service on Azure Container Instance

In [None]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 1, 
                                               tags = {'ml': "Forecasting", 'type': "automl"}, 
                                               description = 'ACI service for automated ml energy demand forecasting predictions')

In [None]:
from azureml.core.webservice import Webservice

aci_service_name = 'automlenergyforecasting'
print(aci_service_name)
aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,
                                           image = image,
                                           name = aci_service_name,
                                           workspace = ws)
aci_service.wait_for_deployment(True)
print(aci_service.state)