Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.png)

# Automated Machine Learning
_**Orange Juice Sales Forecasting**_

## Contents
1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Compute](#Compute)
1. [Data](#Data)
1. [Train](#Train)
1. [Predict](#Predict)
1. [Operationalize](#Operationalize)

## Introduction
In this example, we use AutoML to train, select, and operationalize a time-series forecasting model for multiple time-series.

Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.

The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area.

## Setup

In [1]:
import azureml.core
import pandas as pd
import numpy as np
import logging

from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig

  from numpy.core.umath_tests import inner1d


As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem. 

In [2]:
# ws = Workspace.from_config()
import sys
sys.path.append(r'C:\Users\jp\Documents\GitHub\vault-private')
import credentials
ws = credentials.authenticate_AZR('gmail','WS_demo','RG_wip')

# choose a name for the run history container in the workspace
experiment_name = 'test-automl-ojforecasting'

experiment = Experiment(ws, experiment_name)

output = {}
output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace'] = ws.name
output['SKU'] = ws.sku
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Run History Name'] = experiment_name
# pd.set_option('display.max_colwidth', -1)
outputDf = pd.DataFrame(data = output, index = [''])
outputDf.T

{'Authorization': 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6IllNRUxIVDBndmIwbXhvU0RvWWZvbWpxZmpZVSIsImtpZCI6IllNRUxIVDBndmIwbXhvU0RvWWZvbWpxZmpZVSJ9.eyJhdWQiOiJodHRwczovL21hbmFnZW1lbnQuY29yZS53aW5kb3dzLm5ldC8iLCJpc3MiOiJodHRwczovL3N0cy53aW5kb3dzLm5ldC9lMjE4ZGRjZC1jYTYyLTQzNzgtYmJlMS0xMDliZGQwNGU4YTMvIiwiaWF0IjoxNTg3NTUzMzY4LCJuYmYiOjE1ODc1NTMzNjgsImV4cCI6MTU4NzU1NzI2OCwiYWlvIjoiNDJkZ1lCRE00ZW1ybk5BeHUzeXY3N1RXaFBPc0FBPT0iLCJhcHBpZCI6IjIwYWQ0NWFhLTQ3NGQtNGFmYy04MzNiLTllNjI5MjEzMmQzOSIsImFwcGlkYWNyIjoiMSIsImlkcCI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0L2UyMThkZGNkLWNhNjItNDM3OC1iYmUxLTEwOWJkZDA0ZThhMy8iLCJvaWQiOiJmZmVlMjRjYy0xNDY5LTQ1NWQtOTFkOC04YzhmZWI5MjJlMjIiLCJzdWIiOiJmZmVlMjRjYy0xNDY5LTQ1NWQtOTFkOC04YzhmZWI5MjJlMjIiLCJ0aWQiOiJlMjE4ZGRjZC1jYTYyLTQzNzgtYmJlMS0xMDliZGQwNGU4YTMiLCJ1dGkiOiJmekhnT01YcU1rZWI4bXJiXy1sTUFBIiwidmVyIjoiMS4wIn0.EBTf0_tOZDqEJRuwOmDx58QcSlbZKrshWEFIgcXMSbLVk8VUkWBZf3Jjh7FzieWkOd30nHNr0jfJzdyQklAvTqV2gSpZDNwslVD9rK1eAydVGdBD8feE7LUgIMOExqYP9nv35l5XBZSDsJeTQXjE4

Unnamed: 0,Unnamed: 1
SDK version,1.2.0
Subscription ID,be8e48ab-94b2-4145-a6de-2104dc657912
Workspace,WS_demo
SKU,Enterprise
Resource Group,RG_wip
Location,eastus2
Run History Name,test-automl-ojforecasting


## Compute
You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.
#### Creation of AmlCompute takes approximately 5 minutes. 
If the AmlCompute with that name is already in your workspace this code will skip the creation process.
As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota.

In [3]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget

# Choose a name for your cluster.
amlcompute_cluster_name = "cpu-cluster"

found = False
# Check if this compute target already exists in the workspace.
cts = ws.compute_targets
if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':
    found = True
    print('Found existing compute target.')
    compute_target = cts[amlcompute_cluster_name]
    
if not found:
    print('Creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = "STANDARD_D2_V2", # for GPU, use "STANDARD_NC6"
                                                                #vm_priority = 'lowpriority', # optional
                                                                max_nodes = 6)

    # Create the cluster.
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)
    
# print('Checking cluster status...')
# # Can poll for a minimum number of nodes and for a specific timeout.
# # If no min_node_count is provided, it will use the scale settings for the cluster.
# compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)
    
# # For a more detailed view of current AmlCompute status, use get_status().

# check compute targets
cts = ws.compute_targets
print(cts)

# attach compute (gpu / cpu / local)
import pyautogui
sys.path.append(r'C:\Users\jp\Documents\GitHub\jp-codes-python\autoML_py36')
import jp_utils
answer = pyautogui.prompt(
    text='Enter compute target (gpu, cpu, or local)',
    title='Compute target',
    default='cpu')
compute_dict = {'gpu':'gpu-cluster', 'cpu':'cpu-cluster', 'local':'gpu-local'}
target_name = jp_utils.generic_switch(compute_dict, answer)
compute_target =cts[target_name]
print(compute_target.name)


Found existing compute target.
{'cpu-cluster': AmlCompute(workspace=Workspace.create(name='WS_demo', subscription_id='be8e48ab-94b2-4145-a6de-2104dc657912', resource_group='RG_wip'), name=cpu-cluster, id=/subscriptions/be8e48ab-94b2-4145-a6de-2104dc657912/resourceGroups/RG_wip/providers/Microsoft.MachineLearningServices/workspaces/WS_demo/computes/cpu-cluster, type=AmlCompute, provisioning_state=Succeeded, location=eastus2, tags=None)}
cpu-cluster


## Data
You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type.

In [4]:
time_column_name = 'WeekStarting'
data = pd.read_csv("dominicks_OJ.csv", parse_dates=[time_column_name])
data.head()

Unnamed: 0,WeekStarting,Store,Brand,Quantity,logQuantity,Advert,Price,Age60,COLLEGE,INCOME,Hincome150,Large HH,Minorities,WorkingWoman,SSTRDIST,SSTRVOL,CPDIST5,CPWVOL5
0,1990-06-14,2,dominicks,10560,9.26,1,1.59,0.23,0.25,10.55,0.46,0.1,0.11,0.3,2.11,1.14,1.93,0.38
1,1990-06-14,2,minute.maid,4480,8.41,0,3.17,0.23,0.25,10.55,0.46,0.1,0.11,0.3,2.11,1.14,1.93,0.38
2,1990-06-14,2,tropicana,8256,9.02,0,3.87,0.23,0.25,10.55,0.46,0.1,0.11,0.3,2.11,1.14,1.93,0.38
3,1990-06-14,5,dominicks,1792,7.49,1,1.59,0.12,0.32,10.92,0.54,0.1,0.05,0.41,3.8,0.68,1.6,0.74
4,1990-06-14,5,minute.maid,4224,8.35,0,2.99,0.12,0.32,10.92,0.54,0.1,0.05,0.41,3.8,0.68,1.6,0.74


Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred.    

The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we thus define the **grain** - the columns whose values determine the boundaries between time-series: 

In [5]:
grain_column_names = ['Store', 'Brand']
nseries = data.groupby(grain_column_names).ngroups
print('Data contains {0} individual time-series.'.format(nseries))

Data contains 249 individual time-series.


For demonstration purposes, we extract sales time-series for just a few of the stores:

In [6]:
use_stores = [2, 5, 8]
data_subset = data[data.Store.isin(use_stores)]
nseries = data_subset.groupby(grain_column_names).ngroups
print('Data subset contains {0} individual time-series.'.format(nseries))

Data subset contains 9 individual time-series.


### Data Splitting
We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the grain columns.

In [7]:
n_test_periods = 20

def split_last_n_by_grain(df, n):
    """Group df by grain and split on last n rows for each group."""
    df_grouped = (df.sort_values(time_column_name) # Sort by ascending time
                  .groupby(grain_column_names, group_keys=False))
    df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])
    df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])
    return df_head, df_tail

train, test = split_last_n_by_grain(data_subset, n_test_periods)

### Upload data to datastore
The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace), is paired with the storage account, which contains the default data store. We will use it to upload the train and test data and create [tabular datasets](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training and testing. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation.

In [8]:
train.to_csv (r'./dominicks_OJ_train.csv', index = None, header=True)
test.to_csv (r'./dominicks_OJ_test.csv', index = None, header=True)

In [9]:
datastore = ws.get_default_datastore()
datastore.upload_files(files = ['./dominicks_OJ_train.csv', './dominicks_OJ_test.csv'], target_path = 'dataset/', overwrite = True,show_progress = True)

Uploading an estimated of 2 files
Uploading ./dominicks_OJ_test.csv
Uploading ./dominicks_OJ_train.csv
Uploaded ./dominicks_OJ_test.csv, 1 files out of an estimated total of 2
Uploaded ./dominicks_OJ_train.csv, 2 files out of an estimated total of 2
Uploaded 2 files


$AZUREML_DATAREFERENCE_60edbd47ccd242b6b562f4e75cc897f1

### Create dataset for training

In [10]:
from azureml.core.dataset import Dataset
train_dataset = Dataset.Tabular.from_delimited_files(path=datastore.path('dataset/dominicks_OJ_train.csv'))

In [11]:
train_dataset.to_pandas_dataframe().tail()

Unnamed: 0,WeekStarting,Store,Brand,Quantity,logQuantity,Advert,Price,Age60,COLLEGE,INCOME,Hincome150,Large HH,Minorities,WorkingWoman,SSTRDIST,SSTRVOL,CPDIST5,CPWVOL5
847,1992-04-09,8,tropicana,16192,9.69,0,2.5,0.25,0.1,10.6,0.05,0.13,0.04,0.28,2.64,1.5,2.91,0.64
848,1992-04-16,8,tropicana,6528,8.78,0,2.89,0.25,0.1,10.6,0.05,0.13,0.04,0.28,2.64,1.5,2.91,0.64
849,1992-04-23,8,tropicana,8320,9.03,0,2.89,0.25,0.1,10.6,0.05,0.13,0.04,0.28,2.64,1.5,2.91,0.64
850,1992-04-30,8,tropicana,30784,10.33,1,2.16,0.25,0.1,10.6,0.05,0.13,0.04,0.28,2.64,1.5,2.91,0.64
851,1992-05-07,8,tropicana,18048,9.8,0,2.89,0.25,0.1,10.6,0.05,0.13,0.04,0.28,2.64,1.5,2.91,0.64


## Modeling

For forecasting tasks, AutoML uses pre-processing and estimation steps that are specific to time-series. AutoML will undertake the following pre-processing steps:
* Detect time-series sample frequency (e.g. hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span 
* Impute missing values in the target (via forward-fill) and feature columns (using median column values) 
* Create grain-based features to enable fixed effects across different series
* Create time-based features to assist in learning seasonal patterns
* Encode categorical variables to numeric quantities

In this notebook, AutoML will train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series. If you're looking for training multiple models for different time-series, please check out the forecasting grouping notebook. 

You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: 

In [12]:
target_column_name = 'Quantity'

## Train

The AutoMLConfig object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, the training data, and cross-validation parameters. 

For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.

The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.

Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *validation_data* parameter of AutoMLConfig.

Here is a summary of AutoMLConfig parameters used for training the OJ model:

|Property|Description|
|-|-|
|**task**|forecasting|
|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>
|**experiment_timeout_hours**|Experimentation timeout in hours.|
|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|
|**training_data**|Input dataset, containing both features and label column.|
|**label_column_name**|The name of the label column.|
|**compute_target**|The remote compute for training.|
|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection|
|**enable_voting_ensemble**|Allow AutoML to create a Voting ensemble of the best performing models|
|**enable_stack_ensemble**|Allow AutoML to create a Stack ensemble of the best performing models|
|**debug_log**|Log file path for writing debugging information|
|**time_column_name**|Name of the datetime column in the input data|
|**grain_column_names**|Name(s) of the columns defining individual series in the input data|
|**drop_column_names**|Name(s) of columns to drop prior to modeling|
|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|

In [13]:
time_series_settings = {
    'time_column_name': time_column_name,
    'grain_column_names': grain_column_names,
    'drop_column_names': ['logQuantity'],  # 'logQuantity' is a leaky feature, so we remove it.
    'max_horizon': n_test_periods
}

automl_config = AutoMLConfig(task='forecasting',
                             debug_log='automl_oj_sales_errors.log',
                             primary_metric='normalized_mean_absolute_error',
                             experiment_timeout_hours=0.25,
                             training_data=train_dataset,
                             label_column_name=target_column_name,
                             compute_target=compute_target,
                             enable_early_stopping=True,
                             n_cross_validations=3,
                             verbosity=logging.INFO,
                             **time_series_settings)

You can now submit a new training run. Depending on the data and number of iterations this operation may take several minutes.
Information from each iteration will be printed to the console.

In [14]:
remote_run = experiment.submit(automl_config, show_output=False)
remote_run


Experiment,Id,Type,Status,Details Page,Docs Page
test-automl-ojforecasting,AutoML_1534bb12-57ff-4208-827d-20effa815c1d,automl,Starting,Link to Azure Machine Learning studio,Link to Documentation


In [15]:
remote_run.wait_for_completion()

{'runId': 'AutoML_1534bb12-57ff-4208-827d-20effa815c1d',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2020-04-22T11:51:50.796374Z',
 'endTimeUtc': '2020-04-22T12:09:33.311138Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'normalized_mean_absolute_error',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '3',
  'target': 'cpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"a3f40f41-b03b-4a18-8049-06b2671b7f55\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"dataset/dominicks_OJ_train.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"RG_wip\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"be8e48ab-94b2-4145-a6de-2104dc657912\\\

### Retrieve the Best Model
Each run within an Experiment stores serialized (i.e. pickled) pipelines from the AutoML iterations. We can now retrieve the pipeline with the best performance on the validation dataset:

In [16]:
best_run, fitted_model = remote_run.get_output()
model_name = best_run.properties['model_name']

In [17]:
print(model_name)
print(best_run.properties)
print(fitted_model)

from helper import get_result_df
summary_df = get_result_df(remote_run)
print(summary_df)

AutoML1534bb1255
{'runTemplate': 'automl_child', 'pipeline_id': '__AutoML_Ensemble__', 'pipeline_spec': '{"pipeline_id":"__AutoML_Ensemble__","objects":[{"module":"azureml.train.automl.ensemble","class_name":"Ensemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'regression\',\'primary_metric\':\'normalized_mean_absolute_error\',\'debug_log\':\'azureml_automl.log\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':True,\'name\':\'test-automl-ojforecasting\',\'compute_target\':\'cpu-cluster\',\'subscription_id\':\'be8e48ab-94b2-4145-a6de-2104dc657912\',\'region\':\'eastus2\',\'time_column_name\':\'WeekStarting\',\'grain_column_names\':[\'Store\',\'Brand\'],\'max_horizon\':20,\'drop_column_names\':[\'logQuantity\'],\'spark_service\':None}","ensemble_run_id":"AutoML_1534bb12-57ff-4208-827d-20effa815c1d_5","experiment_name":"test-automl-ojforecasting","workspace_name":"WS_demo","subscription_id":"be8e48ab-94b2-4145-a6de-2104dc657912

# Forecasting

Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. First, we remove the target values from the test set:

In [None]:
X_test = test
y_test = X_test.pop(target_column_name).values

In [None]:
X_test.head()

To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data.

In [None]:
# The featurized data, aligned to y, will also be returned.
# This contains the assumptions that were made in the forecast
# and helps align the forecast to the original data
y_predictions, X_trans = fitted_model.forecast(X_test)

If you are used to scikit pipelines, perhaps you expected `predict(X_test)`. However, forecasting requires a more general interface that also supplies the past target `y` values. Please use `forecast(X,y)` as `predict(X)` is reserved for internal purposes on forecasting models.

The [energy demand forecasting notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) demonstrates the use of the forecast function in more detail in the context of using lags and rolling window features. 

# Evaluate

To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). 

It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows.

In [None]:
from forecasting_helper import align_outputs

df_all = align_outputs(y_predictions, X_trans, X_test, y_test, target_column_name)

In [None]:
from azureml.automl.core._vendor.automl.client.core.common import metrics
from matplotlib import pyplot as plt
from automl.client.core.common import constants

# use automl metrics module
scores = metrics.compute_metrics_regression(
    df_all['predicted'],
    df_all[target_column_name],
    list(constants.Metric.SCALAR_REGRESSION_SET),
    None, None, None)

print("[Test data scores]\n")
for key, value in scores.items():    
    print('{}:   {:.3f}'.format(key, value))
    
# Plot outputs
%matplotlib inline
test_pred = plt.scatter(df_all[target_column_name], df_all['predicted'], color='b')
test_test = plt.scatter(df_all[target_column_name], df_all[target_column_name], color='g')
plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)
plt.show()

# Operationalize

_Operationalization_ means getting the model into the cloud so that other can run it after you close the notebook. We will create a docker running on Azure Container Instances with the model.

In [18]:
description = 'AutoML OJ forecaster'
tags = None
model = remote_run.register_model(model_name = model_name, description = description, tags = tags)

In [19]:
print(remote_run.model_id)
print([model.name, ' ', model.run])

AutoML1534bb1255
['AutoML1534bb1255', ' ', Run(Experiment: test-automl-ojforecasting,
Id: AutoML_1534bb12-57ff-4208-827d-20effa815c1d_5,
Type: azureml.scriptrun,
Status: Completed)]


In [20]:
dir(model)

['Framework',
 '_SUPPORTED_FRAMEWORKS_FOR_NO_CODE_DEPLOY',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_auth',
 '_collect_model_artifact_paths',
 '_deploy_no_code',
 '_deploy_with_environment',
 '_deploy_with_environment_image_request',
 '_download_model_files',
 '_expected_payload_keys',
 '_get',
 '_get_asset',
 '_get_dataset',
 '_get_dataset_id',
 '_get_last_path_segment',
 '_get_latest_version',
 '_get_model_path_local',
 '_get_model_path_local_from_root',
 '_get_model_path_remote',
 '_get_sas_to_relative_download_path_map',
 '_get_strip_prefix',
 '_handle_packed_model_file',
 '_initialize',
 '_mms_endpoint',
 '_paths_in_scope',
 '_register_with_asset',
 '_reso

### Develop the scoring script

For the deployment we need a function which will run the forecast on serialized data. It can be obtained from the best_run.

In [None]:
script_file_name = 'score_fcast.py'
best_run.download_file('outputs/scoring_file_v_1_0_0.py', script_file_name)

In [None]:
dir(ws)

### Deploy the model as a Web Service on Azure Container Instance

In [None]:
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.model import Model

inference_config = InferenceConfig(environment = best_run.get_environment(), 
                                   entry_script = script_file_name)

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 2, 
                                               tags = {'type': "automl-forecasting"},
                                               description = "Automl forecasting sample service")

aci_service_name = 'automl-oj-forecast-01'
print(aci_service_name)
aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)
print(aci_service.state)

In [None]:
aci_service.get_logs()

### Call the service

In [None]:
import json
X_query = X_test.copy()
# We have to convert datetime to string, because Timestamps cannot be serialized to JSON.
X_query[time_column_name] = X_query[time_column_name].astype(str)
# The Service object accept the complex dictionary, which is internally converted to JSON string.
# The section 'data' contains the data frame in the form of dictionary.
test_sample = json.dumps({'data': X_query.to_dict(orient='records')})
print(test_sample)

In [None]:
response = aci_service.run(input_data = test_sample)
# translate from networkese to datascientese
try: 
    res_dict = json.loads(response)
    y_fcst_all = pd.DataFrame(res_dict['index'])
    y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')
    y_fcst_all['forecast'] = res_dict['forecast']    
except:
    print(res_dict)

In [None]:
y_fcst_all.head()

### Delete the web service if desired

In [None]:
serv = Webservice(ws, 'automl-oj-forecast-01')
serv.delete()     # don't do it accidentally