Hyperparameter tuning by using **Azure Machine Learning service** ([AML or AzureML](https://azure.microsoft.com/en-us/services/machine-learning-service/)).  

Specifically, we utilize TensorFlow's higher level Estimator API to build [wide-and-deep model](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html) for a movie recommendation scenario. While doing that, we try to search optimal hyperparameters via [AML hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters).

### Prerequisite

* azureml -- You can skip this if you already know what values of hyperparameters you want to use


For details about how to install and setup AML, see following materials:
- [AML quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
- [Train a TensorFlow model](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-tensorflow)
- [Hyperparameter tuning](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters)

In [1]:
import sys
sys.path.append("../../")

import os
import shutil

import tensorflow as tf

import azureml.core
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.dnn import TensorFlow
from azureml.train.hyperdrive import *
from azureml.widgets import RunDetails

from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_random_split

print("Azure ML SDK Version:", azureml.core.VERSION)
print("Tensorflow Version:", tf.__version__)

Azure ML SDK Version: 1.0.2
Tensorflow Version: 1.12.0


In [2]:
# top k items to recommend
TOP_K = 10

# Select Movielens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

### Model Hyperparameter Tuning via AML

This section assumes you already created a **Azure ML workspace** and have a `./aml_config/config.json` file to load the workspace from this notebook. If not, please follow instructions in the [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) to create a workspace and make a `./aml_config/config.json` file containing:
```
{
    "subscription_id": "your-subscription-id",
    "resource_group": "your-resource-group",
    "workspace_name": "your-workspace-name"
}
```
  
From the following cells, we will
1. Create a remote compute target (gpu-cluster) if it does not exist already,
2. Mount data store and upload the training set, and
3. Run a hyperparameter tuning experiment.

First, let's connect to the workspace.

In [3]:
# Connect to a workspace
ws = Workspace.from_config()
print("Workspace name: ", ws.name)

Found the config file in: C:\Users\jumin\git\Recommenders\notebooks\02_model\aml_config\config.json
Workspace name:  junmin-aml-workspace


Create a remote compute target

In [4]:
CLUSTER_NAME = 'gpu-cluster'

try:
    compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)
    print("Found existing compute target")
except ComputeTargetException:
    print("Creating a new compute target...")
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='STANDARD_NC6',
        vm_priority='lowpriority',
        min_nodes=1,
        max_nodes=4
    )
    # create the cluster
    compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# Use the 'status' property to get a detailed status for the current cluster. 
print(compute_target.status.serialize())

# Check list of aml-computes
compute_targets = ws.compute_targets
for name, ct in compute_targets.items():
    print(name, ct.type, ct.provisioning_state)

Found existing compute target
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2018-12-31T20:55:27.147000+00:00', 'creationTime': '2018-12-29T20:07:54.814212+00:00', 'currentNodeCount': 1, 'errors': None, 'modifiedTime': '2018-12-29T20:08:23.921123+00:00', 'nodeStateCounts': {'idleNodeCount': 1, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 1, 'vmPriority': 'LowPriority', 'vmSize': 'STANDARD_NC6'}
gpu-cluster AmlCompute Succeeded


Prepare dataset
1. Download data and train/test split
2. Upload to storage

Next, upload the training set to the data store. This example uses the workspace's default **blob storage**.
  
We also prepare a training script [wide_deep_training.py](../../reco_utils/aml/wide_deep_training.py) for the hyperparameter tuning, which will log our target metrics (e.g. [RMSE](https://en.wikipedia.org/wiki/Root-mean-square_deviation)) to AML experiment so that we can track the metrics and optimize it via **hyperdrive**.

```
TODO - maybe attach a code snippet here for description
1. logging part

2. wide and deep model

```

In [5]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=['UserId','MovieId','Rating','Timestamp'],
    # TODO For now, not using genres YET
    load_genres=False
)
data.head()

train_df, test_df = python_random_split(data)

Unnamed: 0,UserId,MovieId,Rating,Timestamp
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596


In [6]:
DATA_DIR = "./data"
os.makedirs(DATA_DIR, exist_ok=True)

TRAIN_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_train.pkl"

train_df.to_pickle(os.path.join(DATA_DIR, TRAIN_FILE_NAME))

# Note, all the files under DATA_DIR will be uploaded
ds = ws.get_default_datastore()
ds.upload(
    src_dir=DATA_DIR,
    target_path='data',
    overwrite=True,
    show_progress=True
)

$AZUREML_DATAREFERENCE_595956bc8be442fda5a5d6d5b52b1bb8

Prepare training script. All the script in the folder will be uploaded

In [13]:
SCRIPT_DIR = './aml_scripts'
ENTRY_SCRIPT_NAME = 'wide_deep_training.py'

os.makedirs(SCRIPT_DIR, exist_ok=True)
shutil.copy('../../reco_utils/aml/wide_deep_training.py', SCRIPT_DIR)
shutil.copy('../../reco_utils/common/tf_utils.py', SCRIPT_DIR)

'./aml_scripts\\tf_utils.py'

Now we define a search space for the hyperparameters. All the parameter values will be passed to the training script where they are parsed by `argparse`, e.g.:
```
TODO code snippet for argparse
```
    
AML hyperdrive provides some very useful searching strategies including `RandomParameterSampling`, `GridParameterSampling`, and `BayesianParameterSampling`. Details about each approach are beyond the scope of this notebook and you can find them from [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters). Here, we use the random sampling for simplicity. 

> Note: Currently, this repo accepts either 'rmse' or 'mae' for `METRICS` as implemented in [tf_utils.py](../../reco_utils/common/tf_utils.py), but you can define any custom metrics and utilize it along with AML hyperdrive.

In [14]:
EXP_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_wide_deep"
METRICS = 'rmse'

script_params = {
    '--datastore': ds.as_mount(),
    '--train-set-path': "data/" + TRAIN_FILE_NAME,
    '--user-col': 'UserId',
    '--item-col': 'MovieId',
    '--rating-col': 'Rating',
    '--timestamp-col': 'Timestamp',
#     '--item-feat-col': 'Genres',
#     '--item_feat-num': 11? check this
    # We fixed the batch size and epochs instead of search them
    '--batch-size': 64,
    '--epochs': 50,
    '--eval-metrics': METRICS,
}

hyper_params = {
    '--model-type': choice('wide', 'deep', 'wide-deep'),
    # Wide model hyperparameters
    '--linear-optimizer': choice('Ftrl', 'SGD'),
    '--linear-optimizer-lr': loguniform(-7, -2),
    # Deep model hyperparameters
    '--dnn-optimizer': choice('Adagrad', 'Adam'),
    '--dnn-optimizer-lr': loguniform(-7, -2),
    '--hidden-units': choice(
        [256, 256, 256, 128],
        [256, 128],
        [256, 64, 64, 256],
        [1024, 128, 32]
    ),
    '--user-embedding-dim': choice(4, 8, 16),
    '--item-embedding-dim': choice(8, 32, 128),
    '--dnn-dropout': uniform(0.0, 0.5),
    '--dnn-batch-norm': choice(True, False),
}

ps = RandomParameterSampling(hyper_params)

https://mlworkspace.azure.ai/portal/subscriptions/03909a66-bef8-4d52-8e9a-a346604e0902/resourceGroups/junmin-aml/providers/Microsoft.MachineLearningServices/workspaces/junmin-aml-workspace/experiments/movielens_100k_wide/runs/movielens_100k_wide_1546313676294


We use `azureml.train.dnn.TensorFlow`, a custom AML `Estimator` class which utilizes a preset docker image in the cluster (see more information from [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-tensorflow)).

Once you submit the experiment, you can see the progress from the notebook by using `azureml.widgets.RunDetails`. You can directly check the details from the Azure portal as well. To get the link, run `run.get_portal_url()`.

> Since we will do hyperparameter tuning, we create a `HyperDriveRunConfig` and pass it to the experiment object. If you already know what hyperparameters to use and still want to utilize AML for other purposes (e.g. model management), you can set the hyperparameter values directly to `script_params` and run the experiment, `run = exp.submit(est)`, instead.  

In [None]:
est = TensorFlow(
    source_directory=SCRIPT_DIR,
    entry_script=TRAINING_SCRIPT_NAME,
    script_params=script_params,
    compute_target=compute_target,
    use_gpu=True,
    pip_packages=['pandas']
)

# early termnination policy
policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

hd_config = HyperDriveRunConfig(
    estimator=est, 
    hyperparameter_sampling=ps,
    policy=policy,  
    primary_metric_name=METRICS,
    primary_metric_goal=PrimaryMetricGoal.MINIMIZE, 
    max_total_runs=20,
    max_concurrent_runs=4
)

# Create an experiment to track the runs in the workspace
exp = Experiment(workspace=ws, name=EXP_NAME)
run = exp.submit(config=hd_config)

print(run.get_portal_url())

In [15]:
RunDetails(run).show()
run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'NOTSE…

RunId: movielens_100k_wide_1546313676294





Execution Summary
RunId: movielens_100k_wide_1546313676294



{'runId': 'movielens_100k_wide_1546313676294',
 'target': 'gpu-cluster',
 'status': 'Completed',
 'endTimeUtc': '2019-01-01T03:55:10.000Z',
 'properties': {'primary_metric_config': '{"name": "rmse", "goal": "minimize"}',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive'},
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://junminamlworks3904856782.blob.core.windows.net/azureml/ExperimentRun/movielens_100k_wide_1546313676294/azureml-logs/hyperdrive.txt?sv=2018-03-28&sr=b&sig=0c2teKLgCV6v94pWHwGvoYSYOCt9gQiJNoFDaw0s54E%3D&st=2019-01-01T04%3A25%3A42Z&se=2019-01-01T12%3A35%3A42Z&sp=r'}}

In [None]:
# To stop, run.cancel()

### Test

To load a registered model in the future,
```
from azureml.core.model import Model

model = Model(ws, 'model_name')
```

In [16]:
MODEL_DIR = './model'

best_run = run.get_best_run_by_primary_metric()
# Check model files uploaded during the run
print(best_run.get_file_names())

# Register the model in the workspace so that can later query, examine, and deploy this model.
# TODO check model path...
model = best_run.register_model(model_name=MODEL_NAME, model_path='./outputs/model')
print(model.name, model.id, model.version)

# Download the model to local. (alternatively, run.download_file(name=f, output_file_path=output_file_path))
os.makedirs(MODEL_DIR, exist_ok=True)
model.download(target_dir=MODEL_DIR)






"""       
        


tf.reset_default_graph()

saver = tf.train.import_meta_graph("./model/mnist-tf.model.meta")
graph = tf.get_default_graph()

for op in graph.get_operations():
    if op.name.startswith('network'):
        print(op.name)

# input tensor. this is an array of 784 elements, each representing the intensity of a pixel in the digit image.
X = tf.get_default_graph().get_tensor_by_name("network/X:0")
# output tensor. this is an array of 10 elements, each representing the probability of predicted value of the digit.
output = tf.get_default_graph().get_tensor_by_name("network/output/MatMul:0")

with tf.Session() as sess:
    saver.restore(sess, './model/mnist-tf.model')
    k = output.eval(feed_dict={X : X_test})
# get the prediction, which is the index of the element that has the largest probability value.
y_hat = np.argmax(k, axis=1)

# print the first 30 labels and predictions
print('labels:  \t', y_test[:30])
print('predictions:\t', y_hat[:30])





# TODO...
model_root = Model.get_model_path('tf-dnn-mnist')
    saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))
    X = tf.get_default_graph().get_tensor_by_name("network/X:0")
    output = tf.get_default_graph().get_tensor_by_name("network/output/MatMul:0")
    
    sess = tf.Session()
    saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    out = output.eval(session=sess, feed_dict={X: data})
    y_hat = np.argmax(out, axis=1)
    return y_hat.tolist()

"""


Deprecated, use RunHistoryFacade.assets instead.


['azureml-logs/60_control_log.txt', 'azureml-logs/80_driver_log.txt', 'outputs/model/1546314741/saved_model.pb', 'outputs/model/1546314741/variables/variables.data-00000-of-00002', 'outputs/model/1546314741/variables/variables.data-00001-of-00002', 'outputs/model/1546314741/variables/variables.index', 'driver_log', 'azureml-logs/azureml.log', 'azureml-logs/55_batchai_execution.txt']
movielens_100k_wide movielens_100k_wide:1 1


'       \n        \n\n\ntf.reset_default_graph()\n\nsaver = tf.train.import_meta_graph("./model/mnist-tf.model.meta")\ngraph = tf.get_default_graph()\n\nfor op in graph.get_operations():\n    if op.name.startswith(\'network\'):\n        print(op.name)\n\n# input tensor. this is an array of 784 elements, each representing the intensity of a pixel in the digit image.\nX = tf.get_default_graph().get_tensor_by_name("network/X:0")\n# output tensor. this is an array of 10 elements, each representing the probability of predicted value of the digit.\noutput = tf.get_default_graph().get_tensor_by_name("network/output/MatMul:0")\n\nwith tf.Session() as sess:\n    saver.restore(sess, \'./model/mnist-tf.model\')\n    k = output.eval(feed_dict={X : X_test})\n# get the prediction, which is the index of the element that has the largest probability value.\ny_hat = np.argmax(k, axis=1)\n\n# print the first 30 labels and predictions\nprint(\'labels:  \t\', y_test[:30])\nprint(\'predictions:\t\', y_hat[:

In [33]:
from reco_utils.evaluation.python_evaluation import (
    rmse, mae, rsquared, exp_var,
    map_at_k, ndcg_at_k, precision_at_k, recall_at_k
)

# Prepare test data
X_test = test.copy()
y_test = X_test.pop('Rating')

# test_input_fn = tf.estimator.inputs.pandas_input_fn(
#     x=X_test,
#     num_epochs=1,
#     shuffle=False
# )


In [43]:
model = tf.contrib.predictor.from_saved_model(MODEL_DIR+"/model/1546314741")
# model = tf.contrib.estimator.SavedModelEstimator(MODEL_DIR+"/model/1546314741")

# Convert input data into serialized Example strings.
examples = []
for index, row in X_test.iterrows():
    feature = {}
    for col, value in row.iteritems():
        feature[col] = tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
    example = tf.train.Example(
        features=tf.train.Features(
            feature=feature
        )
    )
    examples.append(example.SerializeToString())

predictions = model({'inputs': examples})

INFO:tensorflow:Restoring parameters from ./model/model/1546314741\variables\variables


InvalidArgumentError: Name: <unknown>, Key: UserId, Index: 0.  Data types don't match. Expected type: string, Actual type: float
	 [[node ParseExample/ParseExample (defined at C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\contrib\predictor\saved_model_predictor.py:153)  = ParseExample[Ndense=0, Nsparse=2, Tdense=[], dense_shapes=[], sparse_types=[DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_input_example_tensor_0_0, ParseExample/ParseExample/names, ParseExample/ParseExample/sparse_keys_0, ParseExample/ParseExample/sparse_keys_1)]]

Caused by op 'ParseExample/ParseExample', defined at:
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\ipykernel\kernelapp.py", line 505, in start
    self.io_loop.start()
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tornado\platform\asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\asyncio\base_events.py", line 427, in run_forever
    self._run_once()
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\asyncio\base_events.py", line 1440, in _run_once
    handle._run()
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\asyncio\events.py", line 145, in _run
    self._callback(*self._args)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tornado\ioloop.py", line 758, in _run_callback
    ret = callback()
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tornado\stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tornado\gen.py", line 1233, in inner
    self.run()
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tornado\gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\ipykernel\kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tornado\gen.py", line 326, in wrapper
    yielded = next(result)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\ipykernel\kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tornado\gen.py", line 326, in wrapper
    yielded = next(result)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\ipykernel\kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tornado\gen.py", line 326, in wrapper
    yielded = next(result)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\ipykernel\ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\ipykernel\zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\IPython\core\interactiveshell.py", line 2819, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\IPython\core\interactiveshell.py", line 2845, in _run_cell
    return runner(coro)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\IPython\core\async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\IPython\core\interactiveshell.py", line 3020, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\IPython\core\interactiveshell.py", line 3185, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-43-9b8a7e6335f6>", line 1, in <module>
    model = tf.contrib.predictor.from_saved_model(MODEL_DIR+"/model/1546314741")
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\contrib\predictor\predictor_factories.py", line 153, in from_saved_model
    config=config)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\contrib\predictor\saved_model_predictor.py", line 153, in __init__
    loader.load(self._session, tags.split(','), export_dir)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\saved_model\loader_impl.py", line 197, in load
    return loader.load(sess, tags, import_scope, **saver_kwargs)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\saved_model\loader_impl.py", line 350, in load
    **saver_kwargs)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\saved_model\loader_impl.py", line 278, in load_graph
    meta_graph_def, import_scope=import_scope, **saver_kwargs)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\training\saver.py", line 1696, in _import_meta_graph_with_return_elements
    **kwargs))
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\framework\meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\framework\importer.py", line 442, in import_graph_def
    _ProcessNewOps(graph)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\framework\importer.py", line 234, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\framework\ops.py", line 3440, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\framework\ops.py", line 3440, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\framework\ops.py", line 3299, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Name: <unknown>, Key: UserId, Index: 0.  Data types don't match. Expected type: string, Actual type: float
	 [[node ParseExample/ParseExample (defined at C:\Users\jumin\AppData\Local\Continuum\anaconda3\envs\aml\lib\site-packages\tensorflow\contrib\predictor\saved_model_predictor.py:153)  = ParseExample[Ndense=0, Nsparse=2, Tdense=[], dense_shapes=[], sparse_types=[DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_input_example_tensor_0_0, ParseExample/ParseExample/names, ParseExample/ParseExample/sparse_keys_0, ParseExample/ParseExample/sparse_keys_1)]]


In [32]:
import pandas as pd

# def predict_input_fn():
#     example = tf.train.Example()
#     example.features.feature['feature1'].bytes_list.value.extend(['yellow'])
#     example.features.feature['feature2'].float_list.value.extend([1.])
#     return {'inputs':tf.constant([example.SerializeToString()])}

# If all modes were exported, you can immediately evaluate and predict, or
# continue training. Otherwise only predict is available.
# See https://www.tensorflow.org/api_docs/python/tf/contrib/estimator/export_all_saved_models

# eval_results = model.evaluate(input_fn=input_fn, steps=1)
# print(eval_results)
# model.train(input_fn=input_fn, steps=20)



predictions = predict_fn(
    {"x": [[6.4, 3.2, 4.5, 1.5],
           [5.8, 3.1, 5.0, 1.7]]})
print(predictions['scores'])



pred_list = [p['predictions'][0] for p in list(model.predict(predict_input_fn))]
predictions = test.copy()
predictions['prediction']  = pd.Series(pred_list).values
print(predictions.head())

cols = {
    'col_user': 'UserId',
    'col_item': 'MovieId',
    'col_rating': 'Rating',
    'col_prediction': 'prediction'
}

predictions.drop('Rating', axis=1, inplace=True)

eval_rmse = rmse(test, predictions, **cols)
eval_mae = mae(test, predictions, **cols)
eval_rsquared = rsquared(test, predictions, **cols)
eval_exp_var = exp_var(test, predictions, **cols)

print("RMSE:\t\t%f" % eval_rmse,
      "MAE:\t\t%f" % eval_mae,
      "rsquared:\t%f" % eval_rsquared,
      "exp var:\t%f" % eval_exp_var, sep='\n')

# Load the downloaded model and test
# with tf.Session() as sess:
#     tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], MODEL_DIR)

# #     test_input_fn = tf.estimator.inputs.pandas_input_fn(
# #         x=X_test,
# #         y=y_test,
# #         batch_size=BATCH_SIZE,
# #         num_epochs=1,
# #         shuffle=False
# #     )
    
#     input_x_holder =sess.graph.get_operation_by_name("input_example_tensor").outputs[0]
# #check your dnn classifier txt pb to know which operation you should use.
# predictions_holder = sess.graph.get_operation_by_name("dnn/binary_logistic_head/predictions/probabilities").outputs[0]
    
#     predictor = tf.contrib.predictor.from_saved_model(MODEL_DIR)
#         model_input = tf.train.Example(features=tf.train.Features( feature={"words": tf.train.Feature(int64_list=tf.train.Int64List(value=features_test_set)) })) 
#         model_input = model_input.SerializeToString()
#         output_dict = predictor({"predictor_inputs":[model_input]})
#         y_predicted = output_dict["pred_output_classes"][0]
#         output_dict['scores']

#         input_tensor=tf.get_default_graph().get_tensor_by_name("input_tensors:0")
#         model_input=input_tensor.SerializeToString()        
#         output_dict= predictor({"inputs":[model_input]})
        

        

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\jumin\\AppData\\Local\\Temp\\tmptw9dk7nt', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000228576F9EB8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Checking available modes for SavedModelEstimator.
INFO:tensorflow:Available modes for 

TypeError: 'yellow' has type str, but expected one of: bytes

In [None]:
# Clean-up resources
ws.delete(delete_dependent_resources=True)

# optionally, delete the Azure Managed Compute cluster
compute_target.delete()

# Clean-up temporal local-copy of script, model and data files
shutil.rmtree(SCRIPT_DIR)
shutil.rmtree(DATA_DIR)
shutil.rmtree(MODEL_DIR)

### References

* [Fine-tune natural language processing models using Azure Machine Learning service](https://azure.microsoft.com/en-us/blog/fine-tune-natural-language-processing-models-using-azure-machine-learning-service/)
* [Training, hyperparameter tune, and deploy with TensorFlow](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb)
