Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

## Introduction to Azure Machine Learning service: Run experiment

In this example, you'll learn how to use Azure Machine Learning for experimentation. The concepts you'll learn about are workspace, experiment and run.

**Run** is a an execution of Python code that does a machine learning task, such as training a model. Within a run you can log metrics and upload results to Azure cloud, to keep track of your experimentation. 
 
In this example, the run is a simple notebook cell, but in subsequent tutorials you can learn how to submit different kinds of runs - hyperparameter tuning, automated machine learning, distributed training - to scalable cloud compute.
 
**Experiment** is a collection of related runs. For example, if you train different models to solve the same problem, you can group the training runs under the same experiment, and later compare their results.
 
**Workspace** is an Azure resource that contains your experiments, models, deployments and cloud compute resources.

To illustrate these concepts, we use a simple example of Monte Carlo simulation to estimate pi. You first connect to your workspace, create an experiment that will contains the different simulation runs, and then launch a run and log the progress on Monte Carlo simulation.

First, let's import the Python packages and load your workspace. When you run *ws = Workspace.from_config* below, you will be prompted to log in to your Azure subscription. Once you are connected to your workspace in Azure cloud, you can start experimenting.

In [1]:
!pip install pydocumentdb

Collecting pydocumentdb
[?25l  Downloading https://files.pythonhosted.org/packages/cf/53/310ef5bd836e54f8a8c3d4da8c9a8c9b21c6bb362665e018eb27c41a1518/pydocumentdb-2.3.3.tar.gz (104kB)
[K     |████████████████████████████████| 112kB 3.7MB/s eta 0:00:01
Building wheels for collected packages: pydocumentdb
  Building wheel for pydocumentdb (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/nbuser/.cache/pip/wheels/12/75/87/f3728f9217c548355acf03cf1ad82bb3e439cefd5f23bdea8b
Successfully built pydocumentdb
Installing collected packages: pydocumentdb
Successfully installed pydocumentdb-2.3.3


In [2]:
from azureml.core import Workspace, Experiment, Run
import math, random, pickle

from pydocumentdb import document_client

Run the next cell and follow the prompt to use device login to connect to Azure. Ignore any warnings about failing to load or parse files.

In [3]:
ws = Workspace.from_config()

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FZNE9J6UF to authenticate.
Interactive authentication successfully completed.


Let's create an experiment. The experiment is bound to a workspace, and contains the methods to launch runs.

In [4]:
experiment = Experiment(workspace = ws, name = "my-first-experiment")

Cell to load data from cosmos DB:

In [5]:
uri = 'https://vivid.documents.azure.com:443/'
key = 'b8lUbi0lwLyd8PR9onWM10sZhDJ4AkcIsb99FNkDu7zhcWkrpTSXYfITBvf6w2i1uSPDoGnshd3tF3MfN7Zv0Q=='

client = document_client.DocumentClient(uri, {'masterKey': key})

db_id = 'taskDatabase'
db_query = "select * from r where r.id = '{0}'".format(db_id)
db = list(client.QueryDatabases(db_query))[0]
db_link = db['_self']

coll_id = 'MyCollection'
coll_query = "select * from r where r.id = '{0}'".format(coll_id)
coll = list(client.QueryCollections(db_link, coll_query))[0]
coll_link = coll['_self']

docs = client.ReadDocuments(coll_link)

In [34]:
'''
df = pd.DataFrame()
for doc in docs:
    
    pd.concat([df, row])
'''
import json
df = pd.DataFrame(pd.DataFrame.from_dict((doc) for doc in docs))

In [35]:
df = df[['daily_prod', 'ID', 'label', 'lact_d', 'lact_n', 'lbd_d']].drop_duplicates()

Next, let's start a run to estimate Pi value. While the experiment is running, we log metrics about the accuracy of the estimate into run history.

Let's also save the value of pi as a file, and upload that file into run history. In the next notebook we'll use the value in the file to create a web service that computes an area of a circle using our estimate.

In [36]:
df.columns

Index(['daily_prod', 'ID', 'label', 'lact_d', 'lact_n', 'lbd_d'], dtype='object')

### Training the ML Model

In [39]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

run = experiment.start_logging()

'''
data needs to be loaded from true data not dummy
'''
y = df[['label']]
x = df[['daily_prod', 'lact_d', 'lact_n', 'lbd_d']]


xTrain, xTest, yTrain, yTest = train_test_split(x,y,test_size=0.2)

clf = RandomForestClassifier(n_estimators=100, max_depth=100, random_state=0)
clf.fit(xTrain,yTrain)



RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=100, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None,
            oob_score=False, random_state=0, verbose=0, warm_start=False)

### Testing the ML Model

In [40]:
clf.score(xTest, yTest)

0.9076949326053575

In [41]:
# Write file containing pi value into run history
with open("random_forest_model.pkl","wb") as f:
    pickle.dump(clf,f)
run.upload_file(name = 'outputs/random_forest_model.pkl', path_or_stream = './random_forest_model.pkl')

# Complete tracking and get link to details
run.complete()
print("Training Complete")

Training Complete


In [42]:
from azureml.core.model import Model

model = Model.register(model_path = "random_forest_model.pkl",
                       model_name = "cowPredictor",
                       description = "predicts cow pregnancy",
                       workspace = ws)

Registering model cowPredictor


Once the run has completed, you can view a detailed report of the run from Azure Portal by simply calling "run" and following the link. You can view the convergence of estimate, as well as the uploaded file.

In [43]:
run

Experiment,Id,Type,Status,Details Page,Docs Page
my-first-experiment,8269f522-50c7-48f0-aad7-b62fca25745f,,NotStarted,Link to Azure Portal,Link to Documentation


You can also view all runs within an experiment. If you run the simulation above multiple times, these runs will appear under the experiment view and you can compare them.

In [44]:
experiment

Name,Workspace,Report Page,Docs Page
my-first-experiment,mlWorkspace,Link to Azure Portal,Link to Documentation


Next, learn how to deploy a web service that computes the area of circle using your estimate using following Notebook:

[Deploy web service](02.deploy-web-service.ipynb)

For an example using scikit-learn using Azure compute to train image classification model, see:

[tutorials/img-classification-part1-training](./tutorials/img-classification-part1-training.ipynb)