# Exercise 3 - Getting Started with Experiments

So far, you've spent a lot of time setting up your Azure ML workspace; and you may be beginning to wonder what benefits this will bring to your day-to-day data science activities. Well, there are lots of benefits, which we'll explore in detail in later exercises; but for now, let's see how Azure ML can help track metrics from a simple experiment that uses Python code to examine some data.

> **Important**: This exercise assumes you have completed the previous exercises in this lab - specifically, you must have:
>
> - Created an Azure ML Workspace.
>
> If you haven't done that, go back and do it now - we'll wait!

## Task 1: Connect to Your Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK. Let's start by ensuring you still have the latest version installed.

In [None]:
!pip install --upgrade azureml-sdk[notebooks]

import azureml.core
print("Ready to use Azure ML", azureml.core.VERSION)

Now you're ready to connect to your workspace.

> **Note**: If the authenticated session with your Azure subscription has expired since you completed the previous exercise, you'll be prompted to reauthenticate.

In [None]:
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to work with', ws.name)

## Task 2: Run an Experiment

One of the most fundamentals tasks that data scientists need to perform is to create and run experiments that process and analyze data. In this exercise, you'll learn how to use an Azure ML *experiment* to run Python code and record values extracted from data. In this case, you'll use a simple dataset that contains details of patients that have been tested for diabetes. You'll run an experiment to explore the data, extracting statistics, visualizations, and data samples. Most of the code you'll use is fairly generic Python, such as you might run in any data exploration process. However, with the addition of a few lines, the code uses an Azure ML *experiment* to log details of the run.

In [None]:
from azureml.core import Experiment, Run
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 

# Create an Azure ML experiment in your workspace
experiment_name = 'diabetes-experiment'
experiment = Experiment(workspace = ws, name = experiment_name)
print("Starting experiment:", experiment.name)

# Start logging data from the experiment
run = experiment.start_logging()

# load the diabetes dataset from a local file
data = pd.read_csv('data/diabetes.csv')

# Count the rows and log the result
row_count = (len(data))
run.log("observations", row_count)

# Create box plots for each feature variable by the "diabetic" label and log them
num_cols = data.columns[:-1]
for col in num_cols:
    fig = plt.figure(figsize=(9, 6))
    ax = fig.gca()
    data.boxplot(column = col, by = "Diabetic", ax = ax)
    ax.set_title(col + ' by Diabetic')
    ax.set_ylabel(col)
    run.log_image(name = col, plot = fig)
plt.show()

# Create a list of mean diabetes pedigree per age and log it
mean_by_age = data[["Age", "DiabetesPedigree"]].groupby(["Age"]).mean().reset_index()
ages = mean_by_age["Age"].tolist()
pedigrees = mean_by_age["DiabetesPedigree"].tolist()
for index in range(len(ages)):
       run.log_row("Mean Diabetes Pedigree by Age", Age = ages[index],Diabetes_Pedigree = pedigrees[index])

# Save a sample of the data and upload it to the experiment output
data.sample(100).to_csv("sample.csv", index=False, header=True)
run.upload_file(name = 'outputs/sample.csv', path_or_stream = './sample.csv')

# Complete tracking and get link to details
run.complete()

## Task 2: View Experiment Results

After the experiment has finished running, you can use the **run** object to get information about the run and its outputs:

In [None]:
import json

# Get run details
details = run.get_details()
print(details)

# Get logged metrics
metrics = run.get_metrics()
print(json.dumps(metrics, indent=2))

# Get output files
files = run.get_file_names()
print(json.dumps(files, indent=2))

Experiments are recorded in your workspace, with a record for each run of the experiment. To see all of the experiments and their runs, you can use the following code.

In [None]:
for experiment_id in ws.experiments:
    experiment = ws.experiments[experiment_id]
    print("Experiment:", experiment_id)
    for experiment_run in experiment.get_runs():
        print("\tRun ID:", experiment_run.id)
        print("\tName:", experiment_run.name)
        print("\tType:", experiment_run.type)
        print("\tStatus:",experiment_run.status, "\n")

You can also view details of experiment runs in the [Azure ML Studio web interface](https://ml.azure.com). Go there now and explore the details for this run on the **Experiments** tab. When you view the details for the run, you can see metrics that were logged, the images that were created by plots, and output values that were generated.

## Task 3: Run an Experiment Script

In the previous example, you ran an experiment inline in this notebook. A more flexible solution is to create a separate script for the experiment, and store it in a folder along with any other files it needs, and then use Azure ML to run the experiment based on the script in the folder.

Let's create a folder for the experiment:

In [None]:
import os, shutil

# Create an experiment
experiment_name = 'diabetes-experiment'

# Create a folder for the experiment files
experiment_folder = './' + experiment_name
os.makedirs(experiment_folder, exist_ok=True)

# Copy the data file into the experiment folder
shutil.copy('data/diabetes.csv', os.path.join(experiment_name, "diabetes.csv"))

Now we'll create a Python script containing the experiment code. Note that this code is the same as the inline code used before, except we use the `Run.get_context()` method to retrieve the experiment run context when the script is run, and we load the diabetes data from the folder where the script is located:

In [None]:
%%writefile $experiment_folder/diabetes_experiment.py
from azureml.core import Run
import pandas as pd
import matplotlib.pyplot as plt

# Get the experiment run context
run = Run.get_context()

# load the diabetes dataset
data = pd.read_csv('./diabetes.csv')

# Count the rows and log the result
row_count = (len(data))
run.log("observations", row_count)

# Create box plots for each feature variable by the "diabetic" label and log them
num_cols = data.columns[:-1]
for col in num_cols:
    fig = plt.figure(figsize=(9, 6))
    ax = fig.gca()
    data.boxplot(column = col, by = "Diabetic", ax = ax)
    ax.set_title(col + ' by Diabetic')
    ax.set_ylabel(col)
    run.log_image(name = col, plot = fig)
plt.show()

# Create a list of mean diabetes pedigree per age and log it
mean_by_age = data[["Age", "DiabetesPedigree"]].groupby(["Age"]).mean().reset_index()
ages = mean_by_age["Age"].tolist()
pedigrees = mean_by_age["DiabetesPedigree"].tolist()
for index in range(len(ages)):
       run.log_row("Mean Diabetes Pedigree by Age", Age = ages[index],Diabetes_Pedigree = pedigrees[index])

# Save a sample of the data and upload it to the experiment output
data.sample(100).to_csv("sample.csv", index=False, header=True)
run.upload_file(name = 'outputs/sample.csv', path_or_stream = './sample.csv')

# Complete tracking and get link to details
run.complete()

Now you're almost ready to run the experiment. There are just a few configuration issues you need to deal with:

1. Create a *Run Configuration* that defines the Python code execution environment for the script - in this case, you'll create a new Conda environment that includes the required packages.
2. Create a *Script Configuration* that identifies the Python script file to be run in the experiment, and the environment in which to run it.

The following cell sets up these configuration objects, and then submits the experiment.

In [None]:
import os
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import Experiment, RunConfiguration
from azureml.core import ScriptRunConfig
from azureml.core import Experiment

# create a new RunConfig object
run_config = RunConfiguration()

# Create a Python environment for the experiment
diabetes_env = Environment("diabetes-experiment-env")
diabetes_env.python.user_managed_dependencies = False # We'll let Azure ML manage dependencies
diabetes_env.docker.enabled = False # Don't use a docker container (default is true)

# Create a set of package dependencies (conda or pip as required)
diabetes_conda = CondaDependencies.create(conda_packages=['pandas','ipykernel','matplotlib'],
                                          pip_packages=['azureml-sdk','argparse','pyarrow'])

# Add the dependencies to the environment
diabetes_env.python.conda_dependencies = diabetes_conda

# Use this python environment in the run config
run_config.environment = diabetes_env

# Create a script config
src = ScriptRunConfig(source_directory=experiment_folder, 
                      script='diabetes_experiment.py',
                      run_config=run_config) 

# submit the experiment
experiment = Experiment(workspace = ws, name = experiment_name)
run = experiment.submit(config=src)
run.wait_for_completion(show_output=True)

As before, you can write code to retrieve the metrics and files generated by the run:

In [None]:
# Get logged metrics
metrics = run.get_metrics()
print(json.dumps(metrics, indent=2))

# Get output files
files = run.get_file_names()
print(json.dumps(files, indent=2))

At any time, if you know the name of the experiment, you can get it's list of runs Iin reverse chronological order, so the most recent run is listed first).

In [None]:
experiment = ws.experiments[experiment_name]
print("Experiment:", experiment.name)
for experiment_run in experiment.get_runs():
    print("\tRun ID:", experiment_run.id)
    print("\tName:", experiment_run.name)
    print("\tType:", experiment_run.type)
    print("\tStatus:",experiment_run.status, "\n")

Of course, you can also use the [Azure ML Studio web interface](https://ml.azure.com) to view the outputs generated by the latest run of the experiment.

> **More Information**: To find out more about running experiments, see [this topic](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-runs) in the Azure ML documentation. For details of how to log metrics in a run, see [this topic](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-track-experiments).