# Connect to your workspace

In [10]:
from azureml.core import Workspace, Experiment, Run

ws = Workspace.get(name='',
            subscription_id='', 
            resource_group='')
print(ws)

For input data, you will use a dataset built into Python's sklearn.datasets module that contains some sample diabetes data. The columns include age, sex, body mass index, average blood pressure, and six blood serum measurements on 442 diabetes patients, as well a quantitative measure of disease progression. We want to build a model that can predict the disease progression after one year based given the same input variables.

In [2]:
# Load the diabetes dataset, a well-known built-in small dataset that comes with scikit-learn.
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

X, y = load_diabetes(return_X_y = True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

print('Done')

Done


Now, you'll create an experiment in this workspace by using the following code:

In [3]:
from azureml.core.experiment import Experiment

experiment = Experiment(workspace = ws, name = "my-third-experiment")

print('Done')

Done


# Define the machine learning objective and constraints
The first step is to define the machine learning objective by using AutoMLConfig, as illustrated in the following code:

In [4]:
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
import logging

automl_config = AutoMLConfig(task = 'regression',
                  iteration_timeout_minutes = 10,
                  iterations = 3,
                  primary_metric = 'spearman_correlation',
                  n_cross_validations = 5,
                  debug_log = 'automl.log',
                  verbosity = logging.INFO,
                  X = X_train, 
                  y = y_train)

print('Done')



Done


#task: This is the type of model that is required, such as classification, regression, or forecasting. After you specify the type, AutoML will automatically pick the best algorithm of that type for you

#primary_metric: This is the metric that you want AutoML to optimize. In the regression example are several metrics: normalized_root_mean_squared_error, r2_score, normalized_mean_absolute_error, and spearman_correlation. In this example, the Spearman correlation was the primary metric, which basically measures the similarity of two datasets. For more information about how these metrics work, refer to the documentation.

#iterations_time_out_minutes: Iterations is how many model pipeline executions you want to use and the time limit for each iteration.

#n_cross_validations: This is the number of cross-validation splits.

After the objective and the constraints are defined, you can start the AutoML job as illustrated in the following code:



In [5]:
local_run = experiment.submit(automl_config, show_output = True)

Running on local machine
Parent Run ID: AutoML_58f4203d-036a-4a63-8b0e-c5013275f9ad
Current status: DatasetCrossValidationSplit. Generating CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************

 ITERATION   PIPELINE                                       DURATION      METRIC      BEST
         0   StandardScalerWrapper RandomForest             0:03:31       0.6987    0.6987
         1   MinMaxScaler RandomForest                      0:01:39       0.6689    0.6987
         2   StandardScalerWrapper ExtremeRandomTrees   

Azure Machine Learning has a widget to display the information for each run. By using the code below, you can compare the best model across different iterations:

In [6]:
from azureml.widgets import RunDetails
RunDetails(local_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

The best model can also be retrieved by running the following code:

In [7]:
best_run, fitted_model = local_run.get_output()

After confirming the best model, you can run it on the test dataset. The best model is used in the following code and visualization for the training set and the test set:

In [9]:
y_pred_train = fitted_model.predict(X_train)
y_residual_train = y_train - y_pred_train
y_pred_test = fitted_model.predict(X_test)
y_residual_test = y_test - y_pred_test
print(y_residual_test)

[  97.19671669  -15.15172845  -53.93898676  -49.81284229    5.44726459
   29.86618584   84.21555425   18.17496002    1.67195375 -110.10127532
   91.91938217   -1.46857573   20.41700789  -23.73187974   11.36573458
  -54.36394968  -49.50806459   -7.07266142    5.33702158  -62.53630844
  -19.44735156  132.22774529  -54.70782811    3.72244505 -134.92211343
  -56.49646138  -10.6338372     2.22312656   29.38874177   22.14826287
   50.54645787   92.43167524   52.53500509   59.47562794   60.56860968
  105.82221821    5.64666233  103.98551623  -58.52781448  -13.1783865
  -17.66515124   16.49264761  -47.53251754   47.53641858   52.86557638
    4.88610822   86.33228744   46.71475932  -37.65412123   38.75295161
   23.8639521    48.4751136   -51.77917254  -48.28487058  -32.60494388
  -14.06774424  -71.89026679  -31.30712515   87.09829476   58.33390109
 -164.64332027  -23.09591566  -28.64990565   -4.45302602 -107.58111499
  164.43429408  -45.80824169   95.48015989  -68.44223975  -47.24245174
   12.1