# Introduction Azure ML service
<br>

<img src='https://github.com/retkowsky/images/blob/master/AzureMLservicebanniere.png?raw=true'>


> Documentation : https://docs.microsoft.com/en-us/azure/machine-learning/

## 0. Paramétrage

In [1]:
import sys
sys.version

'3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) \n[GCC 7.3.0]'

In [2]:
import datetime
now = datetime.datetime.now()
print(now)

2019-12-04 14:59:55.668204


In [3]:
import azureml.core
from azureml.core import Experiment, Workspace

# Check core SDK version number
print("Version Azure ML service : ", azureml.core.VERSION)

Version Azure ML service :  1.0.74


In [4]:
# Rappel des infos du workspace Azure ML service
ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Resource group: ' + ws.resource_group, sep='\n')

Workspace name: workshopml
Azure region: northeurope
Resource group: workshopmlRG


## 1. Chargement des données

In [9]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib

In [10]:
X, y = load_diabetes(return_X_y = True)

columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

In [11]:
# Partitionnement des données
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

data = {
    "train":{"X": X_train, "y": y_train},        
    "test":{"X": X_test, "y": y_test}
}

In [12]:
print ("Training =", len(data['train']['X']), 'observations')

Training = 353 observations


In [13]:
print ("Test =", len(data['test']['X']), 'observations')

Test = 89 observations


## 2. Modélisation

Nous allons réaliser un modèle de régression **Ridge**.<br> 
<img src='https://github.com/retkowsky/images/blob/master/ridge.png?raw=true'>
<br>
C'est une version régularisée de la régression linéaire.
Cela permet d'ajuster les données avec des coefficents de pondération du modèle les plus petits possibles.
- Si le paramétre = 0 => nous avons dans ce cas une régression linéaire.
- Si le paramétre a une valeur importante, alors les coefficients de pondérations ont des valeurs proches de 0 => on aura dans ce cas une ligne horizontale qui passe par la moyenne des données.



In [14]:
experiment = Experiment(workspace=ws, name="workshop1-IntroductionAMLS")

## Les étapes
1. Logs d'informations
2. Modélisation
3. Logs de résultats de modèles
4. Sauvegarde modèle

In [15]:
def regridge(alpha):
    
    from datetime import datetime
    maintenant = datetime.now()
    print("Maintenant : ", maintenant)

    # 1. Run object
    run = experiment.start_logging()
    print('Alpha = ', alpha)
    # 2. Logs de valeurs
    run.log('alpha', alpha)
    run.log('date_log', str(maintenant))

    # 3. Modèle ML
    regression_model = Ridge(alpha=alpha)
    regression_model.fit(data['train']['X'], data['train']['y'])
    preds = regression_model.predict(data['test']['X'])

    # 4. Output
    print('MSE ou Mean Squared Error du modèle =', mean_squared_error(data['test']['y'], preds))
    run.log('mse', mean_squared_error(data['test']['y'], preds))

    # 5. Export modèle
    joblib.dump(value=regression_model, filename='modele.pkl')

    # 6. Fin
    run.complete()

In [16]:
regridge(0.1)

Maintenant :  2019-12-04 15:00:34.689522
Alpha =  0.1
MSE ou Mean Squared Error du modèle = 3372.649627810032


In [17]:
regridge(0.2)

Maintenant :  2019-12-04 15:00:42.497748
Alpha =  0.2
MSE ou Mean Squared Error du modèle = 3325.2946794678764


In [18]:
regridge(0.3)

Maintenant :  2019-12-04 15:00:49.006780
Alpha =  0.3
MSE ou Mean Squared Error du modèle = 3302.6736334017255


In [19]:
# On regarde si on a bien téléchargé le pickle
%ls modele.pkl -l

-rwxrwxrwx 1 root root 645 Dec  4 15:00 [0m[01;32mmodele.pkl[0m*


## Boucle sur alpha

In [20]:
%%time
import numpy as np
from tqdm import tqdm

alphas = np.arange(0.0, 1.0, 0.1)

# try a bunch of alpha values in a Linear Regression (Ridge) model
for alpha in tqdm(alphas):
    # create a bunch of runs, each train a model with a different alpha value
    with experiment.start_logging() as run:
        # Use Ridge algorithm to build a regression model
        regression_model = Ridge(alpha=alpha)
        regression_model.fit(X=data["train"]["X"], y=data["train"]["y"])
        preds = regression_model.predict(X=data["test"]["X"])
        mse = mean_squared_error(y_true=data["test"]["y"], y_pred=preds)

        # log alpha, mean_squared_error and feature names in run history
        run.log(name="alpha", value=alpha)
        run.log(name="mse", value=mse)
        
        # Génération pickle du modèle
        model_name = "modele_ridge_alpha_" + str(alpha) + ".pkl"
        filename = "outputs/" + model_name
    
        # Save the model to the outputs directory for capture
        joblib.dump(value=regression_model, filename=filename)

100%|██████████| 10/10 [01:13<00:00,  7.32s/it]

CPU times: user 3.67 s, sys: 1.29 s, total: 4.96 s
Wall time: 1min 13s





In [21]:
experiment

Name,Workspace,Report Page,Docs Page
workshop1-IntroductionAMLS,workshopml,Link to Azure Machine Learning studio,Link to Documentation


In [22]:
runs = {}
run_metrics = {}

# Create dictionaries containing the runs and the metrics for all runs containing the 'mse' metric
for r in tqdm(experiment.get_runs()):
    metrics = r.get_metrics()
    if 'mse' in metrics.keys():
        runs[r.id] = r
        run_metrics[r.id] = metrics

# Find the run with the best (lowest) mean squared error and display the id and metrics
best_run_id = min(run_metrics, key = lambda k: run_metrics[k]['mse'])
best_run = runs[best_run_id]
print('ID du meilleur Run =', best_run_id)
print('Métriques du meilleur Run = ', run_metrics[best_run_id])

# On ajoute un TAG au modèle pour l'identifier plus rapidement
best_run.tag("Meilleur")

71it [00:06, 11.78it/s]

ID du meilleur Run = 5dae9a60-bdd5-4555-bfcf-e917bce3ab78
Métriques du meilleur Run =  {'alpha': 0.4, 'mse': 3295.741064355809}





### Visualisation des pkl du répertoire Outputs

In [23]:
for f in best_run.get_file_names():
    print(f)

logs/user_log.txt
outputs/modele_ridge_alpha_0.0.pkl
outputs/modele_ridge_alpha_0.1.pkl
outputs/modele_ridge_alpha_0.2.pkl
outputs/modele_ridge_alpha_0.30000000000000004.pkl
outputs/modele_ridge_alpha_0.4.pkl
outputs/modele_ridge_alpha_0.5.pkl
outputs/modele_ridge_alpha_0.6000000000000001.pkl
outputs/modele_ridge_alpha_0.7000000000000001.pkl
outputs/modele_ridge_alpha_0.8.pkl
outputs/modele_ridge_alpha_0.9.pkl


### J'enregistre le meilleur modèle dans le repository de modèle Azure ML service

In [25]:
model = best_run.register_model(model_name='meilleur_modele_ridge', model_path='outputs/modele_ridge_alpha_0.4.pkl')

### Je visualise le modèle et la version associée

In [29]:
from azureml.core.model import Model
models = Model.list(ws, name='meilleur_modele_ridge')
for m in models:
    print("Modèle :",m.name, "- version =", m.version)

Modèle : meilleur_modele_ridge - version = 4
Modèle : meilleur_modele_ridge - version = 3
Modèle : meilleur_modele_ridge - version = 2
Modèle : meilleur_modele_ridge - version = 1


In [30]:
# On télécharge en local le pickle du modèle
best_run.download_file(name="outputs/modele_ridge_alpha_0.4.pkl")

In [31]:
# On regarde si le fichier a bien été téléchargé en local (ici la VM notebook)
%ls modele_ridge_*.pkl -l

-rwxrwxrwx 1 root root 658 Dec  4 15:04 [0m[01;32mmodele_ridge_alpha_0.4.pkl[0m*


<img src="https://github.com/retkowsky/images/blob/master/Powered-by-MS-Azure-logo-v2.png?raw=true" height="300" width="300">