## Before you start

You'll need the latest version of the **azureml-ai-ml** package to run the code in this notebook. Run the cell below to verify that it is installed.

> **Note**:
> If the **azure-ai-ml** package is not installed, run `pip install azure-ai-ml` to install it.

# Introduction

In [None]:
from azure.ai.ml.entities import Workspace

workspace_name = "mlw-example"

ws_basic = Workspace(
    name = workspace_name,  #your workspace name
    location = "eastus",    #inwhich server location is used
    display_name = "Basic workspace-example"    #name of the workspace displayed
    description = "This example shows how to create a basic workspace"  #description
)

ml_client.workspace.begin_create(ws_basic)

## Artifacts

Any file generated (and captured) from an experiment's run or job is an artifact. It may represent a model serialized as a Pickle file, the weights of a PyTorch or TensorFlow model, or even a text file containing the coefficients of a linear regression. Other artifacts can have nothing to do with the model itself, but they can contain configuration to run the model, pre-processing information, sample data, etc. As you can see, an artifact can come in any format.

In Azure Machine Learning, logging models has the following advantages:

* You can deploy them on real-time or batch endpoints without providing an scoring script nor an environment.
* When deployed, Model's deployments have a Swagger generated automatically and the Test feature can be used in Azure Machine Learning studio.
* Models can be used as pipelines inputs directly.

In [None]:
import mlfow
import pickle

filename = 'model.pkl'
with open(filename, 'wb') as f:
    pickle.dump(model, f)

mlflow.log_artifact(filename)

In [None]:
# How to log model
import mlflow
mlflow.sklearn.log_model(sklearn_estimator, "classifier")

# Install the Python SDK

In [None]:
!pip install azure-ai-ml

In [None]:
from azure.ai.ml import MLClient
from acure.identity import DefaultAzureCredential

ml_client = MLClient(
    credential = DefaultAzureCredential(),      # credentşial to use for authenticaiton
    subscription_id = subscription_id,          # Your subscription ID
    resoruce_group_name = resoruce_group,       # The name of your resource group
    workspace_name = workspace                  # The name of your worksapce
)

After defining the authentication, you need to call MLClient for the environment to connect to the workspace. You'll call MLClient anytime you want to create or update an asset or resource in the workspace.

In [None]:
from azure.ai.ml import command

# configure job
job = command(
    code = "./src",
    command = "python train.py",
    environment = "AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute = "aml-cluster",
    experiment_name = "train-model"
)

# connect to workspace and submit job
returned_job = ml_client.create_or_update(job)

# Install the Azure CLI

To install the Azure CLI on your computer you can use a package manager.
* Here are the instructions to install the Azure CLI, based on the platform you choose. https://learn.microsoft.com/en-us/cli/azure/install-azure-cli
* You don't need to install the Azure CLI if you use the Azure Cloud Shell. Learn more about how to use the Azure Cloud Shell.

In [None]:
# Install the Azure Machine Learning extention (ml)
> az extention add -n ml -y

# create resource group
> az group create --name "rg-dp100-labs" --location "eastus"

# create workspace
> az ml workspace create --name "mlw-dp100-labs" -g "rg-dp100-labs"

# create a compute instance
> az ml compute create --name "ciXXXX" --size STANDARD_DS11_V2 --type ComputeInstance -w mlw-dp100-labs -g rg-dp100-labs


# create a compute cluster with 2 ways as followings
> az ml compute create --name "aml-cluster" --size STANDARD_DS11_V2 --max-instances 2 --type AmlCompute -w mlw-dp100-labs -g rg-dp100-labs
# or
# Create the configuration in a YAML file
# ".yml" file 
"""
$schema: https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json 
name: aml-cluster
type: amlcompute
size: STANDARD_DS3_v2
min_instances: 0
max_instances: 5
"""
> az ml compute create --file compute.yml --resource-group my-resource-group --workspace-name my-workspace


# Create datastore

For detailed information, one can visit here: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-datastore?view=azureml-api-2&tabs=sdk-identity-based-access%2Csdk-adls-identity-access%2Csdk-azfiles-accountkey%2Csdk-adlsgen1-identity-access%2Csdk-onelake-identity-access

## Azure Blob Storage container

In [None]:
# by using account key to authentication
from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

blob_datastore = AzureBlobDatastore(
    name = "blob_example",
    description = "Datastore pointing to a blob container",
    account_name = "mytestblobstore",
    container_name = "data-container",
    credentials = AccountKeyCredentials(
        account_key="XXXxxxXXXxXXXXxxXXX"
    ),
)
ml_client.create_or_update(blob_datastore)

In [None]:
# by using a shared access signature (SAS) token to authentication
from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import SasTokenConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

blob_datastore = AzureBlobDatastore(
    name = "blob_example",
    description = "Datastore pointing to a blob container",
    account_name = "mytestblobstore",
    container_name = "data-container",
    credentials = SasTokenCredentials(
        sas_token="?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX"
    ),
)
ml_client.create_or_update(blob_datastore)

# Create a URI file data asset

A URI file data asset points to a specific file. Azure Machine Learning will only store the path to the file, which means you can point to any type of file. When you use the data asset, you'll specify how you want to read the data, which will depend on the type of data you're connecting to.

***The supported paths you can use when creating a URI file data asset are:***

* Local: ./<path>
* Azure Blob Storage: wasbs://<account_name>.blob.core.windows.net/<container_name>/<folder>/<file>
* Azure Data Lake Storage (Gen 2): abfss://<file_system>@<account_name>.dfs.core.windows.net/<folder>/<file>
* Datastore: azureml://datastores/<datastore_name>/paths/<folder>/<file>

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_path = '<supported-path>'

my_data = Data(
    path=my_path,
    type=AssetTypes.URI_FILE,
    description="<description>",
    name="<name>",
    version="<version>"
)

ml_client.data.create_or_update(my_data)

When you parse the URI file data asset as input in an Azure Machine Learning job, you'll first need to read the data before you can work with it.

Imagine you create a Python script you want to run as a job, and you set the value of the input parameter input_data to be the URI file data asset (which points to a CSV file). You can read the data by including the following code in your Python script:

In [None]:
import argparse
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

df = pd.read_csv(args.input_data)
print(df.head(10))

If your URI file data asset points to a different type of file, you'll need to use the appropriate Python code to read the data. For example, if instead of CSV files, you're working with JSON files, you'd use pd.read_json() instead.

# Create a URI folder data asset

A URI folder data asset points to a specific folder. It works similar to a URI file data asset and supports the same paths.

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_path = '<supported-path>'

my_data = Data(
    path=my_path,
    type=AssetTypes.URI_FOLDER,
    description="<description>",
    name="<name>",
    version='<version>'
)

ml_client.data.create_or_update(my_data)

When you parse the URI folder data asset as input in an Azure Machine Learning job, you'll first need to read the data before you can work with it.

Imagine you create a Python script you want to run as a job, and you set the value of the input parameter input_data to be the URI folder data asset (which points to multiple CSV files). You may want to read all CSV files in the folder and concatenate them, which you can do by including the following code in your Python script:

In [None]:
import argparse
import glob
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

data_path = args.input_data
all_files = glob.glob(data_path + "/*.csv")
df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)

Depending on the type of data you're working with, the code you use to read the files may change.

# Create a MLTable data asset

A MLTable data asset allows you to point to tabular data. When you create a MLTable data asset, you specify the schema definition to read the data. As the schema is already defined and stored with the data asset, you don't have to specify how to read the data when you use it.

Therefore, you'll want to use a MLTable data asset when the schema of your data is complex or changes frequently. Instead of changing how to read the data in every script that uses the data, you only have to change it in the data asset itself.

When you define the schema when creating a MLTable data asset, you can also choose to only specify a subset of the data.

For certain features in Azure Machine Learning, like Automated Machine Learning, you'll need to use a MLTable data asset, as Azure Machine Learning needs to know how to read the data.

To define the schema, it's recommended to include a MLTable file in the same folder as the data you want to read. The MLTable file includes the path pointing to the data you want to read, and how to read the data (above file is a yaml file):

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_path = '<path-including-mltable-file>'

my_data = Data(
    path=my_path,
    type=AssetTypes.MLTABLE,
    description="<description>",
    name="<name>",
    version='<version>'
)

ml_client.data.create_or_update(my_data)

In [None]:
import argparse
import mltable
import pandas

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

tbl = mltable.load(args.input_data)
df = tbl.to_pandas_dataframe()

print(df.head(10))

A common approach is to convert the tabular data to a Pandas data frame. However, you can also convert the data to a Spark data frame if that suits your workload better.