# Explore Azure Machine Learning workspace resources and assets
As a data scientist, you want to focus on training machine learning models. Ideally, you want to work with a service that gives you access to all the necessary infrastructure you need to train and deploy a model. You also want the service to allow you to track any work you do to make your model reproducible and robust.

Azure Machine Learning provides a platform for data scientists to train, deploy, and manage their machine learning models on the Microsoft Azure platform. Azure Machine Learning provides a comprehensive set of resources and assets to train and deploy effective machine learning models.

To use these resources and assets, you create an Azure Machine Learning workspace resource in your Azure subscription. In the Azure Machine Learning workspace, you can manage data, compute resources, models, endpoints, and other artifacts related to your machine learning workloads.

## Create an Azure ML Service

![alt text](assets/overview-azure-resources.png)

To create an Azure Machine Learning service, you'll have to:

1. Get access to Azure, for example through the Azure portal.
2. Sign in to get access to an Azure subscription.
3. Create a resource group within your subscription.
4. Create an Azure Machine Learning service to create a workspace. When a workspace is provisioned, Azure will automatically create other Azure resources within the same resource group to support the workspace:
6. Azure Storage Account: To store files and notebooks used in the workspace, and to store metadata of jobs and models.
7. Azure Key Vault: To securely manage secrets such as authentication keys and credentials used by the workspace.
8. Application Insights: To monitor predictive services in the workspace.
9. Azure Container Registry: Created when needed to store images for Azure Machine Learning environments.

## Create a workspace
You can create an Azure Machine Learning workspace in any of the following ways:
- Use the user interface in the Azure portal to create an Azure Machine Learning service.
- Create an Azure Resource Manager (ARM) template. [Learn how to use an ARM template to create a workspace.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-workspace-template?tabs=azcli%3Fazure-portal%3Dtrue)
- Use the Azure Command Line Interface (CLI) with the Azure Machine Learning CLI extension. [Learn how to create the workspace with the CLI v2.](https://learn.microsoft.com/en-us/training/modules/create-azure-machine-learning-resources-cli-v2/)
- Use the [Azure Machine Learning Python SDK](https://learn.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py). (`pip install azure-ai-ml`)

For example, the following code uses the Python SDK to create a workspace named `mlw-example`:

In [None]:
from azure.ai.ml.entities import Workspace

workspace_name = "mlw-example"

ws_basic = Workspace(
    name=workspace_name,
    location="eastus",
    display_name="Basic workspace-example",
    description="This example shows how to create a basic workspace",
)
ml_client.workspaces.begin_create(ws_basic)

## Explore the workspace in the Azure portal
Creating an Azure Machine Learning workspace will typically take between 5-10 minutes to complete. When your workspace is created, you can select the workspace to view its details.
![alt text](assets/workspace-portal.png)

From the Overview page of the Azure Machine Learning workspace in the Azure portal, you can launch the Azure Machine Learning studio. The Azure Machine Learning studio is a web portal and provides an easy-to-use interface to create, manage, and use resources and assets in the workspace.

From the Azure portal, you can also give others access to the Azure Machine Learning workspace, using the Access control.

## Give access to the Azure Machine Learning workspace
You can give individual users or teams access to the Azure Machine Learning workspace. Access is granted in Azure using role-based access control (RBAC), which you can configure in the Access control tab of the resource or resource group.

In the access control tab, you can manage permissions to restrict what actions certain users or teams can perform. For example, you could create a policy that only allows users in the Azure administrators group to create compute targets and datastores. While users in the data scientists group can create and run jobs to train models, and register models.

There are three general built-in roles that you can use across resources and resource groups to assign permissions to other users:

- Owner: Gets full access to all resources, and can grant access to others using access control.
- Contributor: Gets full access to all resources, but can't grant access to others.
- Reader: Can only view the resource, but isn't allowed to make any changes.

Additionally, Azure Machine Learning has specific built-in roles you can use:

- AzureML Data Scientist: Can perform all actions within the workspace, except for creating or deleting compute resources, or editing the workspace settings.
- AzureML Compute Operator: Is allowed to create, change, and manage access the compute resources within a workspace.

Finally, if the built-in roles aren't meeting your needs, you can create a custom role to assign permissions to other users.

> Learn more about [how to manage access to an Azure ML workspace](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-assign-roles).

## Organize your workspaces
Initially, you might only work with one workspace. However, when working on large-scale projects, you might choose to use multiple workspaces.

You can use workspaces to group machine learning assets based on projects, deployment environments (for example, test and production), teams, or some other organizing principle.



## Identify Azure Machine Learning resources
Resources in Azure Machine Learning refer to the infrastructure you need to run a machine learning workflow. Ideally, you want someone like an administrator to create and manage the resources.

The resources in Azure Machine Learning include:
- The workspace
- Compute resources
- Datastores

## Create and manage the workspace
The workspace is the top-level resource for Azure Machine Learning. Data scientists need access to the workspace to train and track models, and to deploy the models to endpoints.

However, you want to be careful with who has full access to the workspace. Next to references to compute resources and datastores, you can find all logs, metrics, outputs, models, and snapshots of your code in the workspace.

## Create and manage compute resources
One of the most important resources you need when training or deploying a model is compute. There are five types of compute in the Azure Machine Learning workspace:

- Compute instances: Similar to a virtual machine in the cloud, managed by the workspace. Ideal to use as a development environment to run (Jupyter) notebooks.
- Compute clusters: On-demand clusters of CPU or GPU compute nodes in the cloud, managed by the workspace. Ideal to use for production workloads as they automatically scale to your needs.
- Kubernetes clusters: Allows you to create or attach an Azure Kubernetes Service (AKS) cluster. Ideal to deploy trained machine learning models in production scenarios.
- Attached computes: Allows you to attach other Azure compute resources to the workspace, like Azure Databricks or Synapse Spark pools.
- Serverless compute: A fully managed, on-demand compute you can use for training jobs.

>As Azure Machine Learning creates and manages serverless compute for you, it's not listed on the compute page in the studio. Learn more about how to [use serverless compute for model training](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-serverless-compute).

Though compute is the most important resource when working with machine learning workloads, it can also be the most cost-intensive. Therefore, a best practice is to only allow administrators to create and manage compute resources. Data scientists shouldn't be allowed to edit compute, but only use the available compute to run their workloads.


## Create and manage datastores
The workspace doesn't store any data itself. Instead, all data is stored in datastores, which are references to Azure data services. The connection information to a data service that a datastore represents, is stored in the Azure Key Vault.

When a workspace is created, an Azure Storage account is created and automatically connected to the workspace. As a result, you have four datastores already added to your workspace:

- `workspaceartifactstore`: Connects to the `azureml` container of the Azure Storage account created with the workspace. Used to store compute and experiment logs when running jobs.
- `workspaceworkingdirectory`: Connects to the file share of the Azure Storage account created with the workspace used by the Notebooks section of the studio. Whenever you upload files or folders to access from a compute instance, it's uploaded to this file share.
- `workspaceblobstore`: Connects to the Blob Storage of the Azure Storage account created with the workspace. Specifically `the azureml-blobstore-...` container. Set as the default datastore, which means that whenever you create a data asset and upload data, it's stored in this container.
- `workspacefilestore`: Connects to the file share of the Azure Storage account created with the workspace. Specifically the `azureml-filestore-...` file share.

Additionally, you can create datastores to connect to other Azure data services. Most commonly, your datastores will connect to an Azure Storage Account or Azure Data Lake Storage (Gen2) as those data services are most often used in data science projects.


# Identify Azure Machine Learning 

As a data scientist, you'll mostly work with assets in the Azure Machine Learning workspace. Assets are created and used at various stages of a project and include:
- Models
- Environments
- Data
- Components

## Create and manage models
The end product of training a model is the model itself. You can train machine learning models with various frameworks, like Scikit-learn or PyTorch. A common way to store such models is to package the model as a Python pickle file (.pkl extension).

Alternatively, you can use the open-source platform MLflow to store your model in the MLModel format.

> Learn more about [logging workflow artifacts as models using MLflow and the MLModel format](https://learn.microsoft.com/en-us/azure/machine-learning/concept-mlflow-models).

Whatever format you choose, binary file(s) will represent the model and any corresponding metadata. To persist those files, you can create or register a model in the workspace.

When you create a model in the workspace, you'll specify the name and version. Especially useful when you deploy the registered model, versioning allows you to track the specific model you want to use.

## Create and manage environments
When you work with cloud compute, it's important to ensure that your code runs on any compute that is available to you. Whether you want to run a script on a compute instance, or a compute cluster, the code should execute successfully.

Imagine working in Python or R, using open-source frameworks to train a model, on your local device. If you want to use a library such as Scikit-learn or PyTorch, you'll have to install it on your device.

Similarly, when you write code that uses any frameworks or libraries, you'll need to ensure the necessary components are installed on the compute that will execute the code. To list all necessary requirements, you can create environments. When you create an environment, you have to specify the name and version.

Environments specify software packages, environment variables, and software settings to run scripts. An environment is stored as an image in the Azure Container Registry created with the workspace when it's used for the first time.

Whenever you want to run a script, you can specify the environment that needs to be used by the compute target. The environment will install all necessary requirements on the compute before executing the script, making your code robust and reusable across compute targets.

## Create and manage data
Whereas datastores contain the connection information to Azure data storage services, data assets refer to a specific file or folder.

You can use data assets to easily access data every time, without having to provide authentication every time you want to access it.

When you create a data asset in the workspace, you'll specify the path to point to the file or folder, and the name and version.

## Create and manage components
To train machine learning models, you'll write code. Across projects, there may be code you can reuse. Instead of writing code from scratch, you'll want to reuse snippets of code from other projects.

To make it easier to share code, you can create a component in a workspace. To create a component, you have to specify the name, version, code, and environment needed to run the code.

You can use components when creating pipelines. A component therefore often represents a step in a pipeline, for example to normalize data, to train a regression model, or to test the trained model on a validation dataset.

# Train models in the workspace
To train models with the Azure Machine Learning workspace, you have several options:
- Use Automated Machine Learning.
- Run a Jupyter notebook.
- Run a script as a job.

# Explore algorithms and hyperparameter values with Automated Machine Learning
When you have a training dataset and you're tasked with finding the best performing model, you might want to experiment with various algorithms and hyperparameter values.

Manually experimenting with different configurations to train a model might take long. Alternatively, you can use Automated Machine Learning to speed up the process.

Automated Machine Learning iterates through algorithms paired with feature selections to find the best performing model for your data.

![alt text](assets/automated-machine-learning.png)

## Run a notebook
When you prefer to develop by running code in notebooks, you can use the built-in notebook feature in the workspace.
The Notebooks page in the studio allows you to edit and run Jupyter notebooks.

![alt text](assets/notebooks.png)

All files you clone or create in the notebooks section are stored in the file share of the Azure Storage account created with the workspace.

To run notebooks, you'll use a compute instance as they're ideal for development and work similar to a virtual machine.

You can also choose to edit and run notebooks in Visual Studio Code, while still using a compute instance to run the notebooks.

## Run a script as a job
When you want to prepare your code to be production ready, it's better to use scripts. You can easily automate the execution of script to automate any machine learning workload.

You can run a script as a job in Azure Machine Learning. When you submit a job to the workspace, all inputs and outputs will be stored in the workspace.

![alt text](assets/job-overview.png)

There are different types of jobs depending on how you want to execute a workload:

- Command: Execute a single script.
- Sweep: Perform hyperparameter tuning when executing a single script.
- Pipeline: Run a pipeline consisting of multiple scripts or components.

>When you submit a pipeline you created with the designer it will run as a pipeline job. When you submit an Automated Machine Learning experiment, it will also run as a job.
