# Work with Datastores

Data is the foundation on which machine learning models are built. Managing data centrally in the cloud, and making it accessible to teams of data scientists who are running experiments and training models on multiple workstations and compute targets is an important part of any professional data science solution.

In this notebook, you'll explore two Azure Machine Learning objects for working with data: *datastores*, and *datasets*.

## Install the Azure Machine Learning SDK

The Azure Machine Learning SDK is updated frequently. Run the following cell to upgrade to the latest release, along with the additional package to support notebook widgets.

In [1]:
!pip install --upgrade azureml-sdk azureml-widgets

Collecting azureml-sdk
  Using cached azureml_sdk-1.36.0-py3-none-any.whl (4.5 kB)
Collecting azureml-widgets
  Downloading azureml_widgets-1.36.0-py3-none-any.whl (14.2 MB)
[K     |████████████████████████████████| 14.2 MB 22.9 MB/s eta 0:00:01    |██████████████████████████████  | 13.2 MB 22.9 MB/s eta 0:00:01
[?25hCollecting azureml-pipeline~=1.36.0
  Using cached azureml_pipeline-1.36.0-py3-none-any.whl (3.7 kB)
Collecting azureml-train-core~=1.36.0
  Using cached azureml_train_core-1.36.0-py3-none-any.whl (8.6 MB)
Collecting azureml-dataset-runtime[fuse]~=1.36.0
  Using cached azureml_dataset_runtime-1.36.0-py3-none-any.whl (3.5 kB)
Collecting azureml-core~=1.36.0
  Using cached azureml_core-1.36.0.post2-py3-none-any.whl (2.4 MB)
Collecting azureml-train-automl-client~=1.36.0
  Using cached azureml_train_automl_client-1.36.0-py3-none-any.whl (135 kB)
Collecting azureml-telemetry~=1.36.0
  Downloading azureml_telemetry-1.36.0-py3-none-any.whl (30 kB)
Collecting azureml-pipeline-s

## Connect to your workspace

With the latest version of the SDK installed, now you're ready to connect to your workspace.

> **Note**: If you haven't already established an authenticated session with your Azure subscription, you'll be prompted to authenticate by clicking a link, entering an authentication code, and signing into Azure.

In [2]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.34.0 to work with workspaceveranika


## Work with datastores

In Azure ML, *datastores* are references to storage locations, such as Azure Storage blob containers. Every workspace has a default datastore - usually the Azure storage blob container that was created with the workspace. If you need to work with data that is stored in different locations, you can add custom datastores to your workspace and set any of them to be the default.

### View datastores

Run the following code to determine the datastores in your workspace:

In [3]:
# Get the default datastore
default_ds = ws.get_default_datastore()

# Enumerate all datastores, indicating which is the default
for ds_name in ws.datastores:
    print(ds_name, "- Default =", ds_name == default_ds.name)

aml_data - Default = False
workspaceblobstore - Default = True
workspaceartifactstore - Default = False
workspaceworkingdirectory - Default = False
workspacefilestore - Default = False


You can also view and manage datastores in your workspace on the **Datastores** page for your workspace in [Azure Machine Learning studio](https://ml.azure.com).

### Upload data to a datastore

Now that you have determined the available datastores, you can upload files from your local file system to a datastore so that it will be accessible to experiments running in the workspace, regardless of where the experiment script is actually being run.

In [4]:
default_ds.upload_files(files=['./data/diabetes.csv', './data/diabetes2.csv'], # Upload the diabetes csv files in /data
                       target_path='diabetes-data/', # Put it in a folder path in the datastore
                       overwrite=True, # Replace existing files of the same name
                       show_progress=True)

Uploading an estimated of 2 files
Uploading ./data/diabetes.csv
Uploaded ./data/diabetes.csv, 1 files out of an estimated total of 2
Uploading ./data/diabetes2.csv
Uploaded ./data/diabetes2.csv, 2 files out of an estimated total of 2
Uploaded 2 files


$AZUREML_DATAREFERENCE_9ec303d1d88e466989d7b201081667bd