In [7]:
%reload_kedro
%reload_azureml_ws

2020-03-29 21:43:53,430 - root - INFO - ** Kedro project AzureDataScience
2020-03-29 21:43:53,431 - root - INFO - Defined global variable `context` and `catalog`
[33mTraceback (most recent call last):
  File "/home/yuvraj/anaconda3/envs/AzureDS/lib/python3.6/site-packages/kedro/cli/cli.py", line 594, in load_entry_points
    entry_point_commands.append(entry_point.load())
  File "/home/yuvraj/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2442, in load
    self.require(*args, **kwargs)
  File "/home/yuvraj/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2465, in require
    items = working_set.resolve(reqs, env, installer, extras=self.extras)
  File "/home/yuvraj/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (pandas 0.23.4 (/home/yuvraj/anaconda3/envs/AzureDS/lib/python3.6/site-packages), Requirement.pa

# Overview, this tutorial will:

1. Explain Azure Datastores, and its different types. 
2. Recommended workflow with working with datastores.
3. How to add a datastore to an azureml workspace.
4. How to manage datastores.

# 1. Datastores

In Azure Machine Learning, datastores are abstraction/connectors for cloud data sources. They contain all the information required to connect to data sources. They can be used to:

* Ingest Data into experiment
* Write Outputs from an experiment

### 1.1 Type of Datastore

Azure Machine Learning supports the creation of <b> datastores </b> for multiple kinds of Azure data source, including:

* Azure Storage (blob and file containers)
* Azure Data Lake stores
* Azure SQL Database
* Azure Databricks file system (DBFS)

### 1.2 Recommended Azure Machine Learning Data Workflow

This workflow assumes you have an Azure storage account and data in a cloud-based storage service in Azure.

1. Create an Azure Machine Learning datastore to store connection information to your Azure storage.

2. From that datastore, create an Azure Machine Learning dataset to point to a specific file(s) in your underlying storage.

3. To use that dataset in your machine learning experiment you can either

   * a. Mount it to your experiment's compute target for model training.
    
    OR

   * b. consume it directly in Azure Machine Learning solutions like, automated machine learning (automated ML) experiment runs, machine learning pipelines, or the Azure Machine Learning designer.

4. Create dataset monitors for your model output dataset to detect for data drift.

5. If data drift is detected, update your input dataset and retrain your model accordingly.

The following diagram provides a visual demonstration of this recommended workflow.
<img src="https://docs.microsoft.com/en-gb/azure/machine-learning/media/concept-data/data-concept-diagram.svg">


In [8]:
# Get the default datastore
default_ds = ws.get_default_datastore()

# Enumerate all datastores, indicating which is the default
for ds_name in ws.datastores:
    print(ds_name, "- Default =", ds_name == default_ds.name)

2020-03-29 21:43:55,663 - azureml.data.datastore_client - INFO - <azureml.core.authentication.InteractiveLoginAuthentication object at 0x7f8ed73825f8>
2020-03-29 21:43:58,499 - azureml.data.datastore_client - INFO - <azureml.core.authentication.InteractiveLoginAuthentication object at 0x7f8ed73825f8>
azuremlprimary - Default = True
workspaceblobstore - Default = False
workspacefilestore - Default = False


# 2. Adding Data Stores to a Workspace

Every workspace has two built-in datastores:

* an Azure Storage blob container, and
* an Azure Storage file container)

that are used as system storage by Azure Machine Learning. You can also store a limited amount of your own data in these built-in datastores for experiments, model training, and so on.

However, in most machine learning projects, you will likely need to work with data sources of your own - either because you need to store larger volumes of data than the built-in datastores support, or because you need to integrate your machine learning solution with data from existing applications.

## 2.1 Registering a Datastore

To add a datastore to your workspace, you can register it using the graphical interface in Azure Machine Learning Studio, or you can use the Azure Machine Learning SDK. For example, the following code registers an Azure Storage blob container as a datastore named <b>blob_data</b>.

```python
from azureml.core import Workspace, Datastore

ws = Workspace.from_config()

# Register a new datastore
blob_ds = Datastore.register_azure_blob_container(workspace=ws, 
                                                  datastore_name='blob_data', 
                                                  container_name='data_container',
                                                  account_name='az_store_acct',
                                                  account_key='123456abcde789…')
```

The above code sample show how to registew a blob container with datastore. Datastore offer several other functions inluding:

* register_azure_data_lake
* register_azure_data_lake_gen2
* register_azure_sql_database
* register_azure_postgre_sql
* register_azure_my_sql

Sample of this code can he found [here](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb)

In [10]:
from msrest.exceptions import HttpOperationError
from azureml.core import Workspace, Datastore
from azureml.data.data_reference import DataReference

blob_datastore_name = conf_catalog['azuremlprimary']['storage_name']
account_name        = conf_catalog['azuremlprimary']['storage_name']   # Storage account name
container_name      = conf_catalog['azuremlprimary']['container_name'] # Name of Azure blob container
account_key         = conf_catalog['azuremlprimary']['key']            # Storage account key

# Register a new datastore
try:
    blob_datastore = blob_datastore = Datastore.get(ws, blob_datastore_name)
    print("Found Blob Datastore with name: %s" % blob_datastore_name)
except HttpOperationError:

    blob_datastore = Datastore.register_azure_blob_container(workspace = ws, 
                                                      datastore_name = blob_datastore_name, 
                                                      container_name = container_name,
                                                      account_name = account_name,
                                                      account_key = account_key)


2020-03-29 21:44:30,561 - azureml.data.datastore_client - INFO - <azureml.core.authentication.InteractiveLoginAuthentication object at 0x7f8ed73825f8>
Found Blob Datastore with name: azuremlprimary


# 2.2 Managing Datastores

You can view and manage datastores in Azure Machine Learning Studio, or you can use the Azure Machine Learning SDK. For example, the following code lists the names of each datastore in the workspace.

>```python
for ds_name in ws.datastores:
    print(ds_name)
```

You can get a reference to any datastore by using the Datastore.get() method as shown here:

>```python
blob_store = Datastore.get(ws, datastore_name='blob_data')
    print(ds_name)
```

The workspace always includes a default datastore (initially, this is the built-in workspaceblobstore datastore), which you can retrieve by using the get_default_datastore() method of a Workspace object, like this:

>```python
default_store = ws.get_default_datastore()
```

To change the default datastore, use the set_default_datastore() method:


>```python
ws.set_default_datastore('blob_data')
```

In [4]:
# Enumerate all datastores, indicating which is the default
for ds_name in ws.datastores:
    print(ds_name)
    
# Get datastore
from azureml.core import Datastore

blob_store = Datastore.get(ws, datastore_name='azuremlprimary')
print('Default Properties:' + blob_store.name,":", blob_store.datastore_type + " (" + blob_store.account_name + ")")

# Set defauws.set_default_datastore('azuremlprimary')
ws.set_default_datastore('azuremlprimary')
default_ds = ws.get_default_datastore()
print('Default Datastore: '+default_ds.name)


2020-03-29 17:02:37,597 - azureml.data.datastore_client - INFO - <azureml.core.authentication.InteractiveLoginAuthentication object at 0x7f8ed7a16ba8>
azuremlprimary
workspaceblobstore
workspacefilestore
2020-03-29 17:02:39,487 - azureml.data.datastore_client - INFO - <azureml.core.authentication.InteractiveLoginAuthentication object at 0x7f8ed7a16ba8>
Default Properties:azuremlprimary : AzureBlob (azuremlprimary)
2020-03-29 17:02:40,045 - azureml.data.datastore_client - INFO - <azureml.core.authentication.InteractiveLoginAuthentication object at 0x7f8ed7a16ba8>
2020-03-29 17:02:40,541 - azureml.data.datastore_client - INFO - <azureml.core.authentication.InteractiveLoginAuthentication object at 0x7f8ed7a16ba8>
Default Datastore: azuremlprimary
