# Exercise 02 : Prepare Datastore

Here we prepare ```Datastore``` for storing and sharing our dataset.

[Optional] When you run optional exercise in below, **please prepare Azure storage account and its container as follows**

1. Create your Storage Account in [Azure Portal](https://portal.azure.com/)
2. Create container in blob storage
3. Copy storage account name, access key, and container name

*back to [index](https://github.com/tsmatz/azureml-tutorial-tensorflow-v1/)*

## Get config setting

Read your config settings. See "[Exercise01 : Prepare Config Settings](https://github.com/tsmatz/azureml-tutorial-tensorflow-v1/blob/master/notebooks/exercise01_prepare_config.ipynb)".

In [1]:
from azureml.core import Workspace
import azureml.core

ws = Workspace.from_config()

## Use default datastore

The default datastore is attached in your AML workspace. (See that a storage account is generated in the same resource group.)    
The data is stored in Azure File Share on *{your workspace name}{arbitary numbers}*.

In [2]:
# Get AML default datastore
ds = ws.get_default_datastore()

# Upload local "data" folder (incl. files) as "tfdata" folder
ds.upload(
    src_dir='./data',
    target_path='tfdata',
    overwrite=True)

Uploading an estimated of 2 files
Uploading ./data/test.tfrecords
Uploaded ./data/test.tfrecords, 1 files out of an estimated total of 2
Uploading ./data/train.tfrecords
Uploaded ./data/train.tfrecords, 2 files out of an estimated total of 2
Uploaded 2 files


$AZUREML_DATAREFERENCE_343ace96022948a0950762e7f6fba7f9

## Create and Register as Dataset

Now we create AML dataset and register in workspace.<br>
Registering dataset is not mandatory, but you can track versions and trace data with models or experiments by registering data as AML dataset.

In this exercise, we register all files in specific folders, but you can also register a part of files (such as, files with specific extension) as dataset

In [3]:
from azureml.core import Dataset

datastore_paths = [(ds, 'tfdata')]
mnist_dataset = Dataset.File.from_files(path=datastore_paths)
mnist_dataset = mnist_dataset.register(
    workspace=ws,
    name='mnist_tfrecords_dataset',
    description='training and test dataset',
    create_new_version=True)

## [Optional] Use datastore with your own blob storage

You can also use your own blob storage.    
Set the previously copied storage account name, key, and container name (see above) in the following script and run.

(Running this tutorial is not needed for the following exercises.)

In [4]:
from azureml.core import Datastore

ds = Datastore.register_azure_blob_container(
    ws,
    datastore_name='myblob01',
    account_name='{STORAGE ACCOUNT NAME}',
    account_key='{ACCESS KEY}',
    container_name='{CONTAINER NAME}',
    overwrite=True)

# Upload local "data" folder (incl. files) as "tfdata" folder
ds.upload(
    src_dir='./data',
    target_path='tfdata',
    overwrite=True)

Uploading an estimated of 2 files
Uploading ./data/test.tfrecords
Uploading ./data/train.tfrecords
Uploaded ./data/test.tfrecords, 1 files out of an estimated total of 2
Uploaded ./data/train.tfrecords, 2 files out of an estimated total of 2
Uploaded 2 files


$AZUREML_DATAREFERENCE_f65a74a6fa4445e6816ea604e9b55e32

Once you have registered datastore, you can access using datastore name.    
Here we upload data again.

In [5]:
# Get your own registered datastore
ds = Datastore.get(ws, datastore_name='myblob01')

# Upload local "data" folder (incl. files) as "tfdata" folder
ds.upload(
    src_dir='./data',
    target_path='tfdata',
    overwrite=True)

$AZUREML_DATAREFERENCE_b6854bcc152f44229c4e478b6dde1e29