##### Copyright 2020 The TensorFlow IO Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 将 Azure Blob 存储与 TensorFlow 结合使用

<table class="tfo-notebook-buttons" align="left">
  <td>     <a target="_blank" href="https://tensorflow.google.cn/io/tutorials/azure"><img src="https://tensorflow.google.cn/images/tf_logo_32px.png">在 TensorFlow.org 上查看</a>   </td>
  <td><a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/io/tutorials/azure.ipynb"><img src="https://tensorflow.google.cn/images/colab_logo_32px.png">在 Google Colab 中运行</a></td>
  <td>     <a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/io/tutorials/azure.ipynb"><img src="https://tensorflow.google.cn/images/GitHub-Mark-32px.png">在 Github 上查看源代码</a>   </td>
      <td>     <a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/io/tutorials/azure.ipynb"><img src="https://tensorflow.google.cn/images/download_logo_32px.png">下载笔记本</a>   </td>
</table>

小心：除 Python 软件包外，此笔记本还使用 `npm install --user` 安装软件包。在本地运行时要注意。


## 概述

本教程介绍如何通过 TensorFlow IO 的 Azure 文件系统集成，使用 TensorFlow 读写 [Azure Blob 存储](https://azure.microsoft.com/en-us/services/storage/blobs/)上的文件。

您需要有一个 Azure 存储帐户才能读写 Azure Blob 存储上的文件。Azure 存储密钥应通过环境变量提供：

```
os.environ['TF_AZURE_STORAGE_KEY'] = '<key>'
```

文件名 URI 包含存储帐户名称和容器名称：

```
azfs://<storage-account-name>/<container-name>/<path>
```

在本教程中，出于演示目的，您可以选择设置 [Azurite](https://github.com/Azure/Azurite)（Azure 存储模拟器）。利用 Azurite 模拟器，您可以使用 TensorFlow 通过 Azure Blob 存储界面读写文件。

## 设置和使用

### 安装要求的软件包，然后重新启动运行时

In [2]:
try:
  %tensorflow_version 2.x 
except Exception:
  pass

!pip install tensorflow-io

TensorFlow 2.x selected.
Collecting tensorflow-io
[?25l  Downloading https://files.pythonhosted.org/packages/c0/d0/c5d7adce72c6a6d7c9a59c062150f60b5404c706578a0922f7dc2835713c/tensorflow_io-0.12.0-cp36-cp36m-manylinux2010_x86_64.whl (20.1MB)
[K     |████████████████████████████████| 20.1MB 42.7MB/s 
Installing collected packages: tensorflow-io
Successfully installed tensorflow-io-0.12.0


### 安装并设置 Azurite（可选）

如果没有可用的 Azure 存储帐户，则需要执行以下命令才能安装和设置模拟 Azure 存储界面的 Azurite：

In [3]:
!npm install azurite@2.7.0

[K[?25h[37;40mnpm[0m [0m[30;43mWARN[0m [0m[35mdeprecated[0m request@2.87.0: request has been deprecated, see https://github.com/request/request/issues/3142
[K[?25h[37;40mnpm[0m [0m[30;43mWARN[0m [0m[35msaveError[0m ENOENT: no such file or directory, open '/content/package.json'
[0m[37;40mnpm[0m [0m[34;40mnotice[0m[35m[0m created a lockfile as package-lock.json. You should commit this file.
[0m[37;40mnpm[0m [0m[30;43mWARN[0m [0m[35menoent[0m ENOENT: no such file or directory, open '/content/package.json'
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No description
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No repository field.
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No README data
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No license field.
[0m
+ azurite@2.7.0
added 116 packages from 141 contributors in 6.591s


In [4]:
# The path for npm might not be exposed in PATH env,
# you can find it out through 'npm bin' command
npm_bin_path = get_ipython().getoutput('npm bin')[0]
print('npm bin path: ', npm_bin_path)

# Run `azurite-blob -s` as a background process. 
# IPython doesn't recognize `&` in inline bash cells.
get_ipython().system_raw(npm_bin_path + '/' + 'azurite-blob -s &')

npm bin path:  /content/node_modules/.bin


### 使用 TensorFlow 读写 Azure 存储上的文件

下面是使用 TensorFlow 的 API 读写 Azure 存储上的文件的一个示例。

导入 `tensorflow-io` 软件包后，它与 TensorFlow 中其他文件系统（例如，POSIX 或 GCS）的行为相同，因为 `tensorflow-io` 会自动注册 `azfs` 方案以供使用。

Azure 存储密钥应通过 `TF_AZURE_STORAGE_KEY` 环境变量提供。否则，可将 `TF_AZURE_USE_DEV_STORAGE` 设置为 `True` 以使用 Azurite 模拟器：


In [None]:
import os
import tensorflow as tf
import tensorflow_io as tfio

# Switch to False to use Azure Storage instead:
use_emulator = True

if use_emulator:
  os.environ['TF_AZURE_USE_DEV_STORAGE'] = '1'
  account_name = 'devstoreaccount1'
else:
  # Replace <key> with Azure Storage Key, and <account> with Azure Storage Account
  os.environ['TF_AZURE_STORAGE_KEY'] = '<key>'
  account_name = '<account>'

  # Alternatively, you can use a shared access signature (SAS) to authenticate with the Azure Storage Account
  os.environ['TF_AZURE_STORAGE_SAS'] = '<your sas>'
  account_name = '<account>'

In [6]:
pathname = 'az://{}/aztest'.format(account_name)
tf.io.gfile.mkdir(pathname)

filename = pathname + '/hello.txt'
with tf.io.gfile.GFile(filename, mode='w') as w:
  w.write("Hello, world!")

with tf.io.gfile.GFile(filename, mode='r') as r:
  print(r.read())

Hello, world!


## 配置

在 TensorFlow 中，始终通过环境变量完成 Azure Blob 存储的配置。下面是可用配置的完整列表：

- `TF_AZURE_USE_DEV_STORAGE`：对于“az://devstoreaccount1/container/file.txt”之类的连接，设置为 1 可使用本地开发存储模拟器。该设置的优先级高于所有其他设置，所以，要使用任何其他连接，请将其设置为 `unset`。
- `TF_AZURE_STORAGE_KEY`：使用的存储帐户的帐户密钥
- `TF_AZURE_STORAGE_USE_HTTP`：如果不想使用 https 传输，则可将其设置为任何值。将其设置为 `unset` 可使用默认值 https
- `TF_AZURE_STORAGE_BLOB_ENDPOINT`：设置为 Blob 存储的端点 - 默认值为 `.core.windows.net`。
