forked from Azure/azureml-examples
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
delta example and up-version mltable (Azure#2205)
- Loading branch information
Showing
10 changed files
with
381 additions
and
8 deletions.
There are no files selected for viewing
75 changes: 75 additions & 0 deletions
75
.github/workflows/sdk-using-mltable-delta-lake-example-delta-lake-example.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# This code is autogenerated. | ||
# Code is generated by running custom script: python3 readme.py | ||
# Any manual changes to this file may cause incorrect behavior. | ||
# Any manual changes will be overwritten if the code is regenerated. | ||
|
||
name: sdk-using-mltable-delta-lake-example-delta-lake-example | ||
# This file is created by sdk/python/readme.py. | ||
# Please do not edit directly. | ||
on: | ||
workflow_dispatch: | ||
schedule: | ||
- cron: "23 8/12 * * *" | ||
pull_request: | ||
branches: | ||
- main | ||
paths: | ||
- sdk/python/using-mltable/delta-lake-example/** | ||
- .github/workflows/sdk-using-mltable-delta-lake-example-delta-lake-example.yml | ||
- sdk/python/dev-requirements.txt | ||
- infra/** | ||
- sdk/python/setup.sh | ||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} | ||
cancel-in-progress: true | ||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: check out repo | ||
uses: actions/checkout@v2 | ||
- name: setup python | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: "3.8" | ||
- name: pip install notebook reqs | ||
run: pip install -r sdk/python/dev-requirements.txt | ||
- name: azure login | ||
uses: azure/login@v1 | ||
with: | ||
creds: ${{secrets.AZUREML_CREDENTIALS}} | ||
- name: bootstrap resources | ||
run: | | ||
echo '${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}'; | ||
bash bootstrap.sh | ||
working-directory: infra | ||
continue-on-error: false | ||
- name: setup SDK | ||
run: | | ||
source "${{ github.workspace }}/infra/sdk_helpers.sh"; | ||
source "${{ github.workspace }}/infra/init_environment.sh"; | ||
bash setup.sh | ||
working-directory: sdk/python | ||
continue-on-error: true | ||
- name: setup-cli | ||
run: | | ||
source "${{ github.workspace }}/infra/sdk_helpers.sh"; | ||
source "${{ github.workspace }}/infra/init_environment.sh"; | ||
bash setup.sh | ||
working-directory: cli | ||
continue-on-error: true | ||
- name: run using-mltable/delta-lake-example/delta-lake-example.ipynb | ||
run: | | ||
source "${{ github.workspace }}/infra/sdk_helpers.sh"; | ||
source "${{ github.workspace }}/infra/init_environment.sh"; | ||
bash "${{ github.workspace }}/infra/sdk_helpers.sh" generate_workspace_config "../../.azureml/config.json"; | ||
bash "${{ github.workspace }}/infra/sdk_helpers.sh" replace_template_values "delta-lake-example.ipynb"; | ||
[ -f "../../.azureml/config" ] && cat "../../.azureml/config"; | ||
papermill -k python delta-lake-example.ipynb delta-lake-example.output.ipynb | ||
working-directory: sdk/python/using-mltable/delta-lake-example | ||
- name: upload notebook's working folder as an artifact | ||
if: ${{ always() }} | ||
uses: actions/upload-artifact@v2 | ||
with: | ||
name: delta-lake-example | ||
path: sdk/python/using-mltable/delta-lake-example |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
290 changes: 290 additions & 0 deletions
290
sdk/python/using-mltable/delta-lake-example/delta-lake-example.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,290 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Create a Table from Delta Lake\n", | ||
"\n", | ||
"In this example notebook you will create an AzureML Table from a Delta Table." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 📦 Install dependencies\n", | ||
"\n", | ||
"Ensure you have the latest MLTable library and dependencies." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install -r ../mltable-requirements.txt" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 🐍 Create an MLTable using the Python SDK\n", | ||
"\n", | ||
"Here you build your data loading steps using the `mltable` Python SDK. The `show()` method allows you to see the effect of the data loading transformation." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import mltable\n", | ||
"\n", | ||
"# create paths to the data files\n", | ||
"delta_table_uri = \"wasbs://data@azuremlexampledata.blob.core.windows.net/COVID-19_NYT\"\n", | ||
"\n", | ||
"# create an MLTable from the data files\n", | ||
"tbl = mltable.from_delta_lake(delta_table_uri, timestamp_as_of=\"2022-10-01T00:00:00Z\")\n", | ||
"\n", | ||
"# show the first 5 records\n", | ||
"tbl.show(5)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### 🐼 Load into a Pandas data frame\n", | ||
"\n", | ||
"You can load your Azure ML Table into Pandas using:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"df = tbl.to_pandas_dataframe()\n", | ||
"df.head(5)" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### 💾 Save data loading steps \n", | ||
"Next, you'll save all your data loading steps into an `MLTable` file. This allows you to *reproduce* your Pandas data frame at a later point in time without having to redefine the data loading steps in your code." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# save the data loading steps in an MLTable file\n", | ||
"tbl.save(\"./covid\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### 🔍 View the saved file\n", | ||
"\n", | ||
"In the next code cell, we show you the `MLTable` file so you can understand how the data loading steps are serialized into a file." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"with open(\"./covid/MLTable\", \"r\") as f:\n", | ||
" print(f.read())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## ♻️ Reproduce data loading steps\n", | ||
"\n", | ||
"Now that the data loading steps have been serialized into a file, you can reproduce them at any point in time using the `load()` method. This means you do not need to redefine your data loading steps in code and makes it easier to share with others." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import mltable\n", | ||
"\n", | ||
"# load the previously saved MLTable file\n", | ||
"tbl = mltable.load(\"./covid/\")\n", | ||
"df = tbl.to_pandas_dataframe()\n", | ||
"df.head(5)" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### 🤝 Create a data asset to aid sharing and reproducibility\n", | ||
"\n", | ||
"You'll now create a data asset, which will automatically upload the `MLTable` to cloud storage (the default AzureML datastore) so that others can use it easily." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"subscription_id = \"<SUBSCRIPTION_ID>\"\n", | ||
"resource_group = \"<RESOURCE_GROUP>\"\n", | ||
"workspace = \"<AML_WORKSPACE_NAME>\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import time\n", | ||
"from azure.ai.ml import MLClient\n", | ||
"from azure.ai.ml.entities import Data\n", | ||
"from azure.ai.ml.constants import AssetTypes\n", | ||
"from azure.identity import DefaultAzureCredential\n", | ||
"\n", | ||
"# set the version number of the data asset to the current UTC time\n", | ||
"VERSION = time.strftime(\"%Y.%m.%d.%H%M%S\", time.gmtime())\n", | ||
"\n", | ||
"# connect to the AzureML workspace\n", | ||
"ml_client = MLClient(\n", | ||
" DefaultAzureCredential(), subscription_id, resource_group, workspace\n", | ||
")\n", | ||
"\n", | ||
"my_data = Data(\n", | ||
" path=\"./covid\",\n", | ||
" type=AssetTypes.MLTABLE,\n", | ||
" description=\"COVID-19 dataset.\",\n", | ||
" name=\"covid-delta-example\",\n", | ||
" version=VERSION,\n", | ||
")\n", | ||
"\n", | ||
"ml_client.data.create_or_update(my_data)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### 📖 Read the data asset in an interactive session\n", | ||
"\n", | ||
"Now you have your MLTable stored in the cloud, you and Team members can access it using a friendly name in an interactive session (for example, a notebook)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import mltable\n", | ||
"from azure.ai.ml import MLClient\n", | ||
"from azure.identity import DefaultAzureCredential\n", | ||
"\n", | ||
"# connect to the AzureML workspace\n", | ||
"ml_client = MLClient(\n", | ||
" DefaultAzureCredential(), subscription_id, resource_group, workspace\n", | ||
")\n", | ||
"\n", | ||
"# get the latest version of the data asset\n", | ||
"# Note: The version was set in the previous code cell.\n", | ||
"data_asset = ml_client.data.get(name=\"covid-delta-example\", version=VERSION)\n", | ||
"\n", | ||
"# create a table\n", | ||
"tbl = mltable.load(f\"azureml:/{data_asset.id}\")\n", | ||
"\n", | ||
"# load into pandas\n", | ||
"df = tbl.to_pandas_dataframe()\n", | ||
"df.head(5)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### 📖 Read the data asset in a job\n", | ||
"\n", | ||
"You can also access your Table in a job, using:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from azure.ai.ml import MLClient, command, Input\n", | ||
"from azure.ai.ml.entities import Environment\n", | ||
"from azure.identity import DefaultAzureCredential\n", | ||
"\n", | ||
"# connect to the AzureML workspace\n", | ||
"ml_client = MLClient(\n", | ||
" DefaultAzureCredential(), subscription_id, resource_group, workspace\n", | ||
")\n", | ||
"\n", | ||
"# get the latest version of the data asset\n", | ||
"# Note: the VERSION was set in a previous cell.\n", | ||
"data_asset = ml_client.data.get(name=\"covid-delta-example\", version=VERSION)\n", | ||
"\n", | ||
"job = command(\n", | ||
" command=\"python train.py --input ${{inputs.titanic}}\",\n", | ||
" inputs={\"titanic\": Input(type=\"mltable\", path=data_asset.id)},\n", | ||
" compute=\"cpu-cluster\",\n", | ||
" environment=Environment(\n", | ||
" image=\"mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04\",\n", | ||
" conda_file=\"./job-env/conda_dependencies.yml\",\n", | ||
" ),\n", | ||
" code=\"./src\",\n", | ||
")\n", | ||
"\n", | ||
"ml_client.jobs.create_or_update(job)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3.10 - SDK V2", | ||
"language": "python", | ||
"name": "python310-sdkv2" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.9" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
6 changes: 6 additions & 0 deletions
6
sdk/python/using-mltable/delta-lake-example/job-env/conda_dependencies.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
dependencies: | ||
- python=3.10 | ||
- pip=21.2.4 | ||
- pip: | ||
- mltable==1.3.0 | ||
- azureml-dataprep[pandas]==4.10.6 |
File renamed without changes.
Oops, something went wrong.