Skip to content

Commit

Permalink
[AutoML Images] Add mltable asset links for sdk (Azure#1970)
Browse files Browse the repository at this point in the history
* AutoML Images: Add mltable asset links; update docs

* renaming classes back to labels

* Reverting metadata update

* Fixing typo

* Fixing typo

* Refactoring mltable to MLTable
  • Loading branch information
rjaincc committed Dec 14, 2022
1 parent 3bc167a commit 2834cc8
Show file tree
Hide file tree
Showing 9 changed files with 226 additions and 202 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,9 @@
"\n",
"In order to generate models for computer vision tasks with automated machine learning, you need to bring labeled image data as input for model training in the form of an MLTable. You can create an MLTable from labeled training data in JSONL format. If your labeled training data is in a different format (like, pascal VOC or COCO), you can use a conversion script to first convert it to JSONL, and then create an MLTable. Alternatively, you can use Azure Machine Learning's [data labeling tool](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-image-labeling-projects) to manually label images, and export the labeled data to use for training your AutoML model.\n",
"\n",
"In this notebook, we use a toy dataset called Fridge Objects, which consists of 134 images of 4 classes of beverage container {can, carton, milk bottle, water bottle} photos taken on different backgrounds.\n",
"In this notebook, we use a toy dataset called Fridge Objects, which consists of 134 images of 4 classes of beverage container {`can`, `carton`, `milk bottle`, `water bottle`} photos taken on different backgrounds.\n",
"\n",
"All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE).\n",
"\n",
"Please make use of the MLTable files present within the data folder at the same location (in the repo) as this notebook."
"All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE)."
]
},
{
Expand Down Expand Up @@ -239,10 +237,11 @@
"- /carton\n",
"- /can\n",
"\n",
"This is the most common data format for multiclass image classification. Each folder title corresponds to the image label for the images contained inside. In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. Please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n",
"This is the most common data format for multiclass image classification. Each folder title corresponds to the image label for the images contained inside. \n",
"\n",
"For documentation on preparing the datasets beyond this notebook, please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n",
"\n",
"The following script is creating two .jsonl files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file."
"In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. The following script is creating two `.jsonl` files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file. For further details on jsonl file used for image classification task in automated ml, please refer to the [data schema documentation for multi-class image classification task](https://learn.microsoft.com/en-us/azure/machine-learning/reference-automl-images-schema#image-classification-binarymulti-class)."
]
},
{
Expand Down Expand Up @@ -308,7 +307,11 @@
"source": [
"## 2.4. Create MLTable data input\n",
"\n",
"Create MLTable data input using the jsonl files created above."
"Create MLTable data input using the jsonl files created above.\n",
"\n",
"For documentation on creating your own MLTable assets for jobs beyond this notebook, please refer to below resources\n",
"- [MLTable YAML Schema](https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-mltable) - covers how to write MLTable YAML, which is required for each MLTable asset.\n",
"- [Create MLTable data asset](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets?tabs=Python-SDK#create-a-mltable-data-asset) - covers how to create MLTable data asset. "
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,10 @@
"\n",
"In order to generate models for computer vision tasks with automated machine learning, you need to bring labeled image data as input for model training in the form of an MLTable. You can create an MLTable from labeled training data in JSONL format. If your labeled training data is in a different format (like, pascal VOC or COCO), you can use a conversion script to first convert it to JSONL, and then create an MLTable. Alternatively, you can use Azure Machine Learning's [data labeling tool](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-image-labeling-projects) to manually label images, and export the labeled data to use for training your AutoML model.\n",
"\n",
"In this notebook, we use a toy dataset called Fridge Objects, which consists of 128 images of 4 labels of beverage container {can, carton, milk bottle, water bottle} photos taken on different backgrounds. It also includes a labels file in .csv format. This is one of the most common data formats for Image Classification Multi-Label: one csv file that contains the mapping of labels to a folder of images.\n",
"In this notebook, we use a toy dataset called Fridge Objects, which consists of 128 images of 4 labels of beverage container {`can`, `carton`, `milk bottle`, `water bottle`} photos taken on different backgrounds. It also includes a labels file in `.csv` format. This is one of the most common data formats for Image Classification Multi-Label: one csv file that contains the mapping of labels to a folder of images.\n",
"\n",
"All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE).\n",
"\n",
"Please make use of the MLTable files present within the data folder at the same location (in the repo) as this notebook."
"All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE)."
]
},
{
Expand Down Expand Up @@ -227,10 +226,12 @@
"source": [
"## 2.3. Convert the downloaded data to JSONL\n",
"\n",
"In this example, the fridge object dataset is annotated in the CSV file, where each image corresponds to a line. It defines a mapping of the filename to the labels. Since this is a multi-label classification problem, each image can be associated to multiple labels. In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. Please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n",
"In this example, the fridge object dataset is annotated in the CSV file, where each image corresponds to a line. It defines a mapping of the filename to the labels. Since this is a multi-label classification problem, each image can be associated to multiple labels. \n",
"\n",
"For documentation on preparing the datasets beyond this notebook, please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n",
"\n",
"The following script is creating two .jsonl files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file."
"\n",
"In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. The following script is creating two `.jsonl` files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file. For further details on jsonl file used for image classification task in automated ml, please refer to the [data schema documentation for multi-label image classification task](https://learn.microsoft.com/en-us/azure/machine-learning/reference-automl-images-schema#image-classification-multi-label)."
]
},
{
Expand Down Expand Up @@ -297,7 +298,11 @@
"source": [
"## 2.4. Create MLTable data input\n",
"\n",
"Create MLTable data input using the jsonl files created above."
"Create MLTable data input using the jsonl files created above.\n",
"\n",
"For documentation on creating your own MLTable assets for jobs beyond this notebook, please refer to below resources\n",
"- [MLTable YAML Schema](https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-mltable) - covers how to write MLTable YAML, which is required for each MLTable asset.\n",
"- [Create MLTable data asset](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets?tabs=Python-SDK#create-a-mltable-data-asset) - covers how to create MLTable data asset. "
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,9 @@
"\n",
"In order to generate models for computer vision tasks with automated machine learning, you need to bring labeled image data as input for model training in the form of an MLTable. You can create an MLTable from labeled training data in JSONL format. If your labeled training data is in a different format (like, pascal VOC or COCO), you can use a conversion script to first convert it to JSONL, and then create an MLTable. Alternatively, you can use Azure Machine Learning's [data labeling tool](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-image-labeling-projects) to manually label images, and export the labeled data to use for training your AutoML model.\n",
"\n",
"In this notebook, we use a toy dataset called Fridge Objects, which consists of 128 images of 4 labels of beverage container {can, carton, milk bottle, water bottle} photos taken on different backgrounds.\n",
"In this notebook, we use a toy dataset called Fridge Objects, which consists of 128 images of 4 labels of beverage container {`can`, `carton`, `milk bottle`, `water bottle`} photos taken on different backgrounds.\n",
"\n",
"All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE).\n",
"\n",
"Please make use of the MLTable files present within the data folder at the same location (in the repo) as this notebook."
"All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE)."
]
},
{
Expand Down Expand Up @@ -191,7 +189,7 @@
"\n",
"In order to use the data for training in Azure ML, we upload it to our default Azure Blob Storage of our Azure ML Workspace.\n",
"\n",
"Reference to URI FOLDER data asset example for further details: https://github.com/Azure/azureml-examples/blob/samuel100/data-samples/sdk/assets/data/data.ipynb"
"[Check this notebook for AML data asset example](../../../assets/data/data.ipynb)"
]
},
{
Expand Down Expand Up @@ -226,10 +224,11 @@
"metadata": {},
"source": [
"## 2.3. Convert the downloaded data to JSONL\n",
"In this example, the fridge object dataset is annotated in Pascal VOC format, where each image corresponds to an xml file. Each xml file contains information on where its corresponding image file is located and also contains information about the bounding boxes and the object labels. In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. Please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n",
"In this example, the fridge object dataset is annotated in Pascal VOC format, where each image corresponds to an xml file. Each xml file contains information on where its corresponding image file is located and also contains information about the bounding boxes and the object labels. \n",
"\n",
"For documentation on preparing the datasets beyond this notebook, please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n",
"\n",
"The following script is creating two .jsonl files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file."
"In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. The following script is creating two `.jsonl` files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file. For further details on jsonl file used for image classification task in automated ml, please refer to the [data schema documentation for image instance segmentation task](https://learn.microsoft.com/en-us/azure/machine-learning/reference-automl-images-schema#instance-segmentation)."
]
},
{
Expand Down Expand Up @@ -260,7 +259,11 @@
"source": [
"## 2.4. Create MLTable data input\n",
"\n",
"Create MLTable data input using the jsonl files created above."
"Create MLTable data input using the jsonl files created above.\n",
"\n",
"For documentation on creating your own MLTable assets for jobs beyond this notebook, please refer to below resources\n",
"- [MLTable YAML Schema](https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-mltable) - covers how to write MLTable YAML, which is required for each MLTable asset.\n",
"- [Create MLTable data asset](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets?tabs=Python-SDK#create-a-mltable-data-asset) - covers how to create MLTable data asset. "
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,15 @@
"\n",
"In order to generate models for computer vision tasks with automated machine learning, you need to bring labeled image data as input for model training in the form of an MLTable. You can create an MLTable from labeled training data in JSONL format. If your labeled training data is in a different format (like, pascal VOC or COCO), you can use a conversion script to first convert it to JSONL, and then create an MLTable. Alternatively, you can use Azure Machine Learning's [data labeling tool](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-image-labeling-projects) to manually label images, and export the labeled data to use for training your AutoML model.\n",
"\n",
"In this notebook, we use a toy dataset called Fridge Objects, which consists of 128 images of 4 labels of beverage container {can, carton, milk bottle, water bottle} photos taken on different backgrounds.\n",
"\n",
"All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE).\n",
"\n",
"Please make use of the MLTable files present within the data folder at the same location (in the repo) as this notebook.\n",
"In this notebook, we use a toy dataset called Fridge Objects, which consists of 128 images of 4 labels of beverage container {`can`, `carton`, `milk bottle`, `water bottle`} photos taken on different backgrounds.\n",
"\n",
"All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.1. Download the Data\n",
"We first download and unzip the data locally. By default, the data would be downloaded in `./data` folder in current directory. \n",
"If you prefer to download the data at a different location, update it in `dataset_parent_dir = ...` in the next cell."
Expand Down Expand Up @@ -206,10 +209,12 @@
"source": [
"## 2.3. Convert the downloaded data to JSONL\n",
"\n",
"In this example, the fridge object dataset is annotated in Pascal VOC format, where each image corresponds to an xml file. Each xml file contains information on where its corresponding image file is located and also contains information about the bounding boxes and the object labels. In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. Please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n",
"In this example, the fridge object dataset is annotated in Pascal VOC format, where each image corresponds to an xml file. Each xml file contains information on where its corresponding image file is located and also contains information about the bounding boxes and the object labels. \n",
"\n",
"For documentation on preparing the datasets beyond this notebook, please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n",
"\n",
"\n",
"The following script is creating two .jsonl files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file."
"In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. The following script is creating two `.jsonl` files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file. For further details on jsonl file used for image classification task in automated ml, please refer to the [data schema documentation for image object-detection task](https://learn.microsoft.com/en-us/azure/machine-learning/reference-automl-images-schema#object-detection)."
]
},
{
Expand Down Expand Up @@ -336,7 +341,12 @@
"metadata": {},
"source": [
"## 2.5. Create MLTable data input\n",
"Create MLTable data input using the jsonl files created above."
"\n",
"Create MLTable data input using the jsonl files created above.\n",
"\n",
"For documentation on creating your own MLTable assets for jobs beyond this notebook, please refer to below resources\n",
"- [MLTable YAML Schema](https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-mltable) - covers how to write MLTable YAML, which is required for each MLTable asset.\n",
"- [Create MLTable data asset](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets?tabs=Python-SDK#create-a-mltable-data-asset) - covers how to create MLTable data asset. "
]
},
{
Expand Down
Loading

0 comments on commit 2834cc8

Please sign in to comment.