In [None]:
# !pip install tfx

# Data Pipeline Components for Production ML

In this last graded programming exercise of the course, you will put together all the lessons we've covered so far to handle the first three steps of a production machine learning project - Data ingestion, Data Validation, and Data Transformation.

Specifically, you will build the production data pipeline by:

*   Performing feature selection
*   Ingesting the dataset
*   Generating the statistics of the dataset
*   Creating a schema as per the domain knowledge
*   Creating schema environments
*   Visualizing the dataset anomalies
*   Preprocessing, transforming and engineering your features
*   Tracking the provenance of your data pipeline using ML Metadata

Most of these will look familiar already so try your best to do the exercises by recall or browsing the documentation. If you get stuck however, you can review the lessons in class and the ungraded labs. 

Let's begin!

## Table of Contents

- [1 - Imports](#1)
- [2 - Load the Dataset](#2)
- [3 - Feature Selection](#4)
  - [Exercise 1 - Feature Selection](#ex-1)
- [4 - Data Pipeline](#4)
  - [4.1 - Setup the Interactive Context](#4-1)
  - [4.2 - Generating Examples](#4-2)
    - [Exercise 2 - ExampleGen](#ex-2)
  - [4.3 - Computing Statistics](#4-3)
    - [Exercise 3 - StatisticsGen](#ex-3)
  - [4.4 - Inferring the Schema](#4-4)
    - [Exercise 4 - SchemaGen](#ex-4)
  - [4.5 - Curating the Schema](#4-5)
    - [Exercise 5 - Curating the Schema](#ex-5)
  - [4.6 - Schema Environments](#4-6)
    - [Exercise 6 - Define the serving environment](#ex-6)
  - [4.7 - Generate new statistics using the updated schema](#4-7)
      - [Exercise 7 - ImporterNode](#ex-7)
      - [Exercise 8 - StatisticsGen with the new schema](#ex-8)
  - [4.8 - Check anomalies](#4-8)
      - [Exercise 9 - ExampleValidator](#ex-9)
  - [4.9 - Feature Engineering](#4-9)
      - [Exercise 10 - preprocessing function](#ex-10)
      - [Exercise 11 - Transform](#ex-11)
- [5 - ML Metadata](#5)
  - [5.1 - Accessing stored artifacts](#5-1)
  - [5.2 - Tracking artifacts](#5-2)
    - [Exercise 12 - Get parent artifacts](#ex-12)

<a name='1'></a>
## 1 - Imports

In [2]:
import tensorflow as tf
import tfx

# TFX components
from tfx.components import CsvExampleGen
from tfx.components import ExampleValidator
from tfx.components import SchemaGen
from tfx.components import StatisticsGen
from tfx.components import Transform
from tfx.components import ImporterNode

# TFX libraries
import tensorflow_data_validation as tfdv
import tensorflow_transform as tft
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext


# Utilities
from tensorflow.python.lib.io import file_io
from tensorflow_metadata.proto.v0 import schema_pb2
from google.protobuf.json_format import MessageToDict
from  tfx.proto import example_gen_pb2
from tfx.types import standard_artifacts
import os
import pprint
import tempfile
import pandas as pd
import numpy as np

# To ignore warnings from TF
tf.get_logger().setLevel('ERROR')

# For formatting print statements
pp = pprint.PrettyPrinter()

# Display versions of TF and TFX related packages
print('TensorFlow version: {}'.format(tf.__version__))
print('TFX version: {}'.format(tfx.__version__))
print('TensorFlow Data Validation version: {}'.format(tfdv.__version__))
print('TensorFlow Transform version: {}'.format(tft.__version__))

TensorFlow version: 2.6.0
TFX version: 1.4.0
TensorFlow Data Validation version: 1.4.0
TensorFlow Transform version: 1.4.0


<a name='2'></a>
## 2 - Load the dataset


In [None]:
# # OPTIONAL: Just in case you want to restart the lab workspace *from scratch*, you
# # can uncomment and run this block to delete previously created files and
# # directories. 

# !rm -rf pipeline
# !rm -rf data

<a name='4'></a>
## 4 - Data Pipeline

With the selected subset of features prepared, you can now start building the data pipeline. This involves ingesting, validating, and transforming your data. You will be using the TFX components you've already encountered in the ungraded labs and you can look them up here in the [official documentation](https://www.tensorflow.org/tfx/api_docs/python/tfx/components).

<a name='4-1'></a>
### 4.1 - Setup the Interactive Context

As usual, you will first setup the Interactive Context so you can manually execute the pipeline components from the notebook. You will save the sqlite database in a pre-defined directory in your workspace. Please do not modify this path because you will need this in a later exercise involving ML Metadata.

In [3]:
# Location of the pipeline metadata store
PIPELINE_DIR = './pipeline'

# Declare the InteractiveContext and use a local sqlite file as the metadata store.
context = InteractiveContext(pipeline_root=PIPELINE_DIR)



<a name='4-2'></a>
### 4.2 - Generating Examples

The first step in the pipeline is to ingest the data. Using [ExampleGen](https://www.tensorflow.org/tfx/guide/examplegen), you can convert raw data to TFRecords for faster computation in the later stages of the pipeline.

<a name='ex-2'></a>
#### Exercise 2: ExampleGen

Use `ExampleGen` to ingest the dataset we loaded earlier. Some things to note:

* The input is in CSV format so you will need to use the appropriate type of `ExampleGen` to handle it. 
* This function accepts a *directory* path to the training data and not the CSV file path itself. 

This will take a couple of minutes to run.

In [None]:
# # NOTE: Uncomment and run this if you get an error saying there are different 
# # headers in the dataset. This is usually because of the notebook checkpoints saved in 
# # that folder.
# !rm -rf {TRAINING_DIR}/.ipynb_checkpoints
# !rm -rf {TRAINING_DIR_FSELECT}/.ipynb_checkpoints
# !rm -rf {SERVING_DIR}/.ipynb_checkpoints

In [4]:
import sys
import csv

csv.field_size_limit(sys.maxsize)

131072

In [7]:
input_config_real = example_gen_pb2.Input(splits=[
                example_gen_pb2.Input.Split(name='sample_train', pattern='/root/Applied_AI_Lab_WiSe2021_Passau/sample_train/real/*'),
                # example_gen_pb2.Input.Split(name='train', pattern='/root/Applied_AI_Lab_WiSe2021_Passau/train/real/*'),
                # example_gen_pb2.Input.Split(name='eval', pattern='/root/Applied_AI_Lab_WiSe2021_Passau/eval/real/*'),
                # example_gen_pb2.Input.Split(name='testA', pattern='/root/Applied_AI_Lab_WiSe2021_Passau/test/testA/real/*'),
                # example_gen_pb2.Input.Split(name='testB', pattern='/root/Applied_AI_Lab_WiSe2021_Passau/test/testB/real/*'),
            ])

In [8]:
### START CODE HERE

# Instantiate ExampleGen with the input CSV dataset
data_example_gen = CsvExampleGen(input_base='/root/Applied_AI_Lab_WiSe2021_Passau/', input_config=input_config_real)

# Run the component using the InteractiveContext instance
context.run(data_example_gen)

### END CODE HERE



0,1
.execution_id,43
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } CsvExampleGen at 0x7f3f7430d860.inputs{}.outputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0.exec_properties['input_base']/root/Applied_AI_Lab_WiSe2021_Passau/['input_config']{  ""splits"": [  {  ""name"": ""sample_train"",  ""pattern"": ""/root/Applied_AI_Lab_WiSe2021_Passau/sample_train/real/*""  }  ] }['output_config']{  ""split_config"": {  ""splits"": [  {  ""hash_buckets"": 2,  ""name"": ""train""  },  {  ""hash_buckets"": 1,  ""name"": ""eval""  }  ]  } }['output_data_format']6['output_file_format']5['custom_config']None['range_config']None['span']0['version']None['input_fingerprint']split:sample_train,num_files:1,total_bytes:376623873,xor_checksum:1639241382,sum_checksum:1639241382"
.component.inputs,{}
.component.outputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.inputs,{}
.outputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"
.exec_properties,"['input_base']/root/Applied_AI_Lab_WiSe2021_Passau/['input_config']{  ""splits"": [  {  ""name"": ""sample_train"",  ""pattern"": ""/root/Applied_AI_Lab_WiSe2021_Passau/sample_train/real/*""  }  ] }['output_config']{  ""split_config"": {  ""splits"": [  {  ""hash_buckets"": 2,  ""name"": ""train""  },  {  ""hash_buckets"": 1,  ""name"": ""eval""  }  ]  } }['output_data_format']6['output_file_format']5['custom_config']None['range_config']None['span']0['version']None['input_fingerprint']split:sample_train,num_files:1,total_bytes:376623873,xor_checksum:1639241382,sum_checksum:1639241382"

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0

0,1
['input_base'],/root/Applied_AI_Lab_WiSe2021_Passau/
['input_config'],"{  ""splits"": [  {  ""name"": ""sample_train"",  ""pattern"": ""/root/Applied_AI_Lab_WiSe2021_Passau/sample_train/real/*""  }  ] }"
['output_config'],"{  ""split_config"": {  ""splits"": [  {  ""hash_buckets"": 2,  ""name"": ""train""  },  {  ""hash_buckets"": 1,  ""name"": ""eval""  }  ]  } }"
['output_data_format'],6
['output_file_format'],5
['custom_config'],
['range_config'],
['span'],0
['version'],
['input_fingerprint'],"split:sample_train,num_files:1,total_bytes:376623873,xor_checksum:1639241382,sum_checksum:1639241382"

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0


<a name='4-3'></a>
### 4.3 - Computing Statistics

Next, you will compute the statistics of your data. This will allow you to observe and analyze characteristics of your data through visualizations provided by the integrated [FACETS](https://pair-code.github.io/facets/) library.

<a name='ex-3'></a>
#### Exercise 3: StatisticsGen

Use [StatisticsGen](https://www.tensorflow.org/tfx/guide/statsgen) to compute the statistics of the output examples of `ExampleGen`. 

In [9]:
### START CODE HERE
stats_options = tfdv.StatsOptions(enable_semantic_domain_stats= True,
                                feature_allowlist=['image_w', 'query', 'image_h', 'num_boxes']
                                )
# Instantiate StatisticsGen with the ExampleGen ingested dataset
meta_statistics_gen = StatisticsGen(examples=data_example_gen.outputs['examples'], stats_options=stats_options) 

# Run the component
context.run(meta_statistics_gen)
### END CODE HERE



0,1
.execution_id,44
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } StatisticsGen at 0x7f3f74333550.inputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0.outputs['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7419d780.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/44) at 0x7f3f6feeecc0.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/44.span0.split_names[""train"", ""eval""].exec_properties['stats_options_json']{""_generators"": null, ""_feature_allowlist"": [""image_w"", ""query"", ""image_h"", ""num_boxes""], ""_schema"": null, ""label_feature"": null, ""weight_feature"": null, ""_slice_functions"": null, ""_sample_rate"": null, ""num_top_values"": 20, ""frequency_threshold"": 1, ""weighted_frequency_threshold"": 1.0, ""num_rank_histogram_buckets"": 1000, ""_num_values_histogram_buckets"": 10, ""_num_histogram_buckets"": 10, ""_num_quantiles_histogram_buckets"": 10, ""epsilon"": 0.01, ""infer_type_from_schema"": false, ""_desired_batch_size"": null, ""enable_semantic_domain_stats"": true, ""_semantic_domain_stats_sample_rate"": null, ""_per_feature_weight_override"": null, ""_vocab_paths"": null, ""_add_default_generators"": true, ""_use_sketch_based_topk_uniques"": false, ""_slice_sqls"": null}['exclude_splits'][]"
.component.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"
.component.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7419d780.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/44) at 0x7f3f6feeecc0.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/44.span0.split_names[""train"", ""eval""]"

0,1
.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"
.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7419d780.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/44) at 0x7f3f6feeecc0.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/44.span0.split_names[""train"", ""eval""]"
.exec_properties,"['stats_options_json']{""_generators"": null, ""_feature_allowlist"": [""image_w"", ""query"", ""image_h"", ""num_boxes""], ""_schema"": null, ""label_feature"": null, ""weight_feature"": null, ""_slice_functions"": null, ""_sample_rate"": null, ""num_top_values"": 20, ""frequency_threshold"": 1, ""weighted_frequency_threshold"": 1.0, ""num_rank_histogram_buckets"": 1000, ""_num_values_histogram_buckets"": 10, ""_num_histogram_buckets"": 10, ""_num_quantiles_histogram_buckets"": 10, ""epsilon"": 0.01, ""infer_type_from_schema"": false, ""_desired_batch_size"": null, ""enable_semantic_domain_stats"": true, ""_semantic_domain_stats_sample_rate"": null, ""_per_feature_weight_override"": null, ""_vocab_paths"": null, ""_add_default_generators"": true, ""_use_sketch_based_topk_uniques"": false, ""_slice_sqls"": null}['exclude_splits'][]"

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7419d780.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/44) at 0x7f3f6feeecc0.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/44.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/44) at 0x7f3f6feeecc0.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/44.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/44) at 0x7f3f6feeecc0.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/44.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/44
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['stats_options_json'],"{""_generators"": null, ""_feature_allowlist"": [""image_w"", ""query"", ""image_h"", ""num_boxes""], ""_schema"": null, ""label_feature"": null, ""weight_feature"": null, ""_slice_functions"": null, ""_sample_rate"": null, ""num_top_values"": 20, ""frequency_threshold"": 1, ""weighted_frequency_threshold"": 1.0, ""num_rank_histogram_buckets"": 1000, ""_num_values_histogram_buckets"": 10, ""_num_histogram_buckets"": 10, ""_num_quantiles_histogram_buckets"": 10, ""epsilon"": 0.01, ""infer_type_from_schema"": false, ""_desired_batch_size"": null, ""enable_semantic_domain_stats"": true, ""_semantic_domain_stats_sample_rate"": null, ""_per_feature_weight_override"": null, ""_vocab_paths"": null, ""_add_default_generators"": true, ""_use_sketch_based_topk_uniques"": false, ""_slice_sqls"": null}"
['exclude_splits'],[]

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7419d780.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/44) at 0x7f3f6feeecc0.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/44.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/44) at 0x7f3f6feeecc0.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/44.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/44) at 0x7f3f6feeecc0.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/44.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/44
.span,0
.split_names,"[""train"", ""eval""]"


In [10]:
# Display the results
context.show(meta_statistics_gen.outputs['statistics'])

<a name='4-4'></a>
### 4.4 - Inferring the Schema

You will need to create a schema to validate incoming datasets during training and serving. Fortunately, TFX allows you to infer a first draft of this schema with the [SchemaGen](https://www.tensorflow.org/tfx/guide/schemagen) component.

<a name='ex-4'></a>
#### Exercise 4: SchemaGen

Use `SchemaGen` to infer a schema based on the computed statistics of `StatisticsGen`.

In [11]:
### START CODE HERE

# Instantiate StatisticsGen with the ExampleGen ingested dataset
data_statistics_gen = StatisticsGen(examples=data_example_gen.outputs['examples']) 

# Run the component
context.run(data_statistics_gen)
### END CODE HERE



0,1
.execution_id,45
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } StatisticsGen at 0x7f3f742040b8.inputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0.outputs['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""].exec_properties['stats_options_json']None['exclude_splits'][]"
.component.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"
.component.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"
.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"
.exec_properties,['stats_options_json']None['exclude_splits'][]

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/45
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['stats_options_json'],
['exclude_splits'],[]

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/45
.span,0
.split_names,"[""train"", ""eval""]"


In [12]:
### START CODE HERE
# Instantiate SchemaGen with the output statistics from the StatisticsGen
data_schema_gen = SchemaGen(
    statistics=data_statistics_gen.outputs['statistics'],
    )
    
# Run the component
context.run(data_schema_gen)
### END CODE HERE

0,1
.execution_id,46
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } SchemaGen at 0x7f3f6fe48a90.inputs['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""].outputs['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6fe48080.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/SchemaGen/schema/46) at 0x7f3f6fe48898.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/SchemaGen/schema/46.exec_properties['infer_feature_shape']1['exclude_splits'][]"
.component.inputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"
.component.outputs,['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6fe48080.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/SchemaGen/schema/46) at 0x7f3f6fe48898.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/SchemaGen/schema/46

0,1
.inputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"
.outputs,['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6fe48080.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/SchemaGen/schema/46) at 0x7f3f6fe48898.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/SchemaGen/schema/46
.exec_properties,['infer_feature_shape']1['exclude_splits'][]

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/45
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6fe48080.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/SchemaGen/schema/46) at 0x7f3f6fe48898.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/SchemaGen/schema/46

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/SchemaGen/schema/46) at 0x7f3f6fe48898.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/SchemaGen/schema/46

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/SchemaGen/schema/46) at 0x7f3f6fe48898.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/SchemaGen/schema/46

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,./pipeline/SchemaGen/schema/46

0,1
['infer_feature_shape'],1
['exclude_splits'],[]

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f7418cd30.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/45) at 0x7f3f7419dcf8.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/45.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/45
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6fe48080.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/SchemaGen/schema/46) at 0x7f3f6fe48898.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/SchemaGen/schema/46

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/SchemaGen/schema/46) at 0x7f3f6fe48898.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/SchemaGen/schema/46

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/SchemaGen/schema/46) at 0x7f3f6fe48898.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/SchemaGen/schema/46

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,./pipeline/SchemaGen/schema/46


In [13]:
# Visualize the output
context.show(data_schema_gen.outputs['schema'])

Unnamed: 0_level_0,Type,Presence,Valency,Domain
Feature name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
'boxes',BYTES,required,,-
'class_labels',BYTES,required,,-
'features',BYTES,required,,-
'image_h',INT,required,,-
'image_w',INT,required,,-
'num_boxes',INT,required,,-
'product_id',INT,required,,-
'query',BYTES,required,,-
'query_id',INT,required,,-


<a name='4-5'></a>
### 4.5 - Curating the schema

You can see that the inferred schema is able to capture the data types correctly and also able to show the expected values for the qualitative (i.e. string) data. You can still fine-tune this however. For instance, we have features where we expect a certain range.

You want to update your schema to take note of these so the pipeline can detect if invalid values are being fed to the model.

<a name='ex-5'></a>
#### Exercise 5: Curating the Schema

Use [TFDV](https://www.tensorflow.org/tfx/data_validation/get_started) to update the inferred schema to restrict a range of values to the features mentioned above.

In [14]:
try:
    # Get the schema uri
    data_schema_uri = data_schema_gen.outputs['schema']._artifacts[0].uri
    
# for grading since context.run() does not work outside the notebook
except IndexError:
    print("context.run() was no-op")
    schema_path = './pipeline/SchemaGen/data_schema'
    dir_id = os.listdir(schema_path)[0]
    data_schema_uri = f'{schema_path}/{dir_id}'

In [15]:
# Get the schema pbtxt file from the SchemaGen output
data_schema = tfdv.load_schema_text(os.path.join(data_schema_uri, 'schema.pbtxt'))

In [16]:
tfdv.display_schema(schema=data_schema)

Unnamed: 0_level_0,Type,Presence,Valency,Domain
Feature name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
'boxes',BYTES,required,,-
'class_labels',BYTES,required,,-
'features',BYTES,required,,-
'image_h',INT,required,,-
'image_w',INT,required,,-
'num_boxes',INT,required,,-
'product_id',INT,required,,-
'query',BYTES,required,,-
'query_id',INT,required,,-


<a name='4-6'></a>
### 4.6 - Curating the schema

#### We can set the rang of our features vbased on the observed ranges from above statistics ^^

In [17]:
### START CODE HERE ###

tfdv.set_domain(data_schema, 'query', schema_pb2.StringDomain())

### END CODE HERE ###

tfdv.display_schema(schema=data_schema)

Unnamed: 0_level_0,Type,Presence,Valency,Domain
Feature name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
'boxes',BYTES,required,,-
'class_labels',BYTES,required,,-
'features',BYTES,required,,-
'image_h',INT,required,,-
'image_w',INT,required,,-
'num_boxes',INT,required,,-
'product_id',INT,required,,-
'query',STRING,required,,'query_domain'
'query_id',INT,required,,-


Unnamed: 0_level_0,Values
Domain,Unnamed: 1_level_1
'query_domain',


#### Generate statistics from curated schema

In [18]:
# Declare StatsOptions to use the curated schema
stats_options = tfdv.StatsOptions(schema=data_schema, infer_type_from_schema=True,
feature_allowlist=['image_w', 'query', 'image_h', 'num_boxes'],
num_rank_histogram_buckets=20,
enable_semantic_domain_stats=True)

### START CODE HERE

# Instantiate StatisticsGen with the ExampleGen ingested dataset
data_statistics_gen_curated = StatisticsGen(examples=data_example_gen.outputs['examples'], stats_options=stats_options)

# Run the component
context.run(data_statistics_gen_curated)
### END CODE HERE



0,1
.execution_id,47
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } StatisticsGen at 0x7f3f6ffc81d0.inputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0.outputs['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f6fe347f0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/47) at 0x7f3f6ffc8d68.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/47.span0.split_names[""train"", ""eval""].exec_properties['stats_options_json']{""_generators"": null, ""_feature_allowlist"": [""image_w"", ""query"", ""image_h"", ""num_boxes""], ""label_feature"": null, ""weight_feature"": null, ""_slice_functions"": null, ""_sample_rate"": null, ""num_top_values"": 20, ""frequency_threshold"": 1, ""weighted_frequency_threshold"": 1.0, ""num_rank_histogram_buckets"": 20, ""_num_values_histogram_buckets"": 10, ""_num_histogram_buckets"": 10, ""_num_quantiles_histogram_buckets"": 10, ""epsilon"": 0.01, ""infer_type_from_schema"": true, ""_desired_batch_size"": null, ""enable_semantic_domain_stats"": true, ""_semantic_domain_stats_sample_rate"": null, ""_per_feature_weight_override"": null, ""_vocab_paths"": null, ""_add_default_generators"": true, ""_use_sketch_based_topk_uniques"": false, ""_slice_sqls"": null, ""schema_json"": ""{\n \""feature\"": [\n {\n \""name\"": \""boxes\"",\n \""type\"": \""BYTES\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""class_labels\"",\n \""type\"": \""BYTES\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""features\"",\n \""type\"": \""BYTES\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""image_h\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""image_w\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""num_boxes\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""product_id\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""query\"",\n \""type\"": \""BYTES\"",\n \""stringDomain\"": {},\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""query_id\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n }\n ]\n}""}['exclude_splits'][]"
.component.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"
.component.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f6fe347f0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/47) at 0x7f3f6ffc8d68.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/47.span0.split_names[""train"", ""eval""]"

0,1
.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"
.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f6fe347f0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/47) at 0x7f3f6ffc8d68.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/47.span0.split_names[""train"", ""eval""]"
.exec_properties,"['stats_options_json']{""_generators"": null, ""_feature_allowlist"": [""image_w"", ""query"", ""image_h"", ""num_boxes""], ""label_feature"": null, ""weight_feature"": null, ""_slice_functions"": null, ""_sample_rate"": null, ""num_top_values"": 20, ""frequency_threshold"": 1, ""weighted_frequency_threshold"": 1.0, ""num_rank_histogram_buckets"": 20, ""_num_values_histogram_buckets"": 10, ""_num_histogram_buckets"": 10, ""_num_quantiles_histogram_buckets"": 10, ""epsilon"": 0.01, ""infer_type_from_schema"": true, ""_desired_batch_size"": null, ""enable_semantic_domain_stats"": true, ""_semantic_domain_stats_sample_rate"": null, ""_per_feature_weight_override"": null, ""_vocab_paths"": null, ""_add_default_generators"": true, ""_use_sketch_based_topk_uniques"": false, ""_slice_sqls"": null, ""schema_json"": ""{\n \""feature\"": [\n {\n \""name\"": \""boxes\"",\n \""type\"": \""BYTES\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""class_labels\"",\n \""type\"": \""BYTES\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""features\"",\n \""type\"": \""BYTES\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""image_h\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""image_w\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""num_boxes\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""product_id\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""query\"",\n \""type\"": \""BYTES\"",\n \""stringDomain\"": {},\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""query_id\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n }\n ]\n}""}['exclude_splits'][]"

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f6fe347f0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/47) at 0x7f3f6ffc8d68.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/47.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/47) at 0x7f3f6ffc8d68.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/47.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/47) at 0x7f3f6ffc8d68.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/47.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/47
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['stats_options_json'],"{""_generators"": null, ""_feature_allowlist"": [""image_w"", ""query"", ""image_h"", ""num_boxes""], ""label_feature"": null, ""weight_feature"": null, ""_slice_functions"": null, ""_sample_rate"": null, ""num_top_values"": 20, ""frequency_threshold"": 1, ""weighted_frequency_threshold"": 1.0, ""num_rank_histogram_buckets"": 20, ""_num_values_histogram_buckets"": 10, ""_num_histogram_buckets"": 10, ""_num_quantiles_histogram_buckets"": 10, ""epsilon"": 0.01, ""infer_type_from_schema"": true, ""_desired_batch_size"": null, ""enable_semantic_domain_stats"": true, ""_semantic_domain_stats_sample_rate"": null, ""_per_feature_weight_override"": null, ""_vocab_paths"": null, ""_add_default_generators"": true, ""_use_sketch_based_topk_uniques"": false, ""_slice_sqls"": null, ""schema_json"": ""{\n \""feature\"": [\n {\n \""name\"": \""boxes\"",\n \""type\"": \""BYTES\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""class_labels\"",\n \""type\"": \""BYTES\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""features\"",\n \""type\"": \""BYTES\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""image_h\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""image_w\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""num_boxes\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""product_id\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""query\"",\n \""type\"": \""BYTES\"",\n \""stringDomain\"": {},\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n },\n {\n \""name\"": \""query_id\"",\n \""type\"": \""INT\"",\n \""presence\"": {\n \""minFraction\"": 1.0,\n \""minCount\"": \""1\""\n },\n \""shape\"": {\n \""dim\"": [\n {\n \""size\"": \""1\""\n }\n ]\n }\n }\n ]\n}""}"
['exclude_splits'],[]

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f6fe347f0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/47) at 0x7f3f6ffc8d68.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/47.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/47) at 0x7f3f6ffc8d68.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/47.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/47) at 0x7f3f6ffc8d68.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/47.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/47
.span,0
.split_names,"[""train"", ""eval""]"


In [19]:
# Display the results
context.show(data_statistics_gen_curated.outputs['statistics'])

We can now save this curated schema in a local directory so we can import it to our TFX pipeline.

In [20]:
# Declare the path to the updated schema directory
UPDATED_SCHEMA_DIR = f'{PIPELINE_DIR}/updated_schema'

# Create the said directory
!mkdir -p {UPDATED_SCHEMA_DIR}

# Declare the path to the schema file
schema_file = os.path.join(UPDATED_SCHEMA_DIR, 'schema.pbtxt')

# Save the curated schema to the said file
tfdv.write_schema_text(data_schema, schema_file)

<a name='4-7'></a>
### 4.7 - Generate new statistics using the updated schema

You will now compute the statistics using the schema you just curated. Remember though that TFX components interact with each other by getting artifact information from the metadata store. So you first have to import the curated schema file into ML Metadata. You will do that by using an [ImporterNode](https://www.tensorflow.org/tfx/guide/statsgen#using_the_statsgen_component_with_a_schema) to create an artifact representing the curated schema.

<a name='ex-7'></a>
#### Exercise 7: ImporterNode

Complete the code below to create a `Schema` artifact that points to the curated schema directory. Pass in an `instance_name` as well and name it `import_user_schema`.

In [21]:
### START CODE HERE ###

# Use an ImporterNode to put the curated schema to ML Metadata
user_schema_importer = ImporterNode(
    source_uri=UPDATED_SCHEMA_DIR,
    artifact_type=standard_artifacts.Schema
)

# Run the component
context.run(user_schema_importer, enable_cache=False)

### END CODE HERE ###

context.show(user_schema_importer.outputs['result'])

Unnamed: 0_level_0,Type,Presence,Valency,Domain
Feature name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
'boxes',BYTES,required,,-
'class_labels',BYTES,required,,-
'features',BYTES,required,,-
'image_h',INT,required,,-
'image_w',INT,required,,-
'num_boxes',INT,required,,-
'product_id',INT,required,,-
'query',STRING,required,,'query_domain'
'query_id',INT,required,,-


Unnamed: 0_level_0,Values
Domain,Unnamed: 1_level_1
'query_domain',


With the artifact successfully created, you can now use `StatisticsGen` and pass in a `schema` parameter to use the curated schema.

<a name='ex-8'></a>
#### Exercise 8: Statistics with the new schema

Use `StatisticsGen` to compute the statistics with the schema you updated in the previous section. Remember to use the `stats_options` paremeter too to tell `StatisticsGen` to infer the data types from this new schema.

In [22]:
### START CODE HERE ###
# Use StatisticsGen to compute the statistics using the curated schema
stats_options = tfdv.StatsOptions(num_rank_histogram_buckets=20,
                                    enable_semantic_domain_stats=True)

statistics_gen_updated = StatisticsGen(examples=data_example_gen.outputs['examples'], 
                                       stats_options=stats_options,
                                      schema=user_schema_importer.outputs['result'])
    

# Run the component
context.run(statistics_gen_updated)
### END CODE HERE ###



0,1
.execution_id,49
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } StatisticsGen at 0x7f3f6d421048.inputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6d4216a0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/updated_schema) at 0x7f3f6d421240.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/updated_schema.outputs['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f75f50e10.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/49) at 0x7f3f6d415208.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/49.span0.split_names[""train"", ""eval""].exec_properties['stats_options_json']{""_generators"": null, ""_feature_allowlist"": null, ""_schema"": null, ""label_feature"": null, ""weight_feature"": null, ""_slice_functions"": null, ""_sample_rate"": null, ""num_top_values"": 20, ""frequency_threshold"": 1, ""weighted_frequency_threshold"": 1.0, ""num_rank_histogram_buckets"": 20, ""_num_values_histogram_buckets"": 10, ""_num_histogram_buckets"": 10, ""_num_quantiles_histogram_buckets"": 10, ""epsilon"": 0.01, ""infer_type_from_schema"": false, ""_desired_batch_size"": null, ""enable_semantic_domain_stats"": true, ""_semantic_domain_stats_sample_rate"": null, ""_per_feature_weight_override"": null, ""_vocab_paths"": null, ""_add_default_generators"": true, ""_use_sketch_based_topk_uniques"": false, ""_slice_sqls"": null}['exclude_splits'][]"
.component.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6d4216a0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/updated_schema) at 0x7f3f6d421240.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/updated_schema"
.component.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f75f50e10.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/49) at 0x7f3f6d415208.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/49.span0.split_names[""train"", ""eval""]"

0,1
.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6d4216a0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/updated_schema) at 0x7f3f6d421240.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/updated_schema"
.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f75f50e10.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/49) at 0x7f3f6d415208.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/49.span0.split_names[""train"", ""eval""]"
.exec_properties,"['stats_options_json']{""_generators"": null, ""_feature_allowlist"": null, ""_schema"": null, ""label_feature"": null, ""weight_feature"": null, ""_slice_functions"": null, ""_sample_rate"": null, ""num_top_values"": 20, ""frequency_threshold"": 1, ""weighted_frequency_threshold"": 1.0, ""num_rank_histogram_buckets"": 20, ""_num_values_histogram_buckets"": 10, ""_num_histogram_buckets"": 10, ""_num_quantiles_histogram_buckets"": 10, ""epsilon"": 0.01, ""infer_type_from_schema"": false, ""_desired_batch_size"": null, ""enable_semantic_domain_stats"": true, ""_semantic_domain_stats_sample_rate"": null, ""_per_feature_weight_override"": null, ""_vocab_paths"": null, ""_add_default_generators"": true, ""_use_sketch_based_topk_uniques"": false, ""_slice_sqls"": null}['exclude_splits'][]"

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6d4216a0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/updated_schema) at 0x7f3f6d421240.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/updated_schema

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/updated_schema) at 0x7f3f6d421240.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/updated_schema

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/updated_schema) at 0x7f3f6d421240.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/updated_schema

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,./pipeline/updated_schema

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f75f50e10.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/49) at 0x7f3f6d415208.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/49.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/49) at 0x7f3f6d415208.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/49.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/49) at 0x7f3f6d415208.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/49.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/49
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['stats_options_json'],"{""_generators"": null, ""_feature_allowlist"": null, ""_schema"": null, ""label_feature"": null, ""weight_feature"": null, ""_slice_functions"": null, ""_sample_rate"": null, ""num_top_values"": 20, ""frequency_threshold"": 1, ""weighted_frequency_threshold"": 1.0, ""num_rank_histogram_buckets"": 20, ""_num_values_histogram_buckets"": 10, ""_num_histogram_buckets"": 10, ""_num_quantiles_histogram_buckets"": 10, ""epsilon"": 0.01, ""infer_type_from_schema"": false, ""_desired_batch_size"": null, ""enable_semantic_domain_stats"": true, ""_semantic_domain_stats_sample_rate"": null, ""_per_feature_weight_override"": null, ""_vocab_paths"": null, ""_add_default_generators"": true, ""_use_sketch_based_topk_uniques"": false, ""_slice_sqls"": null}"
['exclude_splits'],[]

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f3f7430d6d8.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f3f6d4216a0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/updated_schema) at 0x7f3f6d421240.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/updated_schema

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ./pipeline/CsvExampleGen/examples/43) at 0x7f41a5841a90.type<class 'tfx.types.standard_artifacts.Examples'>.uri./pipeline/CsvExampleGen/examples/43.span0.split_names[""train"", ""eval""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,./pipeline/CsvExampleGen/examples/43
.span,0
.split_names,"[""train"", ""eval""]"
.version,0

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/updated_schema) at 0x7f3f6d421240.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/updated_schema

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: ./pipeline/updated_schema) at 0x7f3f6d421240.type<class 'tfx.types.standard_artifacts.Schema'>.uri./pipeline/updated_schema

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,./pipeline/updated_schema

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f3f75f50e10.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/49) at 0x7f3f6d415208.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/49.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/49) at 0x7f3f6d415208.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/49.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: ./pipeline/StatisticsGen/statistics/49) at 0x7f3f6d415208.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri./pipeline/StatisticsGen/statistics/49.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,./pipeline/StatisticsGen/statistics/49
.span,0
.split_names,"[""train"", ""eval""]"


In [23]:
context.show(statistics_gen_updated.outputs['statistics'])

<a name='4-10'></a>
### 4.10 - Feature engineering

You will now proceed to transforming your features to a form suitable for training a model. This can include several methods such as scaling and converting strings to vocabulary indices. It is important for these transformations to be consistent across your training data, and also for the serving data when the model is deployed for inference. TFX ensures this by generating a graph that will process incoming data both during training and inference.

Let's first declare the constants and utility function you will use for the exercise.

In [24]:
# Set the constants module filename
_multimodal_constants_module_file = 'multimodal_constants.py'

In [25]:
%%writefile {_multimodal_constants_module_file}

IMAGE = [
        "image_h",
        "image_w",
        "num_boxes",
        "boxes",
        "features",
        "class_labels"
    ]

QUERY = [
        "query",
        "query_id"
    ]

PRODUCT = [
        "product_id"
    ]

LABEL_KEY = "relevancy"

# Utility function for renaming the feature
def transformed_name(key):
    return key + '_xf'

Overwriting multimodal_constants.py


Next you will define the `preprocessing_fn` to apply transformations to the features. 

<a name='ex-10'></a>
#### Exercise 10: Preprocessing function

Complete the module to transform your features. Refer to the code comments to get hints on what operations to perform.

Here are some links to the docs of the functions you will need to complete this function:

- [`tft.scale_by_min_max`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/scale_by_min_max)
- [`tft.scale_to_0_1`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/scale_to_0_1)
- [`tft.scale_to_z_score`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/scale_to_z_score)
- [`tft.compute_and_apply_vocabulary`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/compute_and_apply_vocabulary)
- [`tft.hash_strings`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/hash_strings)

In [26]:
# Set the transform module filename
_multimodal_transform_module_file = 'multimodal_transform.py'

In [62]:
%%writefile {_multimodal_transform_module_file}

import tensorflow as tf
import tensorflow_transform as tft

import numpy as np

import base64

import multimodal_constants

_IMAGE = multimodal_constants.IMAGE
_QUERY = multimodal_constants.QUERY
_PRODUCT = multimodal_constants.PRODUCT
_LABEL_KEY = multimodal_constants.LABEL_KEY
_transformed_name = multimodal_constants.transformed_name

def preprocessing_fn(inputs):

    features_dict = {}

    ### START CODE HERE ###

    ### Decode boxes BYTES to tensor
    # print(inputs[_IMAGE[3]])
    # boxes = tf.io.decode_base64(inputs[_IMAGE[3]])
    # print(boxes)
    # boxes = tf.frombuffer(tf.io.decode_raw(boxes, tf.float32))
    # print(boxes)
    y = tf.io.decode_base64(inputs[_IMAGE[3]], pad=True)
    # z = tf.compat.bytes_or_text_types(inputs[_IMAGE[3]])
    print(y)
    # x = np.frombuffer(base64.b64decode(), dtype=np.float32).reshape(inputs[_IMAGE[2]], 4)
    # print(x)
    # features_dict[_transformed_name(_IMAGE[3])] = 
    # print(features_dict[_transformed_name(_IMAGE[3])])

    
    ### END CODE HERE ###  

    # No change in the label
    # features_dict[_LABEL_KEY] = inputs[_LABEL_KEY]

    return features_dict


Overwriting multimodal_transform.py


<a name='ex-11'></a>
#### Exercise 11: Transform

Use the [TFX Transform component](https://www.tensorflow.org/tfx/api_docs/python/tfx/components/Transform) to perform the transformations and generate the transformation graph. You will need to pass in the dataset examples, *curated* schema, and the module that contains the preprocessing function.

In [63]:
from tfx.v1.proto import SplitsConfig


### START CODE HERE ###
# Instantiate the Transform component

splits = SplitsConfig(analyze='sample_train', transform='sample_train')

transform = Transform(
    examples=data_example_gen.outputs['examples'],
    schema=user_schema_importer.outputs['result'],
    module_file=os.path.abspath(_multimodal_transform_module_file)
    )
    
    
    
### END CODE HERE ###

# Run the component
context.run(transform, enable_cache=False)


TypeError: decode_base64() got an unexpected keyword argument 'pad'

Let's inspect a few examples of the transformed dataset to see if the transformations are done correctly.

In [None]:
try:
    transform_uri = transform.outputs['transformed_examples'].get()[0].uri

# for grading since context.run() does not work outside the notebook
except IndexError:
    print("context.run() was no-op")
    examples_path = './pipeline/Transform/transformed_examples'
    dir_id = os.listdir(examples_path)[0]
    transform_uri = f'{examples_path}/{dir_id}'

In [None]:
# Get the URI of the output artifact representing the transformed examples
train_uri = os.path.join(transform_uri, 'train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]

# Create a `TFRecordDataset` to read these files
transformed_dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

In [None]:
# import helper function to get examples from the dataset
from util import get_records

# Get 3 records from the dataset
sample_records_xf = get_records(transformed_dataset, 3)

# Print the output
pp.pprint(sample_records_xf)

<a name='5'></a>
## 5 - ML Metadata

TFX uses [ML Metadata](https://www.tensorflow.org/tfx/guide/mlmd) under the hood to keep records of artifacts that each component uses. This makes it easier to track how the pipeline is run so you can troubleshoot if needed or want to reproduce results.

In this final section of the assignment, you will demonstrate going through this metadata store to retrieve related artifacts. This skill is useful for when you want to recall which inputs are fed to a particular stage of the pipeline. For example, you can know where to locate the schema used to perform feature transformation, or you can determine which set of examples were used to train a model.

You will start by importing the relevant modules and setting up the connection to the metadata store. We have also provided some helper functions for displaying artifact information and you can review its code in the external `util.py` module in your lab workspace.

In [None]:
# Import mlmd and utilities
import ml_metadata as mlmd
from ml_metadata.proto import metadata_store_pb2
from util import display_types, display_artifacts, display_properties

# Get the connection config to connect to the metadata store
connection_config = context.metadata_connection_config

# Instantiate a MetadataStore instance with the connection config
store = mlmd.MetadataStore(connection_config)

# Declare the base directory where All TFX artifacts are stored
base_dir = connection_config.sqlite.filename_uri.split('metadata.sqlite')[0]

<a name='5-1'></a>
#### 5.1 -  Accessing stored artifacts

With the connection setup, you can now interact with the metadata store. For instance, you can retrieve all artifact types stored with the `get_artifact_types()` function. For reference, the API is documented [here](https://www.tensorflow.org/tfx/ml_metadata/api_docs/python/mlmd/MetadataStore).

In [None]:
# Get the artifact types
types = store.get_artifact_types()

# Display the results
display_types(types)

You can also get a list of artifacts for a particular type to see if there are variations used in the pipeline. For example, you curated a schema in an earlier part of the assignment so this should appear in the records. Running the cell below should show at least two rows: one for the inferred schema, and another for the updated schema. If you ran this notebook before, then you might see more rows because of the different schema artifacts saved under the `./SchemaGen/schema` directory.

In [None]:
# Retrieve the transform graph list
schema_list = store.get_artifacts_by_type('Schema')

# Display artifact properties from the results
display_artifacts(store, schema_list, base_dir)


Moreover, you can also get the properties of a particular artifact. TFX declares some properties automatically for each of its components. You will most likely see `name`, `state` and `producer_component` for each artifact type. Additional properties are added where appropriate. For example, a `split_names` property is added in `ExampleStatistics` artifacts to indicate which splits the statistics are generated for.

In [None]:
# Get the latest TransformGraph artifact
statistics_artifact = store.get_artifacts_by_type('ExampleStatistics')[-1]

# Display the properties of the retrieved artifact
display_properties(store, statistics_artifact)

<a name='5-2'></a>
#### 5.2 - Tracking artifacts

For this final exercise, you will build a function to return the parent artifacts of a given one. For example, this should be able to list the artifacts that were used to generate a particular `TransformGraph` instance. 

<a name='ex-12'></a>
##### Exercise 12: Get parent artifacts

Complete the code below to track the inputs of a particular artifact.

Tips:

* You may find [get_events_by_artifact_ids()](https://www.tensorflow.org/tfx/ml_metadata/api_docs/python/mlmd/MetadataStore#get_events_by_artifact_ids) and [get_events_by_execution_ids()](https://www.tensorflow.org/tfx/ml_metadata/api_docs/python/mlmd/MetadataStore#get_executions_by_id) useful here. 

* Some of the methods of the MetadataStore class (such as the two given above) only accepts iterables so remember to convert to a list (or set) if you only have an int (e.g. pass `[x]` instead of `x`).



In [None]:
def get_parent_artifacts(store, artifact):

    ### START CODE HERE ###
    
    # Get the artifact id of the input artifact
    artifact_id = artifact.id
    
    # Get events associated with the artifact id
    artifact_id_events = store.get_events_by_artifact_ids([artifact_id])
    
    # From the `artifact_id_events`, get the execution ids of OUTPUT events.
    # Cast to a set to remove duplicates if any.
    execution_id = set( 
        event.execution_id
        for event in artifact_id_events # @REPLACE
        if event.type == metadata_store_pb2.Event.OUTPUT # @REPLACE
    )
    
    # Get the events associated with the execution_id
    execution_id_events = store.get_events_by_execution_ids(execution_id)

    # From execution_id_events, get the artifact ids of INPUT events.
    # Cast to a set to remove duplicates if any.
    parent_artifact_ids = set( 
        event.artifact_id
        for event in execution_id_events
        if event.type == metadata_store_pb2.Event.INPUT
    )
    
    # Get the list of artifacts associated with the parent_artifact_ids
    parent_artifact_list = [artifact for artifact in store.get_artifacts_by_id(parent_artifact_ids)]

    ### END CODE HERE ###
    
    return parent_artifact_list

In [None]:
# Get an artifact instance from the metadata store
artifact_instance = store.get_artifacts_by_type('TransformGraph')[0]

# Retrieve the parent artifacts of the instance
parent_artifacts = get_parent_artifacts(store, artifact_instance)

# Display the results
display_artifacts(store, parent_artifacts, base_dir)

**Expected Output:**

*Note: The ID numbers may differ.*

| artifact id | type | uri |
| ----------- | ---- | --- |
| 1	| Examples | ./CsvExampleGen/examples/1 |
| 4	| Schema | ./updated_schema |

**Congratulations!** You have now completed the assignment for this week. You've demonstrated your skills in selecting features, performing a data pipeline, and retrieving information from the metadata store. Having the ability to put these all together will be critical when working with production grade machine learning projects. For next week, you will work on more data types and see how these can be prepared in an ML pipeline. **Keep it up!**