tensorflow · cfezequiel · Mar 14, 2023 · Mar 15, 2023 · Feb 27, 2023 · Mar 3, 2023
@@ -10,25 +10,25 @@ SIG TFX-Addons is a community-led open source project. As such, the project depe
 ## Maintainership
 
 The maintainers of TensorFlow Addons can be found in the [CODEOWNERS](https://github.com/tensorflow/tfx-addons/blob/main/CODEOWNERS) file of the repo. If you would
-like to maintain something, please feel free to submit a PR. We encourage multiple 
+like to maintain something, please feel free to submit a PR. We encourage multiple
 owners for all submodules.
 
 
 ## Installation
 
-TFX Addons is available on PyPI for all OS. To install the latest version, 
+TFX Addons is available on PyPI for all OS. To install the latest version,
 run the following:
 
 ```
 pip install tfx-addons
 ```
 
-To ensure you have a compatible version of dependencies for any given project, 
+To ensure you have a compatible version of dependencies for any given project,
 you can specify the project name  as an extra requirement during install:
 
 ```
 pip install tfx-addons[feast_examplegen,schema_curation]
-``` 
+```
 
 To use TFX Addons:
 
@@ -45,18 +45,19 @@ tfxa.feast_examplegen.FeastExampleGen(...)
 
 ## TFX Addons projects
 
-* [tfxa.feast_examplegen](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/feast_examplegen) 
+* [tfxa.feast_examplegen](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/feast_examplegen)
 * [tfxa.feature_selection](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/feature_selection)
 * [tfxa.firebase_publisher](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/firebase_publisher)
 * [tfxa.huggingface_pusher](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/huggingface_pusher)
-* [tfxa.message_exit_handler](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/message_exit_handler) 
-* [tfxa.mlmd_client](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/mlmd_client) 
+* [tfxa.message_exit_handler](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/message_exit_handler)
+* [tfxa.mlmd_client](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/mlmd_client)
 * [tfxa.model_card_generator](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/model_card_generator)
-* [tfxa.pandas_transform](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/pandas_transform) 
+* [tfxa.pandas_transform](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/pandas_transform)
+* [tfxa.predictions_to_bigquery](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/predictions_to_bigquery)
 * [tfxa.sampling](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/sampling)
-* [tfxa.schema_curation](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/schema_curation) 
+* [tfxa.schema_curation](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/schema_curation)
 * [tfxa.xgboost_evaluator](https://github.com/tensorflow/tfx-addons/tree/main/tfx_addons/xgboost_evaluator)
- 
+
 
 Check out [proposals](https://github.com/tensorflow/tfx-addons/tree/main/proposals) for a list of existing or upcoming projects proposals for TFX Addons.
 

@@ -57,7 +57,7 @@ def get_long_description():
     return fp.read()
 
 
-TESTS_REQUIRE = ["pytest", "pylint", "pre-commit", "isort", "yapf"]
+TESTS_REQUIRE = ["pytest", "pylint", "pre-commit", "isort", "yapf", "absl-py"]
 
 PKG_REQUIRES = get_pkg_metadata()
 EXTRAS_REQUIRE = PKG_REQUIRES.copy()

@@ -0,0 +1,24 @@
+# Copyright 2023 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the 'License');
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an 'AS IS' BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+ARG PLATFORM=cpu
+
+FROM gcr.io/tfx-oss-public/tfx:latest
+
+WORKDIR /tfx-addons
+RUN mkdir -p /tfx-addons/tfx_addons
+ADD __init__.py /tfx-addons/tfx_addons
+COPY ./ ./tfx_addons/predictions_to_bigquery
+
+ENV PYTHONPATH="/tfx-addons:${PYTHONPATH}"
@@ -0,0 +1,128 @@
+# Prediction results to BigQuery component
+
+[![TensorFlow](https://img.shields.io/badge/TFX-orange)](https://www.tensorflow.org/tfx)
+
+## Project Description
+
+This component exports prediction results from BulkInferrer to a BigQuery
+table.
+The BigQuery table schema can be generated through one of the following sources:
+1. From SchemaGen component output
+2. From Transform component output
+3. From BulkInferrer component output (i.e. prediction results)
+
+If both SchemaGen and Transform outputs are passed to the component,
+the SchemaGen output will take priority. It would be best to use SchemaGen
+for generating the BigQuery schema.
+
+If the Transform output channel is passed to the component, without the
+SchemaGen output, the BigQuery schema will be derived from the pre-transform
+metadata schema generated by Transform. Note that the metadata schema may
+include a label key, which may not be present in the BulkInferrer prediction
+results. Therefore, this option may not work for unlabeled data.
+
+If neither the SchemaGen nor Transform outputs are passed to the component,
+the BigQuery schema will be parsed from the BulkInferrer prediction results
+itself, which contains tf.Example protos.
+
+Prediction string labels from the BulkInferrer output may be derived by passing a 'vocab_label_file' execution parameter to the component. This will only work
+if the Transform component output is passed and if it the `vocab_label_file`
+is present.
+
+## Project Use-Case(s)
+
+The main use case for this components is to enable export of model prediction
+results into a BigQuery for further data analysis. The exported table will
+contain the model predictions and their corresponding inputs. If the input
+data is labeled, this would allow users to compare labels and corresponding predictions.
+
+## Project Implementation
+
+PredictionsToBigQuery component uses Beam to process the prediction results
+from BulkInferrer and export it to a BigQuery table.
+
+The BigQuery table name is passed as a parameter by the user, however the user
+can also choose to have the component append a timestamp at the end of the table name.
+
+The output component is the fully qualified BigQuery table name where the inference results are stored, and this can be accessed through the `bigquery_export` key. The same table name is also stored as a custom property
+of the `bigquery_export` artifact.
+
+### Usage example
+
+```python
+
+from tfx import v1 as tfx
+import tfx_addons as tfxa
+
+...
+
+predictions_to_bigquery = tfxa.predictions_to_bigquery.PredictionsToBigQuery(
+    inference_results=bulk_inferrer.outputs['inference_result'],
+    schema=schema_gen.outputs['schema'],
+    transform_graph=transform.outputs['transform_graph'],
+    bq_table_name='my_bigquery_table',
+    gcs_temp_dir='gs://bucket/temp-dir',
+    vocab_label_file='Label',
+)
+```
+
+Refer to `integration_test.py` for tests that demonstrates how to use the
+component.
+
+For a description of the inputs and execution parameters of the component,
+refer to the `component.py` file.
+
+## Project Dependencies
+
+See `version.py` in the top repo directory for component dependencies.
+
+## Testing
+
+Each Python module has a corresponding unit test file ending in `_test.py`.
+
+An integration test is also available and requires use of a Google Cloud
+project. Additional instructions for running the unit test can be found in `integration_test.py`.
+
+Some tests use Abseil's `absltest` module.
+Install the package using pip:
+```bash
+pip install absl-py
+```
+
+### Test coverage
+
+Test coverage can be generated using the `coverage package`:
+```bash
+pip install coverage
+```
+
+To get test code coverage on the component code, run the following from the
+top directory of the tfx-addons repository:
+
+```bash
+coverage run -m unittest discover -s tfx_addons/predictions_to_bigquery -p *_test.py
+```
+
+Generate a summary report in the terminal:
+```bash
+coverage report -m
+
+```
+Generate an HTML report that also details missed lines
+```bash
+coverage html -d /tmp/htmlcov
+```
+
+If working on a remote machine, the HTML coverage report can be viewed
+by launching a web server
+```bash
+pushd /tmp/htmlcov
+python -m http.server 8000  # or another unused port number
+```
+
+## Project team
+- Hannes Hapke (@hanneshapke, Digits Financial Inc.)
+- Carlos Ezequiel (@cfezequiel, Google)
+- Michael Sherman (@michaelwsherman, Google)
+- Robert Crowe (@rcrowe-google, Google)
+- Gerard Casas Saez (@casassg, Cash App)