Skip to content
Branch: master
Find file History
tfx-copybara and tensorflow-extended-team Updated README.md instructions for 0.14.0
PiperOrigin-RevId: 268571873
Latest commit ec23918 Sep 12, 2019
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
data/simple Move examples directory under tfx directory. Apr 4, 2019
README.md Updated README.md instructions for 0.14.0 Sep 11, 2019
__init__.py Move examples directory under tfx directory. Apr 4, 2019
chicago_taxi_pipeline_simple.png Move examples directory under tfx directory. Apr 4, 2019
taxi_pipeline_beam.py Fixed #472. Support option to infer feature shape in SchemaGen. Aug 16, 2019
taxi_pipeline_beam_e2e_test.py Fix Beam e2e test Aug 20, 2019
taxi_pipeline_beam_test.py Update components with forward-compatibility input / output aliases. Aug 15, 2019
taxi_pipeline_interactive.ipynb Change deprecated import path for get_split_uri in notebook example. Aug 24, 2019
taxi_pipeline_kubeflow.py Refactor EXECUTOR_CLASS into EXECUTOR_SPEC. Aug 28, 2019
taxi_pipeline_kubeflow_test.py Deprecate airflow_runner.AirflowDAGRunner and runner.KubeflowRunner a… Aug 29, 2019
taxi_pipeline_mysql.py Deprecate airflow_runner.AirflowDAGRunner and runner.KubeflowRunner a… Aug 29, 2019
taxi_pipeline_portable_beam.py Deprecate airflow_runner.AirflowDAGRunner and runner.KubeflowRunner a… Aug 29, 2019
taxi_pipeline_portable_beam_test.py coding style fix; updating test methods to CamelCase Aug 8, 2019
taxi_pipeline_simple.py Deprecate airflow_runner.AirflowDAGRunner and runner.KubeflowRunner a… Aug 29, 2019
taxi_pipeline_simple_airflow_e2e_test.py Adjust some import order. Jun 7, 2019
taxi_pipeline_simple_test.py Deprecate airflow_runner.AirflowDAGRunner and runner.KubeflowRunner a… Aug 29, 2019
taxi_utils.py Update preprocessing_fns prior to default flag change for quantile/bu… Aug 8, 2019
taxi_utils_test.py coding style fix; updating test methods to CamelCase Aug 8, 2019

README.md

Chicago Taxi Example

The Chicago Taxi example demonstrates the end-to-end workflow and steps of how to analyze, validate and transform data, train a model, analyze and serve it. It uses the following TFX components:

  • ExampleGen ingests and splits the input dataset.
  • StatisticsGen calculates statistics for the dataset.
  • SchemaGen SchemaGen examines the statistics and creates a data schema.
  • ExampleValidator looks for anomalies and missing values in the dataset.
  • Transform performs feature engineering on the dataset.
  • Trainer trains the model using TensorFlow Estimators
  • Evaluator performs deep analysis of the training results.
  • ModelValidator ensures that the model is "good enough" to be pushed to production.
  • Pusher deploys the model to a serving infrastructure.

Inference in the example is powered by:

The dataset

This example uses the Taxi Trips dataset released by the City of Chicago.

Note: This site provides applications using data that has been modified for use from its original source, www.cityofchicago.org, the official website of the City of Chicago. The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.

You can read more about the dataset in Google BigQuery. Explore the full dataset in the BigQuery UI.

Local prerequisites

Install dependencies

Development for this example will be isolated in a Python virtual environment. This allows us to experiment with different versions of dependencies.

There are many ways to install virtualenv, see the TensorFlow install guides for different platforms, but here are a couple:

  • For Linux:
sudo apt-get install python-pip python-virtualenv python-dev build-essential
  • For Mac:
sudo easy_install pip
pip install --upgrade virtualenv

Create a Python 3.6 virtual environment for this example and activate the virtualenv:

virtualenv -p python3.6 taxi_pipeline
source ./taxi_pipeline/bin/activate

Configure common paths:

export AIRFLOW_HOME=~/airflow
export TAXI_DIR=~/taxi
export TFX_DIR=~/tfx

Next, install the dependencies required by the Chicago Taxi example:

pip install tensorflow==1.14.0
pip install apache-airflow==1.10.5
pip install tfx==0.14.0

Next, initialize Airflow

airflow initdb

Copy the pipeline definition to Airflow's DAG directory

The benefit of the local example is that you can edit any part of the pipeline and experiment very quickly with various components. First let's download the data for the example:

mkdir -p $TAXI_DIR/data/simple
wget -O $TAXI_DIR/data/simple/data.csv https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/data/simple/data.csv?raw=true

Next, copy the TFX pipeline definition to Airflow's DAGs directory ($AIRFLOW_HOME/dags) so it can run the pipeline. To find the location of your TFX installation, use this command:

pip show tfx

Use the location shown when setting the TFX_EXAMPLES path below.

export TFX_EXAMPLES=~/taxi_pipeline/lib/python3.6/site-packages/tfx/examples/chicago_taxi_pipeline

Copy the Chicago Taxi example pipeline into the Airflow DAG folder.

mkdir -p $AIRFLOW_HOME/dags/
cp $TFX_EXAMPLES/taxi_pipeline_simple.py $AIRFLOW_HOME/dags/

The module file taxi_utils.py used by the Trainer and Transform components will reside in $TAXI_DIR. Copy it there.

cp $TFX_EXAMPLES/taxi_utils.py $TAXI_DIR

Run the local example

Start Airflow

Start the Airflow webserver (in 'taxi_pipeline' virtualenv):

airflow webserver

Open a new terminal window:

source ./taxi_pipeline/bin/activate

and start the Airflow scheduler:

airflow scheduler

Open a browser to 127.0.0.1:8080 and click on the chicago_taxi_simple example. It should look like the image below if you click the Graph View option.

Pipeline view

Run the example

If you were looking at the graph above, click on the DAGs button to get back to the DAGs view.

Enable the chicago_taxi_simple pipeline in Airflow by toggling the DAG to On. Now that it is schedulable, click on the Trigger DAG button (triangle inside a circle) to start the run. You can view status by clicking on the started job, found in the Last run column. This process will take several minutes.

Serve the TensorFlow model

Once the pipeline completes, the model will be copied by the Pusher to the directory configured in the example code:

ls $TAXI_DIR/serving_model/taxi_simple

To serve the model with TensorFlow Serving please follow the instructions here with following environment variables:

For start_model_server_local.sh:

LOCAL_MODEL_DIR=$TAXI_DIR/serving_model/taxi_simple \
start_model_server_local.sh

This will pick up the latest model under above path.

For classify_local.sh:

EXAMPLES_FILE=~/taxi/data/simple/data.csv \
SCHEMA_FILE=~/tfx/pipelines/chicago_taxi_simple/SchemaGen/output/CHANGE_TO_LATEST_DIR/schema.pbtxt \
classify_local.sh

Chicago Taxi Flink Example (python 2.7, 3.5, 3.6, 3.7)

Start local Flink cluster and Beam job server:

git clone https://github.com/tensorflow/tfx ~/tfx-source && pushd ~/tfx-source
sh tfx/examples/chicago_taxi/setup_beam_on_flink.sh

Follow above instructions of Chicago Taxi Example with 'taxi_pipeline_simple' replaced by 'taxi_pipeline_portable_beam'. (Check http://localhost:8081 for the Flink Cluster Dashboard)

Chicago Taxi Spark Example (python 2.7, 3.5, 3.6, 3.7)

Start local Spark cluster and Beam job server:

git clone https://github.com/tensorflow/tfx ~/tfx-source && pushd ~/tfx-source
sh tfx/examples/chicago_taxi/setup_beam_on_spark.sh

Follow above instructions of Chicago Taxi Example with 'taxi_pipeline_simple' replaced by 'taxi_pipeline_portable_beam'. (Check http://localhost:8081 for the Spark Cluster Dashboard)

Learn more

Please see the TFX User Guide to learn more.

You can’t perform that action at this time.