Skip to content
Branch: master
Find file History
tfx-team
tfx-team update LOCAL_MODEL_DIR location for chicago_taxi_pipeline example.
PiperOrigin-RevId: 238060080
Latest commit 964ed7e Mar 12, 2019
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
data/simple
README.md
__init__.py
chicago_taxi_pipeline_simple.png Adds local example Readme. Mar 6, 2019
taxi_pipeline_kubeflow.py
taxi_pipeline_kubeflow_test.py Rename kubeflow example from taxi_pipeline_kubeflow_large.py to Mar 5, 2019
taxi_pipeline_simple.py Consolidate non-executor TFX logs into a single file. Mar 12, 2019
taxi_pipeline_simple_test.py
taxi_utils.py
taxi_utils_test.py

README.md

Chicago Taxi Example

The Chicago Taxi example demonstrates the end-to-end workflow and steps of how to analyze, validate and transform data, train a model, analyze and serve it. It uses the following TFX components:

  • ExampleGen ingests and splits the input dataset.
  • StatisticsGen calculates statistics for the dataset.
  • SchemaGen SchemaGen examines the statistics and creates a data schema.
  • ExampleValidator looks for anomalies and missing values in the dataset.
  • Transform performs feature engineering on the dataset.
  • Trainer trains the model using TensorFlow Estimators
  • Evaluator performs deep analysis of the training results.
  • ModelValidator ensures that the model is "good enough" to be pushed to production.
  • Pusher deploys the model to a serving infrastructure.

Inference in the example is powered by:

The dataset

This example uses the Taxi Trips dataset released by the City of Chicago.

Note: This site provides applications using data that has been modified for use from its original source, www.cityofchicago.org, the official website of the City of Chicago. The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.

You can read more about the dataset in Google BigQuery. Explore the full dataset in the BigQuery UI.

Local prerequisites

Install dependencies

Development for this example will be isolated in a Python virtual environment. This allows us to experiment with different versions of dependencies.

There are many ways to install virtualenv, see the TensorFlow install guides for different platforms, but here are a couple:

  • For Linux:
sudo apt-get install python-pip python-virtualenv python-dev build-essential
  • For Mac:
sudo easy_install pip
pip install --upgrade virtualenv

Create a Python 2.7 virtual environment for this example and activate the virtualenv:

virtualenv -p python2.7 taxi_pipeline
source ./taxi_pipeline/bin/activate

Configure common paths:

export AIRFLOW_HOME=~/airflow
export TAXI_DIR=~/taxi
export TFX_DIR=~/tfx

Next, install the dependencies required by the Chicago Taxi example:

pip install tensorflow==1.12
pip install docker
export SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install apache-airflow
pip install tfx==0.12.0

Next, initialize Airflow

airflow initdb

Copy the pipeline definition to Airflow's DAG directory

The benefit of the local example is that you can edit any part of the pipeline and experiment very quickly with various components. The example comes with a small subset of the Taxi Trips dataset as CSV files.

First let's download the TFX source we can run the example:

git clone https://github.com/tensorflow/tfx
cd tfx/examples/chicago_taxi_pipeline

Let's copy the dataset CSV to the directory where TFX ExampleGen will ingest it from:

mkdir -p $TAXI_DIR/data/simple
cp data/simple/data.csv $TAXI_DIR/data/simple

Let's copy the TFX pipeline definition to Airflow's DAGs directory ($AIRFLOW_HOME/dags) so it can run the pipeline.

mkdir -p $AIRFLOW_HOME/dags/taxi
cp taxi_pipeline_simple.py $AIRFLOW_HOME/dags/taxi

The module file taxi_utils.py used by the Trainer and Transform components will reside in $TAXI_DIR, let's copy it there.

cp taxi_utils.py $TAXI_DIR

Run the local example

Start Airflow

Start the Airflow webserver (in 'taxi_pipeline' virtualenv):

airflow webserver

Open a new terminal window:

source ./taxi_pipeline/bin/activate

and start the Airflow scheduler:

airflow scheduler

Open a browser to 127.0.0.1:8080 and click on the chicago_taxi_simple example. It should look like the image below if you click the Graph View option.

Pipeline view

Run the example

If you were looking at the graph above, click on the DAGs button to get back to the DAGs view.

Enable the chicago_taxi_simple pipeline in Airflow by toggling the DAG to On. Now that it is schedulable, click on the Trigger DAG button (triangle inside a circle) to start the run. You can view status by clicking on the started job, found in the Last run column. This process will take several minutes.

Serve the TensorFlow model

Once the pipeline completes, the model will be copied by the Pusher to the directory configured in the example code:

ls $TAXI_DIR/serving_model/taxi_simple

To serve the model with TensorFlow Serving please follow the instructions here with following path changes before running the scripts:

In start_model_server_local.sh, change:

LOCAL_MODEL_DIR=$TAXI_DIR/serving_model/taxi_simple

This will pick up the latest model under above path.

In classify_local.sh (must run under examples/chicago_taxi/), change:

--examples_file ~/taxi/data/simple/data.csv \
--schema_file ~/tfx/pipelines/chicago_taxi_simple/SchemaGen/output/CHANGE_TO_LATEST_DIR/schema.pbtxt \

Learn more

Please see the TFX User Guide to learn more.

You can’t perform that action at this time.