# Serving Image-Based Deep Learning Models with TensorFlow-Serving's RESTful API
By Tyler LaBonte, July 2018

TensorFlow-Serving is an incredibly handy tool that, due to its recency and rather niche use case, does not have much in the way of online tutorials. This Notebook is my solution to provide an end-to-end implementation of TensorFlow-Serving on an image-based model, clearly demonstrating everything from converting images to Base64 to integrating TensorFlow Model Server with a deep neural network (both of which are glossed over in Google's documentation).

In this tutorial, I will strive to provide a minimum working example -- this implementation can easily be extended to include Docker containers, Bazel builds, batched inferences, and model decoupling. The main focus here is to understand the general requirements for working with TensorFlow-Serving, independent of any optional bells and whistles. I will be using the RESTful version of TensorFlow-Serving as opposed to the gRPC version, and will implement the predict function, though classify and regress can also be used.

# Overview

At its most basic level, TensorFlow-Serving allows developers to integrate client requests and data with deep learning models served independently of client systems. Benefits of this include clients being able to make inferences on data without actually having to install TensorFlow or even have any contact with the actual model, and the ability to serve multiple clients with one instance of a model.

Our pipeline will look like this:

<img src="images/pipeline_diagram.png">

Note especially that the image must pass from the client to the server as a Base64 encoded string. This is because JSON has no other way to represent images (besides an array representation of a tensor, and that gets out of hand very quickly). The image must also pass from the ProtoBuf to the Generator as a tensor. This can be modified, but it is best to keep any pre- and post-processing decoupled from the model itself.

# Server

Exporting a TensorFlow model for serving is probably the most confusing part of the process, since there are a few steps involved.
1. Export graph to ProtoBuf format. This saves the GraphDef and variables, and represents the trained model. In order to export an image-based model, we have to inject bitstring conversion layers into the beginning and ending of the graph, since we require our inference function to deal only in tensors.
2. Wrap the ProtoBuf in a SavedModel. This step is necessary because TensorFlow-Serving's RESTful API is implemented through a SavedModelBuilder. We'll import our GraphDef, then extract the TensorInfo of the input and output to define our PREDICT signature definition.

We'll use CycleGAN (https://github.com/tmlabonte/CycleGAN-TensorFlow) as a usage example. First, import some useful libraries:

In [1]:
import tensorflow as tf
import argparse
import sys
sys.path.insert(0, "../CycleGAN-TensorFlow")
import model

Here, we instantiate a CycleGAN and inject our first layer.

In [2]:
graph = tf.Graph()

with graph.as_default():
    # Instantiate a CycleGAN
    cycle_gan = model.CycleGAN(ngf=64, norm="instance", image_size=64)

    # Create placeholder for image bitstring
    # This is the injection of the input bitstring layer
    input_bytes = tf.placeholder(tf.string, shape=[], name="input_bytes")

Next, we preprocess the bitstring to a float tensor batch so it can be used in the model.

In [3]:
with graph.as_default(): 
    input_bytes = tf.reshape(input_bytes, [])
    
    # Transform bitstring to uint8 tensor
    input_tensor = tf.image.decode_png(input_bytes, channels=3)
    
    # Convert to float32 tensor
    input_tensor = tf.image.convert_image_dtype(input_tensor, dtype=tf.float32)
    input_tensor = input_tensor / 127.5 - 1.0
    
    # Ensure tensor has correct shape
    input_tensor = tf.reshape(input_tensor, [64, 64, 3])
    
    # CycleGAN's inference function accepts a batch of images
    # So expand the single tensor into a batch of 1
    input_tensor = tf.expand_dims(input_tensor, 0)

Then, we feed the tensor to the model and save its output.

In [4]:
with graph.as_default():
    # Get style transferred tensor
    output_tensor = cycle_gan.G.sample(input_tensor)

Post-inference, we convert the float tensor back to a bitstring. This is the injection of the output layer:

In [5]:
with graph.as_default():    
    # Convert to uint8 tensor
    output_tensor = (output_tensor + 1.0) / 2.0
    output_tensor = tf.image.convert_image_dtype(output_tensor, tf.uint8)
    
    # Remove the batch dimension
    output_tensor = tf.squeeze(output_tensor, [0])
    
    # Transform uint8 tensor to bitstring
    output_bytes = tf.image.encode_png(output_tensor)
    output_bytes = tf.identity(output_bytes, name="output_bytes")
    
    # Instantiate a Saver
    saver = tf.train.Saver()

Now that we have injected the bitstring layers into our model, we will load our train checkpoints and save the graph as a ProtoBuf. Prior to coding this server, I trained CycleGAN for 10,000 steps and saved the checkpoint file on my local machine, which I access in this session.

In [6]:
# Start a TensorFlow session
with tf.Session(graph=graph) as sess:
        sess.run(tf.global_variables_initializer())

        # Access variables and weights from last checkpoint
        latest_ckpt = tf.train.latest_checkpoint("../CycleGAN-TensorFlow/checkpoints/20180628-1208")
        saver.restore(sess, latest_ckpt)

        # Export graph to ProtoBuf
        output_graph_def = tf.graph_util.convert_variables_to_constants(sess, graph.as_graph_def(), [output_bytes.op.name])
        tf.train.write_graph(output_graph_def, "../CycleGAN-TensorFlow/protobufs", "model_v1", as_text=False)

INFO:tensorflow:Restoring parameters from ../CycleGAN-TensorFlow/checkpoints/20180628-1208\model.ckpt-10055
INFO:tensorflow:Froze 52 variables.
Converted 52 variables to const ops.


With that, we've completed step one! In step two, we will wrap the ProtoBuf in a SavedModel to use the RESTful API.

In [7]:
# Instantiate a SavedModelBuilder
# Note that the serve directory is REQUIRED to have a model version subdirectory
builder = tf.saved_model.builder.SavedModelBuilder("serve/1")

# Read in ProtoBuf file
with tf.gfile.GFile("../CycleGAN-TensorFlow/protobufs/model_v1", "rb") as protobuf_file:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(protobuf_file.read())

# Get input and output tensors from GraphDef
# These are our injected bitstring layers
[inp, out] = tf.import_graph_def(graph_def, name="", return_elements=["input_bytes:0", "output_bytes:0"])

Next, we define our signature definition, which expects the TensorInfo of the input and output to the model. When we save the model, we'll get a "No assets" message, but that's okay because our graph and variables were already saved in the ProtoBuf.

In [8]:
# Start a TensorFlow session with our saved graph
with tf.Session(graph=out.graph) as sess:
        # Signature_definition expects a batch
        # So we'll turn the output bitstring into a batch of 1 element
        out = tf.expand_dims(out, 0)

        # Build prototypes of input and output bitstrings
        input_bytes = tf.saved_model.utils.build_tensor_info(inp)
        output_bytes = tf.saved_model.utils.build_tensor_info(out)

        # Create signature for prediction
        signature_definition = tf.saved_model.signature_def_utils.build_signature_def(
            inputs={"input_bytes": input_bytes},
            outputs={"output_bytes": output_bytes},
            method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME)

        # Add meta-information
        builder.add_meta_graph_and_variables(
            sess, [tf.saved_model.tag_constants.SERVING],
            signature_def_map={
                tf.saved_model.signature_constants.
                DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature_definition
            })

# Create the SavedModel
builder.save()

INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'serve/1\\saved_model.pb'


b'serve/1\\saved_model.pb'

And that's it! We can run this TensorFlow Model Server from bash with the command:

tensorflow_model_server --rest_api_port=8501 --model_name=saved_model --model_base_path=$(path)

Where $(path) is the path to the serve directory. In my case, it is /mnt/c/Users/Tyler/Desktop/tendies/minimum_working_example/serve.

# Client

The client's job is to accept an image as input, convert it to Base64, pass it to the server using JSON, and decode the response. First, import some useful libraries:

In [9]:
import base64
import requests
import json
import argparse

We will be performing style transfer from an image of Gaussian noise to an image of sinusoidal noise. Here's the Gaussian image:
<img src="images/gaussian1.png" style="width: 256px;"/>

First, we'll open the image and convert it to base64.

In [10]:
# Open and read image as bitstring
input_image = open("gaussian1.png", "rb").read()
print("Raw bitstring: " + str(input_image[:10]) + " ... " + str(input_image[-10:]))

# Encode image in b64
encoded_input_string = base64.b64encode(input_image)
input_string = encoded_input_string.decode("utf-8")
print("Base64 encoded string: " + input_string[:10] + " ... " + input_string[-10:])

Raw bitstring: b'\x89PNG\r\n\x1a\n\x00\x00' ... b'\x00\x00IEND\xaeB`\x82'
Base64 encoded string: iVBORw0KGg ... 5ErkJggg==


JSON data sent to our TensorFlow Model Server has to be structured in a very particular way. This method will be slightly different for classification and regression (see https://www.tensorflow.org/serving/api_rest). For image prediction calls, our JSON body must look like this:

In [11]:
{
  "instances": [
                  {"b64": "iVBORw"},
                  {"b64": "pT4rmN"},
                  {"b64": "w0KGg2"}
                 ]
}

{'instances': [{'b64': 'iVBORw'}, {'b64': 'pT4rmN'}, {'b64': 'w0KGg2'}]}

Since we're only sending one image to the server, it's pretty simple. We can create our JSON data like so:

In [12]:
# Wrap bitstring in JSON
instance = [{"b64": input_string}]
data = json.dumps({"instances": instance})
print(data[:30] + " ... " + data[-10:])

{"instances": [{"b64": "iVBORw ... Jggg=="}]}


This is all we need to send our POST request to the TensorFlow Model Server. This is a synchronous call, so the client will pause until it receives a response from the server (useful to know when you're wondering why your code has stopped after POSTing a very large image).

In [13]:
json_response = requests.post("http://localhost:8501/v1/models/saved_model:predict", data=data)

To interpret the response, we do the above steps in the reverse order. To grab our base64-encoded image from the JSON response, we have to access:
1. The value corresponding to "predictions" in the response dictionary.
2. The first entry in the resultant array.
3. The value corresponding to "b64" in the resultant dictionary.

Then, we'll decode that value to a raw bitstring.

In [14]:
# Extract text from JSON
response = json.loads(json_response.text)

# Interpret bitstring output
response_string = response["predictions"][0]["b64"]
print("Base64 encoded string: " + response_string[:10] + " ... " + response_string[-10:])

# Decode bitstring
encoded_response_string = response_string.encode("utf-8")
response_image = base64.b64decode(encoded_response_string)
print("Raw bitstring: " + str(response_image[:10]) + " ... " + str(response_image[-10:]))

# Save inferred image
with open("sinusoidal1.png", "wb") as output_file:
    output_file.write(response_image)

Base64 encoded string: iVBORw0KGg ... 5ErkJggg==
Raw bitstring: b'\x89PNG\r\n\x1a\n\x00\x00' ... b'\x00\x00IEND\xaeB`\x82'


Success! Here's the resultant image of sinusoidal noise.
<img src="images/sinusoidal1.png" style="width: 256px;"/>

# Conclusion

Thanks for following along with this tutorial; I hope it helped you out! This Notebook was built on the minimum working example of my TensorFlow Distributed Image Serving library, which you can download here: https://github.com/tmlabonte/tendies. For more blog posts and information about me, visit https://tmlabonte.github.io.