In [1]:
from concurrent import futures

from google.cloud import pubsub_v1

import tensorflow as tf

## Create pub/sub topic

In [2]:
project_id = "kubeflow-1-0-2"
topic_id = "inference_images"
subscription_id = "my_subscription"

In [3]:
!gcloud pubsub topics create $topic_id

Created topic [projects/kubeflow-1-0-2/topics/inference_images].


In [4]:
batch_settings = pubsub_v1.types.BatchSettings(
                                               max_messages=3,  # default 100
                                               max_bytes=1024*1024,  # default 1 MiB
                                               max_latency=2,  # seconds; default 10 ms
                                              )

In [5]:
publisher = pubsub_v1.PublisherClient(batch_settings)
topic_path = publisher.topic_path(project_id, topic_id)

# publish_futures = []

In [6]:
def pub_callback(future: pubsub_v1.publisher.futures.Future) -> None:
    message_id = future.result()
    print(message_id)

**Note**: At this stage, before running the following cells, move to the other notebook, in order to create subscription to this topic, so that messages published from hereon, can be received later on.

## Publish image files (as messages) to a pub/sub topic (for buffering; load-balancing)

Get list of images for which inferencing is needed

In [7]:
filenames = tf.io.gfile.glob("gs://fire_detection_anurag/test_images/*")

filenames

['gs://fire_detection_anurag/test_images/fire1.jpg',
 'gs://fire_detection_anurag/test_images/fire2.jpg',
 'gs://fire_detection_anurag/test_images/fire3.jpg',
 'gs://fire_detection_anurag/test_images/fire4.jpg',
 'gs://fire_detection_anurag/test_images/fire5.jpg',
 'gs://fire_detection_anurag/test_images/no_fire1.jpg',
 'gs://fire_detection_anurag/test_images/no_fire2.jpg',
 'gs://fire_detection_anurag/test_images/no_fire3.jpg',
 'gs://fire_detection_anurag/test_images/no_fire4.jpg',
 'gs://fire_detection_anurag/test_images/no_fire5.jpg']

In [8]:
print("IDs of messages published...")

for file in filenames[:5]:
    # Data must be a bytestring
    data = file.encode("utf-8")
    publish_future = publisher.publish(topic_path, data)
    # Non-blocking. Allow the publisher client to batch multiple messages.
    publish_future.add_done_callback(pub_callback)
#     publish_futures.append(publish_future)

IDs of messages published...
3472774071426909
3472774071426910
3472778815427077
3472778815427078
3472774478471355


Now run the other notebook (or/and python script) for 1. receving messages through asynchronous pull and 2. inferencing with Apache Beam (for auto-scaling)

In [9]:
print("IDs of 2nd lot of messages published...")

for file in filenames[5:]:
    # Data must be a bytestring
    data = file.encode("utf-8")
    publish_future = publisher.publish(topic_path, data)
    # Non-blocking. Allow the publisher client to batch multiple messages.
    publish_future.add_done_callback(pub_callback)
#     publish_futures.append(publish_future)

IDs of 2nd lot of messages published...
3472737478205979
3472737478205980
3472785780480443
3472785780480444
3472794187459134


**Note**: Since the subscriber (other notebook) was continuously listening, it keeps waiting for new messages (images) to be streamed in, and Apache Beam takes care of the scaling up/down (while inferencing) accordingly.

## Clean up

In [10]:
!gcloud pubsub topics delete $topic_id

Deleted topic [projects/kubeflow-1-0-2/topics/inference_images].
