##### Copyright 2020 The TensorFlow IO Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Resilient inference on streaming data using Kafka and Tensorflow-IO

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/io/tutorials/kafka"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/kafka.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/io/blob/master/docs/tutorials/kafka.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
      <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/io/docs/tutorials/kafka.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

Caution: In addition to python packages this notebook uses `sudo apt-get install` to install third party packages.

## Overview

This tutorial focuses on streaming data from a [Kafka](https://docs.confluent.io/current/getting-started.html) cluster into a `tf.data.Dataset` which is then used in conjunction with `tf.keras` for training and inference.

Kafka is primarily a distributed event-streaming platform which provides scalable and fault-tolerant streaming data across data pipelines. It is an essential technical component of a plethora of major enterprises where mission-critical data delivery is a primary requirement.

**NOTE:** A basic understanding of the [kafka components](https://docs.confluent.io/current/kafka/introduction.html) will help you in following the tutorial with ease.

## Setup and usage

### Install the required tensorflow-io and kafka packages

In [1]:
import os

In [3]:
try:
  %tensorflow_version 2.x
except Exception:
  pass

In [4]:
!pip install tensorflow-io

Collecting tensorflow-io
[?25l  Downloading https://files.pythonhosted.org/packages/5f/5a/7a11179b0376df1fbac5f0af46819c0f990324fa3ee90eeb3110f683b129/tensorflow_io-0.15.0-cp36-cp36m-manylinux2010_x86_64.whl (22.3MB)
[K     |████████████████████████████████| 22.3MB 184kB/s 
Installing collected packages: tensorflow-io
Successfully installed tensorflow-io-0.15.0


In [21]:
!pip install kafka-python

Collecting kafka-python
[?25l  Downloading https://files.pythonhosted.org/packages/aa/34/12f219f7f9e68e79a54874d26fbe974db1ab4efac4e6dae665b421df48f9/kafka_python-2.0.1-py2.py3-none-any.whl (232kB)
[K     |█▍                              | 10kB 17.0MB/s eta 0:00:01[K     |██▉                             | 20kB 6.5MB/s eta 0:00:01[K     |████▎                           | 30kB 6.7MB/s eta 0:00:01[K     |█████▋                          | 40kB 8.2MB/s eta 0:00:01[K     |███████                         | 51kB 6.8MB/s eta 0:00:01[K     |████████▌                       | 61kB 7.2MB/s eta 0:00:01[K     |█████████▉                      | 71kB 7.8MB/s eta 0:00:01[K     |███████████▎                    | 81kB 8.4MB/s eta 0:00:01[K     |████████████▊                   | 92kB 7.8MB/s eta 0:00:01[K     |██████████████                  | 102kB 8.1MB/s eta 0:00:01[K     |███████████████▌                | 112kB 8.1MB/s eta 0:00:01[K     |█████████████████               | 122kB 

In [22]:
from datetime import datetime

import kafka
import tensorflow as tf
import tensorflow_io as tfio

In [23]:
tfio.__version__

'0.15.0'

### Download and setup Kafka and Zookeeper instances

For demo purposes, the following instances are setup locally:

- Kafka (Brokers: 127.0.0.1:9092)
- Zookeeper (Node: 127.0.0.1:2181)



In [7]:
!curl -sSOL http://packages.confluent.io/archive/5.4/confluent-community-5.4.1-2.12.tar.gz
!tar -xzf confluent-community-5.4.1-2.12.tar.gz

We use the default configurations for spinning up these instances as provided by the confluent package.

In [17]:

!cd confluent-5.4.1 && bin/zookeeper-server-start -daemon etc/kafka/zookeeper.properties
!echo "Waiting for 10 secs until zookeeper is up and running"
!sleep 10

!cd confluent-5.4.1 && bin/kafka-server-start -daemon etc/kafka/server.properties
!echo "Waiting for 10 secs until kafka is up and running"
!sleep 10


Waiting for 10 secs until zookeeper is up and running
Waiting for 10 secs until kafka is up and running


Once the instances are started as daemon processes, we can grep for `kafka` in the processes list. The two processes correspond to kafka and zookeeper instances.

In [18]:
!ps -ef | grep kafka

root         548       1  4 19:18 ?        00:00:01 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Xlog:gc*:file=/content/confluent-5.4.1/bin/../logs/zookeeper-gc.log:time,tags:filecount=10,filesize=102400 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/content/confluent-5.4.1/bin/../logs -Dlog4j.configuration=file:bin/../etc/kafka/log4j.properties -cp /content/confluent-5.4.1/bin/../share/java/kafka/*:/content/confluent-5.4.1/bin/../support-metrics-client/build/dependant-libs-2.12.10/*:/content/confluent-5.4.1/bin/../support-metrics-client/build/libs/*:/usr/share/java/support-metrics-client/* org.apache.zookeeper.server.quorum.QuorumPeerMain etc/kafka/zookeeper.properties
root         606       1 26 19:19 ?        00:00:08 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMi