Work in progress - NOT READY YET

Building a demo which combines

simplicity of data science tools* (Python, Jupyter notebooks, NumPy, etc)
powerful Machine Learning / Deep Learning frameworks* (TensorFlow, Keras, etc)
reliable, scalable event-based streaming technology* for production deployments (Apache Kafka, KSQL).

Requirements

Python (tested with 3.6)
Java 8+ (tested with Java 8)
Confluent Platform 5.0+ using Kafka + KSQL (tested with 5.0)
ksql-python (tested with Github release 5.x released on 2018-10-12)

Issues with ksql-python installation?

If you have problems installing ksql-python in your environment via 'pip install ksql', use the commands described in the Github project instead.

After installation, for some reason, the 'from ksql import KSQLAPI' statement did not work with Python 2.7.x in my Jupyter notebook (but in Mac terminal), so I used Python 3.6 (which also worked in Jupyter).

Step-by-step guide

We will do the following:

Data Integration (Kafka Connect): Integrate a stream of data from CSV file or continuous data stream (in real world you can connect directly to an existing Kafka stream from the Jupyter notebook)
Data Preprocessing (KSQL): Preprocess the data, e.g. filter, anonymize, aggreate / concatenate
ML-specific preprocessing (NumPy, Scikit-learn): Normalize, split train / test data
Train model (TensorFlow + Keras)
Deploy model (KSQL + Tensorflow)
Monitor model inference (KSQL)

While all of this can be done in a Jupyter notebook for interactive analysis, we can then deploy the same pipeline to production at scale. For instance, you can re-use the KSQL preprocessing statements and run them in your production infrastructure to to model inference with KSQL and the TensorFlow model at scale.

Check out this document to start the backend and notebook. The main demo is running in the Jupyter notebook then and shows all above steps.

Autoencoder for Credit Card Fraud Detection build with Keras and TensorFlow

An autoencoder is an unsupervised neural network which encodes (i.e. compresses) the input and then decodes (i.e. decompresses) it again:

The goal is to lose as little information as possible. This way we can use an autoencoder to detect anomalies if the decoding cannot reconstruct the input well (showing potential fraud).

Hands-On with Python, TensorFlow, Keras, Apache Kafka and KSQL

We use KSQL for preprocessing, Numpy and scikit-learn for ML-specific tasks like array shapes or splitting training and test data, TensorFlow + Keras for model training, and Kafka Streams or KSQL for model deployment.

Here is a TensorBoard screenshot of the Autoencoder:

Apache Kafka and KSQL within a Jupyter notebook

Interactive analysis and data-preprocessing with Python and KSQL:

Keras model (.h5) vs. TensorFlow model (.pb)

If you want to deploy the model in a TensorFlow infrastructure like Google ML Engine, it is best to train the model with GCP's tools as describe in this [Google ML Getting Started] (https://cloud.google.com/ml-engine/docs/tensorflow/getting-started-training-prediction) guide.

Otherwise you need to convert the H5 Keras file to a TensorFlow Protobuffers file and fulfil some more tasks, e.g. described in this blog post.

The Python tool Keras to TensorFlow is a good and simple solution:

            python keras_to_tensorflow.py --input_model="/Users/kai.waehner/git-projects/python-jupyter-apache-kafka-ksql-tensorflow-keras/models/autoencoder_fraud.h5" --output_model="/Users/kai.waehner/git-projects/python-jupyter-apache-kafka-ksql-tensorflow-keras/models/autoencoder_fraud.pb"

The tool freezes the nodes (converts all TF variables to TF constants), and saves the inference graph and weights into a binary protobuf (.pb) file.

TODO Use keras.estimator.model_to_estimator (included in tf.keras)? Example: https://www.kaggle.com/yufengg/emnist-gpu-keras-to-tf

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
models		models
pictures		pictures
Python Tensorflow Keras Fraud Detection Autoencoder.ipynb		Python Tensorflow Keras Fraud Detection Autoencoder.ipynb
Python Tensorflow Keras Fraud Detection Autoencoder.py		Python Tensorflow Keras Fraud Detection Autoencoder.py
live-demo___python-jupyter-apache-kafka-ksql-tensorflow-keras.adoc		live-demo___python-jupyter-apache-kafka-ksql-tensorflow-keras.adoc
python-jupyter-apache-kafka-ksql-tensorflow-keras.ipynb		python-jupyter-apache-kafka-ksql-tensorflow-keras.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Work in progress - NOT READY YET

Requirements

Issues with ksql-python installation?

Step-by-step guide

Autoencoder for Credit Card Fraud Detection build with Keras and TensorFlow

Hands-On with Python, TensorFlow, Keras, Apache Kafka and KSQL

Apache Kafka and KSQL within a Jupyter notebook

Keras model (.h5) vs. TensorFlow model (.pb)

About

Uh oh!

Releases

Packages

Languages

tysoncung/python-jupyter-apache-kafka-ksql-tensorflow-keras

Folders and files

Latest commit

History

Repository files navigation

Work in progress - NOT READY YET

Requirements

Issues with ksql-python installation?

Step-by-step guide

Autoencoder for Credit Card Fraud Detection build with Keras and TensorFlow

Hands-On with Python, TensorFlow, Keras, Apache Kafka and KSQL

Apache Kafka and KSQL within a Jupyter notebook

Keras model (.h5) vs. TensorFlow model (.pb)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages