Skip to content

Distributed Tensorflow, Keras, PyTorch and Ray on Apache Spark

License

Notifications You must be signed in to change notification settings

zhugesdu/analytics-zoo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


A unified Data Analytics and AI platform for distributed TensorFlow, Keras, PyTorch, Apache Spark/Flink and Ray


What is Analytics Zoo?

Analytics Zoo provides a unified data analytics and AI platform that seamlessly unites TensorFlow, Keras, PyTorch, Spark, Flink and Ray programs into an integrated pipeline, which can transparently scale from a laptop to large clusters to process production big data.


  • Integrated Analytics and AI Pipelines for easily prototyping and deploying end-to-end AI applications.

    • Write TensorFlow or PyTorch inline with Spark code for distributed training and inference.
    • Native deep learning (TensorFlow/Keras/PyTorch/BigDL) support in Spark ML Pipelines.
    • Directly run Ray programs on big data cluster through RayOnSpark.
    • Plain Java/Python APIs for (TensorFlow/PyTorch/BigDL/OpenVINO) Model Inference.
  • High-Level ML Workflow that automates the process of building large-scale machine learning applications.

    • Automatically distributed Cluster Serving (for TensorFlow/PyTorch/Caffe/BigDL/OpenVINO models) with a simple pub/sub API.
    • Scalable AutoML for time series prediction (that automatically generates features, selects models and tunes hyperparameters).
  • Built-in Algorithms and Models for Recommendation, Time Series, Computer Vision and NLP applications.


Why use Analytics Zoo?

You may want to develop your AI solutions using Analytics Zoo if:

  • You want to easily prototype the entire end-to-end pipeline that applies AI models (e.g., TensorFlow, Keras, PyTorch, BigDL, OpenVINO, etc.) to production big data.
  • You want to transparently scale your AI applications from a laptop to large clusters with "zero" code changes.
  • You want to deploy your AI pipelines to existing YARN or K8S clusters WITHOUT any modifications to the clusters.
  • You want to automate the process of applying machine learning (such as feature engineering, hyperparameter tuning, model selection and distributed inference).

How to use Analytics Zoo?

About

Distributed Tensorflow, Keras, PyTorch and Ray on Apache Spark

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 80.9%
  • Scala 11.3%
  • Python 6.8%
  • Shell 0.7%
  • Java 0.1%
  • Dockerfile 0.1%
  • Other 0.1%