- Tokyo, Japan
Stars
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Design documents and code for the pandas 2.0 effort.
It'll detect your anomalies! Part of the Kale stack.
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Apache Superset is a Data Visualization and Data Exploration Platform
Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
Quickly and accurately render even the largest data.
pandas, scikit-learn, xgboost and seaborn integration
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Airspeed Velocity: A simple Python benchmarking tool with web-based reporting
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning al…
A flexible framework of neural networks for deep learning
A Theano framework for building and training neural networks
Recipes for using Python's pandas library
NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
IPython kernel for Torch with visualization and plotting
Define fortify and autoplot functions to allow ggplot2 to handle some popular R packages.