📊 📋 Dashboards using YAML or JSON files
Updated Feb 18, 2019
Quilt versions and deploys data
Updated Feb 21, 2019
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Updated Feb 11, 2019
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on va…
📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in…
Updated Feb 12, 2019
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Updated Feb 20, 2019
Elasticsearch plugin for approximate K-nearest-neighbors on floating-point vectors. Repo includes a demo for image si…
Updated Dec 9, 2018
Example project implementing best practices for PySpark ETL jobs and applications.
Updated Feb 19, 2019
Open Metadata and Governance
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Updated Aug 30, 2018
The Unnotebook: A Production-Ready Data Environment Built for DataOps Workflows
Updated Feb 15, 2019
Ansible playbook to deploy distributed technologies
Updated Nov 20, 2017
Schedule for talks, workshops, etc. w/ links to past talk slides and videos.
Updated Nov 9, 2017
Dummy variable generation with fit/transform capabilities
Updated Aug 7, 2018
A Pachyderm deep learning tutorial for conference workshops
Updated Aug 2, 2017
Summary of my projects on kaggle
Updated Nov 14, 2018
Scott Logic: DataHelix
Updated Feb 21, 2019
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Updated Dec 17, 2018
A store abstraction and analytics system for real-time event data.
Updated Feb 4, 2019
Introdução ao Apache Spark para processamento de Big Data
Updated Nov 13, 2018
Notes, slides, and contents for the O'Reilly videos using Kotlin for Data Science
Updated Nov 1, 2017
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoo…
Updated Jan 2, 2019
akka http service for serving spark machine learning models
Updated Aug 11, 2017
Accelerate data science
Updated Feb 10, 2019
Automate building ML classification pipelines in .NET
Updated Apr 23, 2018
Blog post on ETL pipelines with Airflow
Updated Aug 30, 2017
Framework for data processing
Updated Feb 14, 2019
Visualize Apache logs in Minecraft using Docker, Streamsets Data Collector, Spigot and Kafka .
Updated Mar 5, 2018
Data Engineering: Chapter 5 aws chapter for pragmatic ai. Creates an "real world" Data Engineering API using Flask,Cl…
Updated May 30, 2018