Highlights
- Pro
Starred repositories
Clean Business Chart for IBCS-inspired business charts in a Python Package
Kepler.gl is a powerful open source geospatial analysis tool for large-scale data sets.
Positron, a next-generation data science IDE
BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
A terminal workspace with batteries included
New file format for storage of large columnar datasets.
Implements a gateway that speaks the SparkConnect protocol and drives a backend using Substrait (over ADBC Flight SQL).
Turning PySpark Into a Universal DataFrame API
Install and Run Python Applications in Isolated Environments using UV
DuckDB Power Query Custom Connector by MotherDuck
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
A query engine for any combination of data sources. Query your files and APIs as if they were databases!
Apache DataFusion Comet Spark Accelerator
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Practice your pandas skills!
Distributed stream processing engine in Rust
Polars extension for general data science use cases
Delta reader for the Ray open-source toolkit for building ML applications
Custom Contoso database generator and ready-to-use Contoso sample databases for SQL Server
Data visualization templates for Deneb, a custom visual for Power BI. The templates are examples of Vega (not Vega-Lite) data visualizations that can be used in Deneb as is or as a starting point f…
An extremely fast Python package and project manager, written in Rust.