pyarrow

Here are 69 public repositories matching this topic...

ibis-project / ibis

the portable Python dataframe library

mysql python bigquery sql database clickhouse sqlite impala postgresql snowflake pandas pyspark mssql trino pyarrow datafusion duckdb polars

Updated Feb 18, 2025
Python

vaexio / vaex

Star

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

visualization python data-science machine-learning bigdata tabular-data hdf5 machinelearning dataframe memory-mapped-file pyarrow

Updated Oct 8, 2024
Python

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

machine-learning deep-learning tensorflow pytorch pyspark parquet parquet-files sysml pyarrow

Updated Dec 2, 2023
Python

narwhals-dev / narwhals

Star

Lightweight and extensible compatibility layer between dataframe libraries!

pandas pyspark dask ibis vaex pyarrow modin cudf duckdb polars

Updated Feb 18, 2025
Python

zen-xu / pyarrow-stubs

Sponsor

Star

Type annotations for pyarrow

typing pyarrow

Updated Feb 10, 2025
Python

wheretrue / biobear

Sponsor

Star

Work with bioinformatic files using Arrow, Polars, and/or DuckDB

python bioinformatics biology arrow biopython samtools pyarrow rust-bio duckdb polars

Updated Feb 17, 2025
Rust

jaysnm / dremio-arrow

Star

Dremio Arrow Flight Client

python r pandas dataframe dremio pyarrow dremio-arrow

Updated Mar 20, 2024
Python

vipinc007 / ParquetViewer

Star

A web application for viewing Apache Parquet files . This is a Python + Flask application

pandas python3 flask-application parquet-files parquet-viewer pyarrow

Updated Apr 17, 2018
HTML

dacort / faker-cli

Star

Command-line interface to quickly generate fake CSV and JSON data

aws json csv parquet faker-provider pyarrow deltalake

Updated Jul 11, 2024
Python

RandomFractals / chicago-crimes

Sponsor

Star

Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.

julia parquet jupyter-notebooks chicago pyarrow crimes duckdb polars large-csv malloy malloydata

Updated Jan 29, 2023
Jupyter Notebook

icaropires / pdf2dataset

Star

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features