Skip to content

zschumacher/pinot-connect

Repository files navigation

pinot-connect

codecov unit-tests integration-tests pages-build-deployment License: MIT

Installation

pip install pinot-connect
# or
poetry add pinot-connect
# or
uv add pinot-connect

Overview

pinot_connect is a DB-API 2.0 compliant and statically typed driver for querying Apache Pinot with Python. It supports both synchronous and asynchronous execution, making it flexible for a variety of applications.

Powered by:

  • orjson for high-performance JSON deserialization
  • httpx for async support and connection pooling

pinot_connect outperforms pinotdb in benchmarks. On average for queries that return 100 or more rows, you can expect to see ~15-30% faster execution.


Documentation

The full documentation can be found here.


Quickstart

Running a quick start Pinot cluster

To start an Apache Pinot cluster with example data, run:

docker run -d --name pinot-quickstart -p 9000:9000 \
  -p 8099:8000 \
  --health-cmd="curl -f http://localhost:9000/health || exit 1" \
  --health-interval=10s \
  --health-timeout=5s \
  --health-retries=5 \
  --health-start-period=10s \
  apachepinot/pinot:latest QuickStart -type batch

This command launches a Pinot instance with preloaded batch data, making it easy to start querying right away.

Querying with pinot_connect

Once your cluster is up and running, you can query it using pinot_connect. Below are examples for both synchronous and asynchronous usage.

import pinot_connect
from pinot_connect.rows import dict_row

with pinot_connect.connect(host="localhost") as conn:
    with conn.cursor(row_factory=dict_row) as cursor:
        cursor.execute("select * from airlineStats limit 100")
        for row in cursor:
            print(row)
import pinot_connect
from pinot_connect.rows import dict_row
import asyncio

async def main():
    async with pinot_connect.AsyncConnection.connect(hose="localhost") as conn:
        async with conn.cursor(row_factory=dict_row) as cursor:
            await cursor.execute("select * from airlineStats limit 100")
            async for row in cursor:
                print(row)

asyncio.run(main())

What's Happening Here?

  • Standard DB-API 2.0 Interface
    pinot_connect provides a familiar connection and cursor interface, similar to popular Python database clients such as sqlite3 or psycopg

  • Row Factories
    The row_factory parameter lets you customize how rows are returned. In this example, dict_row returns results as dictionaries. You can choose from built-in factories or define your own. See the row factories documentation for details.

  • Type Mapping
    pinot_connect automatically converts Pinot data types to their Python equivalents. More details are available in the type conversion documentation.

  • Cursor Iteration & Fetch Methods
    You can iterate over results directly or use fetchone(), fetchmany(), fetchall(), and scroll(), following the DB-API spec. See the usage docs or reference docs for more details.

About

A statically typed and fast DB-API 2.0 implementation for Apache Pinot

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •