Cannot create iterator for large query #67

mthaak · 2021-03-25T16:41:04Z

I'm using pg8000 in combination with SQLAlchemy.

So I have this large table called Product (more than 6 million rows and some have very large blobs as well).

And when I do iter(session.query(Product)) it never returns. Also something like iter(session.query(Product).yield_per(50)) didn't help. I've waited for 30 minutes, but to no avail.

I know it's an issue of pg8000, because with psycopg2 it does work with iter(session.query(Product).yield_per(50)). It returns in 1 or 2 seconds.

Also session.query(Product).limit(50).all() works, but that's not what I want, because I cannot load all the data in memory at once.

pg8000 version: 1.16.5
SQLAlchemy version: 1.3.23
OS: Windows 10
Database: PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by Debian clang version 10.0.1 , 64-bit

The text was updated successfully, but these errors were encountered:

tlocke · 2021-03-25T17:27:28Z

Hi @mthaak, pg8000 always retrieves and buffers all rows returned by a query. Looking at the docs, https://docs.sqlalchemy.org/en/14/orm/queryguide.html#orm-queryguide-yield-per yield_per requires streaming, which pg8000 does not support (but psycopg2 does).

If you want to use pg8000, perhaps successive queries that return ranges of results might help?

mthaak · 2021-03-25T18:00:10Z

@tlocke thanks for the quick reply!
Yes, the idea to use successive queries would've been a good alternative. But for simplicity, I have decided to stick with yield_per and use psycopg2.
Thank you for maintaining pg8000 though, it has served us well until now

tlocke · 2021-03-31T07:14:42Z

Actually I think we should probably support streaming from the server anyway. I see that the relstorage project needs streaming of BLOBs.

tlocke · 2021-04-03T18:52:43Z

I've done a bit of research on this and it's possible to use server-side cursors with pg8000 currently, so I've now included an example in the docs:

>>> con = pg8000.native.Connection(username, password="cpsnow")
>>> con.run("START TRANSACTION")
[]
>>> con.run("DECLARE c SCROLL CURSOR FOR SELECT * FROM generate_series(1, 100)")
[]
>>> con.run("FETCH FORWARD 5 FROM c");
[[1], [2], [3], [4], [5]]
>>> con.run("MOVE FORWARD 50 FROM c");
[]
>>> con.run("FETCH BACKWARD 10 FROM c");
[[54], [53], [52], [51], [50], [49], [48], [47], [46], [45]]
>>> con.run("CLOSE c");
[]
>>> con.run("ROLLBACK")
[]

So SQLAlchemy could use this for all PostgreSQL yield_per and indeed all server side cursor queries. BLOB support is a different question. I think we need an example for handling BLOBs too.

tlocke · 2021-04-03T19:36:28Z

I've added an example for BLOBs to the docs now. I'll close the issue now, but feel free to re-open it if needed.

tlocke closed this as completed Apr 3, 2021

tlocke mentioned this issue Apr 4, 2021

For PostgreSQL, server-side cursors at the SQL level, rather than driver level sqlalchemy/sqlalchemy#6198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot create iterator for large query #67

Cannot create iterator for large query #67

mthaak commented Mar 25, 2021

tlocke commented Mar 25, 2021

mthaak commented Mar 25, 2021

tlocke commented Mar 31, 2021

tlocke commented Apr 3, 2021

tlocke commented Apr 3, 2021

Cannot create iterator for large query #67

Cannot create iterator for large query #67

Comments

mthaak commented Mar 25, 2021

tlocke commented Mar 25, 2021

mthaak commented Mar 25, 2021

tlocke commented Mar 31, 2021

tlocke commented Apr 3, 2021

tlocke commented Apr 3, 2021