Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create iterator for large query #67

Closed
mthaak opened this issue Mar 25, 2021 · 5 comments
Closed

Cannot create iterator for large query #67

mthaak opened this issue Mar 25, 2021 · 5 comments

Comments

@mthaak
Copy link

mthaak commented Mar 25, 2021

I'm using pg8000 in combination with SQLAlchemy.

So I have this large table called Product (more than 6 million rows and some have very large blobs as well).

And when I do iter(session.query(Product)) it never returns. Also something like iter(session.query(Product).yield_per(50)) didn't help. I've waited for 30 minutes, but to no avail.

I know it's an issue of pg8000, because with psycopg2 it does work with iter(session.query(Product).yield_per(50)). It returns in 1 or 2 seconds.

Also session.query(Product).limit(50).all() works, but that's not what I want, because I cannot load all the data in memory at once.

pg8000 version: 1.16.5
SQLAlchemy version: 1.3.23
OS: Windows 10
Database: PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by Debian clang version 10.0.1 , 64-bit

@tlocke
Copy link
Owner

tlocke commented Mar 25, 2021

Hi @mthaak, pg8000 always retrieves and buffers all rows returned by a query. Looking at the docs, https://docs.sqlalchemy.org/en/14/orm/queryguide.html#orm-queryguide-yield-per yield_per requires streaming, which pg8000 does not support (but psycopg2 does).

If you want to use pg8000, perhaps successive queries that return ranges of results might help?

@mthaak
Copy link
Author

mthaak commented Mar 25, 2021

@tlocke thanks for the quick reply!
Yes, the idea to use successive queries would've been a good alternative. But for simplicity, I have decided to stick with yield_per and use psycopg2.
Thank you for maintaining pg8000 though, it has served us well until now

@tlocke
Copy link
Owner

tlocke commented Mar 31, 2021

Actually I think we should probably support streaming from the server anyway. I see that the relstorage project needs streaming of BLOBs.

@tlocke
Copy link
Owner

tlocke commented Apr 3, 2021

I've done a bit of research on this and it's possible to use server-side cursors with pg8000 currently, so I've now included an example in the docs:

>>> con = pg8000.native.Connection(username, password="cpsnow")
>>> con.run("START TRANSACTION")
[]
>>> con.run("DECLARE c SCROLL CURSOR FOR SELECT * FROM generate_series(1, 100)")
[]
>>> con.run("FETCH FORWARD 5 FROM c");
[[1], [2], [3], [4], [5]]
>>> con.run("MOVE FORWARD 50 FROM c");
[]
>>> con.run("FETCH BACKWARD 10 FROM c");
[[54], [53], [52], [51], [50], [49], [48], [47], [46], [45]]
>>> con.run("CLOSE c");
[]
>>> con.run("ROLLBACK")
[]

So SQLAlchemy could use this for all PostgreSQL yield_per and indeed all server side cursor queries. BLOB support is a different question. I think we need an example for handling BLOBs too.

@tlocke
Copy link
Owner

tlocke commented Apr 3, 2021

I've added an example for BLOBs to the docs now. I'll close the issue now, but feel free to re-open it if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants