Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returing N rows with no relations generates N superfluous selects #65

Closed
tomaswangen opened this issue May 7, 2020 · 8 comments
Closed

Comments

@tomaswangen
Copy link

tomaswangen commented May 7, 2020

Hi,

I stumbled upon this issue recently while returning larger datasets.

When doing a trivial select on tables with no relations, after the correctly formatted select returning N rows, N more selects with seemingly no cause are executed.

The trivial example below demonstrates this

Executing this query returning 10 rows
/items/?fields%5BItem%5D=foo%2Cbar&page%5Boffset%5D=0&filter%5Bbar%5D=group_0

results in 10 instances of:

SELECT items.id AS items_id, items.foo AS items_foo, items.bar AS items_bar
FROM items
WHERE items.id = ?
 LIMIT ? OFFSET ?
INFO sqlalchemy.engine.base.Engine ('', 1, 0)

Code for reproduction:

import sys
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from safrs import SAFRSBase, SAFRSAPI
from flask_sqlalchemy import SQLAlchemy
import sqlite3
import logging

db = SQLAlchemy()
logging.basicConfig(level=logging.DEBUG)

class Item(SAFRSBase, db.Model):
    __tablename__ = "items"
    id = db.Column(db.String(32), primary_key=True)
    foo = db.Column(db.String(32))
    bar = db.Column(db.String(32))

def start_app(app):

    api = SAFRSAPI(app, host=HOST)
    api.expose_object(Item)
    
    for i in range(20):
        item =Item(foo="item_" + str(i), bar="group_" + str((int)(i / 10)))
    
    print("Starting API: http://{}:{}/api".format(HOST, PORT))
    app.run(host=HOST, port=PORT)

app = Flask("demo_app")
app.config.update(
    SQLALCHEMY_DATABASE_URI="sqlite://",
    SQLALCHEMY_TRACK_MODIFICATIONS=False,
    SECRET_KEY=b"sdqfjqsdfqizroqnxwc",
    DEBUG=True,
    SQLALCHEMY_ECHO = True
)

HOST = sys.argv[1] if len(sys.argv) > 1 else "0.0.0.0"
PORT = 5000
db.init_app(app)

with app.app_context():
    db.create_all()
    start_app(app)
@thomaxxl
Copy link
Owner

thomaxxl commented May 7, 2020

Hi,

These individual selects are executed by sqlalchemy .all() (in jsonapi.py paginate() : instances = res_query.all()).
I think this is just how sqlalchemy works: it fetches all items individually when mapping entities.

Do you have performance problems because of this?
How many items do you return ( in the page[limit] url query arg)

When working with really large datasets, doing a count() may become expensive, this can be circumvented by overriding SAFRSBase._s_count, but I don't know if that is the problem you are having.

@tomaswangen
Copy link
Author

tomaswangen commented May 7, 2020

That seems odd, but I will try to reproduce it with sqlalchemy. The strange thing is that the requested data is all returned in the initial query while the repeated SELECT items.id = '' LIMIT 1 OFFSET 0 does not return anything and seemingly serves no purpose (it scales badly on hundreds of rows).

@thomaxxl
Copy link
Owner

thomaxxl commented May 7, 2020

I'll investigate some more when I find some time, I'm not that familiar with sqla internals and debuggging and this is quite complicated.

@tomaswangen
Copy link
Author

Thanks, I will also have a look and see if I can isolate the underlying issue

@tomaswangen
Copy link
Author

You are absolutely right that this is due to the sqlalchemy internals. the .all() triggers the selects regardless of paging. Closing this and rewriting the usage to get around performance issues for larger resultsets.

@thomaxxl
Copy link
Owner

thomaxxl commented May 7, 2020

If you have perf problems on the front, maybe you can perform simultaneous requests with a different offset and smaller limit.

@tomaswangen
Copy link
Author

I decided to go for jsonapi_rpc and implement using handmade sql in this case.

@thomaxxl
Copy link
Owner

It's poorly documented, but it is possible to have jsonapi_rpc or filter= return a jsonapi formatted response object if you wish.
for ex: https://github.com/thomaxxl/safrs/wiki/Custom-Filtering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants