-
-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make use of pypika objects being immutable, and pre-build as much of the query as possible. #64
Conversation
…smaller fetch operations
15-35% speedup for small selects.
Pull Request Test Coverage Report for Build 281
💛 - Coveralls |
1 similar comment
Pull Request Test Coverage Report for Build 281
💛 - Coveralls |
Pull Request Test Coverage Report for Build 304
💛 - Coveralls |
@Zeliboba5 Please review. |
Also, regarding README, I think we should add info about mysql and pg drivers availability, because right now that info isn't available in README. |
Also pypika resolved issue with boolean casting kayak/pypika#171 |
Added __slots__ to fields to reduce memory usage a little. Renamed `generate_filters` for clarity.
I'm in two minds about removing |
May be remove them but leave a comment on first model class occurrence saying that |
So, @Zeliboba5 the new pypika doesn't fix it: https://travis-ci.com/tortoise/tortoise-orm/jobs/157473493 However, on my local system the tests passed, before and after the pypika upgrade. |
Sorry if I was misleading, but new new version of PyPika meant not only just version, but also using SQLiteQuery from this commit kayak/pypika@d14965c |
Unforutnately the implementation is not complete: >>> from pypika import Table
>>> from pypika.dialects import SQLLiteQuery
>>> table_abc = Table('abc')
>>> SQLLiteQuery.from_('abc').select('a').where(table_abc.a == True)
SELECT "a" FROM "abc" WHERE "a"=true But we have our own |
@Zeliboba5 I Fixed #62 by using the overrideable to_db functions in filtering, instead of just insert. Please review. |
self._joined_tables = [] # type: List[Table] | ||
self.model = model | ||
self.query = model._meta.basequery # type: Query | ||
self._db = db if db else model._meta.db # type: BaseDBAsyncClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember I specifically removed db
field from instancing in __init__
, because it brings trouble, where you define your query outside of transaction and then run it inside transaction, which leads to query being executed outside of transaction, cause it's fetched connection before transaction even started.
So that's why connection should be fetched only after query being await
ed
However, as I see lower, I didn't remove it in DeleteQuery
and some other queries, which seems wrong to me now. Is there some case that I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was valid before the transaction rewrite, because now the connection is actually stored in a context variable?
Also, I was toying with separating the DB instance from the actual "connection", I was trying it for #56 but then realised the test runner was causing chaos, hence I deferred that until after that was done. (Since we are an async-only ORM, why can't we always manage connections as a pool?) (Currently all that PR does is remove all the scattered connection logic, just to simplify things, so we can try and re-add it, and lazy evaluation makes debugging that super complicated)
And then there is the case of allowing a single Model that would be available on different databases (of potentially different types). I considered that, and the whole way we manage QuerySet assumes it is a 1:1 relationship, and it will require a large rewrite to make it, so decided that it is an OK limitation for now, as I never needed to do that, and it will complicate everything too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was valid before the transaction rewrite, because now the connection is actually stored in a context variable?
Yeah, but then, after refactoring, I encountered the case I described above and fixed it. But you are actually right, that it looks scattered for now. How about making db
property for AwaitableQuery
, so code will still look better, but actual value will be evaluated only on call?
Regarding allowing using of single model on different databases - it's seems to me that implementation that allows it will bring much more complexity that we want to handle at the moment, though it doesn't bring benefits for most of users, cause I really think that in most cases you would want to just create two separate models to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, too much complexity, in both interface and implementation. I like the suggestion of two separate models in that case. Allows you to manage migrations separately.
I'll look at moving some of that logic to awaitablequery tomorrow. Just concerned with the todb overrides.
Just for clarity, the issue was specific to pooled connections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the following tests, expecting them to fail:
async def test_await_across_transaction_fail(self):
tournament = Tournament(name='Test')
query = tournament.save()
try:
async with in_transaction() as connection:
result = await query
raise Exception('moo')
except Exception:
pass
self.assertEqual(await Tournament.all(), [])
async def test_await_across_transaction_success(self):
tournament = Tournament(name='Test')
query = tournament.save()
async with in_transaction() as connection:
result = await query
self.assertEqual(await Tournament.all(), [tournament])
But they pass. Which means to me that if I prepare OUTSIDE the transaction but await INSIDE a transaction, the transaction isolation then works the same as if everything was inside a transaction.
If that is the case, then the issue you were experiencing is now not valid anymore?
Makes sense to me, because QuerySet._execute()
only runs when you await it. so it inherits the state of where it gets awaited.
Which seems correct to me, so can we leave it as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue I described doesn't apply to .save()
, cause when you call .save()
without awaiting nothing occurs, it's only executed when it's awaited.
Problem should occur when you build queryset, and await it later, like that:
async def test_await_across_transaction(self):
tournament_query = Tournament.filter(name='Test')
async with in_transaction() as connection:
await Tournament.create(name='Test')
result = await tournament_query
self.assertEqual(len(result), 1)
But after actually writing that test I understood that you were right, that problem occurs only in connection pool context and when we deal with single connection it doesn't matter because everything after transaction start will fall into it transaction.
Sorry for confusion. If we are cutting out pools for now we can leave _db
as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let me also do tests for .update()
and then we see.
Only .save()
and .update()
modify contents, so those should be the only queries applicable to transactions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, transactions are not only about writing, it's also about transaction isolation.
In test that I wrote in my previous message - if there was connection pool and tournament_query
was executed outside of transaction - it won't see changes brought by .create()
which leads to errors.
And if we talk about modification - .delete()
also modifies content.
So I think all operations are equally important in terms of their interactions with transactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I added tests for insert/update/delete/select and they all work as I expected: https://github.com/tortoise/tortoise-orm/pull/64/files#diff-12a2be3f481afdc9a647b690a7a299f9
So, for now I think this is OK.
And it becomes the problem of when the connection pooling happens.
So far this PR seems complete to me. |
Make use of pypika objects being immutable, and pre-build as much of the query as possible.
Also pre-build DB-specific filter override list. (avoids a transaction lookup which is quite expensive)
Also optimised queryset cloning.,
This gives us a further 21-48% speed up for small queries. 😁
This actually slowed down tests marginally, as in the tests we re-build all the query caches for each test. 🤷♂️