Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pool (WIP) #69

Closed
wants to merge 3 commits into from
Closed

New pool (WIP) #69

wants to merge 3 commits into from

Conversation

grigi
Copy link
Member

@grigi grigi commented Nov 26, 2018

Now that code base is simpler, and test runner should be more sane, attempt to add connection pooling for the third time.

The plan is to change to connection pooling ONLY, as we currently implement persistent connections, but only one persistent connection.
A connection pool should:

  • add robustness (if connection dies, then reconnect)
  • Allow multiple DB clients to operate at the same time (up to maxsize)
  • Allow more conflicts, so we need to handle rollback/retries explicitly.

Things done:

  • Change to a connection pooling system for MySQL.
  • Add tests for concurrency
  • Add tests for robustness (hackery allowed)
  • Add tests for handling conflicts.

Concerns:

  • Can SQLite be concurrent at all? If difficult, should we limit it?
  • We need to add concurrency to the benchmarks, to manage performance

@coveralls
Copy link

Pull Request Test Coverage Report for Build 349

  • 23 of 69 (33.33%) changed or added relevant lines in 7 files are covered.
  • 241 unchanged lines in 10 files lost coverage.
  • Overall coverage decreased (-6.3%) to 88.032%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tortoise/models.py 2 3 66.67%
tortoise/backends/asyncpg/client.py 0 7 0.0%
tortoise/backends/mysql/client.py 0 38 0.0%
Files with Coverage Reduction New Missed Lines %
tortoise/backends/asyncpg/init.py 2 100.0%
tortoise/init.py 2 99.32%
tortoise/backends/mysql/init.py 2 0.0%
tortoise/models.py 3 93.17%
tortoise/backends/asyncpg/executor.py 6 100.0%
tortoise/backends/asyncpg/schema_generator.py 8 100.0%
tortoise/backends/mysql/schema_generator.py 12 0.0%
tortoise/backends/mysql/executor.py 21 0.0%
tortoise/backends/mysql/client.py 89 0.0%
tortoise/backends/asyncpg/client.py 96 100.0%
Totals Coverage Status
Change from base Build 347: -6.3%
Covered Lines: 1896
Relevant Lines: 2145

💛 - Coveralls

1 similar comment
@coveralls
Copy link

coveralls commented Nov 28, 2018

Pull Request Test Coverage Report for Build 349

  • 23 of 69 (33.33%) changed or added relevant lines in 7 files are covered.
  • 241 unchanged lines in 10 files lost coverage.
  • Overall coverage decreased (-6.3%) to 88.032%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tortoise/models.py 2 3 66.67%
tortoise/backends/asyncpg/client.py 0 7 0.0%
tortoise/backends/mysql/client.py 0 38 0.0%
Files with Coverage Reduction New Missed Lines %
tortoise/backends/asyncpg/init.py 2 100.0%
tortoise/init.py 2 99.32%
tortoise/backends/mysql/init.py 2 0.0%
tortoise/models.py 3 93.17%
tortoise/backends/asyncpg/executor.py 6 100.0%
tortoise/backends/asyncpg/schema_generator.py 8 100.0%
tortoise/backends/mysql/schema_generator.py 12 0.0%
tortoise/backends/mysql/executor.py 21 0.0%
tortoise/backends/mysql/client.py 89 0.0%
tortoise/backends/asyncpg/client.py 96 100.0%
Totals Coverage Status
Change from base Build 347: -6.3%
Covered Lines: 1896
Relevant Lines: 2145

💛 - Coveralls

@grigi
Copy link
Member Author

grigi commented Dec 22, 2018

There is several useful changes in this PR that I'm going to pull out in its own PR:

  • contextvars fixes
  • autocommit for MySQL
  • Less confusing logs
  • Isolation fixes in test runner
  • db_url improvements
  • minor test enchancements
  • dependency updates

@grigi
Copy link
Member Author

grigi commented Dec 22, 2018

@abondar Also whilst debugging the idiotic autocommit issue, I found that we don't have a clear expectation of how transactions should operate.

I'm proposing this:

  • Root level: auto-commit
  • First transaction: full isolation
  • Nested transaction: no-op: These are all handled differently depending on version of DB, connector used, etc... and with the current implementation may just create a separate transaction on a different connection causing even more confusion.

grigi added a commit that referenced this pull request Dec 24, 2018
This takes the useful parts from #69:
* contextvars update & testrunner fixes
* db_url improvements
* minor test enhancements
* dependency updates
* autocommit for MySQL
* re-ordered execute-SQL log to be inside connection context manager for less confusing logs.
@grigi grigi mentioned this pull request Dec 24, 2018
grigi added a commit that referenced this pull request Dec 24, 2018
This takes the useful parts from #69:
* contextvars update & testrunner fixes
* db_url improvements
* minor test enhancements
* dependency updates
* autocommit for MySQL
* re-ordered execute-SQL log to be inside connection context manager for less confusing logs.
grigi added a commit that referenced this pull request Dec 25, 2018
This takes the useful parts from #69:
* contextvars update & testrunner fixes
* db_url improvements
* minor test enhancements
* dependency updates
* autocommit for MySQL
* re-ordered execute-SQL log to be inside connection context manager for less confusing logs.
@grigi
Copy link
Member Author

grigi commented Dec 25, 2018

@abondar I rebased and simplified this connection pooling PR.
Right now only the *await_across_transaction* are failing. You are welcome to have a go yourself.

@abondar
Copy link
Member

abondar commented Dec 27, 2018

@grigi For some reason python3.7/mysql build failed crazily, but other builds are now fine. I'll look into failed build later if you won't manage to do it before me

@grigi
Copy link
Member Author

grigi commented Dec 27, 2018

My first thought was "Probably contextvars" as the backport aiocontextvars isn't exactly the same as the py3.7 version.
Looking at the test results, it looks like test isolation completely failed. I get more errors with less processes and less errors with more processes, and if I run a test by itself it always passes...

@grigi
Copy link
Member Author

grigi commented Dec 28, 2018

It doesn't use the transactioned connection, but a different one:

2018-12-28 08:43:32     DEBUG Acquired connection for transaction <aiomysql.connection.Connection object at 0x7fcbf47fbb00>
2018-12-28 08:43:32     DEBUG Acquired connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG INSERT INTO `tournament` (`name`,`created`) VALUES (%s,%s): ['Test', datetime.datetime(2018, 12, 28, 6, 43, 32, 189123)]
2018-12-28 08:43:32     DEBUG Released connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG Acquired connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG UPDATE `tournament` SET `name`='Updated name' WHERE `id`=1
2018-12-28 08:43:32     DEBUG Released connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG Acquired connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG SELECT `created`,`name`,`id` FROM `tournament` WHERE `name`='Updated name' LIMIT 1
2018-12-28 08:43:32     DEBUG Released connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG Acquired connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG INSERT INTO `tournament` (`name`,`created`) VALUES (%s,%s): ['Test 2', datetime.datetime(2018, 12, 28, 6, 43, 32, 195122)]
2018-12-28 08:43:32     DEBUG Released connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG Acquired connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG SELECT `id` `0` FROM `tournament`
2018-12-28 08:43:32     DEBUG Released connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG Acquired connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG SELECT `id` `id`,`name` `name` FROM `tournament`
2018-12-28 08:43:32     DEBUG Released connection <aiomysql.connection.Connection object at 0x7fcbf4b24ac8>
2018-12-28 08:43:32     DEBUG Released connection for rolled back transaction <aiomysql.connection.Connection object at 0x7fcbf47fbb00>

It creates a transaction on connection 0x7fcbf47fbb00, but then runs all SQL on connection 0x7fcbf4b24ac8.
Which to me does confirm a contextvars related issue, as that is where the connection should be stored?
I'm sure I fixed this, but maybe in the last rebase I broke something?

This should be unrelated to your queryset changes.

@grigi
Copy link
Member Author

grigi commented Dec 28, 2018

Yup, it was already broken on my rebase, and was working before then.
Meh, so much changed :-(

@grigi
Copy link
Member Author

grigi commented Jan 1, 2019

Ok, I did some more digging, and found that I was mistaken. py37+mysql (pooling) never isolated right.
And if I use contextvars with reset() I will always get an error about context having changed.
Meaning somehow the context changes, so the value we set is not seen where we think it should.
So we may have to manage the context manually? Or I may be missing something.

There is some documentation re the differences here: https://pypi.org/project/aiocontextvars/
I feel if we can get the complaint of different Context when we use reset() to go away, we will probably have contextvars working properly on 3.7

@grigi grigi mentioned this pull request Feb 1, 2019
72 tasks
@grigi
Copy link
Member Author

grigi commented Mar 5, 2019

Currently the client looks up a ContextVar that exist globally, outside of its own scope. I think the common issues for globals may actually be relevant here. So I'm going to consider removing the global state tracking and try and simplify the transaction handling in the Client. To make it easier to reason about things.

It is easy to trigger the current issue, and it appears to be a very easy race-condition. Unfortunately the code isn't that easy to reason about, hence my attempt at simplifying it some more.

@abondar
Copy link
Member

abondar commented Mar 6, 2019

Currently the client looks up a ContextVar that exist globally, outside of its own scope. I think the common issues for globals may actually be relevant here.

I don't really understand why there is problem with contextvars, shouldn't it be okay for them to be global, because that what it shows in examples in docs https://docs.python.org/3/library/contextvars.html ?
And what is alternative way to store those variables?

@grigi
Copy link
Member Author

grigi commented Mar 6, 2019

Yes, the docs talk of globals, but how I see it being used in Gino is as part of a instance-variable.
There is a behaviour difference between the backport and what is on 3.7 and I don't really know why 3.7 fails for us.

Also, the way transactions sort-of replace the class just doesn't feel clean, hence me considering refactoring it.
e.g. The Client contains the pool, and the entirety of transaction management could be a ContextVar inside the class.

@zhoufeng1989
Copy link

Hi, guys. I am using tortoise ORM (PostgreSQL backend) in project. Now I am considering applying database pool (asyncpg pool) in tortoise.

It seems that in 'new_pool' branch, you guys have already implemented pool for MySQL.
I run tests in 'new_pool' branch, four of the test cases fail:

  • tortoise.tests.test_capabilities.TestCapabilities.test_actually_runs
  • tortoise.tests.test_capabilities.TestCapabilities.test_connection_name
  • tortoise.tests.test_tester.TestTesterASync.test_fail
  • tortoise.tests.test_tester.TestTesterASync.TestTesterSync.test_fail

But these fails seem irrelevant to database pool. I would like to know the progress of this feature. Are problems you talked in this issue still unsolved? If I add pool for PostgreSQL, is there anything tortoise-specific I need to know about? Can I start with this branch?

Thanks for your excellent work for this library!

@grigi @abondar

@grigi
Copy link
Member Author

grigi commented Aug 22, 2019

Hi @zhoufeng1989
Thank you for your interest! We have done a bit more work on the problem, and understand the issues a bit better.

So, for a short history as to why this have been sitting around so long:
At first we wanted to push for this to get gather() operations to work without issues, somewhat naïvely as with pooling is how one solves concurrency in a threaded environment?
But we ran into these odd hard-to-debug issues. (There is another PR where I essentially took all the bits I found helped out as separate PRs, and this is essentially the second attempt).

We then had a few concurrency related bug reports, and I managed to fix all of them quite easily except for one. That was when mutex around a transaction. The fail was identical to this.
I shelved that here: 51b8d20 And now that I had a small diff I could try and understand why I could get this to work on Py3.5 & Py3.6, but it would fail on Py3.7

The obvious difference is that async contextvars is built-in on Py3.7, and a monkey patch on 3.5/3.6

So it turns out that the context gets applied at higher resolution on py3.7, so would be tied to stack level, instead of to co-routine scheduling. So we need to control the stack a bit better so that we can enable this reliably. I suppose that the "working way" on 3.5/3.6 was probably not entirely reliable.

Then life interfered (Kid got very sick) but that is gradually returning to normal. So for a while most of the dev work was done by contributors, and only recently (the last 2 months or so) I actually did real work on this again.

If you read on #141 Andrey re-attempted it, but ran ultimately into the same issue, and after discussing the core issue as I see it (and I may be wrong) he agreed that we may have to do a large refactor to fix the stacking issue.

So, yes, if you want to contribute in any way to help us reach the milestone reliably, I would be super grateful 😀

I would start with that commit I linked, and see how restructuring the code, could enable the stack-level resolution of the built in contextvars to work.
We now have a relaxed requirement that we just deprecated py3.5 so any work on a large feature can happily break py3.5 compatibility.
So now we can use cleaner, simpler generators: https://docs.python.org/3.7/whatsnew/3.6.html#whatsnew36-pep525 which I think could really help simplify the flow of co-routines, so might allow us to fix this issue without a massive refactor which may or may not fix the issue.

Wow, that was a lot longer than I intended, but I hope it gives enough context.
I want that issue resolved,

@grigi
Copy link
Member Author

grigi commented Aug 22, 2019

Hi @zhoufeng1989
Thank you for your interest! We have done a bit more work on the problem, and understand the issues a bit better.

So, for a short history as to why this have been sitting around so long:
At first we wanted to push for this to get gather() operations to work without issues, somewhat naïvely as with pooling is how one solves concurrency in a threaded environment?
But we ran into these odd hard-to-debug issues. (There is another PR where I essentially took all the bits I found helped out as separate PRs, and this is essentially the second attempt).

We then had a few concurrency related bug reports, and I managed to fix all of them quite easily except for one. That was when mutex around a transaction. The fail was identical to this.
I shelved that here: 51b8d20 And now that I had a small diff I could try and understand why I could get this to work on Py3.5 & Py3.6, but it would fail on Py3.7

The obvious difference is that async contextvars is built-in on Py3.7, and a monkey patch on 3.5/3.6

So it turns out that the context gets applied at higher resolution on py3.7, so would be tied to stack level, instead of to co-routine scheduling. So we need to control the stack a bit better so that we can enable this reliably. I suppose that the "working way" on 3.5/3.6 was probably not entirely reliable.

Then life interfered (Kid got very sick) but that is gradually returning to normal. So for a while most of the dev work was done by contributors, and only recently (the last 2 months or so) I actually did real work on this again.

If you read on #141 Andrey re-attempted it, but ran ultimately into the same issue, and after discussing the core issue as I see it (and I may be wrong) he agreed that we may have to do a large refactor to fix the stacking issue.

So, yes, if you want to contribute in any way to help us reach the milestone reliably, I would be super grateful 😀

I would start with that commit I linked, and see how restructuring the code, could enable the stack-level resolution of the built in contextvars to work.
We now have a relaxed requirement that we just deprecated py3.5 so any work on a large feature can happily break py3.5 compatibility.
So now we can use cleaner, simpler generators: https://docs.python.org/3.7/whatsnew/3.6.html#whatsnew36-pep525 which I think could really help simplify the flow of co-routines, so might allow us to fix this issue without a massive refactor which may or may not fix the issue.

Wow, that was a lot longer than I intended, but I hope it gives enough context.

I want that issue resolved, because once we just get the locking around transactions done, we now have prefect isolation.
Then adding pools or different behaviour for in-or-out of transaction work (like for bulk operations) would be trivial.
And that would get us very close to where we consider ourselves full production ready YAY!

@zhoufeng1989
Copy link

Hi @grigi , thanks for your informative reply, but I still feel kind of confused about the problem that fails on Py3.7 but works on Py3.5 & Py3.6.

I am trying to reproduce the problem, but got some other problems. all my work is based on the latest release 0.13.0.

  • Since I only got PostgreSQL now, can I run unit tests only on PostgreSQL? I tried to set environment variable TORTOISE_TEST_DB to something like postgres://postgres:@127.0.0.1:5432/test_tortoise and also read db_url in conftest.py from this environment variable, but it seems that some test cases still use sqlite (as I debug, the current_transaction_map is something like {'default': <ContextVar name='default' default=<tortoise.backends.sqlite.client.SqliteClient object at 0x10b60e9b0> at 10b60e978>}).

  • Would you please show me the test cases that fails on Py3.7 but works on Py3.5 & 3.6? Maybe I can understand better with these test cases.

Thanks!

@grigi
Copy link
Member Author

grigi commented Aug 27, 2019

Yes, some tests manually create DB contexts, e.g. for testing db-specific DDL generation. (I think it should not actually create a DB instance, just a client object). It should be limited to test_connection_params.py and test_generate_schema.py.

I can't remember exactly which ones were failing on py3.7, but the most important tests for pooling/concurrency is test_concurrency.py and test_transactions.py.
Also if you can get this one working fine: 51b8d20#diff-1ac94b73e8bd2ab0fdcd9d23877a3665R30-R38
Then we are probably all good to go.

@grigi grigi changed the base branch from master to develop September 21, 2019 17:31
@abondar abondar closed this Sep 23, 2019
@hqsz hqsz deleted the new_pool branch May 14, 2020 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants