Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many small performance tweaks #241

Merged
merged 6 commits into from
Nov 20, 2019
Merged

Many small performance tweaks #241

merged 6 commits into from
Nov 20, 2019

Conversation

grigi
Copy link
Member

@grigi grigi commented Nov 19, 2019

After extending the benchmarks to also test PostgreSQL/MySQL and a profiling+optimizing run I got the following changes in the benchmark suite:

  • SQLite is about 6% faster
  • PostgreSQL is about 20-30% faster
  • MySQL is about 5-15% faster

After all this tweaks, I also compared our mean performance to the best results for each test:

  • SQLite is about 71-82% of best
  • PostgreSQL is about 84-90% of best
  • MySQL is about 50-54% of best

SQLite perf is basically due to known overhead with query creation & SQLAlchemy/Pony doing something smart for updates/deletes.
PostgreSQL perf is surprisingly good, and profiling indicates that the asyncpg driver is really fast.
I can't get much more out of MySQL as it is spending the majority of the time in PyMySQL escaping parameters to SQL strings!!! (why???)

@coveralls
Copy link

coveralls commented Nov 19, 2019

Pull Request Test Coverage Report for Build 1014

  • 69 of 111 (62.16%) changed or added relevant lines in 10 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.6%) to 95.995%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tortoise/backends/sqlite/client.py 6 13 46.15%
tortoise/backends/mysql/client.py 18 35 51.43%
tortoise/backends/asyncpg/client.py 10 28 35.71%
Files with Coverage Reduction New Missed Lines %
tortoise/backends/asyncpg/client.py 2 83.82%
tortoise/backends/sqlite/client.py 2 91.73%
Totals Coverage Status
Change from base Build 992: -0.6%
Covered Lines: 6134
Relevant Lines: 6336

💛 - Coveralls

@grigi grigi changed the title Many small performance tweaks reducing overhead by about ~6% Many small performance tweaks Nov 19, 2019
CHANGELOG.rst Outdated
@@ -10,6 +10,9 @@ New features:
- Enabled by default for databases that support it (mysql and postgres) with a minimum pool size of 1, and a maximum of 5
- Not supported by sqlite
- Can be changed by passing the ``minsize`` and ``maxsize`` connection parameters
- Many small performance tweaks reducing filter query generation overhead by about ~5%
- Bulk inserts are ensured to be wrapped in a transaction for >50% speedup
- PostgreSQL prepared queries now use a LRU cache for significant >2x speedup on inserts/updates/deletes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a significant speedup over mysql?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do a similar thing with MySQL, but the PyMySQL driver doesn't actually support parametrized queries natively, as it re-escapes everything and submits a raw SQL for each query...

At least it encodes everything as binary, so should be injection immune, but it just seems non-ideal.

Currently PyMySQL is the only driver supporting asyncio via aiomysql.
Right now I'm going to keep this here, and after we do version 1.0, look at seeing if we can speed up the driver or do an asyncio wrapper around mysqlclient which is apparently notably faster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that "should be injection imune" doesn't sound very assuring, i'll let loose some injection patters on it later just in case to make sure

it's already pretty fast but any speedups to mysql would be much apreciated as that's what i personally use the most :P let me know if i can help with that bit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be appreciated.

@@ -716,6 +716,9 @@ def resolve_to_python_value(self, model: "Type[Model]", field: str) -> Callable:
# return as is to get whole model objects
return lambda x: x

if field in [x[1] for x in model._meta.db_native_fields]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extremely minor and probably won't make much impact but isn't a generator technically faster since we're only iterating once and thus stop generating values once the field has been found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns a mapping of callables, it is run once per model typically, so the perf improvement was returning a identity function for fields that are natively supoorted by the DB.
So the perf here is not a big deal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah my bad, figured it was a performance hotpot, still getting to know the code, sorry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I have learnt to not guess hot-spots any more as I have a history to be wildly wrong.
Definitely use a profiler to see what is going on.

I actually did a talk about this a few years ago.
Step 1, get some kind of repeatable benchmark going (to verify what you are doing moves the needle in the right direction)
Step 2, use a profiler as non-obvious things appear, like "why do we prepare a SQL statement for every call?" Or that the cursor.DictCursor class was spending a lot of time building dicts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my guess was mostly based on the fact that it changed here instead of a profiler :P

@grigi
Copy link
Member Author

grigi commented Nov 19, 2019

@AEnterprise I updated the PR summary with essentially all my findings.

Currently I'm getting significantly better performance with both SQLite and PostgreSQL over MySQL. (e.g. MySQL is ~½ of the PostgreSQL performance)
The main issue there is that the MySQL driver seems to be implemented not very efficiently.

Another part of that could also be how my local MariaDB 10.2 is set up (to use less resources as it is a dev-server). The max geometric mean values is about 20% slower for MySQL then PostgreSQL, so the deficit is probably smaller in a properly tuned environment.

@AEnterprise
Copy link
Collaborator

@grigi thanks a lot

as i'm working on a 2.0 of my bot i'm taking a deep look into future proofing it's performance so it's really interesting to me to see the differences (probably also need to figure out at some point how to tune my live database for best performance)

but i guess if mysql is wasting a lot of space then i guess we can improve this over time and shave bring it more in line? (atleast for me) more performance would be really nice, but we're still talking about milliseconds timescales per operation so sounds more then acceptable for now until something better can be worked out

@grigi
Copy link
Member Author

grigi commented Nov 19, 2019

Depending on your use case, async can buy you lots of scalability (or none), but you would always benefit from being able to scale wide, so putting a load-balancer in front of it and ensuring that you can go wider.

e.g. I know starlette/sanic makes multi-process asyncio relatively easy. But using e.g. containers and scaling those out is also a decent strategy.

That is until either your DB or load-balancer stops scaling... Then you gotta get even more creative!

@AEnterprise
Copy link
Collaborator

horizontal scaling is not really an issue in my project luckily, i can shard my bot to more threads and mysql databases allow for clusters (though once i need to start running clusters of shards i'll need to build a central authority to coordinate who's in charge of coordinating stuff across servers) luckely at current growth that all is quite a long way aways

i just prefer avoiding to need to scale too horizontally when it's possible to optimize the libraries used to not "waste" time and processing power when it's not needed :P

@grigi
Copy link
Member Author

grigi commented Nov 19, 2019

Agreed! I did a lot more testing on this PR, and I think It is both stable, and has pretty much sorted the easy optimizations for now.
Although I think we need to mark Py3.6 as deprecated, as much of the concurrency stuff we did doesn't work perfectly with Py3.6's contextvars backport.

@AEnterprise
Copy link
Collaborator

in a way it's sad to see only the most recent ones be supported, in another it's probably indeed best to drop it to avoid people using it and running in all sorts of weird things

@grigi
Copy link
Member Author

grigi commented Nov 20, 2019

I would prefer to keep py3.6 around, so if we can fix the context leak it would be better, but right now I'd rather warn users that there is an issue, and they should consider upgrading to 3.7, or even 3.8

@grigi grigi merged commit 1d53abe into develop Nov 20, 2019
@grigi grigi deleted the perf_tweaks branch March 11, 2020 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants