New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlalchemy queue pool limit lockup/timeout #104
Comments
Running the test attached i was able to reproduce the failed responses for a majority of the requests on this cookiecutter project. I know this isn't the most scientific test but i just wanted to prove out this repro before switching to something like it. import aiohttp
import asyncio
from datetime import datetime
headers = {
"Authorization": "Bearer xxxxxxxxxxxxxxxxxxx",
"Content-Type": "application/json; charset=utf-8"
}
timeout = aiohttp.ClientTimeout(total=60)
async def get(url):
async with aiohttp.ClientSession(timeout=timeout) as session:
s = datetime.now()
async with session.get(url, headers=headers) as resp:
return resp.status, datetime.now() - s
loop = asyncio.get_event_loop()
coroutines = [get("http://localhost/api/v1/users/me") for i in range(100)]
results = loop.run_until_complete(asyncio.gather(*coroutines))
codes = {}
count = 0
for s, r in results:
codes[s] = codes.get(s, 0) + 1
if r.seconds > 2:
count += 1
print(codes, count) |
I think this is related to #56. |
Yeah, I guess people just use this for convenience and don't have serious spikes of traffic with fastapi + sqlalchemy? Maybe the people who actually are using it in production are using one of the other async db clients. |
For information, I also ran on the 100 requests limit, which was actually due to the default configuration of the postgres DB. Launching the DB with more connections (500 in my case) solved the issue. You just have to modify the docker-compose by adding the following line in the postgres service description:
|
@ebreton What was the default connection limit on that postgres container and did you also increase the pool size in sqlalchemy? I understand that would fix this issue, but the main shouldn't get blocked and timeout after 30 seconds in any condition with only 100 concurrent requests. Running all 100 requests sequentially works perfectly fine and finishes in no time. I don't know fastapi/starlette and sqlachemy that well so I'm a little confused if the main is grabbing a session but blocking when the QueuePool is full? If so is it possible we could see better results by just changing the sqlalchemy configurations? |
Hi @jklaw90 The default configuration in the official postgres container is 100, there is an example of overriding this value on the docker hub repo 1. I upgrades it to 500 for the sake of doing it only once. 😄 I agree that 100 concurrent requests works fine. In my case, I have actually just realized that I made an entirely different mistake, which is to serve (a lot of) pictures directly from the app. I am going to replace this by a NGinx, and I will probably be able to come back to the default SQL values in the app... Thanks for the question 🎉 |
Ahh thank you for the resources and insight! I am still a little confused on why even with default 100 connections + pool size of 5 we would get timeouts rather than just getting responses slowly? If there is some simple fix i'd love to know, really looking to switch from falcon to fastapi. |
That's interesting! I have slightly different setup for my app and I couldn't reproduce the error you have, but occurred with another one. I don't have middleware and don't use I refactor my app to use So, I guess the problem might be in how from contextlib import contextmanager
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(...)
@contextmanager
def SessionManager():
session = Session()
try:
yield session
finally:
session.close() And then you can use it in you endpoint like that: @router.get("/")
def hello_world():
with SessionManager() as db_session:
... So, with a setup like that I also tried to fire 100 concurrent requests and got this error:
However, if I set I think I got this error, because gunicorn spawns several workers (you can see how the number of worker is calculated here). Each worker thinks it has pool size of 100 available to it, when it is actually So as a fix I created engine like this: DB_POOL_SIZE = int(os.getenv("DB_POOL_SIZE", "100"))
WEB_CONCURRENCY = int(os.getenv("WEB_CONCURRENCY", "2"))
POOL_SIZE = max(DB_POOL_SIZE // WEB_CONCURRENCY, 5)
engine = create_engine(config.DATABASE_DSN, pool_size=POOL_SIZE, max_overflow=0) Note: WEB_CONCURRENCY calculated a little bit more complicated The problem is gone and everything seems to work fine |
For me it seems to work (no more timeouts and I do not exceed the default of 100 open connections). But I have the impression that using the SessionManager slows things down. Does this make sense? I am running some loadtesting scenarios with locust and I will report back. |
@stratosgear you wont get timeout if you don't exceed number of available connections. Thats right but when it does exceed, it can be slow it shouldn't be timing out. Increasing the timeout also doesnt help |
If I can explain my issue correctly it was like this:
Unfortunately now I see that my response rate is larger than 1000ms and I'm wondering what causes this. I still have to clean things up and try the locust scenarios again. |
Sweet! One other configuration change I have is when I instantiate the sqlachemy engine:
That ought to be able to utilize all 100 default Postgres connections, but my load tests anyway rarely seem to utlize more than 25-30 connections (now with the SessionManager) 🍺 to @unmade (or 🍰, 🍵 or 🥃 or whatever rocks his boat...) |
The problem is there is a lockup/timeout on sqlalchemy. The session maker should block and wait, so slow responses would be acceptable over no responses. |
@jklaw90 it is not just about updating the pool size. The problem seems to be somewhere in |
I think what you've done it just moved the session creation to the thread itself rather the main. |
of course, you don't want |
Sorry, i haven't had time to test it out. I actually stopped looking into fast api for now. It's for work so i just can't afford throw it up to find out if it works haha. I'll probably check it out again in the future, but most likely without sqlalchemy. |
@tiangolo we are also hitting the Depends issue in production. Is there a long-term production-ready fix here ? |
@sandys this setting seem to help for me. I am using a higher number for max_overflow and smaller number for pool. Only pool size connections are maintained, other connections are released as soon as possible. This does seem to help to mitigate the problem
|
Sorry I cannot be more helpful, but I tried all combinations of I will insist that |
@stratosgear thanks for replying. Could you copy paste what you did with Sessionmanager ? the comment here seems to make the impression that Sessiomanager does not work and we need to put some WEB_CONCURRENCY and PYTHONASYNCIODEBUG to get it to work. If you are deploying with gunicorn, could you also mention your config please. this will be super helpful |
Sure.... So I got rid of the
so my SessionManager, pulled straight out of my repo, is externally defined as defined as:
Embarrassingly enough I do not know the exact gunicorn settings. I am using the standard |
@unmade @stratosgear i have a theory what if you keep Depends.....but put it under Contextmanager. So for example
maybe the whole issue is happening because contextmanager is needed to close the connection before the async corountine exits. |
I'll just add another "success story" of switching from using Depends(get_db) to using a context manager inside of API functions. I was seeing the exact same issue as everyone else in this thread, with the connection pool becoming quickly exhausted. Every alternative I tried using Depends() failed, but as soon as I moved to using a context manager, I was able to throw hundreds of connections at my service and see it queue them up properly. |
@DBendit mind sharing your new context manager function and endpoint code? I ran into some obscure error from context manager (probably my fault) and was too busy (see lazy) at the time to figure it out. |
@abrichr , to your questions: |
Hey guys, I made a benchmark using SQLAlchemy Sync and Async. The code and the results are in the link below: https://github.com/andersonrocha0/fastApiAndSqlAlchemySyncVsAsync |
hi Anderson,
this looks pretty cool. Did you try this -
https://gist.github.com/sandys/671b8b86ba913e6436d4cb22d04b135f ?
The one big difference is that I use the new sqlalchemy dataclasses and
avoid the double declaration of sqlalchemy vs pydantic models.
I also have included various complex model features like indexes, foreign keys, mixins etc
regards
sandeep
…On Mon, Apr 12, 2021 at 5:53 PM Anderson Rocha ***@***.***> wrote:
Hey guys, I made a benchmark using SQLAlchemy Sync and Async.
The code and the results are in the link below:
https://github.com/andersonrocha0/fastApiAndSqlAlchemySyncVsAsync
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASYU4EII4BWOVWGGBF6ZLTILQ53ANCNFSM4KM62SJA>
.
|
@sandys Thanks for sharing man. My intention was demonstrate that the SQLAchemy not work well using the sync way. Do you agree with my tests and my conclusion? If is possible can you check if my tests and conclusion are right? Tks a lot. |
hi anderson,
its not a sync issue. you are having a pool overflow issue. the pool sizes
and tuning on async and sync are different.
https://docs.sqlalchemy.org/en/14/core/pooling.html#connection-pool-configuration
in any case, the recommended way to use pooling is through pgbouncer or
like.
However, asyncpg (the underlying library of sqlalchemy async) boasts that
its connection pool is as good as pgbouncer -
https://magicstack.github.io/asyncpg/current/usage.html#connection-pools
also magicstack has recommended a bunch of tunings on pool to make it
better performing -
https://magicstack.github.io/asyncpg/current/faq.html#why-am-i-getting-prepared-statement-errors
…On Tue, Apr 13, 2021 at 12:47 AM Anderson Rocha ***@***.***> wrote:
@sandys <https://github.com/sandys> Thanks for sharing man.
My intention was demonstrate that the SQLAchemy not work well using the
sync way. Do you agree with my tests and my conclusion?
If is possible can you check if my tests and conclusion are right?
Tks a lot.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASYUZWEERNJR2E6B4LWCTTINBN3ANCNFSM4KM62SJA>
.
|
@sandys If you try the same SQLAlchemy pool technique with flask you do not have problems. So, I understand that the problem is the combination between SQLAlchemy and FastAPI. However, I as said, using SQLAlchemy with async way, fix the problem and you don't need uses tools like PgBouncer, etc. Thanks man. |
hi
im not seeing the same results on my code. I think the way you have setup
the database connections, etc might be causing issues.
I'm running my sync fastapi using "poetry run gunicorn sync_p:app -p 8080
--preload --reload --reload-engine inotify -k
uvicorn.workers.UvicornWorker"
and wrk using docker run run --rm --net="host" skandyla/wrk -t12 -c48 -d10s
http://localhost:8000/
I dont see any drops
…On Tue, Apr 13, 2021 at 8:44 PM Anderson Rocha ***@***.***> wrote:
@sandys <https://github.com/sandys> If you try the same SQLAlchemy pool
technique with flask you do not have problems. So, I understand that the
problem is the combination between SQLAlchemy and FastAPI.
However, I as said, using SQLAlchemy with async way, fix the problem and
you don't need uses tools like PgBouncer, etc.
Thanks man.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASYU7GBJPUESNX3X2VEFDTIRNU3ANCNFSM4KM62SJA>
.
|
@sandys Did this work? Just by adding What's your suggestion to use this in production? Should we move to |
I have a fully working, sqlalchemy and pydantic compatible gist here with both syncpg and asyncpg. https://gist.github.com/sandys/671b8b86ba913e6436d4cb22d04b135f You can try it and let me know. I have already benchmarked it. So should work fine. |
@sandys I'm relatively new to Python and these tools so excuse my ignorance, but don't you run into the issue outlined here: tiangolo/fastapi#3620 if you commit your session transaction after the yield in the get_db function? I tried a similar but slightly different approach to what you have above and I was seeing some weird behavior. I fixed it but I'm not sure I completely understand all of the different methods of using a db session as a fastapi route dependency yet. |
So the method that I have used was inspired by someone else - and it was
specifically meant to address a bunch of these situations.
This is benchmarked with Postgresql database. I really recommend you try it
and let me know if it doesn't work or break.
…On Tue, 3 Aug, 2021, 22:05 thebleucheese, ***@***.***> wrote:
I have a fully working, sqlalchemy and pydantic compatible gist here with
both syncpg and asyncpg.
https://gist.github.com/sandys/671b8b86ba913e6436d4cb22d04b135f
You can try it and let me know. I have already benchmarked it. So should
work fine.
I'm relatively new to Python and these tools so excuse my ignorance, but
don't you run into the issue outlined here: tiangolo/fastapi#3620
<tiangolo/fastapi#3620> if you commit your
session transaction after the yield in the get_db function?
I tried a similar but slightly different approach to what you have above
and I was seeing some weird behavior. I fixed it but I'm not sure I
completely understand all of the different methods of using a db session as
a fastapi route dependency yet.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASYU3VOVXQTBJRLQK5P2TT3ALFPANCNFSM4KM62SJA>
.
|
@thebleucheese did u try my code ? would love any feedback if u have |
@andersonrocha0 , @sandys , I can confirm it works with async SQLAlchemy. I had |
Which code are you referring to? My gist or something else.
Sorry didn't quite understand.
…On Fri, 29 Oct, 2021, 19:28 AlexanderPodorov, ***@***.***> wrote:
@andersonrocha0 <https://github.com/andersonrocha0> , @sandys
<https://github.com/sandys> , I can confirm it works with async
SQLAlchemy. I had pool_size=5 and pool_overflow=5 and 50 concurrent
users. Works like a charm.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAASYUYNOEX7LH5MZRTZKVLUJKSAPANCNFSM4KM62SJA>
.
|
@sandys , sorry, I was not specific about that. I'm not referring to any code here, rebuilt my app to async SQLA several months ago, and just wanted to give it a try, and it worked. |
@sandys Looked at your gist, I still seem to get errors, does the lru_cache guarantee reuse of the same async_engine? |
Yes, sorry it took me a while to get back to this. I had to move on to other issues I had with my project on the front-end. I've encountered an issue with this approach and using multiple dependencies that rely on the DB session. The following works fine in isolation. @app.get("/", response_model=List[UserPyd])
async def foo(context_session: AsyncSession = Depends(get_db)):
async with context_session as db:
# code that uses db (AsyncSession) If I add a dependency on another dependency that uses get_db I get this error:
async def bar(db_session: : AsyncSession = Depends(get_db)):
async with db_session as db:
# use db to check user permissions or some other action
@app.get("/", response_model=List[UserPyd])
async def foo(current_user: User = Depends(bar), context_session: AsyncSession = Depends(get_db)):
async with context_session as db:
# code that uses db (AsyncSession) I'm not running the exact gist but the code is similar enough. When I switch to a simplified setup without the contextmanager I don't see this error: engine = create_async_engine(
DATABASE_URL,
pool_pre_ping=True,
echo=True
)
_async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
async def get_db() -> AsyncSession:
db: AsyncSession = _async_session()
try:
yield db
await db.commit()
except:
await db.rollback()
finally:
await db.close() |
Hi @thebleucheese , thanks for your feedback on @sandys code. The documentation discourages decorating Dependencies with @contextmanager/@asynccontextmanager here in the very last tip. It looks like this is because it breaks dependency resolution, based on your experience. Do you have any recommendations on how to set up an Asyncio SQLAlchemy Session in production, based on your experiences? |
Thanks for the discussion everyone! This was most probably solved in tiangolo/fastapi#5122, released as part of FastAPI 0.82.0 🎉 If this solves it for you, you could close the issue. 🤓 |
Hi,
So, can we use the dependency injection itself(Depends(get_db)), if the version is updated to 0.82.0 or above? |
While doing some testing to see how this fastapi with sqlalchemy would hold up, my server seemed to lock up when running 100 concurrent requests. If i ran the requests sequentially it was totally fine.
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached,
Is it possible I've made some error while trying to mimic the structure of the code base?
Or is it possible the main can lock up in the middleware with the session implementation of sqlalchemy?
Has anyone tested the performance of this cookie cutter project?
The text was updated successfully, but these errors were encountered: