Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createSqlPool's idle timeout seems much too short #775

Open
ocharles opened this issue Jan 26, 2018 · 17 comments
Open

createSqlPool's idle timeout seems much too short #775

ocharles opened this issue Jan 26, 2018 · 17 comments

Comments

@ocharles
Copy link
Contributor

ocharles commented Jan 26, 2018

https://hackage.haskell.org/package/persistent-2.7.3.1/docs/src/Database-Persist-Sql-Run.html#createSqlPool calls createPool with 1 stripe and a 20s idle timeout on resources. I would say that

a) This is much too small

and, to contradict myself somewhat

b) There is no one-size-fits-all.

I would at least bump this up to 5 minutes. Anything but a heavily loaded site essentially negates any benefit of using a pool, constantly forming new connections. I've instrumented the use of a 20s timeout in a pool at work, and while we are constantly serving requests, in the last hour we've opened 160 new database connections.

Methodology

I simply added a Prometheus.incCounter call to the function passed to create in resource-pool.

Here is our traffic (average requests/s with a 1h look back)
traffic

Here is a a counter of new database connections (average new connections per second).
connections

And here is a graph showing every time the new connection counter increases:

counter

@ocharles
Copy link
Contributor Author

The defaults in https://pgbouncer.github.io/config.html might provide some good defaults for persistent. They are somewhat PostgreSQL specific.

server_idle_timeout
If a server connection has been idle more than this many seconds it will be dropped. If 0 then timeout is disabled. [seconds]
Default: 600.0

@MaxGabriel
Copy link
Member

My instinct is that the idle timeout should be increased and that it should be made configurable. Personally I can only think of really weird edge case scenarios where the increased timeout could cause a problem; can anyone think of anything serious?

Are you also suggesting that the stripe count should be increased and/or be configurable? I haven't used anything but 1 before, but if we're making the idle timeout configurable then may as well make the stripe count configurable as well.

Hope things are going well at CircuitHub btw!

@ocharles
Copy link
Contributor Author

My instinct is that the idle timeout should be increased and that it should be made configurable.

Agreed and agreed.

Personally I can only think of really weird edge case scenarios where the increased timeout could cause a problem; can anyone think of anything serious?

On both sides your resource usage is going to be higher. However, you should be anticipating using every connection in the pool anyway, so if you're unable to operate in those constraints then you already had a problem!

However, increased connections under normal operating conditions could result in a performance degradation - if you've got an idle connection open with some buffer allocated, another active connection might not be able to buffer as much space as it needs, and may choose to start processing data by paging it to disk. This is all very database specific though, and suggests only that this should be configurable and is something that people need to know about. This is no different than almost any other library that does connection pooling.

Are you also suggesting that the stripe count should be increased and/or be configurable?

No, and the resource-pool documentation seems to suggest that such a change is rarely needed. However...

if we're making the idle timeout configurable then may as well make the stripe count configurable as well.

I agree with this.

Hope things are going well at CircuitHub btw!

After changing 20 seconds to 10 minutes, they are going even better 😉

prometheus time series collection and processing server

@MaxGabriel
Copy link
Member

Ok cool. Is this something you’re interested in writing?

@ocharles
Copy link
Contributor Author

ocharles commented Jan 27, 2018 via email

@MaxGabriel
Copy link
Member

I haven’t thought about what all functions need to be changed/added, the only thing I was thinking of is combining these options into a new record, since we now have pool size, stripe count, and idle timeout fields, and they’re all of the same type. Either the lens approach used in the SQLite module or the one described here https://www.snoyman.com/blog/2016/11/designing-apis-for-extensibility are OK (If you use lens make sure to add examples to the Haddocks).

@MaxGabriel
Copy link
Member

Ok, well we're now running into this problem ourselves!

image

@MaxGabriel
Copy link
Member

@parsonsmatt Do you have any opinions on changing the API to allow configuring this?

I was thinking:

  1. Add new fields for idle timeout and stripe size to PostgresConf. Probably as 2 new fields. Would this be considered a breaking change? Possibly create some defaultPostgresConf function?
  2. Modify the fromJSON instance for PostgresConf to handle setting those fields (apply defaults here?)
  3. Add a function in Database.Persist.Postgresql to take these new fields. Currently there is:
createPostgresqlPoolModifiedWithVersion
    :: (MonadUnliftIO m, MonadLogger m)
    => (PG.Connection -> IO (Maybe Double)) -- ^ Action to perform to get the server version.
    -> (PG.Connection -> IO ()) -- ^ Action to perform after connection is created.
    -> ConnectionString -- ^ Connection string to the database.
    -> Int -- ^ Number of connections to be kept open in the pool.
    -> m (Pool SqlBackend)

Imo, that's getting a bit out of control, maybe create one new function that takes a PostgresConf + getServerFunction + callback directly, and another that just takes a PostgresConf ?

  1. Add a version of withSqlPool and createSqlPool that take these values. I guess as functions that just take 2 additional parameters?

  2. Imo, increasing the default idle timeout would be pretty reasonable based on @ocharles + our experience. I'm not familiar with pgbouncer but possibly its default of 600 seconds should point us away from 20 seconds as well

@psibi
Copy link
Member

psibi commented Jul 3, 2020

Imo, that's getting a bit out of control, maybe create one new function that takes a PostgresConf + getServerFunction + callback directly, and another that just takes a PostgresConf ?

I like this one!

Imo, increasing the default idle timeout would be pretty reasonable based on @ocharles + our experience. I'm not familiar with pgbouncer but possibly its default of 600 seconds should point us away from 20 seconds as well

Has increasing the idle timeout improved the situation for you ?

@psibi
Copy link
Member

psibi commented Jul 3, 2020

Just as another data point, HikariCP has an default idle timeout of 10 minutes: https://github.com/brettwooldridge/HikariCP#configuration-knobs-baby

@MaxGabriel
Copy link
Member

Has increasing the idle timeout improved the situation for you ?

Good question, and yes! These two graphs are different visualizations of the same data. In the first graph, you can see connections created per hour, starting from when I started recording this. The big drop off happens when I changed our backend to use a fork of Persistent with a 10 minute idle timeout. From the graph, you can tell we churn through far fewer connections with the 10 minute idle timeout.

image

Prometheus query:

sum by (job) (rate(new_postgres_connection_total[10m])) * 60

This second image is the raw data. Each server increments a counter on each new connection, which resets to zero on deploys. The first deploy (drop off to zero) is when I had our backend use the fork of Persistent. By the slope of the graph at the beginning vs the following segments, you can tell we're incrementing the counter far faster before we changed the idle timeout

image

@parsonsmatt
Copy link
Collaborator

I think we can do a patch release that merely increases the timeout to 10 minutes. Should be a transparent performance boon to everyone.

Making it configurable is probably a good idea, too - if we can do so in a non-breaking way, then we can include that in the patch. Otherwise that'll have to be a component to a minor bump and would get bundled with the next set of changes.

@MaxGabriel
Copy link
Member

Ok, I changed the hardcoded limit here: #1097

@MaxGabriel
Copy link
Member

I started work on the configurable values here: https://github.com/yesodweb/persistent/compare/configurableIdleTimeout?expand=1

There is a slightly breaking change in that PostgresConf would need new values when manually constructed, though

@psibi
Copy link
Member

psibi commented Jul 7, 2020

@MaxGabriel Out of curiosity, what is the pool size you have ? We faced a similar issue recently in one of our services and increasing the timeout seemed to have slightly improved the situation. IIRC, HikaraCP has a default pool size of 10.

@MaxGabriel
Copy link
Member

@psibi We have a pool size of 10 per server, and 2 servers, so 20 connections total. I have no idea if it should be bumped; that would be good to investigate. We have something like 1.5–3 requests per second worth of traffic.

@MaxGabriel
Copy link
Member

Made a draft PR to make these values configurable #1098

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants