createSqlPool's idle timeout seems much too short #775

ocharles · 2018-01-26T19:39:47Z

https://hackage.haskell.org/package/persistent-2.7.3.1/docs/src/Database-Persist-Sql-Run.html#createSqlPool calls createPool with 1 stripe and a 20s idle timeout on resources. I would say that

a) This is much too small

and, to contradict myself somewhat

b) There is no one-size-fits-all.

I would at least bump this up to 5 minutes. Anything but a heavily loaded site essentially negates any benefit of using a pool, constantly forming new connections. I've instrumented the use of a 20s timeout in a pool at work, and while we are constantly serving requests, in the last hour we've opened 160 new database connections.

Methodology

I simply added a Prometheus.incCounter call to the function passed to create in resource-pool.

Here is our traffic (average requests/s with a 1h look back)

Here is a a counter of new database connections (average new connections per second).

And here is a graph showing every time the new connection counter increases:

The text was updated successfully, but these errors were encountered:

ocharles · 2018-01-26T19:43:19Z

The defaults in https://pgbouncer.github.io/config.html might provide some good defaults for persistent. They are somewhat PostgreSQL specific.

server_idle_timeout
If a server connection has been idle more than this many seconds it will be dropped. If 0 then timeout is disabled. [seconds]
Default: 600.0

MaxGabriel · 2018-01-27T04:27:59Z

My instinct is that the idle timeout should be increased and that it should be made configurable. Personally I can only think of really weird edge case scenarios where the increased timeout could cause a problem; can anyone think of anything serious?

Are you also suggesting that the stripe count should be increased and/or be configurable? I haven't used anything but 1 before, but if we're making the idle timeout configurable then may as well make the stripe count configurable as well.

Hope things are going well at CircuitHub btw!

ocharles · 2018-01-27T10:23:20Z

My instinct is that the idle timeout should be increased and that it should be made configurable.

Agreed and agreed.

Personally I can only think of really weird edge case scenarios where the increased timeout could cause a problem; can anyone think of anything serious?

On both sides your resource usage is going to be higher. However, you should be anticipating using every connection in the pool anyway, so if you're unable to operate in those constraints then you already had a problem!

However, increased connections under normal operating conditions could result in a performance degradation - if you've got an idle connection open with some buffer allocated, another active connection might not be able to buffer as much space as it needs, and may choose to start processing data by paging it to disk. This is all very database specific though, and suggests only that this should be configurable and is something that people need to know about. This is no different than almost any other library that does connection pooling.

Are you also suggesting that the stripe count should be increased and/or be configurable?

No, and the resource-pool documentation seems to suggest that such a change is rarely needed. However...

if we're making the idle timeout configurable then may as well make the stripe count configurable as well.

I agree with this.

Hope things are going well at CircuitHub btw!

After changing 20 seconds to 10 minutes, they are going even better 😉

MaxGabriel · 2018-01-27T14:07:27Z

Ok cool. Is this something you’re interested in writing?

ocharles · 2018-01-27T14:13:38Z

I may be able to find time on Monday if no one beats me to it. Do you have any main API changes in mind or any other options I should use as a reference?

…

On 27 Jan 2018 2:07 pm, "Maximilian Tagher" ***@***.***> wrote: Ok cool. Is this something you’re interested in writing? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#775 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABRjt0HCn-zB_bfcttbj9VI8Y3MyWHLks5tOy2ggaJpZM4Rux5_> .

MaxGabriel · 2018-01-27T15:34:29Z

I haven’t thought about what all functions need to be changed/added, the only thing I was thinking of is combining these options into a new record, since we now have pool size, stripe count, and idle timeout fields, and they’re all of the same type. Either the lens approach used in the SQLite module or the one described here https://www.snoyman.com/blog/2016/11/designing-apis-for-extensibility are OK (If you use lens make sure to add examples to the Haddocks).

MaxGabriel · 2020-07-01T16:08:28Z

Ok, well we're now running into this problem ourselves!

MaxGabriel · 2020-07-01T18:26:27Z

@parsonsmatt Do you have any opinions on changing the API to allow configuring this?

I was thinking:

Add new fields for idle timeout and stripe size to PostgresConf. Probably as 2 new fields. Would this be considered a breaking change? Possibly create some defaultPostgresConf function?
Modify the fromJSON instance for PostgresConf to handle setting those fields (apply defaults here?)
Add a function in Database.Persist.Postgresql to take these new fields. Currently there is:

createPostgresqlPoolModifiedWithVersion
    :: (MonadUnliftIO m, MonadLogger m)
    => (PG.Connection -> IO (Maybe Double)) -- ^ Action to perform to get the server version.
    -> (PG.Connection -> IO ()) -- ^ Action to perform after connection is created.
    -> ConnectionString -- ^ Connection string to the database.
    -> Int -- ^ Number of connections to be kept open in the pool.
    -> m (Pool SqlBackend)

Imo, that's getting a bit out of control, maybe create one new function that takes a PostgresConf + getServerFunction + callback directly, and another that just takes a PostgresConf ?

Add a version of withSqlPool and createSqlPool that take these values. I guess as functions that just take 2 additional parameters?
Imo, increasing the default idle timeout would be pretty reasonable based on @ocharles + our experience. I'm not familiar with pgbouncer but possibly its default of 600 seconds should point us away from 20 seconds as well

psibi · 2020-07-03T02:57:10Z

Imo, that's getting a bit out of control, maybe create one new function that takes a PostgresConf + getServerFunction + callback directly, and another that just takes a PostgresConf ?

I like this one!

Imo, increasing the default idle timeout would be pretty reasonable based on @ocharles + our experience. I'm not familiar with pgbouncer but possibly its default of 600 seconds should point us away from 20 seconds as well

Has increasing the idle timeout improved the situation for you ?

psibi · 2020-07-03T03:58:48Z

Just as another data point, HikariCP has an default idle timeout of 10 minutes: https://github.com/brettwooldridge/HikariCP#configuration-knobs-baby

MaxGabriel · 2020-07-03T16:29:49Z

Has increasing the idle timeout improved the situation for you ?

Good question, and yes! These two graphs are different visualizations of the same data. In the first graph, you can see connections created per hour, starting from when I started recording this. The big drop off happens when I changed our backend to use a fork of Persistent with a 10 minute idle timeout. From the graph, you can tell we churn through far fewer connections with the 10 minute idle timeout.

Prometheus query:

sum by (job) (rate(new_postgres_connection_total[10m])) * 60

This second image is the raw data. Each server increments a counter on each new connection, which resets to zero on deploys. The first deploy (drop off to zero) is when I had our backend use the fork of Persistent. By the slope of the graph at the beginning vs the following segments, you can tell we're incrementing the counter far faster before we changed the idle timeout

parsonsmatt · 2020-07-03T20:04:33Z

I think we can do a patch release that merely increases the timeout to 10 minutes. Should be a transparent performance boon to everyone.

Making it configurable is probably a good idea, too - if we can do so in a non-breaking way, then we can include that in the patch. Otherwise that'll have to be a component to a minor bump and would get bundled with the next set of changes.

MaxGabriel · 2020-07-07T15:56:49Z

Ok, I changed the hardcoded limit here: #1097

MaxGabriel · 2020-07-07T15:58:38Z

I started work on the configurable values here: https://github.com/yesodweb/persistent/compare/configurableIdleTimeout?expand=1

There is a slightly breaking change in that PostgresConf would need new values when manually constructed, though

psibi · 2020-07-07T16:07:06Z

@MaxGabriel Out of curiosity, what is the pool size you have ? We faced a similar issue recently in one of our services and increasing the timeout seemed to have slightly improved the situation. IIRC, HikaraCP has a default pool size of 10.

MaxGabriel · 2020-07-07T16:12:01Z

@psibi We have a pool size of 10 per server, and 2 servers, so 20 connections total. I have no idea if it should be bumped; that would be good to investigate. We have something like 1.5–3 requests per second worth of traffic.

MaxGabriel · 2020-07-13T02:09:14Z

Made a draft PR to make these values configurable #1098

This was referenced Jul 8, 2020

Increase default idle timeout to 600 seconds #1097

Closed

Configurable idle timeout #1098

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

createSqlPool's idle timeout seems much too short #775

createSqlPool's idle timeout seems much too short #775

ocharles commented Jan 26, 2018 •

edited

Loading

ocharles commented Jan 26, 2018

MaxGabriel commented Jan 27, 2018

ocharles commented Jan 27, 2018

MaxGabriel commented Jan 27, 2018

ocharles commented Jan 27, 2018 via email

MaxGabriel commented Jan 27, 2018

MaxGabriel commented Jul 1, 2020

MaxGabriel commented Jul 1, 2020

psibi commented Jul 3, 2020

psibi commented Jul 3, 2020

MaxGabriel commented Jul 3, 2020

parsonsmatt commented Jul 3, 2020

MaxGabriel commented Jul 7, 2020

MaxGabriel commented Jul 7, 2020

psibi commented Jul 7, 2020 •

edited

Loading

MaxGabriel commented Jul 7, 2020

MaxGabriel commented Jul 13, 2020

createSqlPool's idle timeout seems much too short #775

createSqlPool's idle timeout seems much too short #775

Comments

ocharles commented Jan 26, 2018 • edited Loading

Methodology

ocharles commented Jan 26, 2018

MaxGabriel commented Jan 27, 2018

ocharles commented Jan 27, 2018

MaxGabriel commented Jan 27, 2018

ocharles commented Jan 27, 2018 via email

MaxGabriel commented Jan 27, 2018

MaxGabriel commented Jul 1, 2020

MaxGabriel commented Jul 1, 2020

psibi commented Jul 3, 2020

psibi commented Jul 3, 2020

MaxGabriel commented Jul 3, 2020

parsonsmatt commented Jul 3, 2020

MaxGabriel commented Jul 7, 2020

MaxGabriel commented Jul 7, 2020

psibi commented Jul 7, 2020 • edited Loading

MaxGabriel commented Jul 7, 2020

MaxGabriel commented Jul 13, 2020

ocharles commented Jan 26, 2018 •

edited

Loading

psibi commented Jul 7, 2020 •

edited

Loading