Support for query pipelining? #155

vlazar · 2018-11-07T08:37:40Z

Recently released Round 17 of the TechEmpower Framework Benchmarks has an interesting change:

https://www.techempower.com/blog/2018/10/30/framework-benchmarks-round-17/

As you review Round 17 results, you'll notice that Postgres database tests are stratified—there are groups of test implementations that seem implausibly faster than other test implementations.

The underlying cause of this is use of a Postgres protocol feature we have characterized as "query pipelining" because it is conceptually similar to HTTP pipelining. We call it pipelining, but you could also call it multiplexing. It's a feature of the "Extended Query" protocol in Postgres. Query pipelining allows database clients to send multiple queries without needing to wait for each response before sending the next. It's similar to batching but provided invisibly in the driver.

will · 2018-11-07T19:49:51Z

This would be cool for sure. I wonder how to get pipelining and keep the crystal/db interface. It seems like keeping track of which result sets go where would be a bit tricky. I haven’t thought about this much though, there might be some nice, easy way

…

On Wed, Nov 7, 2018 at 00:37 Vladislav Zarakovsky ***@***.***> wrote: Recently released Round 17 of the TechEmpower Framework Benchmarks has an interesting change: https://www.techempower.com/blog/2018/10/30/framework-benchmarks-round-17/ As you review Round 17 results, you'll notice that Postgres database tests are stratified—there are groups of test implementations that seem implausibly faster than other test implementations. The underlying cause of this is use of a Postgres protocol feature we have characterized as "query pipelining" because it is conceptually similar to HTTP pipelining. We call it pipelining, but you could also call it multiplexing. It's a feature of the "Extended Query" protocol in Postgres. Query pipelining allows database clients to send multiple queries without needing to wait for each response before sending the next. It's similar to batching but provided invisibly in the driver. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#155>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAHtYdgWSdiW8vFv8Ue1saumiM0bbt4ks5uspvVgaJpZM4YR_fL> .

rishavs · 2018-11-22T06:01:15Z

This would be fantastic. Right now the DB connection is the only thing in my crystal webapp which feels a bit slow.

eliasjpr · 2019-09-30T18:35:02Z

Would love to work with someone on this.

While looking for documentation I found the following explanation and implementation:

Extended Query https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY
Pipelining message flow https://segmentfault.com/a/1190000017136059#articleHeader6
Python Example implementation Batch executemany MagicStack/asyncpg#295

eliasjpr · 2019-09-30T18:36:59Z

Also have anyone has considered binding LibPQ instead? https://www.postgresql.org/docs/9.5/libpq.html

It seems to have solve all of these implementations already. We could bind to the library and create a Crystal-DB interface to it. Thoughts?

will · 2019-09-30T18:59:21Z

This started out as a binding to libpq but moved away to remove some nasty problems with the linking on some platforms, and to take easier advantage of crystal’s async io.

…

On Sep 30, 2019, at 11:37, Elias Perez ***@***.***> wrote: Also have anyone has considered binding LibPQ instead? https://www.postgresql.org/docs/9.5/libpq.html It seems to have solve all of these implementations already. We could bind to the library and create a Crystal-DB interface to it. Thoughts? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

eliasjpr · 2019-09-30T22:18:29Z

I think the community can benefit from a native crystal implementation too

jgaskins · 2022-07-02T02:17:37Z

Just posted #253 to support query pipelining, if y'all could please have a look. I've been evaluating TimescaleDB for a side project and being able to pipeline INSERT queries for datapoints would be fantastic.

straight-shoota · 2022-07-02T10:58:57Z

@jgaskins Looks nice. Regarding your questions about the API design, perhaps it makes sense to focus on a very low level API at first. This enables using the feature, but you need to know what you're doing. Later there could be a more high level API on top of that.

For a high level API I think it could be useful to combine the query and result interpretation into a promise object. I think for this there's not even a need for wrapping this into a pipeline block and could even be interleaved with direct queries on the same connection.

query1 = connection.pipeline.query_one "SELECT 42", as: Int32
query2 = connection.pipeline.query_all "SELECT * FROM posts LIMIT $1", 10, as: Post

answer = query1.result
posts = query2.result

Behind the scenes, pipe_query creates a pipeline promise containing the query statement and a result information. As soon as the first result method is called, it triggers a pipeline commit. This sends all queries waiting in the pipeline and then parses the results, placing the values into the promise objects. The second result call returns directly because the value has already been retrieved.

jgaskins · 2022-07-02T18:40:02Z

@straight-shoota That was the first thing I tried. I wanted the pipeline to use the DB::QueryMethods API. For multiple reasons, it doesn’t work with the Crystal type system.

straight-shoota · 2022-07-03T17:29:40Z

How so? I would think this should be perfectly representable in Crystal's type system.

jgaskins · 2022-07-03T19:21:41Z

I thought it would be, too, until I tried it and realized that returning objects of a specific type is easy but storing them in memory for processing later is not.

Feel free to try it out if you're curious, but it's a moot point anyway. That's not how query pipelines are used in practice, and interleaving pipelined vs immediate queries like you suggested would increase cognitive load and/or would be prone to bugs. For example, consider this:

posts = pg.pipeline.query_one "...", as: Post
comments = pg.pipeline.query_all "...", as: Comment
author = pg.query_one "...", as: User
reactions = pg.pipeline.query_all "...", as: Reaction

posts = posts.result
comments = comments.result
reactions = reactions.result

If you send the first two queries before the third, this code breaks. If you don't send the first two queries before the third, it'd be very confusing when debugging as to why the queries are appearing out of order. The pipeline block in #253 communicates a clear start and end point of the pipelined queries to anyone reading the code.

Even if the queries are not being interleaved like that, the structure provided by the block is still better than without it:

posts = pg.pipeline.query "..."
comments = pg.pipeline.query "..."
author = pg.pipeline.query "..."
reactions = pg.pipeline.query "..."

posts = posts.read_one(Post)
comments = comments.read_all(Comment)
author = author.read_one(User)
reactions = reactions.read_all(Reaction)

# vs

rs = pg.pipeline do |pipe|
  pipe.query "..."
  pipe.query "..."
  pipe.query "..."
  pipe.query "..."
end

posts = rs.read_one(Post)
comments = rs.read_all(Comment)
author = rs.read_one(User)
reactions = rs.read_all(Reaction)

The only difference between these two is promise vs block and the second is still significantly easier to grok.

straight-shoota · 2022-07-03T23:08:19Z

My concern about a structure where query and reading the result are completely separate is that you have to make sure to use the same order in both. But this probably doesn't matter too much. Or at least it's already great to have a functioning lower level API. For such an advanced feature this might actually be good enough. You'll just have to make sure to use it properly.

jgaskins · 2022-07-04T18:51:32Z

My concern about a structure where query and reading the result are completely separate is that you have to make sure to use the same order in both.

I hear you on this, but a query pipeline is effectively a channel, so this makes sense. That you're thinking about the problem in a different way is not a bad thing, but while a bespoke async I/O implementation with promises can be implemented on top of a pipeline, it is not itself a pipeline.

For such an advanced feature this might actually be good enough

Agreed that it's pretty advanced, and will likely rarely be used. I imagine most folks are using ORMs for most DB interactions and will only dip into an API like this at all when they really need the performance.

jgaskins linked a pull request Jul 2, 2022 that will close this issue

Add pipeline support #253

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for query pipelining? #155

Support for query pipelining? #155

vlazar commented Nov 7, 2018

will commented Nov 7, 2018 via email

rishavs commented Nov 22, 2018

eliasjpr commented Sep 30, 2019

eliasjpr commented Sep 30, 2019

will commented Sep 30, 2019 via email

eliasjpr commented Sep 30, 2019

jgaskins commented Jul 2, 2022

straight-shoota commented Jul 2, 2022

jgaskins commented Jul 2, 2022

straight-shoota commented Jul 3, 2022

jgaskins commented Jul 3, 2022

straight-shoota commented Jul 3, 2022

jgaskins commented Jul 4, 2022

Support for query pipelining? #155

Support for query pipelining? #155

Comments

vlazar commented Nov 7, 2018

will commented Nov 7, 2018 via email

rishavs commented Nov 22, 2018

eliasjpr commented Sep 30, 2019

eliasjpr commented Sep 30, 2019

will commented Sep 30, 2019 via email

eliasjpr commented Sep 30, 2019

jgaskins commented Jul 2, 2022

straight-shoota commented Jul 2, 2022

jgaskins commented Jul 2, 2022

straight-shoota commented Jul 3, 2022

jgaskins commented Jul 3, 2022

straight-shoota commented Jul 3, 2022

jgaskins commented Jul 4, 2022