Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for query pipelining? #155

Open
vlazar opened this issue Nov 7, 2018 · 13 comments · May be fixed by #253
Open

Support for query pipelining? #155

vlazar opened this issue Nov 7, 2018 · 13 comments · May be fixed by #253

Comments

@vlazar
Copy link

vlazar commented Nov 7, 2018

Recently released Round 17 of the TechEmpower Framework Benchmarks has an interesting change:

https://www.techempower.com/blog/2018/10/30/framework-benchmarks-round-17/

As you review Round 17 results, you'll notice that Postgres database tests are stratified—there are groups of test implementations that seem implausibly faster than other test implementations.

The underlying cause of this is use of a Postgres protocol feature we have characterized as "query pipelining" because it is conceptually similar to HTTP pipelining. We call it pipelining, but you could also call it multiplexing. It's a feature of the "Extended Query" protocol in Postgres. Query pipelining allows database clients to send multiple queries without needing to wait for each response before sending the next. It's similar to batching but provided invisibly in the driver.

@will
Copy link
Owner

will commented Nov 7, 2018 via email

@rishavs
Copy link

rishavs commented Nov 22, 2018

This would be fantastic. Right now the DB connection is the only thing in my crystal webapp which feels a bit slow.

@eliasjpr
Copy link

Would love to work with someone on this.

While looking for documentation I found the following explanation and implementation:

@eliasjpr
Copy link

Also have anyone has considered binding LibPQ instead? https://www.postgresql.org/docs/9.5/libpq.html

It seems to have solve all of these implementations already. We could bind to the library and create a Crystal-DB interface to it. Thoughts?

@will
Copy link
Owner

will commented Sep 30, 2019 via email

@eliasjpr
Copy link

I think the community can benefit from a native crystal implementation too

@jgaskins jgaskins linked a pull request Jul 2, 2022 that will close this issue
4 tasks
@jgaskins
Copy link
Contributor

jgaskins commented Jul 2, 2022

Just posted #253 to support query pipelining, if y'all could please have a look. I've been evaluating TimescaleDB for a side project and being able to pipeline INSERT queries for datapoints would be fantastic.

@straight-shoota
Copy link
Contributor

@jgaskins Looks nice. Regarding your questions about the API design, perhaps it makes sense to focus on a very low level API at first. This enables using the feature, but you need to know what you're doing. Later there could be a more high level API on top of that.

For a high level API I think it could be useful to combine the query and result interpretation into a promise object. I think for this there's not even a need for wrapping this into a pipeline block and could even be interleaved with direct queries on the same connection.

query1 = connection.pipeline.query_one "SELECT 42", as: Int32
query2 = connection.pipeline.query_all "SELECT * FROM posts LIMIT $1", 10, as: Post

answer = query1.result
posts = query2.result

Behind the scenes, pipe_query creates a pipeline promise containing the query statement and a result information. As soon as the first result method is called, it triggers a pipeline commit. This sends all queries waiting in the pipeline and then parses the results, placing the values into the promise objects. The second result call returns directly because the value has already been retrieved.

@jgaskins
Copy link
Contributor

jgaskins commented Jul 2, 2022

@straight-shoota That was the first thing I tried. I wanted the pipeline to use the DB::QueryMethods API. For multiple reasons, it doesn’t work with the Crystal type system.

@straight-shoota
Copy link
Contributor

How so? I would think this should be perfectly representable in Crystal's type system.

@jgaskins
Copy link
Contributor

jgaskins commented Jul 3, 2022

I thought it would be, too, until I tried it and realized that returning objects of a specific type is easy but storing them in memory for processing later is not.

Feel free to try it out if you're curious, but it's a moot point anyway. That's not how query pipelines are used in practice, and interleaving pipelined vs immediate queries like you suggested would increase cognitive load and/or would be prone to bugs. For example, consider this:

posts = pg.pipeline.query_one "...", as: Post
comments = pg.pipeline.query_all "...", as: Comment
author = pg.query_one "...", as: User
reactions = pg.pipeline.query_all "...", as: Reaction

posts = posts.result
comments = comments.result
reactions = reactions.result

If you send the first two queries before the third, this code breaks. If you don't send the first two queries before the third, it'd be very confusing when debugging as to why the queries are appearing out of order. The pipeline block in #253 communicates a clear start and end point of the pipelined queries to anyone reading the code.

Even if the queries are not being interleaved like that, the structure provided by the block is still better than without it:

posts = pg.pipeline.query "..."
comments = pg.pipeline.query "..."
author = pg.pipeline.query "..."
reactions = pg.pipeline.query "..."

posts = posts.read_one(Post)
comments = comments.read_all(Comment)
author = author.read_one(User)
reactions = reactions.read_all(Reaction)

# vs

rs = pg.pipeline do |pipe|
  pipe.query "..."
  pipe.query "..."
  pipe.query "..."
  pipe.query "..."
end

posts = rs.read_one(Post)
comments = rs.read_all(Comment)
author = rs.read_one(User)
reactions = rs.read_all(Reaction)

The only difference between these two is promise vs block and the second is still significantly easier to grok.

@straight-shoota
Copy link
Contributor

My concern about a structure where query and reading the result are completely separate is that you have to make sure to use the same order in both. But this probably doesn't matter too much. Or at least it's already great to have a functioning lower level API. For such an advanced feature this might actually be good enough. You'll just have to make sure to use it properly.

@jgaskins
Copy link
Contributor

jgaskins commented Jul 4, 2022

My concern about a structure where query and reading the result are completely separate is that you have to make sure to use the same order in both.

I hear you on this, but a query pipeline is effectively a channel, so this makes sense. That you're thinking about the problem in a different way is not a bad thing, but while a bespoke async I/O implementation with promises can be implemented on top of a pipeline, it is not itself a pipeline.

For such an advanced feature this might actually be good enough

Agreed that it's pretty advanced, and will likely rarely be used. I imagine most folks are using ORMs for most DB interactions and will only dip into an API like this at all when they really need the performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants