Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boost performance by batching Rows #153

Closed
wants to merge 1 commit into from

Conversation

fabianfett
Copy link
Collaborator

@fabianfett fabianfett commented May 1, 2021

In #135 a connection state machine was introduced. To keep this simple at first, the state machine operates on single PostgresBackendMessages. This is very inefficient for queries that result in a lot of rows. This PR improves query performance by badging Postgres data row messages at the Decoder and operating on those batches from there on.

Open ends:

  • Two open warnings

Comment on lines 94 to 97
case forwardRow([PSQLData], to: EventLoopPromise<StateMachineStreamNextResult>)
case forwardCommandComplete(CircularBuffer<[PSQLData]>, commandTag: String, to: EventLoopPromise<StateMachineStreamNextResult>)
case forwardStreamError(PSQLError, to: EventLoopPromise<StateMachineStreamNextResult>, cleanupContext: CleanUpContext?)
// actions if query has not asked for next row but are pushing the final bytes to it
case forwardStreamErrorToCurrentQuery(PSQLError, read: Bool, cleanupContext: CleanUpContext?)
case forwardStreamCompletedToCurrentQuery(CircularBuffer<[PSQLData]>, commandTag: String, read: Bool)
case forwardRows(CircularBuffer<PSQLBackendMessage.DataRow>)
case forwardStreamComplete(CircularBuffer<PSQLBackendMessage.DataRow>, commandTag: String, read: Bool)
case forwardStreamError(PSQLError, read: Bool, cleanupContext: CleanUpContext?)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduction in Actions 🥳

let promise = self.eventLoop.makePromise(of: Row?.self)
self.downstreamState = .waitingForNext(promise)
self.upstreamState = .streaming(buffer: buffer, dataSource: dataSource)
dataSource.request()
Copy link
Collaborator Author

@fabianfett fabianfett May 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi This is the invocation, where the PSQLRowStream asks for more rows. We could change this to:

context.channel.triggerUserOutboundEvent(PSQLRowStreamEvent.requestMoreRows, promise: nil)

@fabianfett
Copy link
Collaborator Author

fabianfett commented May 2, 2021

Adding some performance numbers... (Swift main on focal in docker)

Running a query SELECT * FROM "large_table" that returns 1million rows with three number each:

1.4.4 release (before state machine):

root@3bcc272414db:/vapor/postgres-perf# time .build/release/PostgresPerf
2021-05-02T13:29:45+0000 info psql : PostgresQueryMetadata(command: "SELECT", oid: nil, rows: Optional(1000000))
2021-05-02T13:29:45+0000 info psql : took: 3.9132070541381836 seconds

real	0m4.455s
user	0m1.970s
sys	0m0.376s

main branch with state machine (perf regression):

root@3bcc272414db:/vapor/postgres-perf# time .build/release/PostgresPerf
2021-05-02T13:12:26+0000 info psql : PostgresQueryMetadata(command: "SELECT", oid: nil, rows: Optional(1000000))
2021-05-02T13:12:26+0000 info psql : query took: 5.985878944396973 seconds

real	0m6.256s
user	0m5.955s
sys	0m0.135s

this branch:

root@3bcc272414db:/vapor/postgres-perf# time .build/release/PostgresPerf
2021-05-02T13:13:23+0000 info psql : PostgresQueryMetadata(command: "SELECT", oid: nil, rows: Optional(1000000))
2021-05-02T13:13:23+0000 info psql : query took: 3.5723689794540405 seconds

real	0m4.003s
user	0m1.110s
sys	0m0.435s

While the speed bump is nice. Even more important: The query doesn't use 100% cpu anymore.

@jdmcd
Copy link
Member

jdmcd commented May 5, 2021

This is really enticing, any chance we can port this easily to MySQL as well? 👀

@fabianfett
Copy link
Collaborator Author

@jdmcd Since this pr builds on top of #135, I'm afraid some groundwork would need to be done to have this land.

@jdmcd
Copy link
Member

jdmcd commented May 5, 2021

Ah, I see

@fabianfett fabianfett marked this pull request as draft May 6, 2021 12:24
@fabianfett fabianfett force-pushed the ff-batch-performance branch 3 times, most recently from 2fe77c2 to 65200ab Compare May 10, 2021 19:03
@gwynne
Copy link
Member

gwynne commented Aug 16, 2021

This is really enticing, any chance we can port this easily to MySQL as well? 👀

@jdmcd FWIW this is something I've already got on my list to look into for MySQL 🙂

@fabianfett
Copy link
Collaborator Author

Just applied another fix. Will push shortly. On Swift 5.5 nightly:

Read 1m rows into an array
main        : 5.354 sec
this branch : 1.790 sec

Consume 1m rows on el
main        : 5.176 sec
this branch : 1.475 sec

@fabianfett fabianfett force-pushed the ff-batch-performance branch 5 times, most recently from f399384 to a693d81 Compare August 20, 2021 21:52
@fabianfett fabianfett marked this pull request as ready for review August 23, 2021 08:05
Co-authored-by: Gwynne Raskind <gwynne@darkrainfall.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants