fix(pg): fix binary format buffer handling #3496

faulpeltz · 2025-06-20T14:40:39Z

This tries to fix #3495

Functionally, this seems to fix the issue and the tests pass (edit: the stream tests are currently not working), but performance is not ideal because now in binary mode there is a buffer slice created, and then a string from the slice when a string is needed,, where previously the string was directly created from the buffer. Maybe it would probably be better instead to hint the field type to the parser before parsing?

change Parser.parseDataRowMessage() to produce a buffer slice instead of a string
change Result.parseRow() and _parseRowAsArray() to create the string from the buffer slice if not in binary mode

- change Parser.parseDataRowMessage() to produce a buffer slice instead of a string - change Result.parseRow() and _parseRowAsArray() to create the string from the buffer slice if not in binary mode

hjr3 · 2025-06-20T14:51:29Z

did binary parsing for int32 work prior to #3494 ? if so, i wonder if we special case the array parsing to fix the regression. that would give us time to explore a larger, more correct change.

change Parser.parseDataRowMessage() to produce a buffer slice instead of a string

i like the idea of passing around a buffer of bytes and only converting to string at the last possible moment.

faulpeltz · 2025-06-20T15:02:48Z

No, I don't think it worked, but I never got to that point because the bind argument protocol message was wrong for binary mode.
I was just about to try to fix that after digging around with wireshark when the new version released 😅

Please feel free to make any necessary changes, I've never worked on the internals of node-postgres before but I have written similar binary message parsers in JS before..

hjr3 · 2025-06-20T15:28:12Z

I will dive into this more later tonight and this weekend. I also plan on adding more data type parsing integration tests as well.

hjr3 · 2025-06-21T03:26:34Z

I am not finding the performance regression.

Before:

parseDataRowMessage converts the buffer into a string using this.reader.string(len) which uses buffer.toString. This allocates.
parseRow() and _parseRowAsArray() pass the string to the parser. This does not allocate

After:

parseDataRowMessage creates a buffer slice using this.reader.bytes(len) which uses buffer.slice``. This does NOT allocate. source
parseRow() and _parseRowAsArray() converts the buffer into a string. This allocates

In both cases, we now have a single allocation. (Actually, my recent fix introduced another allocation when adding the additional Buffer.from(rawValue), but you already fixed that).

I did add tests for every type I could think of.

charmander · 2025-06-21T06:56:38Z

I think the allocation they’re referring to is the additional Buffer view object itself on the string path.

faulpeltz · 2025-06-21T10:00:43Z

I think the allocation they’re referring to is the additional Buffer view object itself on the string path.

Yes, I was referring to the creation of the additional Buffer view when parsing, but because there is no allocation/copy involved its probably not relevant.

I was also just wondering about performance in general - when comparing text vs. binary mode in my app I cannot see any improvements when fetching large amounts of rows

hjr3 · 2025-06-21T21:45:31Z

Ah, the new object from buffer.slice that is now created in the string path.

but because there is no allocation/copy involved its probably not relevant.

I would think this as well! I wrote a quick benchmark to be sure

import { run, bench } from 'mitata'
import { Client } from 'pg'

const client = new Client()
await client.connect()

bench('text format', async () => {
  const rowCount = 50000
  return client.query({
    text: `SELECT
      generate_series(1, $1) as id,
      (random() * 10000)::int as int_val,
      'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.' as text_val,
      ARRAY[random(), random(), random()]::real[] as array_val`,
    values: [rowCount],
    binary: false,
  })
})

await run({
  format: 'mitata',
  colors: true,
  throw: true,
})

await client.end()

This was run on my local laptop, so not scientific. However, I am seeing this difference consistently:

Before

After

If this is true, then maybe we should look at hinting to parseDataRowMessage the foramt and making DataRowMessage generic over String and Buffer. I think that may be a fair amount of work to accomplish though.

I will keep looking as I get time.

hjr3 · 2025-06-21T22:00:41Z

I was also just wondering about performance in general - when comparing text vs. binary mode in my app I cannot see any improvements when fetching large amounts of rows

I am seeing some noticeable difference over large results. Same benchmark as above, but text vs binary

Text

Binary

Are you not seeing any difference? Can you share your benchmark?

faulpeltz · 2025-06-21T23:58:18Z

Regarding your benchmark: in binary mode, the query result for the float array is only string garbage, it seems like type 1021/element type 700 is not supported by pg-types in binary mode, so that might skew the benchmark result
Also the noParser always creates a string from the Buffer in binary mode which I think is not a good default

Are you not seeing any difference? Can you share your benchmark?

In the case of my app (which i cannot share), I wasn't able to measure any meaningful difference.
The part I was measuring basically just dumps large-ish tables to an nd-json stream, and from profiling the pg row parsing is still the largest chunk, but <30% so hard to say whats going on

fix(pg): fix binary format buffer handling

e564bba

- change Parser.parseDataRowMessage() to produce a buffer slice instead of a string - change Result.parseRow() and _parseRowAsArray() to create the string from the buffer slice if not in binary mode

faulpeltz force-pushed the fix-3495 branch from c72c089 to e564bba Compare June 20, 2025 14:51

add exhaustive tests for all types

ccc8619

hjr3 force-pushed the fix-3495 branch from b28fab8 to ccc8619 Compare June 21, 2025 03:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(pg): fix binary format buffer handling #3496

fix(pg): fix binary format buffer handling #3496

faulpeltz commented Jun 20, 2025 •

edited

Loading

Uh oh!

hjr3 commented Jun 20, 2025

Uh oh!

faulpeltz commented Jun 20, 2025

Uh oh!

hjr3 commented Jun 20, 2025

Uh oh!

hjr3 commented Jun 21, 2025 •

edited

Loading

Uh oh!

charmander commented Jun 21, 2025

Uh oh!

faulpeltz commented Jun 21, 2025

Uh oh!

hjr3 commented Jun 21, 2025

Uh oh!

hjr3 commented Jun 21, 2025

Uh oh!

faulpeltz commented Jun 21, 2025

Uh oh!

Uh oh!

Uh oh!

fix(pg): fix binary format buffer handling #3496

Are you sure you want to change the base?

fix(pg): fix binary format buffer handling #3496

Conversation

faulpeltz commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjr3 commented Jun 20, 2025

Uh oh!

faulpeltz commented Jun 20, 2025

Uh oh!

hjr3 commented Jun 20, 2025

Uh oh!

hjr3 commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charmander commented Jun 21, 2025

Uh oh!

faulpeltz commented Jun 21, 2025

Uh oh!

hjr3 commented Jun 21, 2025

Uh oh!

hjr3 commented Jun 21, 2025

Uh oh!

faulpeltz commented Jun 21, 2025

Uh oh!

Uh oh!

faulpeltz commented Jun 20, 2025 •

edited

Loading

hjr3 commented Jun 21, 2025 •

edited

Loading