Release 2.27.0.0-b89: [#22245] YSQL: Increase ysql_output_buffer_size to 1MiB · yugabyte/yugabyte-db

2.27.0.0-b89
b3eeacf
Choose a tag to compare

Filter

View all tags

2.27.0.0-b89: [#22245] YSQL: Increase ysql_output_buffer_size to 1MiB

2.27.0.0-b89
b3eeacf
Choose a tag to compare

Filter

View all tags

pao214 tagged this 10 May 02:48

Summary:
### Motivation

YugabyteDB currently uses a 256KiB YSQL output buffer, compared to PostgreSQL’s default of 8KiB. A larger buffer allows YSQL to retry queries internally in the event of serialization failures. This is crucial because once YSQL sends partial results to the client, it cannot safely retry the query—doing so risks emitting duplicate results such as:

```
 id
----
(0 rows)

 id
----
  1
(1 row)
```

In YSQL, while retries are best-effort in REPEATABLE READ, they are essential in READ COMMITTED to ensure serialization errors are not thrown to the user. Also, PostgreSQL is not subject to read restart errors because it is a single node system. In contrast, YSQL relies on retries to avoid throwing read restart errors.

However, the current 256KiB buffer is often insufficient. Large SELECT queries commonly exceed this threshold. These same queries are also more likely to encounter restart errors due to read/write timestamp conflicts. As a result, increasing the output buffer size is a frequent operational change.

Raise the default buffer size to 1MiB, a common recommendation, to reduce friction and improve out-of-the-box reliability.

### Impact Analysis

**Q.** Do small queries incur increased memory usage?

No. Although each backend allocates a 1MiB buffer space, the OS does not actually reserve this memory unless a large query requires it. This behavior can be observed using the following script to track proportional set size (PSS):

```
#!/bin/bash

peak_pss=0

while true; do
  total_pss=0
  for pid in $(ps -eo pid,comm | grep '[p]ostgres' | awk '{print $1}'); do
    pss=$(awk '/Pss:/ {total += $2} END {print total}' /proc/$pid/smaps 2>/dev/null)
    total_pss=$((total_pss + pss))
  done

  if (( total_pss > peak_pss )); then
    peak_pss=$total_pss
  fi

  echo "Current PSS: ${total_pss} KB, Peak PSS: ${peak_pss} KB"
  sleep 1
done
```

Test Setup:

```
CREATE TABLE kv(k INT PRIMARY KEY, v INT);
INSERT INTO kv SELECT i, i FROM GENERATE_SERIES(1, 100000) i;
```

`SELECT * FROM kv LIMIT 1000` → ~131 MiB PSS

`SELECT * FROM kv` → ~132 MiB PSS

This provides evidence that the memory usage is incremental and the cost of 1 MiB buffer size is not payed unless there is a query with a large output.

**Q:** What about large queries?

* With 256KiB buffer: PSS increase ~3MiB
* With 1MiB buffer: PSS increase ~4MiB

The incremental cost is acceptable.

**Q:** How does this affect real-world workloads?

Ran TPC-H (analytical workload) via BenchBase against a replication factor 1 cluster:

* Idle PSS: ~120MiB
* Peak PSS (with and without buffer change): ~220MiB

The buffer size change had minimal impact on peak memory usage; other query-related allocations dominate.

### Caveats

1. Once allocated, buffer memory is not released until the connection closes.
2. First-row latency of large SELECT queries may increase due to buffering. This is an intentional tradeoff to reduce serialization failures.

Jira: DB-11163

Test Plan:
Jenkins

Close: #22245
Backport-through: 2024.2

Reviewers: pjain, smishra, #db-approvers

Reviewed By: pjain

Subscribers: svc_phabricator, yql

Differential Revision: https://phorge.dev.yugabyte.com/D43805

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!