Summary:
### Motivation
YugabyteDB currently uses a 256KiB YSQL output buffer, compared to PostgreSQL’s default of 8KiB. A larger buffer allows YSQL to retry queries internally in the event of serialization failures. This is crucial because once YSQL sends partial results to the client, it cannot safely retry the query—doing so risks emitting duplicate results such as:
```
id
----
(0 rows)
id
----
1
(1 row)
```
In YSQL, while retries are best-effort in REPEATABLE READ, they are essential in READ COMMITTED to ensure serialization errors are not thrown to the user. Also, PostgreSQL is not subject to read restart errors because it is a single node system. In contrast, YSQL relies on retries to avoid throwing read restart errors.
However, the current 256KiB buffer is often insufficient. Large SELECT queries commonly exceed this threshold. These same queries are also more likely to encounter restart errors due to read/write timestamp conflicts. As a result, increasing the output buffer size is a frequent operational change.
Raise the default buffer size to 1MiB, a common recommendation, to reduce friction and improve out-of-the-box reliability.
### Impact Analysis
**Q.** Do small queries incur increased memory usage?
No. Although each backend allocates a 1MiB buffer space, the OS does not actually reserve this memory unless a large query requires it. This behavior can be observed using the following script to track proportional set size (PSS):
```
#!/bin/bash
peak_pss=0
while true; do
total_pss=0
for pid in $(ps -eo pid,comm | grep '[p]ostgres' | awk '{print $1}'); do
pss=$(awk '/Pss:/ {total += $2} END {print total}' /proc/$pid/smaps 2>/dev/null)
total_pss=$((total_pss + pss))
done
if (( total_pss > peak_pss )); then
peak_pss=$total_pss
fi
echo "Current PSS: ${total_pss} KB, Peak PSS: ${peak_pss} KB"
sleep 1
done
```
Test Setup:
```
CREATE TABLE kv(k INT PRIMARY KEY, v INT);
INSERT INTO kv SELECT i, i FROM GENERATE_SERIES(1, 100000) i;
```
`SELECT * FROM kv LIMIT 1000` → ~131 MiB PSS
`SELECT * FROM kv` → ~132 MiB PSS
This provides evidence that the memory usage is incremental and the cost of 1 MiB buffer size is not payed unless there is a query with a large output.
**Q:** What about large queries?
* With 256KiB buffer: PSS increase ~3MiB
* With 1MiB buffer: PSS increase ~4MiB
The incremental cost is acceptable.
**Q:** How does this affect real-world workloads?
Ran TPC-H (analytical workload) via BenchBase against a replication factor 1 cluster:
* Idle PSS: ~120MiB
* Peak PSS (with and without buffer change): ~220MiB
The buffer size change had minimal impact on peak memory usage; other query-related allocations dominate.
### Caveats
1. Once allocated, buffer memory is not released until the connection closes.
2. First-row latency of large SELECT queries may increase due to buffering. This is an intentional tradeoff to reduce serialization failures.
Jira: DB-11163
Test Plan:
Jenkins
Close: #22245
Backport-through: 2024.2
Reviewers: pjain, smishra, #db-approvers
Reviewed By: pjain
Subscribers: svc_phabricator, yql
Differential Revision: https://phorge.dev.yugabyte.com/D43805