Summary:
=== Background ===
IN expressions are ScalarArrayOpExprs (SAOPs). Upstream PG commit 50e17ad281b8d1c1b410c9833955bc80fbad4078 and 29f45e299e7ffa1df0db44b8452228625479487f changed the serialization of SAOPs, adding two fields: `hashfuncid` and `negfuncid`. If these fields are set, the executor should use the given hash function to create a hash set of the values in the array, to make comparing a value to the array O(1) instead of O(n). This optimization is helpful for large lists, e.g. `IN (1, 2, 3, ..., 100)`, but it is not essential - it has no impact on correctness if the executor just ignores it.
This change in serialization means that if a PG11 backend pushes down a SAOP to a PG15 tserver, the PG15 node expects to read two additional fields that aren't there. And if a PG15 backend pushes down a SAOP to a PG11 tserver, the PG11 tserver gets confused by the extra nodes. These differences result in parsing errors: the statement will error with `did not find '}' at end of input node`.
=== Fix ===
To remedy this, a special case is added to the serialization + deserialization of SAOPs. This is controlled by the GUC `yb_mixed_mode_saop_expression_pushdown`, which defaults to false. Once all special cases like this are handled, an autoflag will be created to control each of the new mixed mode pushdown special case GUCs.
==== Serialization ====
Serialization is done by a PG backend when sending a request. The backend has `yb_major_version_upgrade_compatibility = 11`, so it's straightforward to know whether to include the new fields in serialization or not. However, we don't want to impact any other serialization use-cases. To limit this change to expression pushdown only, create a function `ybSerializeNode` that sets and resets a global variable `yb_serialize_expression_version`.
==== Deserialization ====
Deserialization is done by the tserver when it receives a request. Generally, the tserver would also have `yb_major_version_upgrade_compatibility = 11`, but we cannot rely on that: since the flag cannot be atomically set across all nodes, some backends might be sending requests that are PG15 compatible (i.e. containing `hashfuncid` and `negfuncid`) and some backends might be sending requests that are PG11 compatible (i.e. not containing the two new fields).
D42054 / 5275733aebd4162214513632c92fc112c7b43433 added the expression version to each request, so the tserver is aware of the sender's version (and whether or not SAOPs would contain the two extra fields). We use the expression version of the request to determine if we should expect the new fields or not.
Another problem is that while the backend is single-threaded and can refer to a global variable to determine whether or not to serialize the fields, the tserver is multi-threaded. It may be handling some requests from PG11 backends while handling other requests from PG15 backends. We need to be sure to use the right expression version for each request, so we can't use a global variable. The options are either:
1. pass the expression version through to each `_read*` function in `readfuncs.c`. There are ~400 function signatures to change, which would complicate the next PG merge.
2. use a threadlocal variable [preferred approach].
3. only allow SAOPs at the top level - so we only need to pass expression version to the top level deserialization function, not to any of its recursive calls. This is a simple code change, but unintuitive and not generalizable.
**Upgrade / Downgrade Safety**
This feature is disabled by default, and will only be enabled once an auto-flag guards this and any other pushdown special cases.
Jira: DB-15312
Test Plan:
```
./yb_build.sh --cxx-test integration-tests_ysql_major_upgrade_expression_pushdown-test --gtest_filter YsqlMajorUpgradeExpressionPushdownTest.TestScalarArrayOpExprs
```
Reviewers: amartsinchyk, hsunder, fizaa, #db-approvers
Reviewed By: amartsinchyk
Subscribers: svc_phabricator, yql
Differential Revision: https://phorge.dev.yugabyte.com/D42454