Summary:
For making future PG merges easier, separate YB elements from upstream
PG elements in grammar and lexer files.
- Force ordinary keywords added by YB (into kwlist.h) to have "_YB_"
prefix and "_P" suffix. The first "_" is to make sure YB tokens sort
last when they are placed in various grammar rules in gram.y because
check_keywords.pl enforces sorting. To support this change, update
kwlist.h to strip the "_YB_" prefix. Enforcing "_YB_" is accompanied
by "_P" is done by kwlist.h, which will throw an error during
compilation if not followed. Enforcing YB keywords to have "_YB_"
prefix is done by a new lint rule yb_keyword_missing_yb_prefix.
I considered separating upstream PG's alphabetical sorting check from
YB's by creating new subrules holding YB tokens (e.g.
yb_bare_label_keyword) and modifying check_keywords.pl, but this way
is much simpler. I also considered creating a separate yb_kwlist.h so
that YB keywords are not intermingled with upstream PG keywords (which
causes merge conflicts), but given the serious-sounding comment in
kwlist.h "Check that the list is in alphabetical order (critical!)",
refrain from making such a change. (Compare with the less-serious
sounding comment "Check that each keyword list is in alphabetical
order (just for neatnik-ism)" in a different area, both introduced in
the same upstream PG commit 55c1687a97c3c2b6cbf7c1b45830b49f03641908.)
- Group all gram.y YB types to the end.
- Reorder gram.y stmt rule's items in the same order as upstream PG, and
move YB items to the end. Note that upstream PG's ordering is not
necessarily alphabetical.
- Sort and comment YB tokens in repl_gram.y.
- Sort and comment YB tokens in repl_scanner.l. Note that the tab
before the comment is syntactically necessary.
Besides the one rule added, the other changes also could have lint
rules. That would be nice to have in the future. For now, it is more
effort than reward.
Jira: DB-16267
Test Plan:
On Almalinux 8, run
#!/usr/bin/env bash
set -euxo pipefail
KWLIST=src/postgres/src/include/parser/kwlist.h
cleanup() {
sed -i 's/, ACCOUNT_P,/, _YB_ACCOUNT_P,/' "$KWLIST"
}
trap cleanup EXIT
trap cleanup INT
# Expect the new lint rule to be fully passing.
arc lint "$KWLIST"
# Expect failure for bad keywords.
sed -i 's/, _YB_ACCOUNT_P,/, ACCOUNT_P,/' "$KWLIST"
{ arc lint "$KWLIST" && exit 1; } || true
Close: #26858
Reviewers: kfranz
Reviewed By: kfranz
Subscribers: yql
Differential Revision: https://phorge.dev.yugabyte.com/D43268