Skip to content

2.25.2.0-b197

@timothy-e timothy-e tagged this 20 Mar 12:23
Summary:
=== Background ===
PG Backends are initialized with LC_COLLATE = C and everything else UTF8 (see `setlocales` in `initdb.c`).

DocDB is not initialized with these same settings, so it's LC_CTYPE defaults to C, not UTF-8. This causes in any operation that depends on the locale to evaluate differently in DocDB vs Postgres -- the results differ based on if a filter is evaluated locally or remotely.

=== Fix 1: Change the DocDB default ===
Since each database defaults to UTF8, it's reasonable to set DocDB to default to UTF8 as well. This addresses the common case, and fixes correctness in examples like:
```lang=sql
create table t (c text);
insert into t values ('A'), ('a'), ('Ë'), ('ë');
select * from t where upper(c) = 'Ë';
 c
---
 Ë
(1 row)

set yb_enable_expression_pushdown = false;
select * from t where upper(c) = 'Ë';
 c
---
 ë
 Ë
(2 rows)
```

=== Fix 1 part 2: PG and DocDB collations should not differ for pushed down expressions ===
The above fix does not address cases where the database is created with a non UTF8 LC_CTYPE. To handle these cases, we need to disable pushdown for cases where the database locale differs from the DocDB local - i.e. the database locale is not UTF8.

=== Fix 2: catalog cache lookup is not allowed in multithread mode ===
While writing tests for these cases, another issue was found:
```lang=sql
CREATE TABLE tab2(id text COLLATE "ucs_basic");
INSERT INTO tab2 VALUES ('aaa'), ('äää'), ('ZZZ');
SELECT COUNT(*) FROM tab2 WHERE upper(id) = 'ÄÄÄ';
ERROR:  catalog cache lookup is not allowed in multithread mode
```

`ybCanPushdownExpr` determines that this expression is pushable, because the collation `ucs_basic` is a C collation:
```lang=sql
select collcollate, collctype from pg_collation where collname = 'ucs_basic';
 collcollate | collctype
-------------+-----------
 C           | C
(1 row)
```

However, when DocDB evaluates this code, it calls `str_tolower` which runs:
```lang=c
if (lc_ctype_is_c(collid))
{
	result = asc_tolower(buff, nbytes);
}
else
...
```

`lc_ctype_is_c` / `lc_collate_is_c` require a catalog lookup to get the `ctype_is_c` / `collate_is_c` fields, which is not allowed in DocDB. To fix this, we avoid pushing down any function where the collation would require DocDB to read from the catalog cache.
Jira: DB-15745

Test Plan:
```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTypesString'
```

Reviewers: myang

Reviewed By: myang

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D42617
Assets 2
Loading