Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No xmin column in Citus columnar storage. Is there a way around? #185

Open
tuttle opened this issue Jul 6, 2022 · 2 comments
Open

No xmin column in Citus columnar storage. Is there a way around? #185

tuttle opened this issue Jul 6, 2022 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@tuttle
Copy link

tuttle commented Jul 6, 2022

It appears the tap requires xmin column as the replication bookmark.

The column certainly exists for standard PostgreSQL access method, however Citus columnar storage does not provide any of the system columns (https://www.postgresql.org/docs/14/ddl-system-columns.html).

Is there any way around it so this tap can be used to fully replicate the columnar table, please?

@tuttle tuttle added the help wanted Extra attention is needed label Jul 6, 2022
@tuttle
Copy link
Author

tuttle commented Jul 6, 2022

Looking at the code, I was considering two options to solve this:

  1. add disable_xmin_resuming bool option to skip requesting the xmin column from the database in sync_table,
    or
  2. extend the SELECT ... FROM pg_attribute, pg_class... in produce_table_info() to treat the columnar table as view, as these table are not updatable anyway.

I was also comparing the sync_table() and sync_view() functions. Here's the diff: https://gist.github.com/tuttle/e8595eebbf492dbe60ee9ca18dc92af6
While it appears there is no xmin used in sync_view, which is what I'd needed, there are a few more differences. Some of them could be the developer forgot to update sync_view, when sync_table was extended. Not sure.

Would any of the two solutions mentioned be accepted as PR, please? Any other solutions occurs here?

@halilduygulu
Copy link

Xmin is a problem even it exists in postgres, order by xmin is very very slow for 100+ million row tables.
I had to use pipelinewise fastsync for this reason, which is running copy to stdout from tables and at least 3x faster to do a full copy of a table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants