Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Arrow extension type and metadata prefixes #3208

Merged
merged 2 commits into from Jun 9, 2023

Conversation

dominiklohmann
Copy link
Member

We now register extension types as tenzir.ip, tenzir.subnet, and tenzir.enumeration instead of vast.address, vast.subnet, and vast.enumeration, respectively. Type metadata now has a Tenzir: prefix instead of a VAST: prefix.

The change is mostly backwards compatible. The old type names and prefixes are still supported. However, reading old Apache Feather V2 and Apache Parquet inside the database directory files using the Python bindings will not work until a tenzir-ctl rebuild --all was run.

@dominiklohmann dominiklohmann requested a review from tobim June 8, 2023 16:07
@dominiklohmann dominiklohmann force-pushed the topic/rename-arrow-ext-type-and-metadata branch 2 times, most recently from 465f8d0 to 1bf8515 Compare June 8, 2023 16:19
We now register extension types as `tenzir.ip`, `tenzir.subnet`, and
`tenzir.enumeration` instead of `vast.address`, `vast.subnet`, and
`vast.enumeration`, respectively. Type metadata now has a `Tenzir:`
prefix instead of a `VAST:` prefix.

The change is mostly backwards compatible. The old type names and
prefixes are still supported. However, reading old Apache Feather V2 and
Apache Parquet inside the database directory files using the Pythin
bindings will not work until a `tenzir-ctl rebuild --all` was run.
@dominiklohmann dominiklohmann force-pushed the topic/rename-arrow-ext-type-and-metadata branch from 1bf8515 to 9c1d4ad Compare June 9, 2023 06:37
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good.

I talked with @tobim about the need of a full rebuild. Is this really needed?

libvast/src/format/arrow.cpp Outdated Show resolved Hide resolved
@dominiklohmann
Copy link
Member Author

I talked with @tobim about the need of a full rebuild. Is this really needed?

If you want to read the files using our Python Arrow extension types, then yes, otherwise no.

@mavam
Copy link
Member

mavam commented Jun 9, 2023

If you want to read the files using our Python Arrow extension types, then yes, otherwise no.

That seems like a reasonable tradeoff.

This seems to be the norm, e.g., Arrow's own metadata is `ARROW:`.
@dominiklohmann dominiklohmann merged commit f286883 into main Jun 9, 2023
33 checks passed
@dominiklohmann dominiklohmann deleted the topic/rename-arrow-ext-type-and-metadata branch June 9, 2023 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants