Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set VarZeroVec metadata byte to 4 #2278

Closed
wants to merge 10 commits into from
Closed

Conversation

sffc
Copy link
Member

@sffc sffc commented Jul 29, 2022

This fixes #1410 for the purposes of 1.0 data stability, by setting the metadata byte to 4. This enables us in post-1.0 to make the "flex" version of VarZeroVec a configurable feature for both library and datagen (i.e., a non-default feature zerovec/flex_vzv). Services can always provide metadata byte 4, but data packs for size-constrained devices will be able to use smaller widths.

@sffc sffc requested a review from Manishearth as a code owner July 29, 2022 01:59
@sffc
Copy link
Member Author

sffc commented Jul 29, 2022

Bench results (no change except 0.7% in the parsing function):

``` Benchmarking vzv_precompute/get/precomputed: Collecting 100 samples in estimated 5.000 vzv_precompute/get/precomputed time: [1.7117 ns 1.7154 ns 1.7204 ns] change: [-0.1460% +0.0893% +0.3296%] (p = 0.47 > 0.05) No change in performance detected. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe

Benchmarking vzv_precompute/get/slice: Collecting 100 samples in estimated 5.0000 s (2 vzv_precompute/get/slice
time: [1.8349 ns 1.8374 ns 1.8403 ns]
change: [-0.5059% -0.2485% +0.0018%] (p = 0.06 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe

 Running benches/zerovec.rs (/usr/local/google/home/sffc/projects/icu4x/target/release/deps/zerovec-2a682270ceaad1e3)

WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

 Running benches/zerovec_iai.rs (/usr/local/google/home/sffc/projects/icu4x/target/release/deps/zerovec_iai-d36a11db6e05c490)

sum_slice
Instructions: 81 (No change)
L1 Accesses: 86 (No change)
L2 Accesses: 4 (No change)
RAM Accesses: 6 (No change)
Estimated Cycles: 316 (No change)

sum_zerovec
Instructions: 89 (No change)
L1 Accesses: 93 (No change)
L2 Accesses: 4 (No change)
RAM Accesses: 7 (No change)
Estimated Cycles: 358 (No change)

binarysearch_slice
Instructions: 61 (No change)
L1 Accesses: 66 (No change)
L2 Accesses: 4 (No change)
RAM Accesses: 4 (No change)
Estimated Cycles: 226 (No change)

binarysearch_zerovec
Instructions: 68 (No change)
L1 Accesses: 73 (No change)
L2 Accesses: 4 (No change)
RAM Accesses: 4 (No change)
Estimated Cycles: 233 (No change)

varzeroslice_parse_get
Instructions: 410 (-0.726392%)
L1 Accesses: 513 (+0.588235%)
L2 Accesses: 4 (No change)
RAM Accesses: 14 (No change)
Estimated Cycles: 1023 (+0.294118%)

varzeroslice_get
Instructions: 28 (No change)
L1 Accesses: 33 (No change)
L2 Accesses: 4 (No change)
RAM Accesses: 4 (No change)
Estimated Cycles: 193 (No change)

varzeroslice_get_unchecked
Instructions: 29 (No change)
L1 Accesses: 34 (No change)
L2 Accesses: 4 (No change)
RAM Accesses: 4 (No change)
Estimated Cycles: 194 (No change)

</details>

@sffc sffc requested a review from a team as a code owner July 29, 2022 02:00
@Manishearth
Copy link
Member

Won't we lose the equality invariant with such a change in the future?

Manishearth
Manishearth previously approved these changes Jul 29, 2022
@sffc
Copy link
Member Author

sffc commented Jul 31, 2022

As discussed:

  • The equality invariant is only when comparing a VarZeroVec to another VarZeroVec
  • If the VarZeroVec comes from the same data file, all is okay, and it's most likely that both sides of an equality operation will originate in a data file, because constructing a VarZeroVec at runtime is costly, and we recommend get_by for such cases
  • It could be triggered by using a VarZeroVec as the key of a ZeroMap, but we currently only have string and fixed-sized keys
  • It could also be triggered by using a #[derive(VarULE)] on a type having more than one variable-width field, and then using equality operations on that type
  • If it's truly necessary, we can bump all the data struct version numbers when the change occurs, but it's probably fine so long as we make sure ICU4X code doesn't break
  • I will continue to work on this, but we are out of time for 1.0, and this is a quick way to give us flexibility; I would rather merge this PR and risk an equality invariant breakage than drop the FlexZeroVec change indefinitely

Manishearth
Manishearth previously approved these changes Jul 31, 2022
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document the caveat on the equality invariant?

We don't maintain binary stability between unstable zerovec versions anyway, however this would be a care of it seeming like binary stability is maintained but that not actually happening; it's worth having a warning about that on ZeroVec and VarZeroVec now so there are no surprises.

@sffc
Copy link
Member Author

sffc commented Aug 1, 2022

I wrote some additional docs on both VarZeroVec and the equality section of VarULE.

The process of writing the docs made me think that another way to handle this would be for the DataRequest to request a VarZeroVec serialization version, in the DataRequestMetadata. The current version could be the "fast" version, and we could introduce a "small" version later that is a bit slower. The validate_byte_slice impl can enforce that the correct form is provided.

@Manishearth
Copy link
Member

test failures

@sffc
Copy link
Member Author

sffc commented Aug 2, 2022

This is superseded by #2306. However, there are a few bits and pieces of this PR (like documentation) that I'd like to restore, so I will keep the PR open as draft until I have time to do that.

@sffc sffc marked this pull request as draft August 2, 2022 21:56
@sffc
Copy link
Member Author

sffc commented Nov 10, 2022

This is superseded by #2312. See #1410

@sffc sffc closed this Nov 10, 2022
@sffc sffc deleted the vzv-fzv-july branch November 10, 2022 18:52
@sffc sffc restored the vzv-fzv-july branch November 10, 2022 18:52
@sffc
Copy link
Member Author

sffc commented Nov 10, 2022

Re-opening because I said above that I wanted to restore some docs.

@sffc sffc reopened this Nov 10, 2022
@sffc sffc self-assigned this Nov 10, 2022
@sffc sffc added this to the ICU4X 1.1 milestone Nov 10, 2022
sffc added a commit to sffc/omnicu that referenced this pull request Jan 12, 2023
@sffc sffc mentioned this pull request Jan 12, 2023
@sffc
Copy link
Member Author

sffc commented Jan 12, 2023

New PR: #2979

@sffc sffc closed this Jan 12, 2023
@sffc sffc deleted the vzv-fzv-july branch January 12, 2023 00:23
@sffc
Copy link
Member Author

sffc commented Jan 12, 2023

Archived in a branch named archive/vzv-fzv-july

robertbastian pushed a commit that referenced this pull request Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VarZeroVec capacity and error handling
2 participants