Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComptimeStringMap: return a regular struct and optimize #19719

Merged
merged 1 commit into from
Apr 22, 2024

Conversation

travisstaloch
Copy link
Contributor

@travisstaloch travisstaloch commented Apr 21, 2024

this patch renames ComptimeStringMap to StaticStringMap, makes it
accept only a single type parameter, and return a known struct type
instead of an anonymous struct. initial motivation for these changes
was to reduce the 'very long type names' issue described here
#19682.

this breaks the previous API. users will now need to write:
const map = std.StaticStringMap(T).initComptime(kvs_list);

  • move kvs_list param from type param to an initComptime() param
  • new public methods
    • keys(), values() helpers
    • init(allocator), deinit(allocator) for runtime data
    • getLongestPrefix(str), getLongestPrefixIndex(str) - i'm not sure
      these belong but have left in for now incase they are deemed useful
  • performance notes:
    • i posted some benchmarking results here:
      travisstaloch/comptime-string-map-revised#1
    • i noticed a speedup reducing the size of the struct from 48 to 32
      bytes and thus use u32s instead of usize for all length fields
    • i noticed speedup storing KVs as a struct of arrays
    • latest benchmark shows these wall_time improvements for
      debug/safe/small/fast builds: -6.6% / -10.2% / -19.1% / -8.9%. full
      output in link above.

@Vexu
Copy link
Member

Vexu commented Apr 21, 2024

Could the name be changed while we're breaking the API? Instead of ComptimeStringMap with initRuntime have something like StaticStringMap with initComptime?

@travisstaloch
Copy link
Contributor Author

travisstaloch commented Apr 21, 2024

sure no problem. will work on that now.

@travisstaloch
Copy link
Contributor Author

should I leave in initRuntime()? if so, should I rename it to init() now?

@travisstaloch travisstaloch force-pushed the comptime-str-map-revised branch 2 times, most recently from 3237536 to 7ae12a5 Compare April 21, 2024 08:11
@travisstaloch
Copy link
Contributor Author

Could the name be changed while we're breaking the API? Instead of ComptimeStringMap with initRuntime have something like StaticStringMap with initComptime?

This is done. We'll see how many typos i made 😄

@travisstaloch
Copy link
Contributor Author

travisstaloch commented Apr 21, 2024

CI doesn't like that I've renamed lib/std/comptime_string_map.zig to lib/std/static_string_map.zig

ninja: error: '../lib/std/comptime_string_map.zig', needed by 'zig2.c', missing and no known rule to make it

@travisstaloch
Copy link
Contributor Author

CI doesn't like that I've renamed lib/std/comptime_string_map.zig to lib/std/static_string_map.zig

For now, I've moved back to the original name so we can make progress.

@BratishkaErik
Copy link
Contributor

CI doesn't like that I've renamed lib/std/comptime_string_map.zig to lib/std/static_string_map.zig

ninja: error: '../lib/std/comptime_string_map.zig', needed by 'zig2.c', missing and no known rule to make it

I think you need to also change this line (in samencommit), so that CMake knows this "dependency" is renamed:

"${CMAKE_SOURCE_DIR}/lib/std/comptime_string_map.zig"

@travisstaloch
Copy link
Contributor Author

I think you need to also change this line (in samencommit), so that CMake knows this "dependency" is renamed:

Thanks! Just pushed this change.

@travisstaloch travisstaloch force-pushed the comptime-str-map-revised branch 2 times, most recently from cebaa3f to b7f3025 Compare April 21, 2024 10:51
@Vexu
Copy link
Member

Vexu commented Apr 21, 2024

if so, should I rename it to init() now?

That's what I was going for.

@travisstaloch
Copy link
Contributor Author

Sounds good! Thats an easy change.

Also, any thoughts about whether I should remove getPartial()? I found it very useful for tokenizing or parsing to have a way to check if a string starts with any key from the map and thats why I've left it in. If it should stay, any idea of a better name than getPartial(str)? Something like getLongestKVAtStartOf(str)? I'm struggling with the name.

@erikarvstedt
Copy link
Contributor

erikarvstedt commented Apr 21, 2024

getMaxPrefix

/// Returns the key-value pair for the longest key that is a prefix of str.

@travisstaloch
Copy link
Contributor Author

getMaxPrefix

Not bad. I think I'd prefer getLongestPrefix() 🤔

@travisstaloch
Copy link
Contributor Author

That's what I was going for.

@Vexu done - renamed initRuntime() to init(). Also renamed getPartial() to getLongestPrefix() and cleaned up tests and documentation a little more. I removed most of the duplicated testing of runtime init(), leaving only a couple. So the tests look a lot more like they did before this PR.

@travisstaloch
Copy link
Contributor Author

Perf point: building my simdjzon project with around 7000 loc zig files in src folder.

Note that I'm not super confident in my process so here are some notes:

  • zig below is a release build i built yesterday (../build-release/stage3/bin/zig) and is available in $PATH.
  • ../zig/stage4/bin/zig was built with ../build-release/stage3/bin/zig build -p stage4 -Denable-llvm -Dno-lib -Doptimize=ReleaseFast.
.../zig/simdjzon $ zig version
0.13.0-dev.8+c352845e8
.../zig/simdjzon $ ../zig/stage4/bin/zig version
0.13.0-dev.9+21a3a5710
.../zig/simdjzon $ poop 'zig build-exe -fno-emit-bin -ODebug --dep simdjzon -Mroot=src/main.zig --dep build_options -Msimdjzon=src/simdjzon.zig -Mbuild_options=zig-cache/c/8caf24ab22f7fa50f13c4c948a6b8cca/options.zig --cache-dir zig-cache --global-cache-dir /home/travis/.cache/zig --name simdjzon' '../zig/stage4/bin/zig build-exe -fno-emit-bin -ODebug --dep simdjzon -Mroot=src/main.zig --dep build_options -Msimdjzon=src/simdjzon.zig -Mbuild_options=zig-cache/c/8caf24ab22f7fa50f13c4c948a6b8cca/options.zig --cache-dir zig-cache --global-cache-dir /home/travis/.cache/zig --name simdjzon'
Benchmark 1 (36 runs): zig build-exe -fno-emit-bin -ODebug --dep simdjzon -Mroot=src/main.zig --dep build_options -Msimdjzon=src/simdjzon.zig -Mbuild_options=zig-cache/c/8caf24ab22f7fa50f13c4c948a6b8cca/options.zig --cache-dir zig-cache --global-cache-dir /home/travis/.cache/zig --name simdjzon
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           139ms ± 1.58ms     136ms …  144ms          1 ( 3%)        0%
  peak_rss            111MB ±  285KB     111MB …  112MB          1 ( 3%)        0%
  cpu_cycles          619M  ± 3.32M      612M  …  626M           4 (11%)        0%
  instructions        986M  ± 4.88K      986M  …  986M           2 ( 6%)        0%
  cache_references   58.8M  ±  356K     58.1M  … 59.5M           0 ( 0%)        0%
  cache_misses       8.02M  ±  106K     7.85M  … 8.24M           0 ( 0%)        0%
  branch_misses      4.60M  ± 28.0K     4.54M  … 4.68M           1 ( 3%)        0%
Benchmark 2 (41 runs): ../zig/stage4/bin/zig build-exe -fno-emit-bin -ODebug --dep simdjzon -Mroot=src/main.zig --dep build_options -Msimdjzon=src/simdjzon.zig -Mbuild_options=zig-cache/c/8caf24ab22f7fa50f13c4c948a6b8cca/options.zig --cache-dir zig-cache --global-cache-dir /home/travis/.cache/zig --name simdjzon
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           123ms ± 1.24ms     121ms …  128ms          1 ( 2%)        ⚡- 11.5% ±  0.5%
  peak_rss            106MB ±  178KB     106MB …  106MB          2 ( 5%)        ⚡-  4.5% ±  0.1%
  cpu_cycles          465M  ± 3.15M      461M  …  479M           2 ( 5%)        ⚡- 24.8% ±  0.2%
  instructions        718M  ± 4.03K      718M  …  718M           0 ( 0%)        ⚡- 27.1% ±  0.0%
  cache_references   52.7M  ±  342K     52.1M  … 53.5M           0 ( 0%)        ⚡- 10.4% ±  0.3%
  cache_misses       7.56M  ±  133K     7.30M  … 7.96M           2 ( 5%)        ⚡-  5.8% ±  0.7%
  branch_misses      3.13M  ± 19.2K     3.08M  … 3.18M           3 ( 7%)        ⚡- 31.9% ±  0.2%

This seems too good to be true. Can anyone spot any mistakes or maybe try reproducing with this branch and building a different project?

@travisstaloch
Copy link
Contributor Author

travisstaloch commented Apr 22, 2024

Perf point: building my simdjzon project with around 7000 loc zig files in src folder.

I got some feedback on discord from @jacobly0 @nektro and @Sinon (sorry forget your github handle).

if you want to benchmark code speed it's better to do it at runtime in a loop in an executable and benchmark the executable

The consensus I think is that this benchmark isn't reliable since there are too many moving parts such as caching. So I believe my previous benchmarks are a better measurement.

lib/std/static_string_map.zig Outdated Show resolved Hide resolved
this patch renames ComptimeStringMap to StaticStringMap, makes it
accept only a single type parameter, and return a known struct type
instead of an anonymous struct.  initial motivation for these changes
was to reduce the 'very long type names' issue described here
ziglang#19682.

this breaks the previous API.  users will now need to write:
`const map = std.StaticStringMap(T).initComptime(kvs_list);`

* move `kvs_list` param from type param to an `initComptime()` param
* new public methods
  * `keys()`, `values()` helpers
  * `init(allocator)`, `deinit(allocator)` for runtime data
  * `getLongestPrefix(str)`, `getLongestPrefixIndex(str)` - i'm not sure
     these belong but have left in for now incase they are deemed useful
* performance notes:
  * i posted some benchmarking results here:
    https://github.com/travisstaloch/comptime-string-map-revised/issues/1
  * i noticed a speedup reducing the size of the struct from 48 to 32
    bytes and thus use u32s instead of usize for all length fields
  * i noticed speedup storing KVs as a struct of arrays
  * latest benchmark shows these wall_time improvements for
    debug/safe/small/fast builds: -6.6% / -10.2% / -19.1% / -8.9%. full
    output in link above.
Comment on lines +71 to +73
pub inline fn initComptime(comptime kvs_list: anytype) Self {
comptime {
@setEvalBranchQuota(30 * kvs_list.len);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vexu done. This seems to be just big enough. I've added another test at bottom with a kvs_list of length 1000.

Comment on lines +215 to +216
/// Returns the longest key, value pair where key is a prefix of `str`
/// else null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This makes it sound as if the value contributes to the length calculation.

Suggested change
/// Returns the longest key, value pair where key is a prefix of `str`
/// else null.
/// Returns the key-value pair where key is the longest prefix of `str`
/// else null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. 👍 . I'll make sure to change this next review if getLongestPrefix() is approved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah merged already. Will have to be a follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added these changes in #19823 along with a note about time complexity

@andrewrk
Copy link
Member

andrewrk commented Apr 22, 2024

EDIT: note to readers - this commit message is old Please read the current, updated one.

Can you edit your PR description to be correct so that I can use it for release notes in the future?

nice work, btw.

@andrewrk andrewrk added standard library This issue involves writing Zig code for the standard library. release notes This PR should be mentioned in the release notes. labels Apr 22, 2024
@andrewrk andrewrk merged commit 8af59d1 into ziglang:master Apr 22, 2024
10 checks passed
@travisstaloch
Copy link
Contributor Author

Can you edit your PR description to be correct so that I can use it for release notes in the future?

Thanks! Done. And will do.

@travisstaloch travisstaloch deleted the comptime-str-map-revised branch April 23, 2024 03:03
elasticdog added a commit to EarthmanMuons/clefcraft that referenced this pull request Apr 26, 2024
Not only did the name change, but the initialization procedure and
types also needed to be tweaked.

This made me realize that with my other memory management changes, the
attempts to lookup shorthands for chromatic and whole-tone patterns was
failing. That was due to the Pitch.asText() return value going out of
scope, so it was always blank. Instead of trying to use a static array
and be tricky, we'll pass in an allocator like I probably should have to
begin with.

See:
- ziglang/zig#19719
MFAshby added a commit to MFAshby/zig-lsp-kit that referenced this pull request May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants