mem: add splitBackwards #11908

motiejus · 2022-06-22T11:58:43Z

Over the last couple of weeks weeks I needed to iterate over a
collection backwards at least twice. Do we want to have this in stdlib?
If yes, click "Merge" and start using today! Free shipping and returns
(before 1.0).

Why is this useful?

I need this for building an error wrapper: errors are added in the
wrapper from "lowest" level to "highest" level, and then printed in
reverse order. Imagine UpdateUsers call, which needs to return
error.InvalidInput and a wrappable error context. In Go we would add a
context to the error when returning it:

// if update_user fails, add context on which user we are operating
if err := update_user(user); err != nil {
    return fmt.Errorf("user id=%d: %w", user.id, err)
}

Since Zig cannot pass anything else than u16 with an error (#2647), I
will pass a err_ctx: *Err, to the callers, where they can, besides
returning an error, augment it with auxiliary data. Err is a
preallocated array that can add zero-byte-separated strings. For a
concrete example, imagine such a call graph:

update_user(User, *Err) error{InvalidInput}!<...>
  validate_user([]const u8, *Err) error{InvalidInput}!<...>

Where validate_user would like, besides only the error, signal the
invalid field. And update_user, besides the error, would signal the
offending user id.

We also don't want the low-level functions to know in which context they
are operating to construct a meaningful error message: if validation
fails, they append their "context" to the buffer. To translate/augment
the Go example above:

pub fn validate_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
    const name = user.name;
    if (!ascii.isAlpha(user.name)) {
        err_ctx.print("name '{s}' must be ascii-letters only", .{user.name});
        return error.InvalidInput;
    }
    <...>
}

// update_user validates each user and does something with it.
pub fn update_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
    // validate the user before updating it
    validate_user(user) catch {
        err_ctx.print("user id={d}", .{user.id});
        return error.InvalidInput;
    };
    <...>
}

Then the top-level function (in my case, CLI) will read the buffer
backwards (splitting on "\x00") and print:

user id=123: name 'Žvangalas' must be ascii-letters only

To read that buffer backwards, dear readers of this commit message, I
need mem.splitBackwards.

This PR has 2 commits:

refactor split tests to use assertEqualSlices.
refactor split tests to use rest() occasionally.
add splitBackwards with similar tests to split.

InKryption · 2022-06-22T12:15:02Z

lib/std/mem.zig

+/// the iterator will return `buffer`, null, in that order.
+/// The delimiter length must not be zero.
+/// See also the related function `tokenize`.
+pub fn split_rev(comptime T: type, buffer: []const T, delimiter: []const T) SplitIteratorRev(T) {


Function names are expected to be camelCase: https://ziglang.org/documentation/master/#Style-Guide

That's what happens when you need to change the programming language more than once a day.

lib/std/mem.zig

motiejus · 2022-06-22T12:22:35Z

Looks like github is having trouble with its cache: I renamed split_rev to splitRev in both title and description, and it is still not reflected in the UI.

Luukdegram · 2022-06-22T12:24:14Z

I don't want to get into bikeshedding territory but I wonder if it should be named splitReversed fully written out, similarly to how we have copyBackwards.

motiejus · 2022-06-22T12:38:24Z

I don't want to get into bikeshedding territory but I wonder if it should be named splitReversed fully written out, similarly to how we have copyBackwards.

I like the color of your bike shed better; updating to splitReversed.

motiejus · 2022-06-23T03:06:02Z

I don't want to get into bikeshedding territory but I wonder if it should be named splitReversed fully written out, similarly to how we have copyBackwards.

Repainted again, so splitBackwards is now consistent with copyBackwards.

- add a few cases for .rest() - use expectEqualSlices()

Over the last couple of weeks weeks I needed to iterate over a collection backwards at least twice. Do we want to have this in stdlib? If yes, click "Merge" and start using today! Free shipping and returns (before 1.0). Why is this useful? ------------------- I need this for building an error wrapper: errors are added in the wrapper from "lowest" level to "highest" level, and then printed in reverse order. Imagine `UpdateUsers` call, which needs to return `error.InvalidInput` and a wrappable error context. In Go we would add a context to the error when returning it: // if update_user fails, add context on which user we are operating if err := update_user(user); err != nil { return fmt.Errorf("user id=%d: %w", user.id, err) } Since Zig cannot pass anything else than u16 with an error (ziglang#2647), I will pass a `err_ctx: *Err`, to the callers, where they can, besides returning an error, augment it with auxiliary data. `Err` is a preallocated array that can add zero-byte-separated strings. For a concrete example, imagine such a call graph: update_user(User, *Err) error{InvalidInput}!<...> validate_user([]const u8, *Err) error{InvalidInput}!<...> Where `validate_user` would like, besides only the error, signal the invalid field. And `update_user`, besides the error, would signal the offending user id. We also don't want the low-level functions to know in which context they are operating to construct a meaningful error message: if validation fails, they append their "context" to the buffer. To translate/augment the Go example above: pub fn validate_user(err_ctx: *Err, user: User) error{InvalidInput}!void { const name = user.name; if (!ascii.isAlpha(name)) { err_ctx.print("name '{s}' must be ascii-letters only", .{name}); return error.InvalidInput; } <...> } // update_user validates each user and does something with it. pub fn update_user(err_ctx: *Err, user: User) error{InvalidInput}!void { // validate the user before updating it validate_user(user) catch { err_ctx.print("user id={d}", .{user.id}); return error.InvalidInput; }; <...> } Then the top-level function (in my case, CLI) will read the buffer backwards (splitting on `"\x00"`) and print: user id=123: name 'Žvangalas' must be ascii-letters only To read that buffer backwards, dear readers of this commit message, I need `mem.splitBackwards`.

* mem: refactor tests of split() - add a few cases for .rest() - use expectEqualSlices() * mem: add splitBackwards Over the last couple of weeks weeks I needed to iterate over a collection backwards at least twice. Do we want to have this in stdlib? If yes, click "Merge" and start using today! Free shipping and returns (before 1.0). Why is this useful? ------------------- I need this for building an error wrapper: errors are added in the wrapper from "lowest" level to "highest" level, and then printed in reverse order. Imagine `UpdateUsers` call, which needs to return `error.InvalidInput` and a wrappable error context. In Go we would add a context to the error when returning it: // if update_user fails, add context on which user we are operating if err := update_user(user); err != nil { return fmt.Errorf("user id=%d: %w", user.id, err) } Since Zig cannot pass anything else than u16 with an error (#2647), I will pass a `err_ctx: *Err`, to the callers, where they can, besides returning an error, augment it with auxiliary data. `Err` is a preallocated array that can add zero-byte-separated strings. For a concrete example, imagine such a call graph: update_user(User, *Err) error{InvalidInput}!<...> validate_user([]const u8, *Err) error{InvalidInput}!<...> Where `validate_user` would like, besides only the error, signal the invalid field. And `update_user`, besides the error, would signal the offending user id. We also don't want the low-level functions to know in which context they are operating to construct a meaningful error message: if validation fails, they append their "context" to the buffer. To translate/augment the Go example above: pub fn validate_user(err_ctx: *Err, user: User) error{InvalidInput}!void { const name = user.name; if (!ascii.isAlpha(name)) { err_ctx.print("name '{s}' must be ascii-letters only", .{name}); return error.InvalidInput; } <...> } // update_user validates each user and does something with it. pub fn update_user(err_ctx: *Err, user: User) error{InvalidInput}!void { // validate the user before updating it validate_user(user) catch { err_ctx.print("user id={d}", .{user.id}); return error.InvalidInput; }; <...> } Then the top-level function (in my case, CLI) will read the buffer backwards (splitting on `"\x00"`) and print: user id=123: name 'Žvangalas' must be ascii-letters only To read that buffer backwards, dear readers of this commit message, I need `mem.splitBackwards`.

* mem: refactor tests of split() - add a few cases for .rest() - use expectEqualSlices() * mem: add splitBackwards Over the last couple of weeks weeks I needed to iterate over a collection backwards at least twice. Do we want to have this in stdlib? If yes, click "Merge" and start using today! Free shipping and returns (before 1.0). Why is this useful? ------------------- I need this for building an error wrapper: errors are added in the wrapper from "lowest" level to "highest" level, and then printed in reverse order. Imagine `UpdateUsers` call, which needs to return `error.InvalidInput` and a wrappable error context. In Go we would add a context to the error when returning it: // if update_user fails, add context on which user we are operating if err := update_user(user); err != nil { return fmt.Errorf("user id=%d: %w", user.id, err) } Since Zig cannot pass anything else than u16 with an error (ziglang#2647), I will pass a `err_ctx: *Err`, to the callers, where they can, besides returning an error, augment it with auxiliary data. `Err` is a preallocated array that can add zero-byte-separated strings. For a concrete example, imagine such a call graph: update_user(User, *Err) error{InvalidInput}!<...> validate_user([]const u8, *Err) error{InvalidInput}!<...> Where `validate_user` would like, besides only the error, signal the invalid field. And `update_user`, besides the error, would signal the offending user id. We also don't want the low-level functions to know in which context they are operating to construct a meaningful error message: if validation fails, they append their "context" to the buffer. To translate/augment the Go example above: pub fn validate_user(err_ctx: *Err, user: User) error{InvalidInput}!void { const name = user.name; if (!ascii.isAlpha(name)) { err_ctx.print("name '{s}' must be ascii-letters only", .{name}); return error.InvalidInput; } <...> } // update_user validates each user and does something with it. pub fn update_user(err_ctx: *Err, user: User) error{InvalidInput}!void { // validate the user before updating it validate_user(user) catch { err_ctx.print("user id={d}", .{user.id}); return error.InvalidInput; }; <...> } Then the top-level function (in my case, CLI) will read the buffer backwards (splitting on `"\x00"`) and print: user id=123: name 'Žvangalas' must be ascii-letters only To read that buffer backwards, dear readers of this commit message, I need `mem.splitBackwards`.

motiejus force-pushed the split_rev branch 5 times, most recently from 0cff075 to 4bab160 Compare June 22, 2022 12:05

InKryption reviewed Jun 22, 2022

View reviewed changes

motiejus force-pushed the split_rev branch from 4bab160 to 7ef58b7 Compare June 22, 2022 12:17

Luukdegram requested changes Jun 22, 2022

View reviewed changes

lib/std/mem.zig Outdated Show resolved Hide resolved

lib/std/mem.zig Outdated Show resolved Hide resolved

motiejus changed the title ~~mem: add split_rev~~ mem: add splitRev Jun 22, 2022

motiejus force-pushed the split_rev branch from 7ef58b7 to c3a6888 Compare June 22, 2022 12:18

motiejus requested a review from Luukdegram June 22, 2022 12:22

Luukdegram approved these changes Jun 22, 2022

View reviewed changes

motiejus force-pushed the split_rev branch from c3a6888 to 4bd7406 Compare June 22, 2022 12:38

motiejus changed the title ~~mem: add splitRev~~ mem: add splitReversed Jun 22, 2022

motiejus force-pushed the split_rev branch from 4bd7406 to 91e95d0 Compare June 23, 2022 03:05

motiejus changed the title ~~mem: add splitReversed~~ mem: add splitBackwards Jun 23, 2022

mem: refactor tests of split()

3c8e391

- add a few cases for .rest() - use expectEqualSlices()

motiejus force-pushed the split_rev branch from 91e95d0 to fc7cf60 Compare June 26, 2022 04:11

motiejus force-pushed the split_rev branch from fc7cf60 to a75b436 Compare June 26, 2022 18:10

jedisct1 merged commit 4a6b70f into ziglang:master Jun 29, 2022

motiejus deleted the split_rev branch November 1, 2022 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mem: add splitBackwards #11908

mem: add splitBackwards #11908

motiejus commented Jun 22, 2022 •

edited

Loading

InKryption Jun 22, 2022

motiejus Jun 22, 2022

motiejus commented Jun 22, 2022

Luukdegram commented Jun 22, 2022 •

edited

Loading

motiejus commented Jun 22, 2022 •

edited

Loading

motiejus commented Jun 23, 2022 •

edited

Loading

mem: add splitBackwards #11908

mem: add splitBackwards #11908

Conversation

motiejus commented Jun 22, 2022 • edited Loading

Why is this useful?

InKryption Jun 22, 2022

Choose a reason for hiding this comment

motiejus Jun 22, 2022

Choose a reason for hiding this comment

motiejus commented Jun 22, 2022

Luukdegram commented Jun 22, 2022 • edited Loading

motiejus commented Jun 22, 2022 • edited Loading

motiejus commented Jun 23, 2022 • edited Loading

motiejus commented Jun 22, 2022 •

edited

Loading

Luukdegram commented Jun 22, 2022 •

edited

Loading

motiejus commented Jun 22, 2022 •

edited

Loading

motiejus commented Jun 23, 2022 •

edited

Loading