Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mem: add splitBackwards #11908

Merged
merged 2 commits into from
Jun 29, 2022
Merged

mem: add splitBackwards #11908

merged 2 commits into from
Jun 29, 2022

Conversation

motiejus
Copy link
Contributor

@motiejus motiejus commented Jun 22, 2022

Over the last couple of weeks weeks I needed to iterate over a
collection backwards at least twice. Do we want to have this in stdlib?
If yes, click "Merge" and start using today! Free shipping and returns
(before 1.0).

Why is this useful?

I need this for building an error wrapper: errors are added in the
wrapper from "lowest" level to "highest" level, and then printed in
reverse order. Imagine UpdateUsers call, which needs to return
error.InvalidInput and a wrappable error context. In Go we would add a
context to the error when returning it:

// if update_user fails, add context on which user we are operating
if err := update_user(user); err != nil {
    return fmt.Errorf("user id=%d: %w", user.id, err)
}

Since Zig cannot pass anything else than u16 with an error (#2647), I
will pass a err_ctx: *Err, to the callers, where they can, besides
returning an error, augment it with auxiliary data. Err is a
preallocated array that can add zero-byte-separated strings. For a
concrete example, imagine such a call graph:

update_user(User, *Err) error{InvalidInput}!<...>
  validate_user([]const u8, *Err) error{InvalidInput}!<...>

Where validate_user would like, besides only the error, signal the
invalid field. And update_user, besides the error, would signal the
offending user id.

We also don't want the low-level functions to know in which context they
are operating to construct a meaningful error message: if validation
fails, they append their "context" to the buffer. To translate/augment
the Go example above:

pub fn validate_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
    const name = user.name;
    if (!ascii.isAlpha(user.name)) {
        err_ctx.print("name '{s}' must be ascii-letters only", .{user.name});
        return error.InvalidInput;
    }
    <...>
}

// update_user validates each user and does something with it.
pub fn update_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
    // validate the user before updating it
    validate_user(user) catch {
        err_ctx.print("user id={d}", .{user.id});
        return error.InvalidInput;
    };
    <...>
}

Then the top-level function (in my case, CLI) will read the buffer
backwards (splitting on "\x00") and print:

user id=123: name 'Žvangalas' must be ascii-letters only

To read that buffer backwards, dear readers of this commit message, I
need mem.splitBackwards.

This PR has 2 commits:

  1. refactor split tests to use assertEqualSlices.
  2. refactor split tests to use rest() occasionally.
  3. add splitBackwards with similar tests to split.

@motiejus motiejus force-pushed the split_rev branch 5 times, most recently from 0cff075 to 4bab160 Compare June 22, 2022 12:05
lib/std/mem.zig Outdated
/// the iterator will return `buffer`, null, in that order.
/// The delimiter length must not be zero.
/// See also the related function `tokenize`.
pub fn split_rev(comptime T: type, buffer: []const T, delimiter: []const T) SplitIteratorRev(T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function names are expected to be camelCase: https://ziglang.org/documentation/master/#Style-Guide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what happens when you need to change the programming language more than once a day.

lib/std/mem.zig Outdated Show resolved Hide resolved
lib/std/mem.zig Outdated Show resolved Hide resolved
@motiejus motiejus changed the title mem: add split_rev mem: add splitRev Jun 22, 2022
@motiejus
Copy link
Contributor Author

Looks like github is having trouble with its cache: I renamed split_rev to splitRev in both title and description, and it is still not reflected in the UI.

@motiejus motiejus requested a review from Luukdegram June 22, 2022 12:22
@Luukdegram
Copy link
Sponsor Member

Luukdegram commented Jun 22, 2022

I don't want to get into bikeshedding territory but I wonder if it should be named splitReversed fully written out, similarly to how we have copyBackwards.

@motiejus
Copy link
Contributor Author

motiejus commented Jun 22, 2022

I don't want to get into bikeshedding territory but I wonder if it should be named splitReversed fully written out, similarly to how we have copyBackwards.

I like the color of your bike shed better; updating to splitReversed.

@motiejus motiejus changed the title mem: add splitRev mem: add splitReversed Jun 22, 2022
@motiejus motiejus changed the title mem: add splitReversed mem: add splitBackwards Jun 23, 2022
@motiejus
Copy link
Contributor Author

motiejus commented Jun 23, 2022

I don't want to get into bikeshedding territory but I wonder if it should be named splitReversed fully written out, similarly to how we have copyBackwards.

Repainted again, so splitBackwards is now consistent with copyBackwards.

- add a few cases for .rest()
- use expectEqualSlices()
Over the last couple of weeks weeks I needed to iterate over a
collection backwards at least twice. Do we want to have this in stdlib?
If yes, click "Merge" and start using today! Free shipping and returns
(before 1.0).

Why is this useful?
-------------------

I need this for building an error wrapper: errors are added in the
wrapper from "lowest" level to "highest" level, and then printed in
reverse order. Imagine `UpdateUsers` call, which needs to return
`error.InvalidInput` and a wrappable error context. In Go we would add a
context to the error when returning it:

    // if update_user fails, add context on which user we are operating
    if err := update_user(user); err != nil {
        return fmt.Errorf("user id=%d: %w", user.id, err)
    }

Since Zig cannot pass anything else than u16 with an error (ziglang#2647), I
will pass a `err_ctx: *Err`, to the callers, where they can, besides
returning an error, augment it with auxiliary data. `Err` is a
preallocated array that can add zero-byte-separated strings. For a
concrete example, imagine such a call graph:

    update_user(User, *Err) error{InvalidInput}!<...>
      validate_user([]const u8, *Err) error{InvalidInput}!<...>

Where `validate_user` would like, besides only the error, signal the
invalid field. And `update_user`, besides the error, would signal the
offending user id.

We also don't want the low-level functions to know in which context they
are operating to construct a meaningful error message: if validation
fails, they append their "context" to the buffer. To translate/augment
the Go example above:

    pub fn validate_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
        const name = user.name;
        if (!ascii.isAlpha(name)) {
            err_ctx.print("name '{s}' must be ascii-letters only", .{name});
            return error.InvalidInput;
        }
        <...>
    }

    // update_user validates each user and does something with it.
    pub fn update_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
        // validate the user before updating it
        validate_user(user) catch {
            err_ctx.print("user id={d}", .{user.id});
            return error.InvalidInput;
        };
        <...>
    }

Then the top-level function (in my case, CLI) will read the buffer
backwards (splitting on `"\x00"`) and print:

    user id=123: name 'Žvangalas' must be ascii-letters only

To read that buffer backwards, dear readers of this commit message, I
need `mem.splitBackwards`.
@jedisct1 jedisct1 merged commit 4a6b70f into ziglang:master Jun 29, 2022
andrewrk pushed a commit that referenced this pull request Jul 19, 2022
* mem: refactor tests of split()

- add a few cases for .rest()
- use expectEqualSlices()

* mem: add splitBackwards

Over the last couple of weeks weeks I needed to iterate over a
collection backwards at least twice. Do we want to have this in stdlib?
If yes, click "Merge" and start using today! Free shipping and returns
(before 1.0).

Why is this useful?
-------------------

I need this for building an error wrapper: errors are added in the
wrapper from "lowest" level to "highest" level, and then printed in
reverse order. Imagine `UpdateUsers` call, which needs to return
`error.InvalidInput` and a wrappable error context. In Go we would add a
context to the error when returning it:

    // if update_user fails, add context on which user we are operating
    if err := update_user(user); err != nil {
        return fmt.Errorf("user id=%d: %w", user.id, err)
    }

Since Zig cannot pass anything else than u16 with an error (#2647), I
will pass a `err_ctx: *Err`, to the callers, where they can, besides
returning an error, augment it with auxiliary data. `Err` is a
preallocated array that can add zero-byte-separated strings. For a
concrete example, imagine such a call graph:

    update_user(User, *Err) error{InvalidInput}!<...>
      validate_user([]const u8, *Err) error{InvalidInput}!<...>

Where `validate_user` would like, besides only the error, signal the
invalid field. And `update_user`, besides the error, would signal the
offending user id.

We also don't want the low-level functions to know in which context they
are operating to construct a meaningful error message: if validation
fails, they append their "context" to the buffer. To translate/augment
the Go example above:

    pub fn validate_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
        const name = user.name;
        if (!ascii.isAlpha(name)) {
            err_ctx.print("name '{s}' must be ascii-letters only", .{name});
            return error.InvalidInput;
        }
        <...>
    }

    // update_user validates each user and does something with it.
    pub fn update_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
        // validate the user before updating it
        validate_user(user) catch {
            err_ctx.print("user id={d}", .{user.id});
            return error.InvalidInput;
        };
        <...>
    }

Then the top-level function (in my case, CLI) will read the buffer
backwards (splitting on `"\x00"`) and print:

    user id=123: name 'Žvangalas' must be ascii-letters only

To read that buffer backwards, dear readers of this commit message, I
need `mem.splitBackwards`.
wooster0 pushed a commit to wooster0/zig that referenced this pull request Jul 24, 2022
* mem: refactor tests of split()

- add a few cases for .rest()
- use expectEqualSlices()

* mem: add splitBackwards

Over the last couple of weeks weeks I needed to iterate over a
collection backwards at least twice. Do we want to have this in stdlib?
If yes, click "Merge" and start using today! Free shipping and returns
(before 1.0).

Why is this useful?
-------------------

I need this for building an error wrapper: errors are added in the
wrapper from "lowest" level to "highest" level, and then printed in
reverse order. Imagine `UpdateUsers` call, which needs to return
`error.InvalidInput` and a wrappable error context. In Go we would add a
context to the error when returning it:

    // if update_user fails, add context on which user we are operating
    if err := update_user(user); err != nil {
        return fmt.Errorf("user id=%d: %w", user.id, err)
    }

Since Zig cannot pass anything else than u16 with an error (ziglang#2647), I
will pass a `err_ctx: *Err`, to the callers, where they can, besides
returning an error, augment it with auxiliary data. `Err` is a
preallocated array that can add zero-byte-separated strings. For a
concrete example, imagine such a call graph:

    update_user(User, *Err) error{InvalidInput}!<...>
      validate_user([]const u8, *Err) error{InvalidInput}!<...>

Where `validate_user` would like, besides only the error, signal the
invalid field. And `update_user`, besides the error, would signal the
offending user id.

We also don't want the low-level functions to know in which context they
are operating to construct a meaningful error message: if validation
fails, they append their "context" to the buffer. To translate/augment
the Go example above:

    pub fn validate_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
        const name = user.name;
        if (!ascii.isAlpha(name)) {
            err_ctx.print("name '{s}' must be ascii-letters only", .{name});
            return error.InvalidInput;
        }
        <...>
    }

    // update_user validates each user and does something with it.
    pub fn update_user(err_ctx: *Err, user: User) error{InvalidInput}!void {
        // validate the user before updating it
        validate_user(user) catch {
            err_ctx.print("user id={d}", .{user.id});
            return error.InvalidInput;
        };
        <...>
    }

Then the top-level function (in my case, CLI) will read the buffer
backwards (splitting on `"\x00"`) and print:

    user id=123: name 'Žvangalas' must be ascii-letters only

To read that buffer backwards, dear readers of this commit message, I
need `mem.splitBackwards`.
@motiejus motiejus deleted the split_rev branch November 1, 2022 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants