Skip to content

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Sep 24, 2025

  1. Introduce cut functions: cut, cutPrefix, cutSuffix, cutScalar, cutLast, cutLastScalar

  2. Moving towards our function naming convention of having one word per concept and constructing function names out of concatenated concepts.

In std.mem the concepts are:
* "find" - return index of substring
* "pos" - starting index parameter
* "last" - search from the end
* "linear" - simple for loop rather than fancy algo
* "scalar" - substring is a single element

@andrewrk andrewrk enabled auto-merge September 25, 2025 00:54
@matklad
Copy link
Contributor

matklad commented Sep 25, 2025

We have similar cut, cut_prefix, and cut_suffix functions in TigerBeetle, though not generic.

/// Splits the `haystack` around the first occurrence of `needle`, returning parts before and after.
///
/// This is a Zig version of Go's `string.Cut` / Rust's `str::split_once`. Cut turns out to be a
/// surprisingly versatile primitive for ad-hoc string processing. Often `std.mem.indexOf` and
/// `std.mem.split` can be replaced with a shorter and clearer code using  `cut`.
pub fn cut(haystack: []const u8, needle: []const u8) ?struct { []const u8, []const u8 } {
    const index = std.mem.indexOf(u8, haystack, needle) orelse return null;

    return .{ haystack[0..index], haystack[index + needle.len ..] };
}

pub fn cut_prefix(haystack: []const u8, needle: []const u8) ?[]const u8 {
    if (std.mem.startsWith(u8, haystack, needle)) {
        return haystack[needle.len..];
    }
    return null;
}

pub fn cut_suffix(haystack: []const u8, needle: []const u8) ?[]const u8 {
    if (std.mem.endsWith(u8, haystack, needle)) {
        return haystack[haystack.len - needle.len ..];
    }
    return null;
}

I have a strong opinion that cut in particular is just awesome, it's a swiss-army knife of add-hoc string processin. golang/go#46336 has a bunch of examples demonstrating Cut's utility. For the reference, rust-lang/rust#74707 added equivalent function to Rust (I now regret not coming up with the "cut" name at that time).

Specific suggestions:

  • Consider adding "chomp the middle away" variant, in addition to prefix/suffix
  • Consider using cut name, I think Go got it right!

src/main.zig Outdated
Comment on lines 1080 to 1082
var it = mem.splitScalar(u8, rest, '=');
const mod_name = it.first();
const root_src_orig = if (it.peek() != null) it.rest() else null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const mod_name, const root_src_orig = mem.cut(u8, rest, "=") orelse .{ rest, null };

would be one example of cut being useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would need to be cutScalar in order to match performance characteristics

fixes a bug in how -fstructured-cfg and -fno-structured-cfg are handled.
Moving towards our function naming convention of having one word per
concept and constructing function names out of concatenated concepts.
In `std.mem` the concepts are:
* "find" - return index of substring
* "pos" - starting index parameter
* "last" - search from the end
* "linear" - simple for loop rather than fancy algo
* "scalar" - substring is a single element
@andrewrk
Copy link
Member Author

  • switched to "cut" naming
  • renamed all "index of" names to "find"
  • added "cut" along with "scalar" and "last" variants

@andrewrk andrewrk added standard library This issue involves writing Zig code for the standard library. release notes This PR should be mentioned in the release notes. labels Sep 25, 2025
@andrewrk andrewrk changed the title std.mem: introduce chompPrefix and chompSuffix std.mem: introduce cut functions; rename "index of" to "find" Sep 25, 2025
@andrewrk
Copy link
Member Author

I now regret not coming up with the "cut" name at that time

tagging 1.0 will be a bittersweet moment, indeed

@andrewrk andrewrk merged commit 3b365a1 into master Sep 26, 2025
9 checks passed
@andrewrk andrewrk deleted the chomp branch September 26, 2025 08:45
@linux-user36
Copy link

What about the order of these "concepts" in the identifier? If I saw correctly, there's now

  • findLastAny and findLastNone ("last" comes first), but
  • findScalarLast ("last" comes last).

Since "any", "none", and "scalar" are all describing the same thing, namely the kind of needle the function takes, I would assume them all to be positioned the same relative to other "concepts". This would aid both discoverability as well as ease of memorizing and editing (i.e., if I notice I actually needed findLastAny instead of findScalarLast, I only need to change one term, and not also the order).

I can imagine both orders as sensible, namely:

  • The kind of needle could come first, because it's mandatory in comparison to the other "concepts" and changes the actual argument types, not just the behavior. (That could be a simple convention.)
  • last could come first, to aid greppability. When I look for findLast, I don't care whether the string format I'm implementing only uses one separator ("scalar") or multiple ("any").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants