-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std.ascii
: general improvements
#11629
Conversation
I really wish we could use
Why would it be slower? I think through the type system we can make this not be slower. pub fn lowerString(output: []u7, ascii_string: []const u7) []u7 {
std.debug.assert(output.len >= ascii_string.len);
for (ascii_string) |c, i| {
output[i] = toLower(c);
}
return output[0..ascii_string.len];
}
pub fn main() void {
var buf: [1024]u7 = undefined;
lowerString(&buf, "HELLO WORLD"); // The string will be coerced because all characters are < 128
} Any opinions? One advantage of using |
The main problem with using It is an open question whether/how we will change these semantics for the zig language spec, possibly enabling usage of Note that there's a similar issue regarding the standard library's usage of u21 for unicode codepoints. It works fine as long as one is only concerned with a single codepoint but if UTF-32 text is required to be handled then a slice of u32 must be used instead. |
I see, yes, then it is probably better to leave it as-is for now. Maybe it can be changed at some point. |
I think you should split this into separate pull requests. One for API changes and one for internal lookup table improvements. While I am in favor of expanding abbreviated names, I think removing functions deserves a more focused pull request. The existing ones are related to C's character classification, and removing them should not be taken lightly. |
Would it be reasonable to do |
I would like to get a word from a team member here if that's a requirement. I personally don't agree as this PR isn't necessarily what the final API will look like anyway. People can always follow up with PRs and issues to discuss and improve it further.
If you want you can follow up with a PR and do that. Generally |
It's true the API is not set in stone yet, but C character class support was a deliberate addition, and I think more consideration should be given before starting to undo that. Meanwhile, the |
Breaking up changes into smaller patches makes it far easier to review and merge your changes. In particular, keeping implementation detail improvements separate from API bikeshedding avoids the former never being merged due to disagreement about the API or hesitation due to the API breakage.
Ideally, we should always shoot for an API that we would be OK with maintaining indefinitely. Why would we merge these breaking changes now if we'll need to break everything again down the line? |
I already spent a lot of time on this PR. How about I reorganize the commits and split the table changes, the API changes, and whatever else in different commits? I agree that commit 05d8422 isn't very organized. Would that be fine? |
Or should I instead open an issue about the API changes first? Then we can discuss everything there and once we agreed on everything we can move on with this. |
Apologies for bikeshedding but this one feels important to me. I much prefer |
I don't have a big opinion on that but I think I might also prefer Anyway, this is out of my hands now because apparently this PR was unreviewable so I removed all API changes. Could you take a look again? |
|
I think people could easily be misled by the outdated description and the "breaking" label on this PR. |
std.ascii
: overhaul and simplificationsstd.ascii
: general improvements
How are you determining that it's faster? Counting the instructions generated by the compiler? The version using a table is branchless while the other version uses a branch. I would expect the table to be faster, or at least more consistent, in benchmarks. |
const std = @import("std");
const debug = std.debug;
const time = std.time;
const ascii = @import("ascii.zig");
pub fn main() !void {
var before = time.nanoTimestamp();
const file = try std.fs.openFileAbsolute("path_to_file_with_byte", .{});
const c = try file.reader().readByte();
var index: usize = 0;
while (index < 10000) : (index += 1)
std.mem.doNotOptimizeAway(ascii.isDigit(c));
debug.print("{}\n", .{time.nanoTimestamp() - before});
before = time.nanoTimestamp();
try file.seekTo(0);
const c2 = try file.reader().readByte();
index = 0;
while (index < 10000) : (index += 1)
std.mem.doNotOptimizeAway(ascii.isDigitTable(c2));
debug.print("{}\n", .{time.nanoTimestamp() - before});
}
Wow, I think you're right! Good catch. |
@ifreund Now while trying to fix this I encounter this issue: const Index = enum(u3) {
alphabetic,
hexadecimal,
space,
digit,
lower,
upper,
punct,
control,
alphanumeric, We have too many variants. With |
We might even want to allow the user to choose what to include in the table, so we would provide some kind of customization for their specific usage. |
I may be mistaken, but having separate tables per function might have that desired effect. Unused functions and tables would be eliminated from the final executable, and no need for explicit configuration. Example: const fooTable: [_]u1 = { ... };
pub fn isFoo(c: u8) { ... }
const barTable: [_]u1 = { ... };
pub fn isBar(c: u8) { ... }
const bazTable: [_]u1 = { ... };
pub fn isBaz(c: u8) { ... } If user code only calls This is moving back towards what is already merged into the master branch, but the tables could still be built at comptime from the "naive" functions. That still seems like a clear improvement over the current hard-coded tables in master. |
You're right. Maybe we can just do something like that and don't actually have to tightly pack it like this. It would allow any function to use a fast lookup table. I would like to see how well that works but I won't do it in this PR. @ifreund does it look mergeable? |
One downside of |
I just remembered about #8419 and that we wanted to remove |
Small drive-by correction: Both versions are actually branchless. The table-free version generates a Given that the lookup table is dependent on cache performance, I think the table-free version is preferable. |
After some discussion @topolarity and I came to the conclusion that it's probably better to remove the LUT (Look-Up Table) altogether because relying on the cache like that is not that good and some benchmarks from above were definitely wrong and actually the naive version is faster. For now I started with #12448 to supersede this PR and get the renamings and deprecations in first and then removing the LUT can come later. |
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on cache performance results in inconsistent performance. In many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
This PR:
comptime
feature by easily generating all the tables at compile time! This is much easier to read and to maintain. I mean the tables looked cool but I had no idea what they did or how they worked because there wasn't even any documentation. Now anyone should be able to add new things to it with a lot less effort. There are now naive functions which do what the exposed public equivalents do. They're used to generate those tables and they're used if the table isn't available: I thought why not make use of those naive functions and use them in place of the table implementation if we're inReleaseSmall
mode? It saves some 256 bytes.isAlNum
have its own dedicated table entry and removesisGraph
's table entry. I found this to be faster by one or a few instructions.