Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Streamline loops, and enhance iteration #3110

Closed
Tetralux opened this issue Aug 21, 2019 · 18 comments
Closed

proposal: Streamline loops, and enhance iteration #3110

Tetralux opened this issue Aug 21, 2019 · 18 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@Tetralux
Copy link
Contributor

Tetralux commented Aug 21, 2019

There's probably more I could flesh out about this proposal but I'd like to get it out there, so here goes:

Synopsis

for loops are only really useful for slices, and even for them, it's better to use a while because you may not be iterating a slice in the future, as your program matures.

However, while loops have some shortcomings over for loops, being more verbose, and no easier to understand.

For example, a std.ArrayList called list:

// this is what the iterator of std.ArrayList does:
var i: usize = 0;
while (i < list.len) : (i += 1) {
    const elem = list.at(i);
    // do something with each element
}

// ----- versus -----

// this doesn't work currently.
for (list) |elem| {
    // do something with elem
}

Currently, in order to make a struct iterable, convention is that you have an .iterator() method that returns an instance of a iterator struct for that type, and you then call .next() on that iterator until it returns null.

Again, for a std.ArrayList called list:

var itr = list.iterator();
while (itr.next()) |elem| {
    // do something with elem
}

This approach is arguably a tad better than the while example above; more terse, pretty clear.

However it is a little awkward because, it's different from how you iterate slices with for--the only type for can iterate--and the author of the custom struct (here std.ArrayList) ends up writing dozens of lines of code just to make the iterator struct work as it should... when all you wanted to do was write the code necessary for iterating this collection and reuse that code whenever you want to iterate one of them.

Here is that code for std.ArrayList:

pub const Iterator = struct {
    list: *const Self,
    // how many items have we returned
    count: usize,

    pub fn next(it: *Iterator) ?T {
        if (it.count >= it.list.len) return null;
        const val = it.list.at(it.count);
        it.count += 1;
        return val;
    }

    pub fn reset(it: *Iterator) void {
        it.count = 0;
    }
};

pub fn iterator(self: *const Self) Iterator {
    return Iterator{
        .list = self,
        .count = 0,
    };
}

For std.ArrayList, it's not that bad; a total of 23 lines.
However, this code looks very different from the code you actually need to write in order to iterate this list - it's from the first example - here it is again:

var i: usize = 0;
while (i < list.len) : (i += 1) {
    const elem = list.at(i);
    // do something with each element
}

That's pretty simple, right? It's only 5 lines!

I'd argue that having to use a custom struct just to iterate your actually-important custom struct is something of a waste of your time - it shouldn't be that hard.
It should not require that many lines more code to write it, ideally.
And again, this is just an ArrayList - the custom iterator code for a BucketArray is worse, and harder to follow than this is.

This proposal fixes both of these problems in one swoop.

Basics

It makes it so that you can just for (iterable) |elem| { ... } to iterate over any custom struct that declares how you iterate it.

Iterating would then look like this:

// array_list.zig ----

pub fn iterate(self: *Self, f: fn(e: ElementType) void) void {
    var i: usize = 0;
    while (i < self.len) : (i += 1) {
        @inlineCall(f, self.at(i));
    }
}

// main.zig ----

const std = @import("std");

pub fn main() !void {
    var a = std.debug.global_allocator;
    var list = std.ArrayList(i32).init(a);
    try list.append(1);
    try list.append(2);
    try list.append(4);
    var x: i32 = 0;
    for (list) |e| {
        x += e;
        std.debug.warn("{}\n", e); // prints each element
    }
    std.debug.warn("x is {}\n", x); // prints 7 (1+2+4)
}

Behavior-wise, this setup pastes the code from the body of iterate into main, replacing the for loop, and so, local variables in main are available in the loop.
The f function parameter to iterate is how you represent the body of the users' for loop.

No function calls, either to iterate or f, actually happens.

This is actually the same amount of magic as currently happens with for loops,
the only difference is that you are able to specify what to do for an abitrary custom structure,
rather than it only working for slices.

Also, it's as if you just wrote this:

pub fn main() !void {
    var a = std.debug.global_allocator;
    var list = std.ArrayList(i32).init(a);
    try list.append(1);
    try list.append(2);
    try list.append(4);
    var x: i32 = 0;
    {
        var i: usize = 0;
        while (i < list.len) : (i += 1) {
            const e = list.at(i);
            x += e;
            std.debug.warn("{}\n", e); // prints each element
        }
    }   
    std.debug.warn("x is {}\n", x); // prints 7 (1+2+4)
}

Notice how close the loop is to the body of iterate.

Failure

Futher, sometimes, though rarely, the iteration can fail.
For example, command line arguments allocate each arg as you ask for them:

zig/std/process.zig

Lines 276 to 278 in ec7d7a5

fn internalNext(self: *ArgIteratorWindows, allocator: *Allocator) NextError![]u8 {
var buf = try Buffer.initSize(allocator, 0);
defer buf.deinit();

zig/std/process.zig

Lines 223 to 234 in ec7d7a5

pub fn next(self: *ArgIteratorWindows, allocator: *Allocator) ?(NextError![]u8) {
// march forward over whitespace
while (true) : (self.index += 1) {
const byte = self.cmd_line[self.index];
switch (byte) {
0 => return null,
' ', '\t' => continue,
else => break,
}
}
return self.internalNext(allocator);

Currently, you can use while on an error union and have an else branch to handle the error.
This code will repeatedly evaluate the expression until it's error.
It will then run the else branch:

while (getAnswerOrErrorIfAtTheEnd()) |answer| {
    // do something with the answer
} else |err| {
    // do something with the error.
}

The same applies here.

If the iteration results in an error, you'd be forced by the compiler to provide an else branch, and that would look like this:

var args = std.process.args(allocator);
for (args) |arg| {
    // do something with the argument
} else |err| {
    // do something with the error
}

Notice any similaraties? 😝

To allow for this case, the implementation of iterate for the custom struct could return either void if it always suceeds, or !void if it can fail.

Element removal

There is an open question with this approach.

How would you remove an item from the iterable, while you are iterating over it.
If you don't think you'd need to, see std.ArrayList.swapRemove. Trust me, it's useful. 😁

A little while ago, I made a PR which added .swapRemove and .orderedRemove to the custom iterator type of std.ArrayList.

If you skip needing to have an iterator, how would you access that functionality in a simple way without having to concern yourself over how you actually do that for the data structure in question, or figuring out how to even do it at all, as applicable.

Jai solves this problem by having a remove keyword which you provide the value to.
The Zig translation of that is this:

for (list) |elem| {
    if (elem < 2) continue;
    remove elem;
}

I'm not sure of the best way to do this, though this is one way I thought of:

pub fn iterate(self: *Self, f: fn(e: ElementType) void) void {
    var i: usize = 0;
    while (i < self.len) : (i += 1) {
        @inlineCall(f, self.at(i));
        // what types of removal you do is comptime-known and 
        // you can emit a compile error if someone tried to orderedRemove
        // from a structure that cannot do that, for instance.
        switch (@removalType()) {
            .Fast => {
                _ = self.swapRemove(i);
                i -= 1;
                // incremented at end of loop; so decrement it here.
                // the next item appears in the old items position after the remove,
                // so use the same index again.
            },
            .KeepOrder => {
                 _ = self.orderedRemove(i);
                 i -= 1;
            },
            .None => continue,
        }
    }
}

const std = @import("std");
const warn = std.debug.warn;

pub fn main() !void {
    var a = std.debug.global_allocator;
    var list = std.ArrayList(i32).init(a);
    try list.append(1);
    try list.append(2);
    try list.append(4);
    for (list) |e| {
        if (e == 2) {
            remove e;
            // remove_ordered e;
        }
        std.debug.warn("{}\n", e); // prints each element
    }
    assert(list.len == 2);
    assert(list.at(0) == 1);
    assert(list.at(1) == 4); // because swapRemove.
}
@CurtisFenner
Copy link

I like this proposal a lot -- sometimes I feel like it's easier to allocate a slice and copy into it to for over than set up an iterator :)

This strikes me as very similar to how Ruby approaches iteration, with blocks. Ruby's blocks are slightly more general than what you propose here, since in addition to the iterating function being able to send values to the block, the block can send values back to the iterating function. That is perhaps another possible solution to the removal problem (though, since under this proposal each container type can only have a single iterator, you would most likely need to introduce some "default return" so that loops which don't ever remove elements, which is almost all of them, wouldn't have noisy return/yield statements saying so)

This seems to depend on #229, since the fn argument is necessarily a closure. That also means the argument doesn't seem to be a regular value (in fact, it seems like the only thing it makes sense to do with it is @inlineCall it, right?). Perhaps this is not truly a function-as-a-parameter, but a distinct "block" concept, like Ruby (I don't actually know too much about Ruby, so maybe someone else can correct me or explain why learning from Ruby in this specific place is a bad approach for Zig).

@andersfr
Copy link
Contributor

The use of function pointers and compiler-inlining-voodoo is awkward and reduces the applicability of iterators. They have combinatorial properties, e.g. iterator zipping, and they may be passed as arguments to other functions. The underlying struct is there to provide the necessary value-semantics for the rest to work.

The pattern of using iterators can be improved simply by allowing a for to take a struct as argument and having the compiler emit the .next() call by convention.

// iterator by convention (struct that supports fn next(self: *Self) ?Element)
for(some_struct.iterator()) |e| {}

var zip = ZipIterator(struct1.Iterator, struct2.Iterator){
    struct1.iterator(),
    struct2.iterator(),
};
for(zip) |dual| {
    // parallel iteration
}

var it = some_struct.iterator();
find_first_of(&it, condition);
while(it.next()) |e| {
     // do something with rest
}

@kyle-github
Copy link

This is an area where some sort of standard trait/interface would be sufficient. That is how Rust does this and it works quite well. If you had traits, and a standard Iterator trait then any type that implemented that trait could be iterated in a for loop. for would just be sugar.

@rohlem
Copy link
Contributor

rohlem commented Aug 22, 2019

I'm not yet convinced by the line savings this proposal seems to imply. Also rewriting between for- and while-loops doesn't sound like that much of an inconvenience to me, although of course large codebases can always benefit from standardized interfaces.
To actually add something meaningful to the discussion, here's a neat alternative using an async method, no Iterator struct required, status-quo:

// add a variation on the it's-only-5-lines-iteration
// as a container method (here for std.ArrayList)
        pub fn iterate(self: *Self, e_ptr: *?*T) void {
            //comparable to current Iterator.next, async frame used as iterator state
            var i: usize = 0;
            while(i < self.len) : (i += 1) {
                e_ptr.* = &self.items[i];
                suspend;
            }
            e_ptr.* = null;
            return;
        }
//... usage in code (omitting construction, filling with values etc.):
{
    var e_ptr: ?*T = null; //"client-side" iteration state, because we have no multi-return functions aka generators (yet)
    var iteration = async list.iterate(&e_ptr);
    while(e_ptr) |ep| : (resume iteration) {
        // process elements, f.e.: @import("std").debug.warn("{}\n", ep.*);
    }
}

Of course this doesn't handle removal of elements, or other container-specific more sophisticated use cases.
Also I'm not aware of the current situation on optimization/inlining of async function calls/resumes, although I assume an aggressive enough optimizer should have no problem with that.

@Dubhead
Copy link
Contributor

Dubhead commented Aug 22, 2019

std.ArrayList may not be a good example for this discussion, since it has toSlice().

for (list.toSlice()) |elem| {
    // do something with elem
}

@kyle-github
Copy link

kyle-github commented Aug 23, 2019

@rohlem ooh, nice! All you need to make any data structure iterable (is that a word?) is to write a generator for it.

I do not see the lack of delete-while-iterating as a show stopper. That would probably be data structure-dependent anyway.

@emekoi
Copy link
Contributor

emekoi commented Aug 23, 2019

what about this?

@raulgrell
Copy link
Contributor

I'm not sure where this was previously discussed, but here are some of the points that were raised:

Although the proposal suggests a way to compile away the function calls, it still counts as hidden control flow. It might not have performance issues, but the reader of the code still has to jump around to figure out what the iterator does.

One of the nice things about the status quo for is that I know it operates in linear, contiguous memory. It's a simple operation with much fewer reasons to fail than anything involving an iterator - the performance benefits are secondary. Anything more complicated than traversing a slice is done explicitly with a while.

Status quo for signals to the reader that what is in the parentheses happens once. With while, it happens several times. I think this is an important cue.

As well as the function call, the stack space needed which should probably be explicit. It also ties the language to the standard library - not necessarily bad, lots of languages do it. But something to keep in mind.

My best attempt at reconciling these issues isn't really much better than using while and would also mean that for could accept any struct instance if we provided storage for it. It also means a data structure could expose many different kinds of iterators, though I don't know if that's useful (maybe const_iterators?)

pub fn test(list:  *std.ArrayList(i32)) !void {
    var it = for (list.iterator()) : (it.next()) |e| {
        std.debug.warn("{}\n", e);                      // Iterates until it.next() == null
    }
}

@rohlem that's pretty cool! It prompted me to consider the following, similar to the above:

pub fn test(list:  *std.ArrayList(i32)) !void {
    var frame = for (async list.iterate()) |e| : (resume frame) {
        std.debug.warn("{}\n", e); // prints each element
    }
}

The reason I'd make the order different is that above the e comes from the it.next() as opposed to something returned from suspending list.iterate(). I'm not really sure which is more consistent with the rest of the language. In the second case, it means we can restrict what for can take to slices, arrays and call frames instead of arbitrary structs. Have there been more thoughts about generators since the async rewrite?

That said, I'm very happy with status quo.

@andersfr
Copy link
Contributor

Async is not a facility for implementing generators. That is both convoluted and inefficient. There are proposals for supporting generators but they're not part of the language yet.

A for loop is just syntactic sugar for a while loop over a slice. Establishing a good convention for iterators and ranges could allow extending the sugar to these concepts without altering syntax.

The concept of while-loops over optionals is good for unbounded or unknown-bound iterators. When bounds are known, e.g. for slices, it is better to loop without optionals as the result is trivially known from the bound. That is basically what is achieved by the sugar without exploiting the property in a more general sense.

Implementing removals during iteration requires guarantees on iterator stability which is not generally feasible to provide.

@Tetralux
Copy link
Contributor Author

@CurtisFenner
Not a closure.
Just a local scope with some obvious hygiene rules.

@Tetralux
Copy link
Contributor Author

Tetralux commented Aug 25, 2019

I actually find the generator idea relatively interesting... provided it is as fast as not writing a generator..

One of the things I always hated about Python was how I wanted to use generators, but couldn't because they were so much slower than not using them.

If you did it that way:

for (list.each()) |e| {}
for (list.eachReverse()) |e| {}

But again, how do you signal you want it removed?
The best I've got is remove e in the for, and then checking for it in the generator after the yield returns; if the generator does not support that, then you @compileError.
.. and if the generator doesn't check for it, and you use remove, also compile error.

@andrewrk andrewrk added this to the 0.6.0 milestone Aug 27, 2019
@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Aug 27, 2019
@raulgrell
Copy link
Contributor

raulgrell commented Aug 29, 2019

Generators and async function's aren't fundamentally different. A generator is basically a resumable function that can return multiple times. @rohlem did that with an output parameter. Though I agree with @andersfr that we shouldn't rely on async for this.

@Tetralux what you propose is necessarily a closure unless we disallow the use of variables declared outside the body of the for. As soon as the for block becomes expressible as a function, the user is able to use it as a function, including saving it and calling it somewhere later. Which means it also has to capture part of its enclosing scope.

If we have closures (and/or anonymous functions), another option is to just pass it as a parameter to an iteration function. This also means that you can also do something like a removeIf(fn(e: T)) !bool for removals:

list.forEach(fn |e, i|{ print(i); });

// Will remove every third element - needs closures
var n = 3;
list.removeIf(fn |e|{ return e % n == 0; });

// Will remove every third element - no closures
list.removeIf(fn |e| { return e % 3 == 0; });

// Different removal strategies
list.swapRemoveIf(fn |e| { return e % 3 == 0; });
list.removeAfterCheckingAll(fn |e| { return e % 3 == 0; });

I particularly dislike the concept of a remove keyword.

@RUSshy Zig's syntax is pretty consistent throughout The |names, here| syntax is fairly established as the place where local names are created without explicit var/const/fn declarations.

Part of the fun of reinventing simple things is inventing things that are simple =)

@DutchGhost
Copy link

I dont think a remove keyword is needed. Removal of an item out of a collection should be handled by the collection trough an appropriate function the collection offers.

What if you have collections where you can't remove elements from? now you have a remove keyword that is wrong to use for a collection, how do you define what happens when it's used anyway?

@Tetralux
Copy link
Contributor Author

Tetralux commented Sep 1, 2019

@DutchGhost I thought I remarked about that already: if the collection does not support removal, then iterate can include a @compileError if you used remove.
The idea about having the collection provide functions for this; yes, it should. But using them during removal requires the iterator be aware you removed it in order to continuing iterate correctly.

@raulgrell The idea about it being a closure; no.
Perhaps to make this clearer, you could annotate the fn as inline:

f: fn(e: ElType) void = undefined;
fn iterate(self: *Self, f: inline fn(e: ElType) void) {
    self.f = f; // compile error: function 'f' must be inlinable, and therefore has no storage
}

@andrewrk
Copy link
Member

I realize this is controversial but I'm a big fan of status quo iteration. What I like about it is that there is no hidden control flow. You don't have to know the type of anything to understand how iteration works. If anything, I'm tempted to delete for loops from the language.

mguaypaq added a commit to mguaypaq/aoc2020 that referenced this issue Dec 3, 2020
Today I learned:

* There's a nice list of projects in Zig at [Awesome Zig], which may
  include useful examples of code and useful libraries.

* `for` loops are only for slices/arrays, not custom iterators, but
  `while` loops are [quite featureful]! In fact it seems like the
  language creator has been tempted to [delete for loops] from the
  language and only have while loops.

* The language reference says that "Zig programmers must always be
  able to answer the question: [Where are the bytes]?" and this is
  why you have to choose an allocator, but even after reading some
  of the source code I really don't know where the 4KB buffer for
  bufferedReader lives.

* I can read a file after all! Thanks to chapter 2 of ziglearn.org.

[Awesome Zig]: <https://github.com/nrdmn/awesome-zig>
[quite featureful]: <https://ziglang.org/documentation/0.7.0/#while>
[delete for loops]: <ziglang/zig#3110 (comment)>
[Where are the bytes]: <https://ziglang.org/documentation/0.7.0/#Memory>

For the solution, I got frustrated with ArrayList and the helper
functions in std.io.Reader and std.fmt, so I decided to just do a
finite state machine which reads one character at a time. I think
the result is not so bad, but I have to admit that I'm surprised it
gave the correct answer on the first run, because it would have been
extremely easy to forget updating the parsing_state or some other
trivial error in the state machine.

$ zig build run
643

On to part 2, and let's hope it doesn't need much more state.
@akvadrako
Copy link
Contributor

If #1717 is implemented, another way to implement a for-loop would be using function expressions:

list.each(.{
    fn (self: anytype, i: i64) void {
        std.log.info("arg={}, item={}", .{ self.@"1", i });
    },
    123,
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

12 participants