Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recyclable Iterators #45

Closed
pjulien opened this issue Jun 17, 2016 · 20 comments
Closed

Recyclable Iterators #45

pjulien opened this issue Jun 17, 2016 · 20 comments

Comments

@pjulien
Copy link

pjulien commented Jun 17, 2016

I would like to introduce the idea that fastutil should support recyclable iterators. That is, when calling the iterator() method, the same iterator object should be returned but reset to the start.

When trying to avoid allocations, it's clear that escape analysis can remove these allocations, but that means nothing when trying to profile the application, all I can see is these iterators because escape analysis is turned off by the profiler so I just fallback to using elements() where I can.

As long as the collections are not being paired with a lock, it seems like this is pretty straightforward. For the synchronized collections, this might be complicated. Although a lock must be held for the duration of the iteration, the iterator itself might be retained beyond the scope of the lock, and then be used again when the lock is re-acquired

@mdekstrand
Copy link

Interesting idea.

I think this should not be the default behavior of iterator(); if desired, it should be on a separate method, like fastIterator() is; possibly recycledIterator().

Even in single-threaded applications, multiple iterators can be active over the same object, e.g. in nested loops processing pairs of items in a list or set. Quietly breaking such code by violating the iterator() contract would be a very bad thing.

@pjulien
Copy link
Author

pjulien commented Jun 17, 2016

I think this should not be the default behavior of iterator()

If you make another method, the standard for loop won't be able to call it. It might warrant another set of generated classes

@botismarius
Copy link

botismarius commented Jun 18, 2016 via email

@jonatino
Copy link

This feature would be awesome. But @mdekstrand is correct. It should not be the default behavior of iterator() due to the unforseen amount of multithreading errors

@guidomedina
Copy link

guidomedina commented Feb 15, 2017

You could add a new constructor which will either use the default iterator or recycled iterator by default with the warning that once such collection is created one behavior or the other applies for all iterators of such instance.

Edit: Memory wise is just one boolen per collection instance which will default to the default iterator behavior.

@guidomedina
Copy link

guidomedina commented Feb 15, 2017

Or another option is to generate recycled iterator variants of each collection if adding a boolean is so bad.

@vigna
Copy link
Owner

vigna commented Feb 15, 2017

I think a better option would be to have a reset() (or something) method that resets an iterator. Returning always the same object is too big a break from the current practice (and from the specification).

@guidomedina
Copy link

@vigna yes that's a much better option, basically you reuse the iterator rather than breaking the JCF specs.

@kno10
Copy link
Contributor

kno10 commented Jun 23, 2017

I have had this on my wishlist for some time, but I don't think this should be handled by the collection.

Instead, iterators should simply have a reset() function. I'd assume this is fairly easy to implement for all iterators on collections.

Then you could simply get the iterator once, and reset() it before each use. With multiple threads, you can get a separate iterator for each thread, or even handle an "iterator pool" of iterators if necessary (similar to how people have been doing database connection pools).

P.S. thanks for the upgrades to Java 8. I appreciate this, and I hope there isn't a performance regression because of this (the use of lambdas sometimes seems to cause unexpected performance regressions, but the hotspot VM also improves). I would appreciate a modularized fastutil, with a fastutil-mini that only has the most important parts, fastutil-extra that has the less common primitives, and fastutil-big that has all the big collections. A meta package "fastutil" depending on all three could provide backwards compatibility. In many cases, the -mini should be enough and reduce the footprint.

@vigna
Copy link
Owner

vigna commented Jun 24, 2017 via email

@kno10
Copy link
Contributor

kno10 commented Jun 24, 2017

Since fastutil already adds skip(int), is it really necessary to add a RewindableIterator interface? Why not just add rewind() to drv/Iterator.drv, with a default implementation throwing an exception like remove()?

@vigna
Copy link
Owner

vigna commented Jun 25, 2017

Well, an interface has the advantage that people can easily check for the feature. What you propose implies that to check one has to try and check the exception with a very complicated logic...

@kno10
Copy link
Contributor

kno10 commented Jun 25, 2017

To make it usable, we need the return value to be a RewindableIterator, and at the same time a primitive-specific iterator. So this would mean introducing RewindableFloatIterator etc.; and as far as I can tell we will probably not have a non-rewindable float iterator anywhere in fastutil.

@jheusser
Copy link

jheusser commented Jun 26, 2017

Just as an example of how it could look like, Agrona always returns the same iterator instance on its collections and calls reset() when getting the iterator (example here). This allows garbage-free iteration.

@vigna
Copy link
Owner

vigna commented Jun 26, 2017

That violates the Map contract.

@kno10
Copy link
Contributor

kno10 commented Jun 26, 2017

@jheusser I think that is a really bad idea.

What if I need a nested loop, to compute pairwise similarities? I need to be able to get two different iterators...

I like the idea of rewind(), but iterator recycling needs to be done by the user (you could try to use finalizer/reference magic, but that will likely be more expensive than new iterators!).

Having the collection rewind() an iterator that may still be in use will cause horrible side effects!

With the rewind() discussed above, one could do

IntIterator i1 = col.iterator(), i2 = col.iterator();
while(i1.hasNext()) {
  int first = i1.nextInt();
  i2.rewind();
  while(i2.hasNext()) {
    int second = i2.nextInt();
    // Do something on (frist, second)
  }
}

Note that in this example, I need the iterator to have both rewind() and nextInt().

@jheusser
Copy link

@kno10 @vigna just highlighting how another high-performance library does it. Clearly, Agrona is not meant as drop-in replacement for standard collections but for specific use cases.

I do like the explicit rewind() approach as well.

@vigna vigna closed this as completed Jan 14, 2021
@kno10
Copy link
Contributor

kno10 commented Jan 15, 2021

Why was this closed?
Is there an option to reuse/rewind an iterator now?

@vigna
Copy link
Owner

vigna commented Jan 15, 2021

Nope, but this was from 5 years ago and no one proposed a PR. I have to clean up issues once in a while or I lost track of what's urgent...

@mdproctor
Copy link

mdproctor commented Jan 25, 2023

+1 for explicit rewind() / reset()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants