Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External iterators: for language bindings #33

Open
JervenBolleman opened this issue Oct 29, 2019 · 12 comments
Open

External iterators: for language bindings #33

JervenBolleman opened this issue Oct 29, 2019 · 12 comments

Comments

@JervenBolleman
Copy link

The current handlegraph api, uses the internal iterator approach (e.g. for_each_edge etc.) This is very inconvenient when trying to use this code from python. The only way to move data from inside the iteratee to the outside is by running the iteration in a different thread than the major python code and pass out the required information using a shared blocking queue. As you can imagine this is a pain to program.

The python approach, like modern java is to use generators/streams instead. This works well in these languages because memory ownership is well defined. For C++ I imagine this approach was taken to make it easier to keep track of who owns what memory. Unfortunately, the push approach inherent in this solution is a pain to use from languages which like to use a pull model.

Could we have standard C++ iterators for each iteratee method?

@subwaystation
Copy link
Member

subwaystation commented Oct 29, 2019

+1

@JervenBolleman
Copy link
Author

e.g. in SpOdgi we do the following

def handles(self):
        nodeId = self.odgi.min_node_id()
        
        maxNodeId = self.odgi.max_node_id()
        while (nodeId <= maxNodeId):
            if(self.odgi.has_node(nodeId)):
                nodeId=nodeId+1 
                yield self.odgi.get_handle(nodeId-1)

Which is the only way to iterate over all nodes in lazy way. But less than ideal performance wise.

@ekg
Copy link
Member

ekg commented Nov 21, 2019 via email

@JervenBolleman
Copy link
Author

JervenBolleman commented Nov 21, 2019

@ekg no because you can't jump yield from within the lamda generator style, at least not in python.

So you end up having to either collect all the nodes so you can iterate over them python style at humongous memory costs. Or one has a silly iterator implementation like above.

@JervenBolleman
Copy link
Author

JervenBolleman commented Jun 23, 2022

So I came by today to open this same issue again :) it would be really nice to have std::iterator for edges,handles and paths.
I think these iterators can be basic forward_only types

@jeizenga
Copy link
Contributor

Would it be sufficient if we implemented something like the scan_path method for edges?

https://github.com/vgteam/libhandlegraph/blob/master/src/include/handlegraph/path_handle_graph.hpp#L161-L164

@JervenBolleman
Copy link
Author

JervenBolleman commented Jun 23, 2022 via email

@jeizenga
Copy link
Contributor

It's a tricky thing, because we haven't exposed an object that has knowledge about where it is in the container the way an iterator does (which the exception of step_handle_t). One easy option would be to copy the handles into a vector and then iterate over that. I think the biggest downside is that it would be pretty memory inefficient for any for-each loop that looped over a sizeable fraction of the graph (e.g. for_each_handle(), which does all nodes).

Thoughts?

@JervenBolleman
Copy link
Author

I know. Copying and then iterating is not an option beyond test cases. I really need something new here, sorry. I have one more horrid alternative and that is to spin up a thread and use a small shared queue to load items in a buffer and iterate over that. Still we can't use raii there and will need to free the copied objects.

@glennhickey
Copy link
Contributor

In the case of handles (and other stuff, probably), xg and PackedGraph store them in arrays. So a get_ith_handle() type method would be possible and efficienit (as would, by extension, supporting iterators that count i internally). I don't think you could get random access from HashGraph without some re-engineering, but I guess it could in theory expose its STL iterator (or some kind of wrapper for it).

@JervenBolleman
Copy link
Author

I just need a STL style iterator, no need for fast random access.

@jeizenga
Copy link
Contributor

I made a quick prototype of what a hack-y vector-based implementation could look like. Again, I have some concerns about efficiency for iterating over the entire graph, but this would probably work fine for traversing edges:

https://github.com/vgteam/libhandlegraph/blob/for-each/src/include/handlegraph/handle_graph.hpp#L256-L322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants