Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using select_streams argument traverses file twice #60

Open
chkothe opened this issue Apr 8, 2020 · 3 comments
Open

Using select_streams argument traverses file twice #60

chkothe opened this issue Apr 8, 2020 · 3 comments

Comments

@chkothe
Copy link
Contributor

chkothe commented Apr 8, 2020

Currently, when one uses the select_streams argument, it will first traverse the file's chunks to find/parse the headers, and sets some flags used subsequently for a check during the load_xdf main loop, which traverses and parses the file again*. It does that using an alternative code path that mirrors the chunk traversal and header parsing of the main loop, but currently skips, among others, some error checking in case of file corruptions etc.

Side note: I do like the idea of breaking load_xdf up into smaller subroutines, and I like the generator approach of _read_chunks, i.e., there may be a possibility to refactor load_xdf in this overall style, especially if the logic were to get more complex over the years. However, as it stands, the current main loop is still pretty simple and easy to follow (esp when one reads it alongside the spec), and could continue to serve as an approachable reference implementation for future language ports for a couple more years. So I'm not ready to pick a side at this point, also considering the effort that a full & clean refactor would come down to.

For now, maybe a way to reconcile the code duplication (which I hope is temporary) and double-traversal of the file could be to add a self.skip (as in, skip processing chunks of this stream) bool in the StreamData constructor, and we could move the matching logic there or into a helper method/function. This way, it would run the first time a header is encountered, and then whenever one sees a chunk of that stream, one can, roughly, do an if streams[StreamId].skip: continue near the place where it currently does that check. We could then earmark the remainder of that alternative code path (parse_chunks, _read_chunks, ...) for future consideration when or if we take on a refactor of load_xdf in this general style (maybe with a git tag).

I'd be willing to implement the suggested change (using StreamData.skip) this week if there's no objection. I think this may also get us a closer to a future simple and fast load-only-headers option for load_xdf (some time soon I'm hoping to have a separate discussion on that).

(*): The double-traversal may not sound like much, it'd be relatively more costly on a network file system.

@cbrnr
Copy link
Contributor

cbrnr commented Apr 9, 2020

I agree that duplicating chunk reading is not ideal, but at that time it was the quickest solution, because I could use functions from an alternative XDF reader I'd written. Also, these functions were mainly intended for querying the XDF file (resolve_streams, match_streaminfos), which I think would be nice to keep.

If you want to put in the effort to de-duplicate chunk reading code paths this would be great, provided that there are no regressions in functionality (i.e. keeping the two functions I've mentioned before).

@cboulay
Copy link
Contributor

cboulay commented Dec 29, 2020

@cbrnr - I'm doing a little cleanup of pyxdf, deduplicating code, eliminating the double-traversal when using select_streams, adding some other shortcuts, etc.

Am I right in assuming that resolve_streams only needs stream headers? Indeed the functions it calls (parse_xdf, parse_chunks, _read_chunks) only operate on stream headers so the answer seems like an obvious "yes", but those functions' doc strings don't state this explicitly so I was just wondering if you were planning on building these out in the future?

In my deduplication effort, I'm reusing your _read_chunks method, but this means adding a lot of other functionality to it, only to disable that functionality with stream_headers_only=True when being called via parse_xdf (via resolve_streams). Is this compatible with your use-case?

@cbrnr
Copy link
Contributor

cbrnr commented Dec 30, 2020

I'll answer in #74.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants