-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect stream statistics and offer new hook to check in on them #1767
Conversation
733b5a4
to
3cbad6b
Compare
Pushed a new version that introduces a property |
We implement this by storing the current input stream inside the internal parse struct as a new field `__stream`. Using `stream()` then turns into an access to that field (i.e, `*(self.__stream)`). If that field never gets accessed, the optimizer will remove it, so there's no overhead if nobody ever uses `stream()` with a unit.
3cbad6b
to
eb76180
Compare
Ready for review now, now with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that self.stream()
does not do something crazy for &parse-from
(e.g., __stream
becoming invalid in a subunit); since the code is so simple not sure we necessarily need dedicated tests for that (but might still be nice).
This returns a struct of the following type, reflecting the input seen so far: ``` type StreamStatistics = struct { num_data_bytes: uint64; ## number of data bytes processed num_data_chunks: uint64; ## number of data chunks processed, excluding empty chunks num_gap_bytes: uint64; ## number of gap bytes processed num_gap_chunks: uint64; ## number of gap chunks processed, excluding empty chunks }; ```
This adds support for a new unit hook: ``` on %sync_advance(offset: uint64) { ... } ``` This hook is called regularly (see below) during error recovery when synchronization skips over data or gaps while searching for a valid synchronization point. It can be used to check in on the synchronization to, e.g., abort further processing if it just keeps failing. `offset` is the current position inside the input stream that synchronization just skipped to. By default, "called regularly" means that it's called every 4KB of input skipped over while searching for a synchronization point. That value can be changed by setting a unit property `%sync-advance-block-size = <number of bytes>`. As an additional minor tweak, this also changes the name of what used to be the `__gap__` profiler to now be called `__sync_advance` because it's profiling the time spent in skipping data, not just gaps. Implementation notes: 1. I considered reusing `on %gap()` instead of adding a new hook, but it turns out the semantics are different than what I think we want here: (1) it doesn't trigger for non-gap data (by definition), so couldn't check in on long intervals of searching through payload data; and (2) it's called when a gap added to the input, not when it's encountered. 2. The new `ASTInfo` inside the Spicy code generator might seem overkill for just the one field it currently contains. However, I'm planing to use this for other information in the future as well (specifically, tracking the need for `&on-heap`). Closes #3779.
e575b7c
to
fe832f2
Compare
This extends the
stream
type to track statistics about their data, and it addsnew
%sync_advance
hook that can be used to check statistics duringsynchronization. For more information, see the individual commits.
Closes #1768.