Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piecewiserefactor #12

Merged
merged 11 commits into from
Jan 11, 2024
Merged

Piecewiserefactor #12

merged 11 commits into from
Jan 11, 2024

Conversation

tsostarics
Copy link
Owner

This PR refactors the piecewise_interpolate_pulses function and its related functions (average_pitchtracks, interpolate_equal_pulses, etc.). See discussion in #10 for details, but the memory footprint and speed is drastically improved. In the process, a few major changes have been made:

  • Parallelization capability is deprecated, if you want to parallelize the interpolation/averaging you should batch your dataset into some desired number of chunks. The parameter is included to avoid breaking previous analyses that used this.
  • Index column now inherits the values from section_by if set to NULL
  • Guessing behavior for indices has been removed, which eliminates some stubborn notes from Rcpp related to the methods for that function.
  • Documentation has been updated for the index column
  • .sort now defaults to FALSE, since F0 datasets are typically already sorted.

Under the hood behaviors that have changed:

  • Copy isn't made with data.table anymore
  • A larger focus on base R where possible
  • Less flexibility with grouping structures; instead of trying to juggle user-set groupings and groupings from .grouping and aggregate_by, we just return an ungrouped dataframe. This avoids having to keep regrouping things, which adds up quickly.
  • Instances where the dataframe was split into a list of dataframes has been removed, as this had a huge memory footprint and could result in the computer running out of memory if the dataset was too large and too many parallel calculations were being run
  • There's a dependency on reframe now rather than working with summarize, may revisit this later if I find out it breaks something of mine

this closes #10

…copies and aggregations, runs 5.2 times faster with a 2.5 times smaller memory usage
…annoying scoping rules with furrr, so I'm deprecating that functionality altogether. Parameter is still there but not used. Updated documentation accordingly, though on my computer the overhead is actually more costly than just processing the whole dataset at once
…based on quickdf example in advanced R ch 24.4.2
@tsostarics tsostarics merged commit 75f2493 into main Jan 11, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor piecewise_interpolate_pulse
1 participant