Skip to content

Commit

Permalink
Fix performance issue with subset iterators
Browse files Browse the repository at this point in the history
Previously we were creating a new iterator each time we returned a
value, now we store the previous index in the iterator and update the
full iterator with the difference when needed.

The performance issue was occurring due to transient object creation of
the temporary iterators and moving them to the right location.

Fixes #378
  • Loading branch information
jimhester committed Nov 9, 2021
1 parent f5b59d5 commit 8ef5c86
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 3 deletions.
1 change: 1 addition & 0 deletions NEWS.md
Expand Up @@ -5,6 +5,7 @@
* `vroom(n_max=)` now correctly handles cases when reading from a connection and the file does _not_ end with a newline (https://github.com/tidyverse/readr/issues/1321)

* `vroom()` no longer issues a spurious warning when the parsing needs to be restarted due to the presence of embedded newlines (https://github.com/tidyverse/readr/issues/1313)
* Fix performance issue when materializing subsetted vectors (#378)

* `vroom_format()` now uses the same internal multi-threaded code as `vroom_write()`, improving its performance in most cases (#377)

Expand Down
13 changes: 10 additions & 3 deletions src/index.h
Expand Up @@ -12,13 +12,14 @@ class index {
public:
class subset_iterator : public base_iterator {
size_t i_;
iterator it_;
mutable size_t prev_;
mutable iterator it_;
std::shared_ptr<std::vector<size_t>> indexes_;

public:
subset_iterator(
const iterator& it, const std::shared_ptr<std::vector<size_t>>& indexes)
: i_(0), it_(it), indexes_(indexes) {}
: i_(0), prev_(0), it_(it), indexes_(indexes) {}
void next() override { ++i_; }
void prev() override { --i_; }
void advance(ptrdiff_t n) override { i_ += n; }
Expand All @@ -30,7 +31,13 @@ class index {
auto that_ = static_cast<const subset_iterator*>(&that);
return that_->i_ - i_;
};
string value() const override { return *(it_ + (*indexes_)[i_]); };
string value() const override {
size_t cur = (*indexes_)[i_];
ptrdiff_t diff = cur - prev_;
it_ += diff;
prev_ = cur;
return *it_;
};
subset_iterator* clone() const override {
auto copy = new subset_iterator(*this);
return copy;
Expand Down

0 comments on commit 8ef5c86

Please sign in to comment.