PyTorch: improve memory-efficiency in batched non-shuffle buffer #762

chongxiaoc · 2022-07-12T23:26:38Z

Avoid creating many copies in _make_batch().

codecov · 2022-07-12T23:46:30Z

Codecov Report

Merging #762 (15c5b38) into master (fa8a881) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #762      +/-   ##
==========================================
- Coverage   86.27%   86.26%   -0.01%     
==========================================
  Files          85       85              
  Lines        5084     5081       -3     
  Branches      785      783       -2     
==========================================
- Hits         4386     4383       -3     
  Misses        559      559              
  Partials      139      139

Impacted Files	Coverage Δ
petastorm/reader_impl/pytorch_shuffling_buffer.py	`96.58% <100.00%> (-0.09%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa8a881...15c5b38. Read the comment docs.

chongxiaoc · 2022-07-13T02:17:47Z

@selitvin Can you take a look?

chongxiaoc · 2022-07-21T05:34:13Z

Updated PR:

Memory usage is even lower because of removing intermediate deque().

Performance comparison of iterating 13798 batches from production dataset at Uber.
Master branch: 145.2844157218933 seconds
Updated PR: 43.827380657196045 seconds.
3X faster.

@selitvin

selitvin

Looks good! Please add a line to release notes. Perhaps we can improve the _batch_i variable name?

selitvin · 2022-07-20T18:25:47Z

petastorm/tests/test_shuffling_buffer.py

+
+def test_batched_noop_shuffling_buffer():
+    """Check intermediate status of batched non-shuffling buffer"""
+    import torch


Can we move the import to the top?

this is out-dated test from previous commit.

selitvin · 2022-07-26T22:19:17Z

petastorm/reader_impl/pytorch_shuffling_buffer.py

-        self.store.append(batch)
+        self._buffer = []
+        self._done_adding = False
+        self._batch_i = 0


Perhaps a more self explanatory name instead of _batch_i? Something like first_unconsumed_row or something along the lines?

updated with name and comments.

Avoid creating many copies in _make_batch(). Reuse same rowgroup to generate all available batches.

chongxiaoc requested a review from selitvin July 12, 2022 23:26

chongxiaoc force-pushed the non-shuffle branch 2 times, most recently from bef1fc1 to 3a0911e Compare July 12, 2022 23:30

chongxiaoc force-pushed the non-shuffle branch 2 times, most recently from b221591 to 6884b3a Compare July 13, 2022 01:06

chongxiaoc mentioned this pull request Jul 13, 2022

PyTorch Batched Non-shuffle Buffer Large Memory Consumption #763

Closed

chongxiaoc force-pushed the non-shuffle branch from 6884b3a to 5dff2fe Compare July 21, 2022 05:18

chongxiaoc force-pushed the non-shuffle branch 7 times, most recently from 904ad1c to e5620b6 Compare July 25, 2022 20:30

selitvin approved these changes Jul 26, 2022

View reviewed changes

PyTorch: improve memory-efficiency in batched non-shuffle buffer

15c5b38

Avoid creating many copies in _make_batch(). Reuse same rowgroup to generate all available batches.

chongxiaoc force-pushed the non-shuffle branch from e5620b6 to 15c5b38 Compare July 26, 2022 22:31

chongxiaoc merged commit ddfc599 into uber:master Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch: improve memory-efficiency in batched non-shuffle buffer #762

PyTorch: improve memory-efficiency in batched non-shuffle buffer #762

chongxiaoc commented Jul 12, 2022 •

edited

codecov bot commented Jul 12, 2022 •

edited

chongxiaoc commented Jul 13, 2022

chongxiaoc commented Jul 21, 2022 •

edited

selitvin left a comment

selitvin Jul 20, 2022

chongxiaoc Jul 26, 2022

selitvin Jul 26, 2022

chongxiaoc Jul 26, 2022

PyTorch: improve memory-efficiency in batched non-shuffle buffer #762

PyTorch: improve memory-efficiency in batched non-shuffle buffer #762

Conversation

chongxiaoc commented Jul 12, 2022 • edited

codecov bot commented Jul 12, 2022 • edited

Codecov Report

chongxiaoc commented Jul 13, 2022

chongxiaoc commented Jul 21, 2022 • edited

selitvin left a comment

Choose a reason for hiding this comment

selitvin Jul 20, 2022

Choose a reason for hiding this comment

chongxiaoc Jul 26, 2022

Choose a reason for hiding this comment

selitvin Jul 26, 2022

Choose a reason for hiding this comment

chongxiaoc Jul 26, 2022

Choose a reason for hiding this comment

chongxiaoc commented Jul 12, 2022 •

edited

codecov bot commented Jul 12, 2022 •

edited

chongxiaoc commented Jul 21, 2022 •

edited