Skip to content

cache offsets / lengths on updates #3135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

murphymatt
Copy link

@murphymatt murphymatt commented Jun 24, 2025

Summary

Currently, in the UVM KJT pools, on every lookup, we recompute offset values for all lengths of the lookup ids. This can be extremely expensive if the pool size is very large and if the id lookup size is large. Furthermore, given that these offsets should only change upon updates to the pool, it should be unnecessary to recompute these values on every read path. Instead, we can compute these offsets on the pool's write path, and reference the offsets at the correct lookup indices, to significantly cut down on lookup latency.

Testing

torchx run -s local_cwd utils.python --script torchrec/modules/tests/test_kjt_pool_lookup.py
torchx 2025-06-24 16:02:22 INFO     Tracker configurations: {}
torchx 2025-06-24 16:02:22 INFO     Log directory not set in scheduler cfg. Creating a temporary log dir that will be deleted on exit. To preserve log directory set the `log_dir` cfg option
torchx 2025-06-24 16:02:22 INFO     Log directory is: /tmp/torchx_9keqq27r
local_cwd://torchx/torchx_utils_python-q6bc5grqt7hv3c
torchx 2025-06-24 16:02:22 INFO     Waiting for the app to finish...
python/0 torch.cuda.is_available()=True
python/0 ..
python/0 ----------------------------------------------------------------------
python/0 Ran 2 tests in 0.458s
python/0
python/0 OK
torchx 2025-06-24 16:02:31 INFO     Job finished: SUCCEEDED

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2025
Copy link
Author

@murphymatt murphymatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening for review

@murphymatt murphymatt marked this pull request as ready for review June 24, 2025 17:10
@TroyGarden
Copy link
Contributor

@murphymatt thanks for the PR. could you please address the test failures

=========================== short test summary info ============================
FAILED torchrec/modules/tests/test_keyed_jagged_tensor_pool.py::KeyedJaggedTensorPoolTest::test_conflict - AttributeError: 'TensorJaggedIndexSelectLookup' object has no attribute '_tbe_state'
FAILED torchrec/modules/tests/test_keyed_jagged_tensor_pool.py::KeyedJaggedTensorPoolTest::test_empty_lookup - AttributeError: 'TensorJaggedIndexSelectLookup' object has no attribute '_tbe_state'
FAILED torchrec/modules/tests/test_keyed_jagged_tensor_pool.py::KeyedJaggedTensorPoolTest::test_input_permute - AttributeError: 'TensorJaggedIndexSelectLookup' object has no attribute '_tbe_state'
FAILED torchrec/modules/tests/test_keyed_jagged_tensor_pool.py::KeyedJaggedTensorPoolTest::test_update_lookup - AttributeError: 'TensorJaggedIndexSelectLookup' object has no attribute '_tbe_state'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants