Skip to content

compiler: Cache Dependence instances #2634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

enwask
Copy link
Contributor

@enwask enwask commented Jun 16, 2025

Adds a @_memoized_instances decorator for caching instances returned by __new__. This decorator is applied to Relation and Dependence for a somewhat substantial speedup in build times.

Tested locally (with Python 3.10 on dewback), this yields a pretty consistent performance improvement on an old profiling script with some modifications (removed inline calls to cProfile and ran it against the script up to the first Operator construction). This gist highlights performance of a couple hotspots as well as the top consumers from cProfile; overall, the operator in that script builds roughly 15% faster in most of my runs with cached Dependence instances. I'm seeing a similar improvement in most of the examples as well.

Waiting on #2631 to be merged as there are 3.10 type hints in a couple places.

Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this an unbounded cache?

Even though it's using weak references, the size of this cache could grow arbitrarily, thus causing an increase in memory consumption over long runs.

Don't we need a mechanism to purge it from time to time or at least evict the entries with value None, or make it an LRU bounded cache?

FWIW, this function is called automatically by the devito runtime, "from time to time"
https://github.com/devitocodes/devito/blob/main/devito/types/caching.py#L165
(in essence, before allocating new data and based on heuristics)

@enwask
Copy link
Contributor Author

enwask commented Jun 17, 2025

Isn't this an unbounded cache?

@FabioLuporini technically unbounded yes, but entries are evicted by WeakValueDictionary when the referred object stops existing, so cache entries are only stored while that instance would remain in memory anyway (after rereading, I think this is what you were getting at with evicting entries that are None).

I considered using a bounded LRU cache of strong references but then we're hardcoding a cache limit for something that should scale with the problem size

@FabioLuporini
Copy link
Contributor

so cache entries are only stored while that instance would remain in memory anyway

ah, I see!

How much memory growth have you observed with a big Operator (ideally something high order)?

You also need to rebase the branch on latest main or we won't be able to spin CI

@enwask
Copy link
Contributor Author

enwask commented Jun 17, 2025

How much memory growth have you observed with a big Operator (ideally something high order)?

On that demo script from you I've been using, memory usage was within the noise. Haven't tried on anything huge (don't really have any real-world operators to test with that take up a lot of memory in building) but at least theoretically it should be basically nothing. Will try some more contrived higher-order cases later.

You also need to rebase the branch on latest main or we won't be able to spin CI

Yeah will rebase shortly—also want to make a small change to make it obvious that memoized_constructor is private. But I'm also waiting on Python 3.9 to be dropped, as the 3.9 workflows will fail right away on some unsupported type hints.

@enwask enwask requested a review from FabioLuporini June 17, 2025 12:33
@FabioLuporini FabioLuporini requested a review from Copilot June 17, 2025 12:43
Copilot

This comment was marked as outdated.

@enwask enwask requested a review from Copilot June 17, 2025 13:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a weak-value cache and a class‐level memoization decorator to speed up construction of Dependence instances, updates IR support to use immutable mappings, and expands tests to cover the new cache.

  • Introduce WeakValueCache in devito.tools.abc and corresponding tests in tests/test_tools.py
  • Add _memoized_instances decorator in devito/tools/memoization.py and apply it to cache Relation instances
  • Update TimedAccess hashing and signature in devito/ir/support/basic.py; switch IR tests to use frozendict in tests/test_ir.py

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_tools.py Add WeakValueCache tests, including concurrent use cases
tests/test_ir.py Replace mutable dicts with frozendict for _directions
devito/tools/memoization.py Implement _memoized_instances decorator for class caching
devito/tools/abc.py Implement WeakValueCache
devito/ir/support/basic.py Refine TimedAccess signature, override __hash__, cache Relation
Comments suppressed due to low confidence (3)

tests/test_tools.py:250

  • The test uses time.sleep but time is not imported. Add import time at the top of the file.
            time.sleep(creation_time)

devito/tools/abc.py:309

  • The annotation uses Future[ReferenceType[ValueType]] but Future is imported from concurrent.futures and is not subscriptable. Import Future from typing or remove the subscription to avoid a runtime TypeError.
        self._futures: dict[KeyType, Future[ReferenceType[ValueType]]] = {}

devito/tools/abc.py:347

  • The parameter annotation Future[ref[ValueType]] will error at runtime since Future is not a generic class. Adjust the annotation (e.g., use typing.Future) or remove it.
                def on_obj_destroyed(k: KeyType = key, f: Future[ref[ValueType]] = future):

@enwask enwask force-pushed the cached-dependences branch from 4b3e65d to 2c7d6f7 Compare June 20, 2025 09:57
dependabot bot and others added 16 commits June 20, 2025 12:02
Updates the requirements on [fsspec](https://github.com/fsspec/filesystem_spec) to permit the latest version.
- [Commits](fsspec/filesystem_spec@0.0.1...2025.5.1)

---
updated-dependencies:
- dependency-name: fsspec
  dependency-version: 2025.5.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Updates the requirements on [pytest-cov](https://github.com/pytest-dev/pytest-cov) to permit the latest version.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](pytest-dev/pytest-cov@v1.0...v6.2.1)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 6.2.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@enwask enwask force-pushed the cached-dependences branch from 14c3e72 to 4317211 Compare June 20, 2025 11:04
Copy link

codecov bot commented Jun 20, 2025

Codecov Report

Attention: Patch coverage is 98.40637% with 4 lines in your changes missing coverage. Please review.

Project coverage is 87.66%. Comparing base (bca0b10) to head (0d8a927).

Files with missing lines Patch % Lines
devito/tools/abc.py 93.75% 1 Missing and 2 partials ⚠️
devito/tools/memoization.py 97.05% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2634      +/-   ##
==========================================
- Coverage   91.30%   87.66%   -3.65%     
==========================================
  Files         245      245              
  Lines       48752    48994     +242     
  Branches     4298     4307       +9     
==========================================
- Hits        44515    42952    -1563     
- Misses       3529     5311    +1782     
- Partials      708      731      +23     
Flag Coverage Δ
pytest-gpu-aomp-amdgpuX ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@enwask enwask requested review from FabioLuporini and mloubout June 20, 2025 13:17
Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to add a test that ensures we're not leaking memory and/or memory consumption is bounded ? this is mostly me being paranoid

FWIW, you can get the running process' eaten memory via Process(pid).memory_info().rss / (1024*1024)

Copy link
Contributor

@JDBetteridge JDBetteridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a few more tests too, since this is such a crucial part of the code. Maybe that exercise the deadlock safety by deliberately desynchronising threads. But this can be a discussion too, in case anybody else has ideas

Comment on lines 6 to 7
from weakref import ReferenceType, ref
import weakref
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange that you do both of these!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid squatting on finalize I suspect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReferenceType is the idiomatic type hint for weak refs

Copy link
Contributor Author

@enwask enwask Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean that I also import the namespace I guess. Yeah I could also import finalize itself I guess, just felt like it was better to be explicit (weakref.finalize does a better job of describing what that's doing)—in fact on second thought I should probably be using weakref.ref explicitly as well instead of importing it by name

That said I still want ReferenceType imported by name because it's still explicit but much cleaner in the type hints than weakref.ReferenceType imo

self._futures: dict[KeyType, Future[ReferenceType[ValueType]]] = {}
self._lock = RLock()

def get_or_create(self, key: KeyType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind not subclassing defaultdict

Copy link
Contributor Author

@enwask enwask Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It serves a different purpose. We don't want the actual storage machinery of defaultdict, and implementing the rest of the mapping interface to mirror that of defaultdict would just add unnecessary bloat

Gets the value for `key`, or creates it with `supplier` if it doesn't exist.
If another thread is already creating the value, blocks until it's available.
"""
while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about the logic in this method. Someone might over rule what I'm saying, but:
This is getting close to spaghetti, a while loop that is only exited by returning (with multiple returns) or raising an exception. I feel like there must be a nicer way of implementing the same logic

Copy link
Contributor Author

@enwask enwask Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm aware, while loops are the canonical approach to these types of thread clash-retry constructions. The alternatives are duplicating everything that happens within the lock, or using recursion (or otherwise adding unnecessary complexity)—though on second thought I think a recursive solution might be a little cleaner, so I'll take a look at reimplementing that way

Copy link
Contributor Author

@enwask enwask Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed my mind actually, I was right the first time lol. Best practice is definitely the while loop; the recursive version is just growing the stack for no reason

Basically my argument is:

  1. The while loop clearly communicates what we're doing (trying some threaded access until we get to a certain state) and
  2. Any other solution is less idiomatic, since we otherwise can't use the cache lock as a context manager without blocking for the duration of construction or
  3. Barring (2), an idiomatic solution would require unnecessary code duplication.

I did rewrite this method to factor a bit of logic out of the loop so maybe you'll be happier with the current implementation, but either way, this is just the canonical approach

obj = supplier()

# Listener for when the weak reference expires
def on_obj_destroyed(k: KeyType = key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do futures not support some sort of callback already?
Or could the _futures object itself be a weak dict?

Copy link
Contributor Author

@enwask enwask Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be conflating the futures with weak references here. Not sure whether futures "support a callback" but if so that would be something that runs when the future is resolved; that's what we use future.result() for on non-construction threads (to block until that result is ready).

Using a weak dict for the futures would be self-defeating; we want to weakly reference the cached instances, not the futures. If we store weak references to futures, then as soon as all threads consuming a future are done with it the entry will be evicted—even if the object that the future resolves to hasn't been garbage collected. So, we'd fail to cache something that might still be persisting in memory, and would then reconstruct and cache it again for no reason

@enwask
Copy link
Contributor Author

enwask commented Jun 30, 2025

does it make sense to add a test that ensures we're not leaking memory and/or memory consumption is bounded ? this is mostly me being paranoid

@FabioLuporini the memory the cache takes up is bounded by the number of instances of the object that exist. There is a test which checks that cache entries are evicted when their values go out of scope, so this is guaranteed—I can try to add a test specifically for memory consumption if you'd like, but imo that's just going to be a more heuristic test (is the cache using "too much" memory? how much is too much?) of something that is already tested concretely (i.e. that the cache memory is bounded by the actual instance memory, which in turn ensures we're not leaking memory)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants