-
Notifications
You must be signed in to change notification settings - Fork 13
rf: Factor TemplateFlow into Cache and Client classes #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4ab1bfa to
d5f3685
Compare
mgxd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TemplateFlow depends a lot on import-time initialization, which can slow down any tool that imports it. It also makes the tests extremely contorted as they have to monkey patch global variables and then either reload modules or rerun module import-time behavior.
Agreed, this is a welcome change IMO 👍
How do you feel about this API and separation of concerns?
On board with TemplateFlowCache and TemplateFlowClient, but not sure CacheConfig needs to be public - especially with the suggestions in 2.
What about the TemplateFlowClient.init method?
Yes please, though WDYT about TemplateFlowClient.__init__(self, root: str | None, *, config=None, **config_kwargs) - inspired by BIDSLayout. Then users will only concern themselves with a single import / API. Which parameters should take precedence here? I would assume, from first to last:
- config is loaded
- kwargs overwrite
- root overwrite
Should we have a module-level setattr to update templateflow.cache.TF_* constants? I don't think that was a well-supported feature, since many were imported into templateflow.api, so changing them in one place wouldn't change them in the other.
I don't think so - in fact, isn't this what CacheConfig is essentially addressing?
850632c to
8e4eb33
Compare
mgxd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, left a suggestion for the pytest error, though not sure about the docs...
oesteban
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outstanding work, Chris, thanks for this. Thanks also for your effort to keep things all caught up with best practices, new tools and the general landscape.
I think this work is very much needed, thanks!
It feels more natural. I wonder whether we should just break backwards compatibility and recommend people to try to create the client with the new API and then fall back to the old
I think it could be useful for people trying to do weird things (e.g., multiple atlases, playing with versions of the Archive, etc.). However, I would leave that for a future PR if people request it. |
Co-authored-by: Oscar Esteban <code@oscaresteban.es>
25.1.0 (October 21, 2025) New feature release in the 25.1 series. This release introduces a new ``TemplateFlowClient`` class that provides the functionality previously exposed in ``templateflow.api``. ``templateflow.api`` is now a thin wrapper around a global instance of ``TemplateFlowClient``, so existing code using ``templateflow.api`` should continue to work as before. These changes allow multiple independent clients to coexist in the same Python process, as well as defer loading of data from the filesystem until it is requested, significantly improving import time. * RF: Factor TemplateFlow into Cache and Client classes (#149) * FIX: Error on missing S3 files, do not write error data to disk (#148)
TemplateFlow depends a lot on import-time initialization, which can slow down any tool that imports it. It also makes the tests extremely contorted as they have to monkey patch global variables and then either reload modules or rerun module import-time behavior.
This PR creates the following classes:
templateflow.conf.cache.CacheConfig: a dataclass that will auto-populate in the way theTF_*global config variables did, if no alternatives are passed.templateflow.conf.cache.TemplateFlowCache: A wrapper around a specific cache location that implementsensure(install if missing),updateandwipemethods and holds a persistentLayoutthat is built on first access and deleted after an update or wipe so the next access triggers a rebuild.templateflow.client.TemplateFlowClient: This provides thetemplateflow.apifunctions. It simply holds aTemplateFlowCacheand uses its layout instead of a globalTF_LAYOUT.templateflow.cache.__init__now creates a defaultTemplateFlowCacheandtemplateflow.apinow wraps that in aTemplateFlowClient, and provides its old methods through a module__getattr__. I've left the tests in place to demonstrate that the previously expected behavior remains.Following @mgxd's review, I've adjusted the
TemplateFlowClient.__init__to be the only interface we advertise. There is a cache and a config object, which can be passed as an expert option, but there's no good reason for people to use it.Open questions:
TemplateFlowClient.__init__method? I feel like__init__(self, *, cache=None, **config_kwargs)might be a neater way of doing it, so nobody needs to directly construct aCacheConfiginstead ofTemplateFlowClient(root='/tmp/templateflow', timeout=2).__setattr__to updatetemplateflow.cache.TF_*constants? I don't think that was a well-supported feature, since many were imported intotemplateflow.api, so changing them in one place wouldn't change them in the other.Any other thoughts would be welcome. I want copy the
test_apitest battery to test the client directly (using only the new classes), and add docstrings before finalizing this.