Open
Description
As I understand it, the current design/implementation has the following:
- a single swappable storage factory, configured by
FILES_REST_STORAGE_FACTORY
, the provided implementation beingpyfs_storage_factory
. As far as I can tell, this implementation isn't particularly/actually PyFS-specific. - invenio-s3 has its own storage factory, which implies that it's not currently possible to have multiple storage backends in use at the same time, without creating a new storage factory that hands off to the right storage class.
- potentially many storage class implementations (with the default configured by
FILES_REST_DEFAULT_STORAGE_CLASS
) - the storage class can apparently be specified/overridden as
Bucket.default_storage_class
,FileInstance.storage_class
(both adb.Column(db.String(1))
, which implies it's never been used? see source) - according to the entity relationship diagram, a bucket has a default location, which is used to construct the
FileInstance.uri
. The location is not used after that, and so file URIs for filesystem-stored files are absolute paths on disk. - the storage class is passed into the factory by the models API user. e.g.
invenio_files_rest.tasks:merge_multipartobject
callsmp.merge_parts()
, which has a**kwargs
that gets passed all the way through toFileInstance.storage()
, which defaults tostorage_class
being the one specfied byFILES_REST_DEFAULT_STORAGE_CLASS
. In this scenario, thestorage_class
on the FileInstance gets ignored.
This seems a bit complicated, and doesn't yet necessarily meet its objectives of having a tidy interface and support for multiple storage backends.
What I would have expected:
- A
Location
defines a storage class and a base path, e.g. (abstractly)(pyfs, /some/path)
,(s3, /bucket-name/some-path)
or(s3bucket, /some/path)
- A
Bucket
provides for a default location for new files, defaulting to the global default location - A
FileInstance
has aLocation
and a path. The location defaults to the bucket at creation time. The base path for the bucket and the path for the file instance are combined and used by the storage backend defined on the location - Storage backends are declared using entrypoints, with the entrypoint name being used as the key for the storage backend on the Location model. This key→class mapping can be overridden in config
FileInstance.storage()
takes no**kwargs
, and creates the storage class instance based on the above. Maybe it's a (cached?) property. Its behaviour is contained in a function that replaces the storage class factory and can be overridden, but generally an implementer would leave this alone unless they want to do something special.
This GitHub issue came about because I was trying to work out how one would support hooking up to multiple S3 buckets, and my assumption was that each would be a location for each bucket, but it wasn't immediately clear how to integrate this into the existing framework.
Could we improve on how this all works before we have a public release?
Metadata
Metadata
Assignees
Labels
No labels