-
Notifications
You must be signed in to change notification settings - Fork 14
StorageFormat
All file contents are stored in GridFS, but the file (and folder) metadata is stored in separate collection, where it can also be versioned.
Every file and folder is represented by a MongoDB document in the v7files
collection. It has the following fields:
-
_id
: an id which unique identifies the file, even if it moves around in the filesystem or changes its name. This is a randomly assigned ObjectId, except for the "root" folders which are identified by Strings chosen by the user and mapped to URL endpoints. -
_version
: an integer, starting at 1 and incrementing with every update to the file -
parent
: the_id
of the parent file -
acl
: a nested object containing access control lists -
filename
: the name of the file. This becomes a URL component for WebDAV. -
length
: the length of the file in bytes. Missing in the case of a folder. -
sha
: a byte array with the SHA-1 hash of the file's contents. This is used to link the file to its contents, which are stored in GridFS. Missing in the case of a folder. -
contentType
: the content type of the file -
created_at
: the creation date of the file -
updated_at
: the creation date of the current revision of the file, missing for the first version
When a file is modified, the _version
field in the v7files
collection is incremented by one, and the previous revision is moved to a shadow collection that tracks version history, called v7files.vermongo
. Deleted files are also stored there. The "main" collection only contains the current versions of all files.
The v7files
collection (and its shadow collection) only store the file metadata. File contents are stored in GridFS, keyed by the SHA-1 hash of that data. This is a regular GridFS bucket called v7.fs
(so that there will be the collections v7.fs.files
and v7.fs.chunks
). The _id
for these GridFS files is the binary SHA-1 hash (a byte array).
Because of this arrangement, renaming or duplicating a file (without changing its contents) will not take up additional storage.
On the other hand, if you are not interested in retaining the complete file change history, you will need to eventually "garbage-collect" content that is no longer referenced.