Skip to content
This repository has been archived by the owner on Oct 15, 2022. It is now read-only.

StorageFormat

thiloplanz edited this page Dec 9, 2011 · 8 revisions

All file contents are stored in GridFS, but the file (and folder) metadata is stored in separate collection, where it can also be versioned.

File and folder metadata

Every file and folder is represented by a MongoDB document in the v7files collection. It has the following fields:

  • _id: an id which unique identifies the file, even if it moves around in the filesystem or changes its name. This is a randomly assigned ObjectId, except for the "root" folders which are identified by Strings chosen by the user and mapped to URL endpoints.
  • _version: an integer, starting at 1 and incrementing with every update to the file
  • parent: the _id of the parent file
  • acl: a nested object containing access control lists
  • filename: the name of the file. This becomes a URL component for WebDAV.
  • length: the length of the file in bytes. Missing in the case of a folder.
  • sha: a byte array with the SHA-1 hash of the file's contents. This is used to link the file to its contents, which are stored in GridFS. Missing in the case of a folder.
  • contentType: the content type of the file
  • created_at: the creation date of the file
  • updated_at: the creation date of the current revision of the file, missing for the first version

Version history

When a file is modified, the _version field in the v7files collection is incremented by one, and the previous revision is moved to a shadow collection that tracks version history, called v7files.vermongo. Deleted files are also stored there. The "main" collection only contains the current versions of all files.

File contents in GridFS

The v7files collection (and its shadow collection) only store the file metadata. File contents are stored in GridFS, keyed by the SHA-1 hash of that data. This is a regular GridFS bucket called v7.fs (so that there will be the collections v7.fs.files and v7.fs.chunks). The _id for these GridFS files is the binary SHA-1 hash (a byte array).

Because of this arrangement, renaming or duplicating a file (without changing its contents) will not take up additional storage.

On the other hand, if you are not interested in retaining the complete file change history, you will need to eventually "garbage-collect" content that is no longer referenced.

Clone this wiki locally