Skip to content
This repository has been archived by the owner on Oct 15, 2022. It is now read-only.
Thilo Planz edited this page Apr 2, 2012 · 1 revision

Comparison with GridFS

Separation of files and contents

In GridFS, files and their contents are stored together, v7files keeps them separate.

As a result, there is no content deduplication, and copying files is an expensive operation.

Content compression

v7files offers transparent (i.e. invisible to application code) content compression.

Folders

The virtual file system provided by v7files has folders. GridFS only offers a flat namespace. You could use path separators in the file names to achieve a hierarchy, but then renaming folders becomes very expensive and non-atomic.

Access control

GridFS does not specify a way to provide file access permissions, v7files does.

Chunked storage of large contents

Both v7files and GridFS store data that is too large to fit into a single MongoDB document in multiple chunks. In v7files, the same chunk can belong to multiple files, with GridFS they are an integral part of just one file.

Updates and version history

When a GridFS file is updated, the new contents replace (overwrite) the old content.

v7files never updates any contents, but changes the file metadata to point at the new content. If you are interested in keeping a version history for that file (v7files can be configured to also preserve the necessary file metadata for that), that is very convenient . If not, you need to garbage-collect.

Garbage collection

There is no garbage in GridFS, because updates happen in-place and contents are deleted when their files are deleted.

With v7files, updates and deletions leave the old content in the system. Unless you are interested in a complete version history, you will need to run garbage collection from time to time. This is facilitated by reference tracking.

Library versus Server

GridFS is a protocol aimed at libraries and programs that directly talk to a MongoDB database.

v7files also provides a server process and various HTTP-based interfaces (such as WebDAV) to ease integration into existing applications.

Why GridFS is no longer used internally

In the beginning of v7files development, GridFS was used internally: v7files implemented a layer on top of GridFS, managing only the metadata itself but delegating content storage to GridFS.

This is no longer the case. Now v7files implements its own storage format directly, in a way that is very similar to, but not compatible with, GridFS.

We stopped using GridFS for the following reasons:

  • The original motivation to build on top of GridFS was two-fold:

    • don't re-invent the wheel, reuse existing, proven, and well-maintained code
    • compatibility with existing tools, in particular for server operations (such as backup, data migration, trouble shooting)

    As it turned out, however, existing GridFS tools could no longer be used in a meaningful fashion on the internal GridFS buckets without also considering the meaning of the associated metadata. And the GridFS codebase and protocol was quite simple so that re-implementing the necessary parts was not a big effort (and well worth for the ability to leave out or modify the unnecessary or even counterproductive parts, see below).

  • Some parts of GridFS were made redundant by the additional v7files metadata. For example MD5 hashing, or file name storage.

  • The main feature GridFS provides is content chunking. As part of the deduplication processing, v7files already implements an equivalent mechanism.

  • There were concerns about the lack of atomic operations in GridFS. If a file contains multiple chunks, they cannot be stored, deleted, or updated in a single MongoDB transaction. In v7files, every content chunk can stand on its own and a partially completed operation does not leave the database in an inconsistent state.