Skip to content
Log anonymisation tool for MongoDB
Go
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE.md
README.md
m3m.go

README.md

mongo-three-monkeys

Log anonymisation tool for MongoDB logfiles.

This tool was written for Skunkworks, MongoDB's quarterly hackathon.

This is ALPHA and a work-in-progress - please do check the output after running it.

It currently writes the output to STDOUT.

What It Removes

The tool currently performs the following:

  1. Replace any strings in double-quotes with a SHA1 digest of the contents. This is not currently salted (but this is on the TODO)
  2. Remove any fieldnames (field_name:) or MongoDB namespaces (database.collection), and replace them with another word chosen from a dictionary, based on a FNV hash of the word.
  3. Remove any occurrences of <database_name>.$cmd.
  4. Remove any words contained in a blacklist file, and replace them with XXXX.
  5. Anonymise any IP addresses, using the Crypto-PAn algorithm. Note that this currently uses a hard-coded key - however, we will add functionality to supply your own key in the future.

Please note that unlike IP addresses, hostnames are not explicitly removed - it is suggested that these be added to the blacklist if these are sensitive. (See also https://github.com/victorhooi/mongo-three-monkeys/issues/1)

Usage

To run it:

./m3m <MONGODB_LOGFILE> <BLACKLIST>

Both arguments are optional - if you do not supply a <MONGODB_LOGFILE>, it will default to mongod.log.

The blacklist should be a list of words, one per line, that you want completely redacted from the output - any occurences of these words will be replaced with XXXX (i.e. four X characters). The blacklist file is optional.

Known Issues

  • It removes various things it's not supposed to, due to the use of regexes (e.g. things that look like namespaces, but aren't). However, it was important that we not let things leak through. Ultimately, the goal is to port the regex approach to a proper parsing approach.
  • We pretend that : is an invalid character for collection names - however, it is a valid character,.
  • We assume that $comment is a string type - however, $comment can be any valid BSON type.
  • Nested quotes and newlines - will also not work.
  • We assume that text followed by a colon is a field-name, and that words delimited by periods are namespaces.

Questions

For any questions, please file an issue.

You can’t perform that action at this time.