Skip to content

System Overview

Luke Lovett edited this page Feb 2, 2015 · 2 revisions

This page explains some of the mongo-connector internals from the perspective of running mongo-connector for the first time.

  1. The main Connector thread determines the type of MongoDB node provided in the address given to it by issuing a isdbgrid command to the node. The response reveals whether the node is a mongod or a mongos, and thus whether the user intends to replicate from a replica set or sharded cluster, respectively. Based on this information, the Connector thread then creates an OplogThread for the primary in the replica set, or for the primary node in each shard within the sharded cluster administered by the mongos as necessary. It also initializes one or more DocManagers for each replication endpoint and provides these to the OplogThreads.

  2. An OplogThread creates a tailable cursor into the oplog.rs collection of the mongod. This collection is a running record of all operations that happen on that node.

  3. The OplogThread initiates a "collection dump," by which it upserts every document in the namespaces we're interested in through the specified DocManagers. These DocManagers pass on the documents to their respective target systems. The "collection dump" happens only the first time mongo-connector is started, and it does not happen again as long as mongo-connector can find the timestamp of the last oplog record it processed (more on this in the next step).

  4. The OplogThread goes into a loop, efficiently polling the oplog for new documents. Each document corresponds to one operation and contains information such as time of the operation, namespace of the operation, what operation was performed, and which documents were affected. Based on the operation provided in the oplog document, the OplogThread calls the appropriate method of each DocManager.

    • If the operation is an insert, then the OplogThread retrieves the document from MongoDB and passes the document along to the DocManager using the upsert method.
    • For an update operation, the DocManager retrieves the version of the document it has on the remote system and applies the update operation given in the oplog entry to it, re-saving it in the remote system.
    • When we see a delete operation, the DocManager simply removes the document with the given id from the remote system.
    • A DocManager may also choose to handle database commands from MongoDB, such as dropCollection or dropDatabase. These are passed to the handle_command method, if one is defined on the DocManager.
  5. For each document from the oplog (in the same loop as above), the OplogThread notes the timestamp from the oplog document it read and saves this information as its "checkpoint". The checkpoint acts like a bookmark, periodically written out to the oplog progress file ('oplog.timestamp' by default), and can be used to fast-forward to the proper place in the oplog if mongo-connector is shut-down.

The last step runs until mongo-connector is killed. Some events, such as temporarily losing a connection to MongoDB, replica set rollback, or falling very far behind in the oplog, can cause the OplogThread to take other actions. These actions will be covered in another page.

Clone this wiki locally