Skip to content
rockeet edited this page Nov 3, 2023 · 4 revisions

SidePlugin

For the design motivation of ToplingDB SidePlugin, please refer to Motivation To Solution.

Compile-And-Install

Migrate existing code using RocksDB to ToplingDB, please refer to Using ToplingDB from Scratch.

1. Overview

The ToplingDB SidePlugin configuration system defines configuration items in json/yaml format, and includes all meta-objects of ToplingDB/RocksDB into this configuration system. Overall, the ToplingDB configuration system achieves the following goals:

  1. All configuration requirements for ToplingDB/RocksDB
  2. Dynamic, open and decoupled plugin solution
    • User code can use third-party modules (such as ToplingZipTable) without modification
    • Write new plugins without introducing irrelevant dependencies (similar to silly code: if (is BlockBasedTable) ... else if (is PlainTable) ...)
  3. Visualization: Display the internal state of the engine through Web Service (off-site documentation)
  4. Modify configuration online using REST API through Web Service
  5. Simplify multilingual Binding (only need bind conf object)

2. Detailed introduction

The root configuration objects of ToplingDB/RocksDB are DBOptions and ColumnFamilyOptions. Additional Options objects are a combination of DBOptions and ColumnFamilyOptions (CFOptions for short) (inherited from the latter two).

DBOptions and CFOptions contain secondary configuration objects, and some secondary objects further contain tertiary configuration objects. All these objects are defined as sub-objects of the first-level json object named after its base class name in json. In addition, there are several other special first-level json objects (http, setenv, databases, open) in json. You can refer to other json objects in json objects, and these references will be converted into reference relationships between C++ objects.

DBOptions and CFOptions also support template to specify a DBOptions/CFOptions object as a template, which is copied from the template and then modified.

2.1 json configure

{
  "http": {
    "document_root": "/path/to/dbname",
    "listening_ports": "8081"
  },
  "setenv": {
    "DictZipBlobStore_zipThreads": 8,
    "StrSimpleEnvNameNotOverwrite": "StringValue",
    "IntSimpleEnvNameNotOverwrite": 16384,
    "OverwriteThisEnv": { "overwrite": true,
      "value": "overwrite is default to false, can be manually set to true"
    }
  },
  "permissions": { "web_compact": true },
  "Cache": {
    "lru_cache": {
      "class": "LRUCache",
      "params": {
        "capacity": "4G", "num_shard_bits": -1, "high_pri_pool_ratio": 0.5,
        "strict_capacity_limit": false, "use_adaptive_mutex": false,
        "metadata_charge_policy": "kFullChargeCacheMetadata"
      }
    }
  },
  "WriteBufferManager" : {
    "wbm": {
      "class": "Default",
      "params": {
        "//comment": "share mem budget with cache object ${lru_cache}",
        "buffer_size": "512M", "cache": "${lru_cache}"
      }
    }
  },
  "Statistics": { "stat": "default" },
  "TableFactory": {
    "bb": {
      "class": "BlockBasedTable",
      "params": { "block_cache": "${lru_cache}" }
    },
    "fast": {
      "class": "SingleFastTable",
      "params": { "indexType": "MainPatricia" }
    },
    "zip": {
      "class": "ToplingZipTable",
      "params": {
        "localTempDir": "/dev/shm/tmp",
        "sampleRatio": 0.01, "entropyAlgo": "kNoEntropy"
      }
    },
    "dispatch" : {
      "class": "DispatcherTable",
      "params": {
        "default": "fast",
        "readers": { "SingleFastTable": "fast", "ToplingZipTable": "zip", "BlockBased": "bb" },
        "level_writers": ["fast", "fast", "fast", "zip", "zip", "zip", "zip"]
      }
    }
  },
  "CFOptions": {
    "default": {
        "max_write_buffer_number": 4, "write_buffer_size": "128M",
        "target_file_size_base": "16M", "target_file_size_multiplier": 2,
        "table_factory": "dispatch", "ttl": 0
    }
  },
  "databases": {
    "db1": {
      "method": "DB::Open",
      "params": {
        "options": {
          "write_buffer_manager": "${wbm}",
          "create_if_missing": true, "table_factory": "dispatch"
        }
      }
    },
    "db_mcf": {
      "method": "DB::Open",
      "params": {
        "db_options": {
          "create_if_missing": true,
          "create_missing_column_families": true,
          "write_buffer_manager": "${wbm}",
          "allow_mmap_reads": true
        },
        "column_families": {
          "default": "$default",
          "custom_cf" : {
            "max_write_buffer_number": 4,
            "target_file_size_base": "16M",
            "target_file_size_multiplier": 2,
            "table_factory": "dispatch", "ttl": 0
          }
        },
        "path": "'dbname' passed to Open. If not defined, use 'db_mcf' here"
      }
    }
  },
  "open": "db_mcf"
}

2.2 special objects

2.2.1 http

In this example, the first json sub-object is:

  "http": {
    "document_root": "/", "listening_ports": "8081"
  }

This http object defines the Http Web Server configuration used for web presentation. For complete http parameters, please refer to: CivetWeb UserManual.

2.2.2 setenv

  "setenv": {
    "DictZipBlobStore_zipThreads" : 8
  }

Each sub-object of setenv defines an environment variable.

2.2.3 permissions

Each sub-object of permissions defines a permission.

2.2.4 databases

Multiple database objects can be defined under databases, and database objects are divided into two categories:

  1. DB containing only the default ColumnFamily
  2. DB with multiple ColumnFamily (DB_MultiCF)

These two types of databases are distinguished by whether they contain the child object column_families. Even if a database actually has only one ColumnFamily, but it defines the ColumnFamily in the sub-object column_families, it is also DB_MultiCF.

The database object is opened by the function specified by the method. The method in the C++ code is overloaded, and the method in the json is also overloaded. The same method is overloaded for DB and DB_MultiCF respectively.

2.2.5 open

Although we can define multiple databases in json, in many cases, we will only open one of the databases. When using the OpenDB API without a database name, this open object is used to specify which database to open. When the user uses the OpenDB api with the db name, the open object is ignored.

2.3 General objects

Among the first-level objects of json, except for the above four special objects, the others are general objects. The name of each level-one general object is the class name of the base class of such objects in ToplingDB/RocksDB. For example, "Cache", "Statistics", "TableFactory" in the example, these first-level objects themselves are equivalent to a container, and each sub-object defines a real C++ object. Each such "container" is equivalent to a namespace, and there can be objects with the same name under different namespaces.

The C++ object corresponding to the json object contains the class name and parameters, expressed by "class" and "params" respectively. Careful users can find that the json object named "stat" is the string "default", which is for simplification. For a class without parameters, you can directly use the string of its class name to define (here "default" is The registered class name of stat, the corresponding C++ class is StatisticsImpl), of course, this kind of object can also be defined by a complete and regular json object containing "class" and "params".

DBOptions and CFOptions are special general objects, because their "class" is determined, so "class" and "params" are omitted, and the members in "params" are directly promoted to the outer layer.

2.4 Object references

In C++ objects, one object refers to another object through pointers. In json, it is realized through object names. The formal and complete way of writing object references is "${varname}", and the simplified way of writing can be "$varname" or "varname", where "varname" may lead to ambiguity, because a json string may also express "class_name". Our processing strategy is: first check whether the string is a defined object, if it is, it will be processed according to "varname", otherwise it will be processed according to "class_name".

2.4.1 inline objects

In addition to defining named objects and then referencing them by name, we can also define nested objects, as in the example:

  "custom_cf" : {
    "max_write_buffer_number": 4,
    "target_file_size_base": "16M",
    "target_file_size_multiplier": 2,
    "table_factory": "dispatch", "ttl": 0
  }

"custom_cf" could be defined as a reference to a CFOptions object, but here it is more convenient and concise to define it as an inline object.

2.4.2 CFOptions::ttl

There is no ttl member in CFOptions, but we define ttl for it in json, because the "method" of database can be specified as many other functions besides "DB::Open":

  "DB::OpenForReadOnly" // Equivalent to defining "read_only": true in params
  "DBWithTTL::Open" // Need CFOptions::ttl
  "TransactionDB::Open"
  "OptimisticTransactionDB::Open"
  "BlobDB::Open"

Users can also extend and define their own Open, for example: MyCustomDB::Open.

2.5 DispatcherTable

  "dispatch" : {
    "class": "DispatcherTable",
    "params": {
      "default": "fast",
      "readers": {"SingleFastTable": "fast", "ToplingZipTable": "zip"},
      "level_writers": ["fast", "fast", "fast", "zip", "zip", "zip", "zip"]
    }
  }

As the name implies, DispatcherTable is used for actual Table (SST) dispatching and scheduling. For users, the most critical thing is level_writers: use the corresponding Table at the corresponding level.

default is used as a fallback when level < 0 (level is a member of TableBuilderOptions), or when level_writer fails to create a builder.

readers are used to define the mapping from class_name to varname, because in the internal implementation, loading Table is realized through DispatcherTable::NewTableReader. As a dispather, it is natural to know what kind of Table is loaded, which is distinguished by TableMagicNumber. It is statically determined at compile time, but TableFactory is created at runtime, and each specific TableFactory class can have multiple (params different) objects, so we need to specify which TableFactory object the corresponding TableFactory class uses to load here .

In this DispatcherTable definition, L0~L2 use fast, and L3~L6 use zip.

3. Yaml file

Users who are familiar with Kubernetes may prefer Yaml. As a configuration file, Yaml is more readable, and the ToplingDB configuration system also supports Yaml.

4. Several actual configuration files

The Enterprise Edition includes ToplingZipTable, which is based on the SST searchable memory compression algorithm. Using multi-instance shared distributed Compact clusters can reduce costs and increase efficiency through scale effects.

The Community Edition does not include ToplingZipTable, otherwise the Enterprise Edition is identical to the Community Edition.

Json Yaml explanation
etcd_dcompaction.json yaml with Distributed Compaction
lcompact_community.json yaml without Distributed Compaction
db_bench_community.yaml yaml db_bench testing,without json
db_bench_enterprise.yaml yaml db_bench testing,without json
todis-community.json yaml todis Community Edition
todis-enterprise.json yaml todis Enterprise Edition
mytopling.json no yaml yet MyTopling Enterprise Edition
mytopling-2nd.json no yaml yet MyTopling Enterprise Edition,shared secondary node
mytopling-community.json no yaml yet MyTopling Community Edition
kvtopling-community.json no yaml yet kvrocks ToplingDB Community Edition
kvtopling-community-2nd.json no yaml yet kvrocks ToplingDB Community Edition,shared secondary node