Skip to content

Motivation To Solution

ZengJingtao edited this page Jan 30, 2023 · 2 revisions

Background

In RocksDB, there are a large number of classes such as XXXXFactory, such as TableFactory, which is used to realize the plugin of SST files. If users want to change to another SST, they only need to configure a corresponding TableFactory, which is actually very flexible.

Problems

There are two problems with this simple Factory mechanism:

  1. To replace a Factory, the user must modify the code, which is a bit cumbersome but not fatal
  2. The fatal thing is: if you want to replace a third-party Factory, you must introduce a dependency on the third-party Factory in the code!
    • If used by other languages ​​(such as Java), you also need to implement binding specifically for third-party dependencies

In TerarkDB back then, in order to achieve seamless integration of TerarkZipTable and avoid users from modifying the code, I used a very hacky solution: intercept the configuration in DB::Open, and enable TerarkZipTable if relevant environment variables are found. This allows users to use TerarkZipTable without modifying the code, just by defining environment variables.

This configuration achieved what TerarkDB intended at the time, but it was just a crude patch!

As a complete and systematic solution, the plug-in we expect (ToplingDB) still takes TableFactory as an example. Users should be able to define TableFactory like this:

  std::string table_factory_class = ReadFromSomeWhere(...);
  std::string table_factory_options = ReadFromSomeWhere(...);
  Options opt;
  opt.table_factory = NewTableFactory(table_factory_class, table_factory_options);

Traditional plugin solution

To only need to modify the configuration without modifying the code when replacing the Factory, we need to map the corresponding configuration item (such as the class name) to the Factory (base class) object (creation function), so we need a function that saves this mapping A global map of relationships. Still taking RocksDB's TableFactory as an example, the existing code is roughly like this:

  class TableFactory {
  public:
    virtual Status NewTableReader(...) const = 0;
    virtual TableBuilder* NewTableBuilder(...) const = 0;
    // more ...
  };
  TableFactory* NewBlockBasedTableFactory(const BlockBasedTableOptions&);
  TableFactory* NewCuckooTableFactory(const CuckooTableOptions&);
  TableFactory* NewPlainTableFactory(const PlainTableOptions&);

We add a global map to map class names to NewXXX functions, but first we encounter a problem: the prototypes of these functions are different. In order to unify, we serialize these XXXOptions into strings:

  TableFactory* NewBlockBasedTableFactoryFromString(const std::string&);
  TableFactory* NewCuckooTableFactoryFromString(const std::string&);
  TableFactory* NewPlainTableFactoryFromString(const std::string&);

Now you can start the next step, define a global map, and register these three Factory-ies:

  std::map<std::string, TableFactory*(*)(const std::string&)> table_factory_map;
  table_factory_map["BlockBasedTable"] = &NewBlockBasedTableFactoryFromString;
  table_factory_map["CuckooTable"] = &NewCuckooTableFactoryFromString;
  table_factory_map["PlainTable"] = &NewPlainTableFactoryFromString;

The general framework is like this, but when it comes to specific details, it will roughly look like this:

  class TableFactory {
  public: // omit irrelevant code ...
    using Map = std::map<std::string, TableFactory*(*)(const std::string&)>;
    static Map& get_reg_map() { static Map m; return m; }
    static TableFactory*
    NewTableFactory(const std::string& clazz, const std::string& options) {
      return get_reg_map()[clazz](options); // omit error checking
    }
    struct AutoReg {
     AutoReg(const std::string& clazz, TableFactory*(*fn)(const std::string&))
        { get_reg_map()[clazz] = fn; }
    };
  };
  #define REGISTER_TABLE_FACTORY(clazz, fn) \
     static TableFactory::AutoReg gs_##fn(clazz, &fn)

Global scope in a .cc file (the following three registrations may be scattered in each Table's own .cc file):

  REGISTER_TABLE_FACTORY("BlockBasedTable", NewBlockBasedTableFactoryFromString);
  REGISTER_TABLE_FACTORY("CuckooTable", NewCuckooTableFactoryFromString);
  REGISTER_TABLE_FACTORY("PlainTable", NewPlainTableFactoryFromString);

The calling place of the previous user code can be changed to this:

  TableFactory::NewTableFactory(table_factory_class, table_factory_options);

This is actually the plug-in mechanism used by many mature systems. We put AutoReg into the TableFactory class as an inner class. The reason is to avoid polluting the outer namespace. REGISTER_TABLE_FACTORY is used to define an AutoReg object in the global scope, which is initialized before the main function is executed. The main purpose of defining such a macro It is for convenience, unification, and readability. In theory, it is also possible to write AutoReg completely by hand without using REGISTER_TABLE_FACTORY.

The next problem is that RocksDB has a large number of such XXXFactories. For each XXXFactory, we have to write a set of such codes, which is a lot of work, very boring, and error-prone. So we abstract a Factoryable template class:

  template<class Product>
  class Factoryable { // Factoryable is located in a public header file such as factoryable.h
    using Map = std::map<std::string, Product*(*)(const std::string&)>;
    static Map& get_reg_map() { static Map m; return m; }
    static Product*
    NewProduct(const std::string& clazz, const std::string& params) {
      return get_reg_map()[clazz](params); // omit error checking
    }
    struct AutoReg {
        AutoReg(const std::string& clazz, Product*(*fn)(const std::string&))
        { get_reg_map()[clazz] = fn; }
    };
  };
  class TableFactory : public Factoryable<TableFactory> {
  public:
   // The original code of RocksDB here does not make any changes
  };
  #define REGISTER_FACTORY_PRODUCT(clazz, fn) \
     static decltype(*fn(std::string())::AutoReg gs_##fn(clazz, &fn)

Correspondingly, the calling place of the previous user code is changed to this:

  TableFactory::NewProduct(table_factory_class, table_factory_options);

So far, we only need to make a small amount of modification to the original RocksDB to solve our two problems, and everything seems to be fine. However, there are many such XXXFactories in RocksDB, and many of them need such a Factory mechanism even if they are not a class named XXXFactory, such as Comparator, such as EventListener...

Side plugin solution

For us (ToplingDB), RocksDB is the upstream code. If the upstream can accept our modifications in a timely manner, the traditional plug-in solution is actually good enough. If there are only one or two such modifications, we can try to persuade the upstream to accept these modifications, but we need to make such modifications to a large number of classes in RocksDB, and it will be difficult for the upstream to accept them.

So, can we solve these two problems without making any changes to the original RocksDB?

In fact, as long as the traditional thinking framework is changed from "letting the class have the Factory plug-in function" to "adding the Factory plug-in function to the class", the previous Factoryable code does not need to be modified at all, only the macro definition REGISTER_FACTORY_PRODUCT needs to be changed:

  #define REGISTER_FACTORY_PRODUCT(clazz, fn) \
     static Factoryable<decltype(*fn(std::string())>::AutoReg gs_##fn(clazz, &fn)

For more logical semantics, we rename Factoryable to PluginFactory and add a global template function:

  template<class Product>
  Product* NewPluginProduct(const std::string& clazz, const std::string& params) {
    return PluginFactory<Product>::NewProduct(clazz, params);
  }

The corresponding user code is:

  NewPluginProduct<TableFactory>(table_factory_class, table_factory_options);

Application

In ToplingDB, we use this side-by-side plug-in design pattern. Of course, the corresponding implementation code is much more complicated than the demo code here. Going a step further, also in ToplingDB, we also support:

  • Bypass serialization of objects
  • REST API and web visualization display/modification of objects

These two functions fully reuse PluginFactory, but define two additional template classes, SeDeFunc:

  template<class Object> struct SerDeFunc {
    virtual ~SerDeFunc() {}
    virtual Status Serialize(const Object&, string* output) const = 0;
    virtual Status DeSerialize(Object*, const Slice& input) const = 0;
  };
  template<class Object>
  using SerDeFactory = PluginFactory<std::shared_ptr<SerDeFunc<Object> > >;

and PluginManipFunc:

  template<class Object> struct PluginManipFunc {
    virtual ~PluginManipFunc() {} // Repo refers to ConfigRepository
    virtual void Update(Object*, const json&, const Repo&) const = 0;
    virtual string ToString(const Object&, const json&, const Repo&) const = 0;
  };
  template<class Object>
  using PluginManip = PluginFactory<const PluginManipFunc<Object>*>;