Skip to content

LFX: Coprocessor Plugin #9747

@andizimmerer

Description

@andizimmerer

Description

TiKV is the storage component in the TiDB ecosystem, however, the distribution computation principle suggests that computation should be as close to the data source as possible. Therefore, TiKV has embedded a subset of the TiDB executor framework to push down some computation tasks when applicable.

But TiKV's capability should be far beyond that, as many distributed components can be built on top of TiKV, such as cache, full-text search engine, graph database, and NoSQL database. And same as TiDB, this product will also like to push down specific computation to TiKV, which requires the coprocessor to be customizable, aka pluggable.

For instance, a full-text searching engine will persist the original document and n-gram index on TiKV. It'll be a waste of resources if we read back and then update the index from a client. On the contrary, the coprocessor plugin can generate the index from the original document, and update the index in place. What's more, the coprocessor plugin can perform an index scan directly on TiKV.

Document Collection

LFX Program information

  • Mentor of this issue: @andylokandy @skyzh
  • Recommended skills: Rust
  • Estimated Workloads: 3 Man-Month

Milestones and action items

  • Add a new config option in TiKvConfig for the new coprocessor. The new config should specify (multiple) plugins that are loaded at startup. Also add new config to config-template.toml. Plugins are loaded from the local file system. Make sure to document the config option here: https://docs.pingcap.com/tidb/stable/tikv-configuration-file
  • Add a new coprocessor server endpoint in tikv::server::service::kv::Service.
    The old coprocessor's module is located in coprocessor. The module of the new coprocessor will be called coprocessor_v2.
    • Add a new coprocessor protobuf message in tikvpb.proto like specified in the RFC. We won’t touch the old coprocessor proto messages. Coprocessor plugins will need to handle the type of protobuf message on their own; they will just receive the raw bytes.
    • Dispatch coprocessor requests to proper plugin based on the name that is given in the copr_name field of the proto request.
    • Implement timeout mechanism for coprocessor request.
      Note: Now that we don't use a transaction manager anymore for the first milestone, a timeout mechanism doesn't make sense because transactions can't deadlock.
  • Load coprocessor plugins at startup in endpoint (proof-of-concept).
    • Create plugin_api crate for coprocessor (should not have any dependencies).
      • Provide a CoprocessorPlugin trait that a plugin has to implement like in the POC. In the POC, this is called Endpoint, but I'll use Plugin in this document to avoid confusion with the new coprocessor endpoint.
      • Provide a plugin_api::RawStorage trait that follows a subset of functions from tikv::storage::Storage. The API design is provided in the RFC.
      • Implement impl plugin_api::RawStorage for tikv::storage::Storage somewhere in the new coprocessor endpoint.
    • Implement hot reloading for plugins. There is one plugin directory and plugins will be loaded/unloaded when files change
    • Load plugin as dylib with the libloading crate.
    • Reject plugins on version mismatch between plugin_api and plugin.
    • Reject client requests where copr_version_constraint is violated.
  • Provide a Plugin SDK: a standalone rust library that help setup the build process for the plugin. The SDK might look like a template repository that you can clone and start working on a new coprocessor plugin.
    • Make sure the plugin is compiled correctly as a dylib.
    • Provide some easy-to-use testing mechanisms.
    • Provide documentation on how to implement a custom coprocessor plugin. Make sure to correctly communicate risks of raw data access in coprocessor plugins.
    • Should we provide a possibility for benchmarking in the SDK?
  • Provide an example plugin using the new coprocessor plugin system.
  • Optional: implement a plugin that exposes an Arrow Flight service

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions