Skip to content

Release v2.0.0-rc2

Latest
Compare
Choose a tag to compare
@SimFG SimFG released this 12 Aug 11:31
· 2 commits to main since this release

Note: If you are using the latest version of the CDC, recommend to use Milvus 2.4.7 or above (The newer the version, the more friendly it will be to cdc), as these Milvus versions have adapted the CDC functionality.

Feature

  1. Supports synchronization of a single collection, but currently, the collection_infos parameter in the create request only supports one collection. Please refer to the documentation for the complete create parameters: https://github.com/zilliztech/milvus-cdc/blob/main/doc/cdc-usage.md#create-request
POST http://localhost:8444/cdc
Content-Type: application/json

{
  "request_type":"create",
  "request_data":{
    ...
    "collection_infos":[
      {
        "name":"hello_milvus"
      }
    ],
    ...
  }
}
  1. Supports collection checkpoints, which can be obtained from the flush method. The returned value needs to be base64 encoded before use (see the sample code below).
    Note: This functionality can be used in combination with the latest backup tool for data migration or creating primary-backup instances. The basic steps are:

    • Backup the current Milvus instance, obtain the collection checkpoint.
    • Restore the data to another Milvus instance.
    • Start the CDC, creating a synchronization task for the collection and passing the previously obtained checkpoint.

    If it's a data migration, you can then stop Milvus, wait for the CDC to migrate the incremental data, and then reclaim the other Milvus services, finally stopping the CDC.
    And if you use this functionality, it's recommended to have a certain understanding of Milvus, as it involves internal Milvus concepts. We will write a case study document based on this information and continue to optimize the usage method.

POST http://localhost:8444/cdc
Content-Type: application/json

{
  "request_type":"create",
  "request_data":{
    ...
    "collection_infos":[
      {
        "name":"hello_milvus",
        "positions": {
            "by-dev-rootcoord-dml_0_450541344162316901v0": "AAAAAAAAAAAAAAAAAAAAAA==",
            "by-dev-rootcoord-dml_1_450541344162316901v1": "AAAAAAAAAAAAAAAAAAAAAA=="
        }
      }
    ],
    ...
  }
}

func Base64MsgPosition(position *msgpb.MsgPosition) string {
        positionByte, err := proto.Marshal(position)
        if err != nil {
                return ""
        }
        return base64.StdEncoding.EncodeToString(positionByte)
}
  1. The source/target Milvus supports custom certificate TLS connections.

    1. source milvus reference:https://github.com/zilliztech/milvus-cdc/blob/main/doc/cdc-usage.md#configuration
    2. target milvus reference:https://github.com/zilliztech/milvus-cdc/blob/main/doc/cdc-usage.md#create-request-with-tls-one-way-authentication
  2. The etcd supports custom certificate TLS connections.

sourceConfig:
  etcd:
    address:
      - http://127.0.0.1:2379
    rootPath: by-dev
    metaSubPath: meta
    enableAuth: false
    username: root
    password: root123456
    enableTLS: false
    tlsCertPath: deployment/cert/client.pem # path to your cert file
    tlsKeyPath: deployment/cert/client.key # path to your key file
    tlsCACertPath: deployment/cert/ca.pem # path to your CACert file
    tlsMinVersion: 1.3
...

Improvement

  1. Improved stability - This is undoubtedly the biggest highlight of this update. Regardless of whether the source Milvus, CDC, or target Milvus is interrupted, the synchronization task can be resumed and restored to the state before the interruption after the system is repaired.
  2. Optimized internal logic - After extensive internal testing, with consistent data verification between the source and target Milvus instances, data consistency is now well-guaranteed even under high concurrency scenarios.
  3. Reduced frequent log output, which minimizes the impact on synchronization efficiency, and support for dynamic log level adjustment.
POST http://localhost:8444/cdc
Content-Type: application/json

{
  "request_type":"maintenance",
  "request_data": {
    "operation": "set_log_level",
    "params": {
      "log_level": "debug"
    }
  }
}

Other

Comprehensive performance data is currently being tested internally, and we will continue to optimize based on the performance test results.

NOTE: Before using the tool, please carefully read the documentation at cdc uasge, as there are still many configuration details that are easy to get wrong. If you encounter any issues, feel free to raise an issue, and I will do my best to help resolve it.