Skip to content

vectordbbench run fail with Error: cannot pickle 'generator' object #706

@cydrain

Description

@cydrain
  1. ubuntu 22.04, use VectorDBBench v1.0.18
  2. run this cli
(vdb)  ~/work/zilliz/VectorDBBench/ [tags/v1.0.18*] vectordbbench doris \
    --host 10.228.141.13 \
    --port 9030 \
    --case-type Performance1024D1M \
    --db-name test \
    --search-concurrent \
    --num-concurrency 10 \
    --stream-load-rows-per-batch 10000 \
    --drop-old \
    --m 16 \
    --ef-construction 128 \
    --session-var hnsw_ef_search=100
  1. see this error
2026-01-28 18:34:38,446 | INFO: task submitted: id=d3c558038548489f93c90d3364a3f34a, d3c558038548489f93c90d3364a3f34a, case number: 1 (interface.py:251) (187408)
2026-01-28 18:34:39,283 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'name': 'Search Performance Test (1M Dataset, 1024 Dim)', 'dataset': {'data': {'name': 'Bioasq', 'size': 1000000, 'dim': 1024, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Doris-2026-01-28T18:34:38.282706'}, drop_old=True (interface.py:181) (187470)
2026-01-28 18:34:39,284 | INFO: Starting run (task_runner.py:143) (187470)
2026-01-28 18:34:39,363 | INFO: Index options prepared: applied_props={'metric_type': 'inner_product', 'index_type': 'hnsw', 'max_degree': '16', 'ef_construction': '128'} not_applied_props={} (doris.py:183) (187470)
2026-01-28 18:34:39,364 | INFO: Creating table performance1024d1m with index {'metric_type': 'inner_product', 'index_type': 'hnsw', 'max_degree': '16', 'ef_construction': '128'} (doris.py:189) (187470)
2026-01-28 18:34:39,377 | INFO: Successfully created table performance1024d1m (doris.py:161) (187470)
2026-01-28 18:34:39,619 | INFO: Read the entire file into memory: test.parquet (dataset.py:392) (187470)
2026-01-28 18:34:39,905 | INFO: Read the entire file into memory: neighbors.parquet (dataset.py:392) (187470)
2026-01-28 18:34:40,243 | INFO: Start performance case (task_runner.py:188) (187470)
2026-01-28 18:34:40,243 | INFO: cosine dataset need normalize. (doris.py:239) (187470)
2026-01-28 18:34:40,456 | WARNING: VectorDB load dataset error: cannot pickle 'generator' object (serial_runner.py:168) (187470)
2026-01-28 18:34:41,208 | WARNING: Failed to run performance case, reason = cannot pickle 'generator' object (task_runner.py:222) (187470)
Traceback (most recent call last):
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/task_runner.py", line 193, in _run_perf_case
    _, load_dur = self._load_train_data()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/utils.py", line 43, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/task_runner.py", line 255, in _load_train_data
    raise e from None
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/task_runner.py", line 253, in _load_train_data
    runner.run()
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 209, in run
    count, _ = self._insert_all_batches()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/utils.py", line 43, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 169, in _insert_all_batches
    raise e from e
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 160, in _insert_all_batches
    count = future.result(timeout=self.timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'generator' object
2026-01-28 18:34:41,212 | WARNING: [1/1] case {'label': <CaseLabel.Performance: 2>, 'name': 'Search Performance Test (1M Dataset, 1024 Dim)', 'dataset': {'data': {'name': 'Bioasq', 'size': 1000000, 'dim': 1024, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Doris-2026-01-28T18:34:38.282706'} failed to run, reason=cannot pickle 'generator' object (interface.py:203) (187470)
Traceback (most recent call last):
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/interface.py", line 182, in _async_task_v2
    case_res.metrics = runner.run(drop_old)
                       ^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/task_runner.py", line 150, in run
    return self._run_perf_case(drop_old)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/task_runner.py", line 224, in _run_perf_case
    raise e from None
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/task_runner.py", line 193, in _run_perf_case
    _, load_dur = self._load_train_data()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/utils.py", line 43, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/task_runner.py", line 255, in _load_train_data
    raise e from None
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/task_runner.py", line 253, in _load_train_data
    runner.run()
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 209, in run
    count, _ = self._insert_all_batches()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/utils.py", line 43, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 169, in _insert_all_batches
    raise e from e
  File "/home/caiyd/work/zilliz/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 160, in _insert_all_batches
    count = future.result(timeout=self.timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/caiyd/miniconda3/envs/vdb/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'generator' object

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions