LMDBstore using multiprocessing with vindex fails #737

sanjaysrikakulam · 2021-05-10T14:19:25Z

Minimal, reproducible code sample, a copy-pastable example if possible

import os
import zarr
from numcodecs import Blosc
import numpy as np
from multiprocessing import Process

# Zarr store related #
store_file = "data/bloomfilters.zarr"
store = zarr.LMDBStore(store_file)
sync_dir = os.path.splitext(store_file)[0] + ".sync"
synchronizer = zarr.ProcessSynchronizer(sync_dir)
Blosc.use_threads = False
compressor = Blosc(cname="zstd", clevel=9, shuffle=Blosc.BITSHUFFLE)
root_group = zarr.group(store=store, synchronizer=synchronizer)
bf_group = root_group.create_group(name="BloomFilterIndex")

# Parallelize bloom filter writes to zarr #
# samples_list size is 205
bloomfilter_size = 600000000
n_cores = 64
processes = []
for samp_list in np.array_split(samples_list, n_cores):
    proc = Process(target=write_bf_to_zarr, args=(bf_group, compressor, synchronizer, bloomfilter_size, samples_list))
    proc.start()
    processes.append(proc)

for proc in processes:
    proc.join()

# Write function #
def write_bf_to_zarr(bf_group_handle, compressor, synchronizer, bloomfilter_size, samples_list):
    for sample_info in samples_list:
        col_idx = sample_info[0]
        file_paths = sample_info[1]
        bloomfilter = bf_group_handle.zeros(
            f"sample_{col_idx}",
            shape=(bloomfilter_size),
            chunks=(int(bloomfilter_size/4)),
            dtype="u4",
            compressor=compressor,
            synchronizer=synchronizer
        )
        # Note: key and count are return values of np.unique(list_object, return_counts=True)
        key, count = bloomfilter_cython(file_paths, bloomfilter_size)
        bloomfilter.vindex[[key]] = count

Problem description

Error: 
lmdb.BadRslotError: mdb_txn_begin: MDB_BAD_RSLOT: Invalid reuse of reader locktable slot

Explain why the current behavior is a problem, what the expected output/behaviour is, and why the expected output/behaviour is a better solution.

The error occurs only when using .vindex. When an entire array is written like below I do not see this error.

# some_array is of same shape and dtype as bloomfilter
bloomfilter[:] = some_array

Version and installation information

Zarr: 2.6.1
Numcodecs: 0.7.3
python 3.7.7
Linux (CentOS 7)
using pip into conda environment

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LMDBstore using multiprocessing with vindex fails #737

LMDBstore using multiprocessing with vindex fails #737

sanjaysrikakulam commented May 10, 2021 •

edited

LMDBstore using multiprocessing with vindex fails #737

LMDBstore using multiprocessing with vindex fails #737

Comments

sanjaysrikakulam commented May 10, 2021 • edited

Minimal, reproducible code sample, a copy-pastable example if possible

Problem description

Version and installation information

sanjaysrikakulam commented May 10, 2021 •

edited