Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMDBstore using multiprocessing with vindex fails #737

Open
sanjaysrikakulam opened this issue May 10, 2021 · 0 comments
Open

LMDBstore using multiprocessing with vindex fails #737

sanjaysrikakulam opened this issue May 10, 2021 · 0 comments

Comments

@sanjaysrikakulam
Copy link

sanjaysrikakulam commented May 10, 2021

Minimal, reproducible code sample, a copy-pastable example if possible

import os
import zarr
from numcodecs import Blosc
import numpy as np
from multiprocessing import Process

# Zarr store related #
store_file = "data/bloomfilters.zarr"
store = zarr.LMDBStore(store_file)
sync_dir = os.path.splitext(store_file)[0] + ".sync"
synchronizer = zarr.ProcessSynchronizer(sync_dir)
Blosc.use_threads = False
compressor = Blosc(cname="zstd", clevel=9, shuffle=Blosc.BITSHUFFLE)
root_group = zarr.group(store=store, synchronizer=synchronizer)
bf_group = root_group.create_group(name="BloomFilterIndex")

# Parallelize bloom filter writes to zarr #
# samples_list size is 205
bloomfilter_size = 600000000
n_cores = 64
processes = []
for samp_list in np.array_split(samples_list, n_cores):
    proc = Process(target=write_bf_to_zarr, args=(bf_group, compressor, synchronizer, bloomfilter_size, samples_list))
    proc.start()
    processes.append(proc)

for proc in processes:
    proc.join()

# Write function #
def write_bf_to_zarr(bf_group_handle, compressor, synchronizer, bloomfilter_size, samples_list):
    for sample_info in samples_list:
        col_idx = sample_info[0]
        file_paths = sample_info[1]
        bloomfilter = bf_group_handle.zeros(
            f"sample_{col_idx}",
            shape=(bloomfilter_size),
            chunks=(int(bloomfilter_size/4)),
            dtype="u4",
            compressor=compressor,
            synchronizer=synchronizer
        )
        # Note: key and count are return values of np.unique(list_object, return_counts=True)
        key, count = bloomfilter_cython(file_paths, bloomfilter_size)
        bloomfilter.vindex[[key]] = count

Problem description

Error: 
lmdb.BadRslotError: mdb_txn_begin: MDB_BAD_RSLOT: Invalid reuse of reader locktable slot

Explain why the current behavior is a problem, what the expected output/behaviour is, and why the expected output/behaviour is a better solution.

The error occurs only when using .vindex. When an entire array is written like below I do not see this error.

# some_array is of same shape and dtype as bloomfilter
bloomfilter[:] = some_array

Version and installation information

  • Zarr: 2.6.1
  • Numcodecs: 0.7.3
  • python 3.7.7
  • Linux (CentOS 7)
  • using pip into conda environment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant