Skip to content

ObjectStore.list_dir corrupts directory names due to lstrip vs removeprefix #3753

@TomNicholas

Description

@TomNicholas

_transform_list_dir in zarr/storage/_obstore.py uses str.lstrip(prefix) on line 263 to strip path prefixes from common_prefixes. But lstrip strips individual characters from the argument, not a prefix string. This corrupts directory names when the prefix shares characters with child directory names.

Line 264 already correctly uses removeprefix for objects — the fix is to do the same for prefixes on line 263.

Reproducer:

# /// script                                                                                                                                                   
# requires-python = ">=3.12"
# dependencies = ["zarr>=3", "obstore", "xarray", "numpy"]
# ///
import asyncio, tempfile
import numpy as np, xarray as xr                                                                                                                               
from obstore.store import LocalStore
from zarr.storage import ObjectStore                                                                                                                           
                
tmpdir = tempfile.mkdtemp()                                                                                                                                    
parent_dir = f"{tmpdir}/bucket_root"
filepath = f"{parent_dir}/subdir/data.zarr"                                                                                                                    
                
xr.Dataset({"temp": (("x", "y"), np.arange(12, dtype="float32").reshape(3, 4))}).to_zarr(                                                                      
    filepath, consolidated=False, zarr_format=2
)                                                                                                                                                              
                
store = LocalStore(prefix=parent_dir)
zarr_store = ObjectStore(store=store)

async def main():
    results = [item async for item in zarr_store.list_dir("subdir/data.zarr")]
    print(results)  # ['emp', '.zattrs', '.zgroup'] — 'temp' corrupted to 'emp'                                                                                
                                                                                                                                                                
asyncio.run(main())

Fix:

- prefixes = [obj.lstrip(prefix).lstrip("/") for obj in list_result["common_prefixes"]]                                                                        
+ prefixes = [obj.removeprefix(prefix).lstrip("/") for obj in list_result["common_prefixes"]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions