Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when reading from HDFS backend #1852

Closed
AwalkZY opened this issue Nov 1, 2023 · 3 comments
Closed

Error when reading from HDFS backend #1852

AwalkZY opened this issue Nov 1, 2023 · 3 comments
Assignees

Comments

@AwalkZY
Copy link

AwalkZY commented Nov 1, 2023

First of all, thanks for your excellent work in this field!

I encounter a problem in reading from HDFS backend. I can normally write to HDFS backend but when I try to read the data out from the same location, I got an Error: tiledb.cc.TileDBError: [TileDB::Array] Error: Caught std::exception: [TileDB::HDFS] Error: Cannot list files in hdfs://path/to/__fragment_meta.

And it's weird that, this error will always occur when I try to read from HDFS in the first execution. And it never occurs when I carry out the second attempt. i.e.,

import tiledb
array_uri = "hdfs://path/to/dataset"
with tiledb.DenseArray(array_uri, mode='r', ctx=ctx) as A:
    read_data = A[:]["a"]
# Always raise an TileDBError

with tiledb.DenseArray(array_uri, mode='r', ctx=ctx) as A:
    read_data = A[:]["a"]
# Never raise any Error and the data is what I want.

And the code for writing data is

config = tiledb.Config()
ctx = tiledb.Ctx(config)

dom = tiledb.Domain(tiledb.Dim(name="rows", domain=(0, 999), tile=1000, dtype=np.int32))
schema = tiledb.ArraySchema(domain=dom, sparse=False,
                            attrs=[tiledb.Attr(name="a", dtype=np.float32)])
tiledb.Array.create(array_uri, schema, ctx=ctx)

data = np.random.rand(1000).astype(np.float32)
with tiledb.DenseArray(array_uri, mode='w', ctx=ctx) as A:
    A[:] = data

And when I try it on the local file system, the problem doesn't occur. So I guess the problem is the interaction between HDFS backend and TileDB. I will appreciate it if you can give me a hand. Thanks again.

@ihnorton ihnorton self-assigned this Dec 14, 2023
@ihnorton
Copy link
Member

Thanks for opening this issue. We will try to reproduce, and look for a fix as soon as possible.

@teo-tsirpanis
Copy link
Member

I am working on this issue and could not successfully reproduce it. @AwalkZY can you provide us with more information about your Hadoop instalation, as well as check Hadoop's logs for any errors?

@ihnorton
Copy link
Member

Please ping and we can reopen if more information is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants