Unable to process .arrow file in the datasets #545

ramses-lee · 2024-03-06T03:40:42Z

A general demonstration is outlined here in the google collar file: https://colab.research.google.com/drive/1oKhivD5T9Yi1gMl0_7dUwqVFqiNfD43k?usp=sharing

The 'flights-200k.arrow" is producing an error every time I tried to read in the file using Pandas package.

domoritz · 2024-03-06T04:14:21Z

Can you try reading it as a file and stream? Maybe try pyarrow directly.

ramses-lee · 2024-03-06T05:34:13Z

Not exactly sure what you meant, but I tested both parquet read_table() function as well as the pyarrow memory_map() function and both gave me an error.

domoritz · 2024-03-06T16:26:02Z

Ahh, I fixed it. The file wasn't closed properly.

This works now.

import pyarrow as pa

with open('data/flights-200k.arrow', 'rb') as f:
    buf = f.read()

    with pa.ipc.open_file(buf) as reader:
        df = reader.read_pandas()

        print(df)

domoritz closed this as completed in 1d17d58 Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to process .arrow file in the datasets #545

Unable to process .arrow file in the datasets #545

ramses-lee commented Mar 6, 2024

domoritz commented Mar 6, 2024

ramses-lee commented Mar 6, 2024

domoritz commented Mar 6, 2024

Unable to process .arrow file in the datasets #545

Unable to process .arrow file in the datasets #545

Comments

ramses-lee commented Mar 6, 2024

domoritz commented Mar 6, 2024

ramses-lee commented Mar 6, 2024

domoritz commented Mar 6, 2024