-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while creating a dataset (ArrowIOError: Invalid parquet file. Corrupt footer) #321
Comments
Your code looks good. Couple of questions:
Up until now we were testing with pyarrow 0.11. I see now there are some failures with pyarrow 0.12, They seem to be unrelated to your scenario however. |
and I do not have any empty values either. Count for all columns is 501. For the reference, I went on and run the HelloWorld example. It gave me the same error. So, I guess, it is something in my setup. |
@fralik , did you figure out if this is an issue in your local setup or petastorm? |
I only use Spark on databricks, so not much options to test it further. I'll close the issue. |
@fralik I am also having same issue on Databricks - Spark, did you find the solution or work-around |
I am trying to make a dataset from existing data, but the process fails with the error message
rrowIOError: Invalid parquet file. Corrupt footer
.Here is my setup. I am using Spark on databricks and have loaded a dataframe into variable
a
with the following schema:And my code:
I got the same error if I try to use
make_batch_reader
. I am using Python 3.6, petastorm version is 0.6.0, pyarrow version is 0.12.1.Does anyone know how to make it working?
The text was updated successfully, but these errors were encountered: