You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
20170110210127 commit succeded with all files having valid content, archived and cleaned.
Komondor gets the RDD[WriteStatus] and calculates the count() on this RDD to update num of bad records etc
Hoodie persists the RDD to avoid recomputation
Because of DataNode restarts, some of the persisted is not available and Spark re-executes the upsert DAG for the partitions missing
This kicks off the DAG again with the same commit time and if this DAG tried 3 times to re-compute and failed (again most likely because of data nodes restarting)
We delete the data file path is already existing to account for partial failures in the update task, so a bunch of data files are deleted and recreated
The files that were open and task failed were all 4b files (just the parquet header or magic block)
Resolution:
Should not auto-commit by default. Commit should be called after all the processing and publish the data files atomically.
The text was updated successfully, but these errors were encountered:
Root cause:
Resolution:
The text was updated successfully, but these errors were encountered: