Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude non-existing objects from absolute addresses list #4367

Merged
merged 3 commits into from
Oct 17, 2022

Conversation

talSofer
Copy link
Contributor

Fixes #4366

How was this tested

  • Tested manually in aws and azure on databricks: 1. created an addresses file that includes paths to non-existing objects 2. run backup and restore utility.

@talSofer talSofer added the exclude-changelog PR description should not be included in next release changelog label Oct 13, 2022
val hcValues =
spark.sparkContext.broadcast(HadoopUtils.getHadoopConfigurationValues(hc, "fs."))
val configMapper = new ConfigMapper(hcValues)
absolutePathsDF.filter(x => isPathExist(x, configMapper))
Copy link
Contributor

@johnnyaug johnnyaug Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use fs.exists?

Suggested change
absolutePathsDF.filter(x => isPathExist(x, configMapper))
absolutePathsDF.filter(x => {
val p = new Path(x)
p.getFileSystem(configMapper.configuration).exists(p)
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are absolutely right :) I forgot about it, thanks!
done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that to unify our usage we should get the FS from the path, and also exists saves a few lines of code, but you should catch IOException, and it might be still implemented under the isPathExist method.

Copy link
Contributor

@Jonathan-Rosenberg Jonathan-Rosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (only @johnnyaug's comment)

@talSofer talSofer merged commit 7ce8a17 into master Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GC backup and restore fails when list of expired addresses contains path to non-existing object
3 participants