-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the ability to disable binary validation. #62
Conversation
Codecov Report
@@ Coverage Diff @@
## master #62 +/- ##
==========================================
+ Coverage 96.88% 96.91% +0.02%
==========================================
Files 2 2
Lines 257 259 +2
==========================================
+ Hits 249 251 +2
Misses 8 8
Continue to review full report at Codecov.
|
Hi @pelson! Long time no see :) Thanks for using my "awesome" tool :) It's nice that the community finds it useful. This turned into a much longer response than I had intended so I'll put a tl;dr at the end. long versionThanks for the PR. I've got a few comments. Also I have not looked at this code base in many months so there's a possibility that I am wrong about the things I say here :)
If we skip validation entirely then conda-mirror will not be able to catch packages that have been updated upstream but have the same filename. FWICT this happens when Anaconda has updated the metadata of the conda package but did not want to "break" backwards compatibility by bumping the package build number (whether or not they should be doing that is a whole other matter...). And this happens with distressing regularity. tl;drFrom a practical sense, I would suggest having the if not dry_run:
if not validate:
# Removing the md5 hashes from all packages will disable binary validation
# by implicitly skipping the md5 check in ``_validate`` which takes a very very
# long time. We still want to respect the blacklist if the user has one set and
# we do want to make sure we are capturing packages that are
# removed upstream or have their sizes changed
for pkg_name, pkg_info in desired_repodata.items():
del pkg_info['md5']
# Only validate if we're not doing a dry-run (and we want validation)
validation_results = _validate_packages(desired_repodata, local_directory, num_threads)
summary['validating-existing'].update(validation_results) (I'm like 95% sure that the above code will work. I have not tested it locally). Finally!This is an open-source tool. I know there have been a few people asking for the ability to skip package validation (@Tigeraus and @pp-mo). If what the community wants is the ability to do none of the validation that is done in |
Thanks for the detailed response @ericdill. Hope all is well with fatherhood 😄👶 I started by implementing md5 only validation, but it turned out that we are paying a surprisingly high cost in simply getting the index.json out of the I completely agree with wanting to validate on first download. I'll take a look at doing that. I wouldn't want to lose the md5 sum from the channel index though. My implicit assumption is that conda will check that when it downloads the package from my mirrored channel. Hold fire, I'll take a look at doing this today.
I thought they had stopped doing this many moons ago because of the overwhelmingly negative feedback they got from us about it. I share the distress if they are still doing it - it makes life a whole lot harder if you can't actually rely on the build number... Are you able to find out when you last saw this by any chance? |
Fatherhood is amazing. Also exhausting. But mostly amazing 😁 . Thanks for asking!
To keep validating on the first download just back out the diff from 698-702.
The local channel index is not affected by removing the md5's from the
Noted. Ping me again when you want more thoughts :)
I guess it has been a while. The last time I had to blacklist a package because it was continually failing its size check on download was 2017-05-24. So for all the packages we are mirroring I have not seen this problem crop up again. My mistake for fear mongering 🙀 . |
@pelson did you want me to step in and push this across the finish line? |
Honestly, yes please! 😄 To be completely transparent though, our release engineering folks recently turned on conda channel proxying in artifactory, and so far it has been a pretty good choice. Basically means we only end up copying the binaries we use, not the whole channel. |
I'm going to close this PR with the honest statement that I don't have the time to help revive it and complete it. If someone would like to get it working with the latest master with appropriate tests, I'll have a look and merge it. FWIW, |
Thanks for the awesome tool!
I'm doing some work in line with #60, and I really am not that fussed about validating the binaries that are mirrored. I'm happy to let users of the mirrored channel to identify issues if the binaries aren't as expected (conda will inform them).
The result is a huge speedup when re-running a synch (now almost instantaneous, even for large repos).