Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync only new packages #63

Open
gpcimino opened this issue Feb 5, 2018 · 8 comments
Open

sync only new packages #63

gpcimino opened this issue Feb 5, 2018 · 8 comments

Comments

@gpcimino
Copy link

gpcimino commented Feb 5, 2018

All,

from a quick look to the code looks like conda-mirror copies all the repository (aka channel) files every time is launched.
Is this correct?

I would be useful to download only missing/new packages in order to save bandwidth.

Thanks
GP

@gpcimino gpcimino changed the title sync only new packges sync only new packages Feb 5, 2018
@ericdill
Copy link
Contributor

ericdill commented Feb 5, 2018

Is this correct?

No this is not correct. conda-mirror computes the packages that it is missing from the upstream channel based on the defined package whitelist/blacklist configuration and then downloads only those.

I would be useful to download only missing/new packages in order to save bandwidth.

This is indeed what is happening

@gpcimino
Copy link
Author

gpcimino commented Feb 5, 2018

OK,

this is a good news!
I am having some issues with the download,

During my download I am experiencing some network instability, so the conda-mirror process crashes.
What i noted is that the packages download so far in the tmp dir are deleted, and in the new run my download have to restart from the beginning. I would be very nice to save the work already done.
Is this currently possible?

To make the download more resilient i modified the code in _download() as following:

(NOTE: code not tested i am testing right now!)

def _download(url, target_directory):
    pause_seconds = [3, 15, 30, 60, 120, 300, 900]
    for secs in pause_seconds:
        try:
            chunk_size = 1024  # 1KB chunks
            logger.info("download_url=%s", url)
            # create a temporary file
            target_filename = url.split('/')[-1]
            download_filename = os.path.join(target_directory, target_filename)
            logger.debug('downloading to %s', download_filename)
            with open(download_filename, 'w+b') as tf:
                ret = requests.get(url, stream=True)
                for data in ret.iter_content(chunk_size):
                    tf.write(data)
            logger.info('File {} succesfully downloaded'.format(download_filename))
            break
        except Exception as ex:
            logger.exception("Failure in network connection")
            logger.info("Retry in {} seconds".format(secs))
            time.sleep(secs)
            logger.info("Try again to download")

If the community is interested i can improve this code (e.g. get the pause_seconds as command line parameters, better exception catch) and submit a PR.

Thanks
GP

@ericdill
Copy link
Contributor

ericdill commented Feb 5, 2018

What is the stack trace that you're seeing from conda-mirror when it crashes?

Generally speaking, PRs are welcome :)

@gpcimino
Copy link
Author

gpcimino commented Feb 5, 2018

Eric,

unfortunately i lost the stack trace with the error from my shell :-(
I will run conda-mirror overnight and if i will experience the error again i will surely report to you.

BTW, Is my assumption on the tmp dir files not being copied on the destination dir in case of system crash correct?

For the PR i am definitively happy to contribute, but i want to test it a bit more.

Thanks for your help

GP

@ericdill
Copy link
Contributor

ericdill commented Feb 5, 2018

BTW, Is my assumption on the tmp dir files not being copied on the destination dir in case of system crash correct?

Correct. They are not automatically being copied to the destination dir in case of a crash.

@101glover
Copy link

I am working through similar issues as gpcimino. Only I can not complete a first run.
when running:
conda-mirror --upstream-channel conda-forge --target-directory local_mirror --platform linux-64 -vvv

I can see packages being downloaded to /tmp but ultimately the process blows up with an error stating:
Remote end closed connection without response

Full stack trace is attached
stacktrace.txt

Not sure if Anacoanda.org is misbehaving. Any ideas?

@magnuhho
Copy link
Contributor

magnuhho commented Oct 25, 2018

101glover, change #71 altered the code so that the packages already downloaded will still be processed if the download fails.

@101glover
Copy link

hey magnuhho, thanks for the notification. That solved my problems!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants