-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to persist checkout accross multiple jobs ? #19
Comments
Each job runs in a fresh version of the virtual environment, so you'd need to checkout for each job. You can have multiple steps for each job, and the environment will remain across steps, but you aren't able to persist the environment entirely across separate jobs. If you need to persist data across different jobs you may want have a look at using the artifact upload and download actions. This will allow you to pass files/folders between different jobs: https://github.com/actions/upload-artifact Though for your specific use case I'd recommend looking at the syntax for the https://help.github.com/en/articles/workflow-syntax-for-github-actions#on This repository is intended for issues specific to the |
Thanks for the question, and I think that @ThomasShaped addressed it. So I'm going to close this issue - if you have additional feedback, please contact us. |
Just checking in, is this still the same? Checkouts will not persist across different jobs? |
Sadly it seems so |
Just in case someone else stumbles upon this. If you expect the job environment to be retained, that's because you assume that jobs execute within the same runner (read "machine"), which is not necessarily the case. |
Every job must do their own env set up including checkout.actions/checkout#19 (comment)
It seems there needs to be a ton of setup duplication between jobs with github actions which is very unfortunate. A lot of wasted CPU time having to do common setup work in each job. |
I ran into this recently again; search engine went right here, so I expect more people will end up here. We'd been using our Self Hosted runners which give us a lot more control on lifetime of disk objects of course. I THOUGHT that two jobs that run sequentially were supposed to use the same runner for efficiency, but part of the git checkout integration, post-action is to remove what was checked out, so this doesn't work on GitHub runners. So, one thing if you have some VM's that are available might be to self-host, and do the checkout with git commands somewhere else, if you really need to use artifacts between jobs, and for some reason don't want the standard methods. You can only run one job at a time though. With a new action, I found I don't need to branch as much or run on different VM's, so I'm trying to pack everything into one job. I had some jobs running on different servers, and some branching in most of our self-hosted stuff and had forgotten about the "everything you did to setup is now gone" thing. Doh. |
Sorry, forgot the point - if this were possible as a flag or something in a .yml file, it would be really handy, cut down on duplication, downloads, etc. |
… to future jobs See: actions/checkout#19
… to future jobs See: actions/checkout#19
[actions/checkout#19](actions/checkout#19) says we cant be clever and checkout code once for all jobs
[actions/checkout#19](actions/checkout#19) says we cant be clever and checkout code once for all jobs
* make each pr build step its own job * remove checkout-code job [actions/checkout#19](actions/checkout#19) says we cant be clever and checkout code once for all jobs * install dependencies in every step --------- Co-authored-by: evan ugarte <evanuxd@gmail.com>
I really think you should reconsider this. Especially when you have to spin up services it gets a bit ridiculous that you have to duplicate that for each job for each run. |
It would be great if this was supported |
Can we reopen this issue? For any project of considerable size, this is super painful - you either duplicate a lot of code (thus wasting runners time == money) or crumble everything into one job, which means you can't really use these flashy new strategies such as the |
Still no solution to run one checkout job and parallel jobs on it?.... |
👍 to @konstantinmv 's comment. So much code get's duplicated due to this |
🙁 It's disheartening to see that this issue has been unresolved for nearly four years, causing unnecessary waste of compute and storage resources. This problem not only affects compute and storage but has a broader impact on network usage and memory consumption. For organisations relying on GitHub Actions to scale their development workflows, the consequences are more profound than just inefficient resource allocation. It directly impacts our ability to streamline processes, optimize costs, and maintain a sustainable approach to software development. This issue acts as a bottleneck for organisations aiming to maximise the potential of GitHub Actions, hindering the full realisation of the platform's capabilities. We must prioritise this for the benefit of all users, minimise wastage, and contribute to a more environmentally responsible software development ecosystem. 💔💻🌍 |
I think people are missing this point. The code needs to get onto the virtual environment somehow. GitHub already caches code that is getting checked out. The "downloading" of the code onto each virtual environment has to happen no matter what. If you want to only do this step once, then add a step to an existing job to reuse the already downloaded code in an existing environment. |
The simple and efficient solution would be to just setup one virtual environment and run multiple jobs in parallel in that environment. Each job spins up on a separate (virtual) processor core. You should be able to assign how many cores you want in your environment, and the billed action time could be multiplied by that amount(or even better if you can count active time across the different jobs). |
Facing the same issue. lots of duplication through actions since there is no way to reuse checkout across jobs. :( went through above suggested workarounds as well. but it would be really nice to have something nicer (may be flag which says checkout the new container with already checkedout code -- so one does not have to checkout again the job) |
Hate the duplication and the lack of separation of jobs due to this feature request being closed. This issue should really be considered again. |
Due to this issue, we are using a single job with multiple steps, instead of nicely separating build, clean/lint and run in separate jobs. Please reconsider reopening this ticket. |
related comment actions/checkout#19 workflow examples: - https://github.com/actions/starter-workflows/blob/main/ci/npm-publish.yml - https://github.com/withastro/astro/blob/main/.github/workflows/ci.yml - https://github.com/withastro/starlight/blob/main/.github/workflows/ci.yml - https://github.com/vitejs/vite/blob/main/.github/workflows/ci.yml
We run github actions on custom runners on Kubernetes, and as a consequence have better control over what the runners do. What we do, as a workaround for github actions not sharing the filesystem between different jobs of the same workflow, is to have a volume that's mounted into all runners, and do filesystem stuff on that volume. It's not without issues, and it introduces some potential problems. However, for us, at least, in complex workflows, it saves more trouble than it causes. In theory, you could use the default caching mechanism of github actions. Only, that mechanism isn't any better, with regard to moving tons of bytes over the network - which got us to mount the volume in runners in the first place. |
I was just looking at how Actions Runner Controller does this, and seems like Our plan is to put bare mirrors of our very large repos on a shared PV mount, hoping to get very fast checkouts into the still otherwise ephemeral working containers. |
I will have to check that out. Haven't looked into what the controller does in detail.
The one thing, from my experience so far, with this, is to take care of potential conflicts when different jobs using the same repo(s) run concurrently. Like if two jobs try |
It seems that GitHub is ignoring this issue since it could potentially cut down a portion of their profits which come from forcing users to use an inefficient process, thus wasting more actions minutes/compute/money and consequently harming the environment. I would recommend to all newcomers to this discussion (in which GitHub is willingly absent) to not waste time hoping for an official solution but rather look into something like this:
|
@ethomson how is this a closed issue? this is a fundamental functionality of any build system |
Hi @alan-czajkowski! I don't work at GitHub anymore and haven't in several years, so I was surprised by this notification! In any case, I'm not sure what you have in mind exactly, but there are a couple of opportunities that could solve it. Obviously you can't just "persist" data across two independent virtual machines, but you could use a network storage device or use |
I am new to github actions and not sure if this is supported currently
we would like to setup multiple conditional jobs on a single repo
is it possible to persist the checked out files accross multiple jobs
ie
The text was updated successfully, but these errors were encountered: