Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to persist checkout accross multiple jobs ? #19

Closed
clmcgrath opened this issue Aug 21, 2019 · 25 comments
Closed

Possible to persist checkout accross multiple jobs ? #19

clmcgrath opened this issue Aug 21, 2019 · 25 comments

Comments

@clmcgrath
Copy link

clmcgrath commented Aug 21, 2019

I am new to github actions and not sure if this is supported currently

we would like to setup multiple conditional jobs on a single repo
is it possible to persist the checked out files accross multiple jobs
ie

task checkout code 

task if has js changes 
    run npm install 
    run eslint 
    build & test on js 

task if has .net changes 
    nuget restore 
    build & test 
    on .net if has changes 
@otter-computer
Copy link

Each job runs in a fresh version of the virtual environment, so you'd need to checkout for each job. You can have multiple steps for each job, and the environment will remain across steps, but you aren't able to persist the environment entirely across separate jobs.

If you need to persist data across different jobs you may want have a look at using the artifact upload and download actions. This will allow you to pass files/folders between different jobs:

https://github.com/actions/upload-artifact
https://github.com/actions/download-artifact

Though for your specific use case I'd recommend looking at the syntax for the on statement, as it will allow you to trigger a workflow with a filter on a specific path, or type of file:

https://help.github.com/en/articles/workflow-syntax-for-github-actions#on

This repository is intended for issues specific to the actions/checkout action and not for help on using it in the greater scope of GitHub Actions. For questions like this I'd recommend writing in to GitHub Support in future! 🙏

@ethomson
Copy link
Contributor

Thanks for the question, and I think that @ThomasShaped addressed it. So I'm going to close this issue - if you have additional feedback, please contact us.

@sushma-4
Copy link

Just checking in, is this still the same? Checkouts will not persist across different jobs?

@MartinX3
Copy link

Sadly it seems so

@keeshux
Copy link

keeshux commented Oct 11, 2021

Just in case someone else stumbles upon this. If you expect the job environment to be retained, that's because you assume that jobs execute within the same runner (read "machine"), which is not necessarily the case.

MrCoder added a commit to ZenUml/core that referenced this issue Oct 21, 2022
Every job must do their own env set up including checkout.actions/checkout#19 (comment)
@joeythomaschaske
Copy link

It seems there needs to be a ton of setup duplication between jobs with github actions which is very unfortunate. A lot of wasted CPU time having to do common setup work in each job.

@jgwinner
Copy link

I ran into this recently again; search engine went right here, so I expect more people will end up here.

We'd been using our Self Hosted runners which give us a lot more control on lifetime of disk objects of course.

I THOUGHT that two jobs that run sequentially were supposed to use the same runner for efficiency, but part of the git checkout integration, post-action is to remove what was checked out, so this doesn't work on GitHub runners.

So, one thing if you have some VM's that are available might be to self-host, and do the checkout with git commands somewhere else, if you really need to use artifacts between jobs, and for some reason don't want the standard methods. You can only run one job at a time though.

With a new action, I found I don't need to branch as much or run on different VM's, so I'm trying to pack everything into one job. I had some jobs running on different servers, and some branching in most of our self-hosted stuff and had forgotten about the "everything you did to setup is now gone" thing. Doh.

@jgwinner
Copy link

Sorry, forgot the point - if this were possible as a flag or something in a .yml file, it would be really handy, cut down on duplication, downloads, etc.

D-Pow added a commit to D-Pow/react-app-boilerplate that referenced this issue Feb 26, 2023
D-Pow added a commit to D-Pow/react-app-boilerplate that referenced this issue Feb 26, 2023
evanugarte added a commit to SCE-Development/Clark that referenced this issue Mar 3, 2023
[actions/checkout#19](actions/checkout#19) says we cant be clever and checkout code once for all jobs
evanugarte added a commit to SCE-Development/Clark that referenced this issue Mar 3, 2023
[actions/checkout#19](actions/checkout#19) says we cant be clever and checkout code once for all jobs
akshtshrma24 added a commit to SCE-Development/Clark that referenced this issue Mar 3, 2023
* make each pr build step its own job

* remove checkout-code job

[actions/checkout#19](actions/checkout#19) says we cant be clever and checkout code once for all jobs

* install dependencies in every step

---------

Co-authored-by: evan ugarte <evanuxd@gmail.com>
@pelmered
Copy link

pelmered commented Jul 5, 2023

I really think you should reconsider this.
Setting up the environment once in one job and be able to run multiple jobs in parallel on top of that setup would be very useful. It would make the jobs run a lot faster and save a lot of server resources.

Especially when you have to spin up services it gets a bit ridiculous that you have to duplicate that for each job for each run.

@vipinvkmenon
Copy link

It would be great if this was supported

@konstantinmv
Copy link

Can we reopen this issue? For any project of considerable size, this is super painful - you either duplicate a lot of code (thus wasting runners time == money) or crumble everything into one job, which means you can't really use these flashy new strategies such as the matrix.
Is it something that we can get back to?

@reloxx13
Copy link

Still no solution to run one checkout job and parallel jobs on it?....

@CoryWritesCode
Copy link

👍 to @konstantinmv 's comment. So much code get's duplicated due to this

@azizur
Copy link

azizur commented Oct 15, 2023

🙁 It's disheartening to see that this issue has been unresolved for nearly four years, causing unnecessary waste of compute and storage resources. This problem not only affects compute and storage but has a broader impact on network usage and memory consumption.

For organisations relying on GitHub Actions to scale their development workflows, the consequences are more profound than just inefficient resource allocation. It directly impacts our ability to streamline processes, optimize costs, and maintain a sustainable approach to software development.

This issue acts as a bottleneck for organisations aiming to maximise the potential of GitHub Actions, hindering the full realisation of the platform's capabilities. We must prioritise this for the benefit of all users, minimise wastage, and contribute to a more environmentally responsible software development ecosystem. 💔💻🌍

@michaelrios
Copy link

Each job runs in a fresh version of the virtual environment

I think people are missing this point. The code needs to get onto the virtual environment somehow. GitHub already caches code that is getting checked out. The "downloading" of the code onto each virtual environment has to happen no matter what. If you want to only do this step once, then add a step to an existing job to reuse the already downloaded code in an existing environment.

@pelmered
Copy link

pelmered commented Nov 30, 2023

Each job runs in a fresh version of the virtual environment

I think people are missing this point. The code needs to get onto the virtual environment somehow. GitHub already caches code that is getting checked out. The "downloading" of the code onto each virtual environment has to happen no matter what. If you want to only do this step once, then add a step to an existing job to reuse the already downloaded code in an existing environment.

The simple and efficient solution would be to just setup one virtual environment and run multiple jobs in parallel in that environment. Each job spins up on a separate (virtual) processor core. You should be able to assign how many cores you want in your environment, and the billed action time could be multiplied by that amount(or even better if you can count active time across the different jobs).
The point is that is has to be a much better way to solve this very common use case.

@gauravingalkarext
Copy link

gauravingalkarext commented Jan 31, 2024

Facing the same issue. lots of duplication through actions since there is no way to reuse checkout across jobs. :(

went through above suggested workarounds as well. but it would be really nice to have something nicer (may be flag which says checkout the new container with already checkedout code -- so one does not have to checkout again the job)

@avaldez-gr
Copy link

Hate the duplication and the lack of separation of jobs due to this feature request being closed. This issue should really be considered again.

@shadyelgewily-slimstock
Copy link

Due to this issue, we are using a single job with multiple steps, instead of nicely separating build, clean/lint and run in separate jobs. Please reconsider reopening this ticket.

@Anonymous-Coward
Copy link

We run github actions on custom runners on Kubernetes, and as a consequence have better control over what the runners do.

What we do, as a workaround for github actions not sharing the filesystem between different jobs of the same workflow, is to have a volume that's mounted into all runners, and do filesystem stuff on that volume.

It's not without issues, and it introduces some potential problems. However, for us, at least, in complex workflows, it saves more trouble than it causes.

In theory, you could use the default caching mechanism of github actions. Only, that mechanism isn't any better, with regard to moving tons of bytes over the network - which got us to mount the volume in runners in the first place.

@kaidokert
Copy link

kaidokert commented Aug 11, 2024

What we do, as a workaround for github actions not sharing the filesystem between different jobs of the same workflow, is to have a volume that's mounted into all runners, and do filesystem stuff on that volume.

I was just looking at how Actions Runner Controller does this, and seems like runner-tool-cache ( aka /opt/hostedtoolcache or C:\hostedtoolcache ) provides a decent template how to configure a shared writable mount.

Our plan is to put bare mirrors of our very large repos on a shared PV mount, hoping to get very fast checkouts into the still otherwise ephemeral working containers.

@Anonymous-Coward
Copy link

I was just looking at how Actions Runner Controller does this, and seems like runner-tool-cache ( aka /opt/hostedtoolcache or C:\hostedtoolcache ) provides a decent template how to configure a shared writable mount.

I will have to check that out. Haven't looked into what the controller does in detail.

Our plan is to put bare mirrors of our very large repos on a shared PV mount, hoping to get very fast checkouts into the still otherwise ephemeral working containers.

The one thing, from my experience so far, with this, is to take care of potential conflicts when different jobs using the same repo(s) run concurrently. Like if two jobs try git clone or git pull at the same time, or when one does a mvn package while another does a mvn clean. As I've recently discovered, the concurrency features of github actions are not quite what you need to handle such issues nicely in all situations relying exclusively on github actions.

@kgrozdanovski
Copy link

It seems that GitHub is ignoring this issue since it could potentially cut down a portion of their profits which come from forcing users to use an inefficient process, thus wasting more actions minutes/compute/money and consequently harming the environment.

I would recommend to all newcomers to this discussion (in which GitHub is willingly absent) to not waste time hoping for an official solution but rather look into something like this:

We run github actions on custom runners on Kubernetes, and as a consequence have better control over what the runners do.

What we do, as a workaround for github actions not sharing the filesystem between different jobs of the same workflow, is to have a volume that's mounted into all runners, and do filesystem stuff on that volume.

It's not without issues, and it introduces some potential problems. However, for us, at least, in complex workflows, it saves more trouble than it causes.

In theory, you could use the default caching mechanism of github actions. Only, that mechanism isn't any better, with regard to moving tons of bytes over the network - which got us to mount the volume in runners in the first place.

@alan-czajkowski
Copy link

alan-czajkowski commented Sep 20, 2024

@ethomson how is this a closed issue? this is a fundamental functionality of any build system

@ethomson
Copy link
Contributor

Hi @alan-czajkowski! I don't work at GitHub anymore and haven't in several years, so I was surprised by this notification!

In any case, I'm not sure what you have in mind exactly, but there are a couple of opportunities that could solve it. Obviously you can't just "persist" data across two independent virtual machines, but you could use a network storage device or use actions/cache to identify some state to save and restore. Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests