Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make aarch64 tests complete consistently #129

Closed
workingjubilee opened this issue Dec 2, 2022 · 4 comments
Closed

Make aarch64 tests complete consistently #129

workingjubilee opened this issue Dec 2, 2022 · 4 comments
Labels
ci Changes for continuous integration

Comments

@workingjubilee
Copy link
Contributor

Seriously it's annoying that they constantly have to be restarted because it means the test doesn't complete. Then GH only allows kicking it off again after all jobs have finished. This was okay at first but at some point it has started to happen way more than bearable.

Options, I think, are:

  • Improve test runtimes significantly
  • Stop using spot instances
  • Find a way to have the instance hibernate, then rerun it?
  • Find a way to have GitHub rerun the test if it was interrupted
@BradyBonnette
Copy link
Contributor

There is a 5th way possibly:

Figure out if plrust can be cross-compiled in an x86 environment, then run it in docker with qemu or something, which would be hosted in Github's environment instead of something we set up in The Cloud™️ .

There are probably a lot of things that could be done to the aarch64 CI infrastructure to make it better, but I just assembled the "quickest thing possible that could work". Also being mindful of costs and whatnot.

@workingjubilee
Copy link
Contributor Author

ime QEMU takes even more time to execute.

@BradyBonnette
Copy link
Contributor

ime QEMU takes even more time to execute.

Yeah that's the fear.

Trying to match Github 1:1 on infrastructure implementation details is a bit tricky. I could use normal on-demand instances (as opposed to spot instances) that could "spring to life" when they need, but then I'd probably have to incorporate something with AWS Lambda to get that to work properly (i.e. adds more machinery). I could also consider going down the route of using something like ECS to handle the workload as well if this keeps becoming an issue or if we have so many CI runs that everything sits in a queue forever.

Maybe the short term remedy for now is to increase the number of spot instances available? Right now it's set to 2 available at any given time (simply because commits happened so infrequently), but there's a ramp-up and ramp-down time when the spot instances are terminated and the new ones are instantiated to take their place.

@workingjubilee workingjubilee added the ci Changes for continuous integration label Dec 10, 2022
@eeeebbbbrrrr
Copy link
Contributor

aarch64 tests have been running just fine for as long as I can recall now. CI is super slow, but that's a different problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Changes for continuous integration
Projects
None yet
Development

No branches or pull requests

3 participants