Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

If a git clone fails, the build should not be marked as failing #395

Closed
charliesome opened this Issue Jan 24, 2012 · 20 comments

Comments

Projects
None yet
6 participants

I noticed that one of my builds failed because git timed out while cloning: http://travis-ci.org/#!/charliesome/twostroke/jobs/566176

If a git clone fails, it should retry rather than failing on the spot.

Owner

joshk commented Jan 24, 2012

I agree with you, the problem is if we retry straight away the problem might happen again and again and again .....

We need to put it in a queue to then be retried after 10 min, and failing on the third retry.

If you want to help out with this feature, which will involves changes in multiple places of the code, let us know :)

I might try and give this a crack tomorrow

Contributor

michaelklishin commented Jan 24, 2012

I am curious if git clone exits with different codes for different failures. For example, if the repo does not exist, retrying makes no sense.

Owner

joshk commented Jan 24, 2012

True, maybe we can be smart and read the output?


Sent from my Sega Master System

On 24/01/2012, at 11:12 AM, Michael Klishinreply@reply.github.com wrote:

I am curious if git clone exits with different codes for different failures. For example, if the repo does not exist, retrying makes no sense.


Reply to this email directly or view it on GitHub:
#395 (comment)

Contributor

michaelklishin commented Jan 24, 2012

I am afraid it will never work very well but if we can separate clone failure detection from actual builders and make it possible to test it in isolation by passing in exit code and output, maybe we can try. Output streaming the way it is done today does not make that easy, though, and I am not sure I want to add special cases to the streaming code path.

While I haven't looked at the code, it seems like there's the one shell script that is run, and the output is streamed back to the client (is this right?)

Why not break it up a little bit so the git clone is done and checked before running the rest?

Contributor

michaelklishin commented Jan 24, 2012

@charliesome that is incorrect. Every operation in the build lifecycle is executed separately in a stateful SSH session and we stream output to our log collector as described in the Technical Overview.

So some problems are

  • Too many ways in which network can fail.
  • git clone may or may not exit with different exit codes for them and we cannot retry unconditionally.
  • We can delay retries in the worker code but it does not really have all the information it needs for that. It can requeue the messages, that is all. There is no way to count retries in this case.
  • Streaming is done via callback that Net::SSH provides and we stream everything unconditionally (well, with a limit of 2 MB). To handle git clone output separately we will have to add more state to the worker.

Overall, I find this issue not worth the effort at this point. There are plenty other things we can put our time into. Solving even the 80% cases is challenging, requires changes to 2 applications and likely introducing new message types.

Owner

joshk commented Jan 24, 2012

So although I am not at a comp, and find it hard to type on a mobile phone, I will be brief.

I think this is a good feature request and very possible. I don't have time to work on it but am happy to assist someone on this feature. I do think it will be tricky but it has some great benefits.

@charliesome, I am happy to nut this out with you and give you advise and direction, but I think it will take some time to really get right. Feel free to ping me on irc later today or this week.


Sent from my Sega Master System

On 24/01/2012, at 12:20 PM, Michael Klishinreply@reply.github.com wrote:

@charliesome that is incorrect. Every operation in the build lifecycle is executed separately in a stateful SSH session and we stream output to our log collector as described in the Technical Overview.

So some problems are

  • Too many ways in which network can fail.
  • git clone may or may not exit with different exit codes for them and we cannot retry unconditionally.
  • We can delay retries in the worker code but it does not really have all the information it needs for that. It can requeue the messages, that is all. There is no way to count retries in this case.
  • Streaming is done via callback that Net::SSH provides and we stream everything unconditionally (well, with a limit of 2 MB). To handle git clone output separately we will have to add more state to the worker.

Overall, I find this issue not worth the effort at this point. There are plenty other things we can put our time into. Solving even the 80% cases is challenging, requires changes to 2 applications and likely introducing new message types.


Reply to this email directly or view it on GitHub:
#395 (comment)

Contributor

michaelklishin commented Jan 24, 2012

The easiest solution I see is to add a new final state to all builds (something like "technical issues") and instead of marking the build as failed, we can mark it as having technical issues. This way even though there will be minor changes to all 3 apps and one new message type, we can easily make builds with technical issues not affect build status image.

Retries need to be designed first.

@michaelklishin That seems like a good solution. Most projects move fast enough that retrying a build isn't really worth it, but it's annoying that it shows up as a broken build if something like this happens.

What about allowing a user to manually have travis retry the pull after some hard time limit (i.e. don't let the user try to re-pull the repository to travis within 30 seconds).

Contributor

michaelklishin commented Feb 29, 2012

@sigmavirus24 you can trigger new builds using Test Hook button on github (for master)

Contributor

michaelklishin commented Feb 29, 2012

Plus, VM snapshotting and rollbacks complicate what @sigmavirus24 suggests a great deal. So we are back to square one with retrying builds and how to detect when not to retry it. This will take some time to figure out and won't be trivial to implement.

@michaelklishin The problem for me is that I'm not testing on master I'm testing on a development branch. Thanks for the reply though.

Contributor

michaelklishin commented Feb 29, 2012

@sigmavirus24 I am positive about extending our API to allow retries. We are partially moving into this direction with pre-tested pull requests.

Contributor

henrikhodne commented Nov 16, 2012

Is there any progress on this?

Owner

joshk commented Nov 22, 2012

I have this on the road map for travis-worker, but we also need to add different job results to Job::Test in travis-core, and also the UI.

On 16/11/2012, at 2:14 AM, Henrik Hodne notifications@github.com wrote:

Is there any progress on this?


Reply to this email directly or view it on GitHub.

jlee-r7 commented Jan 25, 2013

Related to #851

Contributor

henrikhodne commented Jan 25, 2013

@joshk I believe this is implemented in travis-worker:sf-compile-sh?

Owner

joshk commented Jan 26, 2013

confirm! :)

On 26/01/2013, at 10:07 AM, Henrik Hodne notifications@github.com wrote:

@joshk I believe this is implemented in travis-worker:sf-compile-sh?


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment