New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
travis lint fails with intermittent "invalid access token" errors #732
Comments
If your command is talking to .org API endpoint, you should not need auth. The "auth" error message could be a red herring. Here, I remove any existing
.com API endpoint requires auth:
The intermittent lint failures are a problem, and this is not the only report I've seen. Unfortunately, I have never seen one firsthand, and at the moment it is not clear to me why it might fail. |
Wouldn't jobs in .com already run |
Hi! The |
@ljharb It does not do that at the moment. Whether it should is a valid (but separate) UX question. To be honest, I'd never thought it should, until you raised it. @ferrarimarco I assume you mean |
Yes! I meant lint, not validate. Thanks! I was expecting the linting to work in an airgapped environment, like many other linting tools, that have the logic builtin. |
but this is not the point I guess :) |
i was pointed to this issue late yesterday from a support issue ( however, my case is slightly different, so hopefully the difference helps eliminate a few things, if they haven't been already. rather than using the travis gem, i'm using the node module travis-lint in most of my js projects since it calls the same api as the gem and doesn't require devs on my team to have the same gem version installed across all of their machines. when i reached out to support, i initially assumed that, with the migration of projects from .org to .com, i would be told there was a deprecation involved to encourage migration of these calls from the .org api to a .com api, or something along those lines. it clearly wasnt an issue of the old api being turned off since the failures are so inconsistent, but i assumed brown-outs were probably being used to get attention and nudge the desired migration a bit. this seemed even more likely to me since I don't remember running into this error in my local builds (which i run often). however, after two months, i don't get the impression that is really the case since the support team at least doesn't seem to be aware of something along those lines. since
also, this seemed to start happening at the same time as another issue I was having around that time (linked to in the ticket that i mentioned above) related to encrypted secrets in shareable configs. from an outsider perspective, it seems like there was likely a large feature release around that time that changed enough things that allowed a few regressions like this to slip through. hopefully some of this detail is helpful in narrowing down the issue. |
I still cannot reason why Can you examine the response body of these 403 requests? It's supposed to be only 10 bytes, which seems too short for most (if not all) of the error message api should be generating. Thanks. |
unfortunately the debug mode of is it possible that this is being introduced by a load balancer or something else in the network stack? like i mentioned above, i dont remember ever seeing this error locally. even though it is random, i run things locally often enough that it seems very unlikely that i wouldn't have hit this yet, as often as i see it when running on travis-ci.com. also, when it does happen, there seems to be a cluster of other projects failing for the same reason around the same time. i obviously dont know any details of your infrastructure layout, but it seems a bit like traffic from .com to the .org api is behaving differently than truly external traffic to the same api. |
i think this is confirmed by https://github.com/pwmckenna/node-travis-ci/blob/75204638777ae5f0fba6ba889f4b085277ae21b9/lib/travis-http.js#L45-L46 where the |
Maybe there is something else at play. In the successful build, the response looks completely different. Different https://travis-ci.com/github/form8ion/ruby-scaffolder/builds/168660399#L291 |
i am unsure if this is related, but thought it was worth at least capturing. i finally saw a failure locally, but the output is different than i've seen in the build logs when run on ci, so i think this failure is at least somewhat different. running again immediately after this failure, passed as expected.
|
Is there any chance the endpoint is getting rate limited, based on how many times everyone is trying to run |
is this still being investigated? i seem to be having this happen significantly more today than previously. i normally could at least restart a build and, while still very annoying and should be unnecessary, usually have it succeed. today, i'm not having success, even after several restart attempts. |
@BanzaiMan can you give any update on this issue at all? This is still a big problem and is not limited to the Ruby gem |
@travi Could you point to what you are observing? Certain kind of input to the command reliably leads to errors (on API, which is an issue in its own right), but I can't reproduce the "intermittent" errors.
None of these are authenticated.
|
@BanzaiMan i'm not sure how to provide the information you are suggesting beyond what i've already provided above and in the support issue that i mentioned. if there is specific information that you think i can send that i haven't already, i think i need help understanding what that is. if reproducing/observing the intermittent nature of this is the problem, is it possible to just follow my account or my as mentioned above, it seems most likely that the failure response is somewhere in front of the actual lint service rather than the service itself, possibly the result of something like rate limiting. here is a sample of failures from just today. it is worth noting that i had several other PRs for the same dependency updates that had both builds pass. i will leave the failures alone for now, but would like to correct them sometime soon. unfortunately, restarting the builds does not leave the old result behind in a linkable way, so that complicates reporting this behavior in a way that is accessible over time. i would appreciate it if you could investigate these builds relatively quickly so that i can fix them by restarting them as a follow up:
please let me know if this information is helpful beyond what has already been provided. |
@travi Can you record the payload and the actual return value from https://github.com/pwmckenna/node-travis-lint/blob/master/lib/travis-lint.js#L22 In my testing, when the input is usable (even when it's empty), we should be returning a data structure that includes $ curl -s -X POST https://api.travis-ci.com/lint
{"lint":{"warnings":[]}}
$ curl -s -X POST -d "language: ruby" https://api.travis-ci.com/lint
{"lint":{"warnings":[]}} Only when it's egregiously wrong, do we return something catastrophic (which should be fixed) $ curl -s -X POST -d "foo" https://api.travis-ci.com/lint
Sorry, we experienced an error.
request_id:6e314394aa334c75e3bf98a1eaae5ed5 |
quite honestly, no. as you can see, i'm observing these errors distributed across many projects in ways that are very unpredictable. it would be a significant investment for me to find a way to record additional information like this across enough of my projects to gather useful data. I've already traced through most of the code stack for the processing of this request above and provided additional debugging output that i've also linked above.
again, this appears to be behavior unrelated to the lint service itself, along the lines of rate limiting in the network stack before the request reaches the actual service. you being this puzzled by how the service could return this type of response suggests this even further in my mind. has there been investigation into this issue outside of the actual lint service?
i could provide lists like i have been recently almost every day. these accounts have plenty of builds running each day that observing them without me in the loop should provide enough examples to identify issues. the tooling you must have internally should be much better at tracing network calls than anything i could possibly add on my end. |
since it doesnt look like travis-ci/travis.rb#732 isnt going to be fixed or even investigated further
since it doesnt look like travis-ci/travis.rb#732 isnt going to be fixed or even investigated further BREAKING CHANGE: the script that lints the travis config will be disabled in new projects. the prefix can be removed to re-enable, but will be unstable until travis-ci/travis.rb#732 is fixed
…roject since the instability resulting from travis-ci/travis.rb#732 not being fixed outweighs the benefit of catching config errors early
since it doesnt look like travis-ci/travis.rb#732 isnt going to be fixed or even investigated further BREAKING CHANGE: the script that lints the travis config will be disabled in new projects. the prefix can be removed to re-enable, but will be unstable until travis-ci/travis.rb#732 is fixed
…roject since the instability resulting from travis-ci/travis.rb#732 not being fixed outweighs the benefit of catching config errors early
since it appears that travis-ci/travis.rb#732 wont be fixed
since travis-ci/travis.rb#732 shows no sign of getting resolution
since travis-ci/travis.rb#732 shows no sign of getting resolution
since it looks like travis-ci/travis.rb#732 wont be fixed
since travis-ci/travis.rb#732 shows no signs of getting fixed
since there is no sign of travis-ci/travis.rb#732 being fixed
since it doesnt look like travis-ci/travis.rb#732 will be fixed
since it appears that travis-ci/travis.rb#732 wont be fixed
since travis-ci/travis.rb#732 appears to be getting no attention
since it appears that travis-ci/travis.rb#732 wont be fixed
Hi! I've an issue with the
travis lint
command.I run it as a build step on Travis CI, and it sometimes fail with the following error:
Most of the times, if I restart the build step/job, it succeeds, with no errors.
So, I think we actually have two issues:
travis lint
shouldn't need any token.travis lint
intermittently fails with this error.Can you assist? Thanks!
Example build: https://travis-ci.com/github/ferrarimarco/kubernetes-playground/jobs/316452382
Gem installation output:
The text was updated successfully, but these errors were encountered: