Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

travis lint fails with intermittent "invalid access token" errors #732

Open
ferrarimarco opened this issue Apr 8, 2020 · 22 comments
Open
Labels

Comments

@ferrarimarco
Copy link

ferrarimarco commented Apr 8, 2020

Hi! I've an issue with the travis lint command.

I run it as a build step on Travis CI, and it sometimes fail with the following error:

$ echo "n" | travis lint -x
Shell completion not installed. Would you like to install it now? |y| invalid access token - try running travis login
The command "echo "n" | travis lint -x" exited with 1.

Most of the times, if I restart the build step/job, it succeeds, with no errors.

So, I think we actually have two issues:

  1. travis lint shouldn't need any token.
  2. travis lint intermittently fails with this error.

Can you assist? Thanks!

Example build: https://travis-ci.com/github/ferrarimarco/kubernetes-playground/jobs/316452382

Gem installation output:

7.91s$ gem install travis --no-document
Fetching: multipart-post-2.1.1.gem (100%)
Successfully installed multipart-post-2.1.1
Fetching: faraday-0.17.3.gem (100%)
Successfully installed faraday-0.17.3
Fetching: faraday_middleware-0.14.0.gem (100%)
Successfully installed faraday_middleware-0.14.0
Fetching: highline-1.7.10.gem (100%)
Successfully installed highline-1.7.10
Fetching: backports-3.17.0.gem (100%)
Successfully installed backports-3.17.0
Fetching: concurrent-ruby-1.1.6.gem (100%)
Successfully installed concurrent-ruby-1.1.6
Fetching: i18n-1.8.2.gem (100%)
HEADS UP! i18n 1.1 changed fallbacks to exclude default locale.
But that may break your application.
If you are upgrading your Rails application from an older version of Rails:
Please check your Rails app for 'config.i18n.fallbacks = true'.
If you're using I18n (>= 1.1.0) and Rails (< 5.2.2), this should be
'config.i18n.fallbacks = [I18n.default_locale]'.
If not, fallbacks will be broken in your app by I18n 1.1.x.
If you are starting a NEW Rails application, you can ignore this notice.
For more info see:
https://github.com/svenfuchs/i18n/releases/tag/v1.1.0
Successfully installed i18n-1.8.2
Fetching: thread_safe-0.3.6.gem (100%)
Successfully installed thread_safe-0.3.6
Fetching: tzinfo-1.2.7.gem (100%)
Successfully installed tzinfo-1.2.7
Fetching: activesupport-5.2.4.2.gem (100%)
Successfully installed activesupport-5.2.4.2
Fetching: multi_json-1.14.1.gem (100%)
Successfully installed multi_json-1.14.1
Fetching: public_suffix-4.0.4.gem (100%)
Successfully installed public_suffix-4.0.4
Fetching: addressable-2.7.0.gem (100%)
Successfully installed addressable-2.7.0
Fetching: net-http-persistent-2.9.4.gem (100%)
Successfully installed net-http-persistent-2.9.4
Fetching: net-http-pipeline-1.0.1.gem (100%)
Successfully installed net-http-pipeline-1.0.1
Fetching: gh-0.16.0.gem (100%)
Successfully installed gh-0.16.0
Fetching: launchy-2.5.0.gem (100%)
Successfully installed launchy-2.5.0
Fetching: ffi-1.12.2.gem (100%)
Building native extensions. This could take a while...
Successfully installed ffi-1.12.2
Fetching: ethon-0.12.0.gem (100%)
Successfully installed ethon-0.12.0
Fetching: typhoeus-0.8.0.gem (100%)
Successfully installed typhoeus-0.8.0
Fetching: websocket-1.2.8.gem (100%)
Successfully installed websocket-1.2.8
Fetching: pusher-client-0.6.2.gem (100%)
Successfully installed pusher-client-0.6.2
Fetching: travis-1.8.13.gem (100%)
Successfully installed travis-1.8.13
23 gems installed
@ferrarimarco ferrarimarco changed the title travis validate fails with "invalid access token" intermittent errors travis lint fails with intermittent "invalid access token" errors Apr 8, 2020
@BanzaiMan BanzaiMan added the bug label May 27, 2020
@BanzaiMan
Copy link
Contributor

BanzaiMan commented May 27, 2020

If your command is talking to .org API endpoint, you should not need auth. The "auth" error message could be a red herring.

Here, I remove any existing travis.rb configuration and invoke the lint command:

$ rm -rf ~/.travis && travis lint --debug --debug-http --skip-completion-check -x travis_production_test/.travis.yml
** reading travis_production_test/.travis.yml
** POST "lint" {"content"=>"language: ruby\ndist: bionic\n\nrvm: 2.5.4\n\ninstall: skip\nscript: skip\n\njobs:\n  include:\n    - stage: test\n    - stage: deVeloP\n    - stage: Test\n    - stage: develop\n\n# matrix:\n#   include:\n#     - language: generic\n#     - language: java\n    # - sudo: false\n    #   dist: precise\n    #   env: DIST=precise STACK=EC2\n    # - sudo: required\n    #   dist: precise\n    #   env: DIST=precise STACK=GCE\n    # - sudo: false\n    #   dist: trusty\n    #   env: DIST=trusty STACK=EC2\n    # - sudo: required\n    #   dist: trusty\n    #   env: DIST=trusty STACK=GCE\n    # - sudo: false\n    #   dist: trusty\n    #   group: edge\n    #   env: DIST=trusty STACK=EC2 GROUP=edge\n    # - sudo: required\n    #   dist: trusty\n    #   group: edge\n    #   env: DIST=trusty STACK=GCE GROUP=edge\n#     - os: osx\n\nnotifications:\n  email: false\n\n  #  slack:\n  #  rooms:\n  #    - secure: \"KZTfjxhkOdDDZYD2ThBILt2ShYtdEee3PudFZlQBfpxG9N470GfxnKVfOl7Odu4lN6cIupwC4S4BnehubEywSR99l60VmfhePUCTqDLYy+iR9+guTvaECdXdaau3jTKhFdu5u60ITsiw5Yb+hwAaGFO0HmQhqR9XVr/U8IGQlok=\"\n  #  template:\n  #    - \"Build {build_url}|#%{build_number}> (<%{compare_url}|%{commit}>) of %{repository}@%{branch} by %{author} %{result} in %{duration} (elapsed time: %{elapsed_time}) subject: %{commit_subject} message: %{commit_message} foobar\"\n  #  on_success: always\n"}
**   took 0.11 seconds
Hooray, travis_production_test/.travis.yml looks valid :)
** Storing "/home/travis/.travis/config.yml"

.com API endpoint requires auth:

$ rm -rf ~/.travis && travis lint --debug --debug-http --skip-completion-check -x travis_production_test/.travis.yml --pro
not logged in, please run travis login --pro

The intermittent lint failures are a problem, and this is not the only report I've seen. Unfortunately, I have never seen one firsthand, and at the moment it is not clear to me why it might fail.

@ljharb
Copy link

ljharb commented May 27, 2020

Wouldn't jobs in .com already run travis login --pro implicitly then?

@ferrarimarco
Copy link
Author

Hi!

The travis validate command shouldn't be talking to any API to do its job, right? Or it's not the case?

@BanzaiMan
Copy link
Contributor

@ljharb It does not do that at the moment. Whether it should is a valid (but separate) UX question. To be honest, I'd never thought it should, until you raised it.

@ferrarimarco I assume you mean travis lint, not travis validate. travis lint works by talking to the API end point POST /lint with required payload. As we have seen, this behavior seems to be a little unreliable. I'll dig a little deeper.

@ferrarimarco
Copy link
Author

Yes! I meant lint, not validate. Thanks!

I was expecting the linting to work in an airgapped environment, like many other linting tools, that have the logic builtin.

@ferrarimarco
Copy link
Author

but this is not the point I guess :)

@travi
Copy link

travi commented May 29, 2020

i was pointed to this issue late yesterday from a support issue (#16611 for those that have internal access) and am glad to hear that i'm not the only one that has been frustrated by this. i opened the support issue on April 4th after seeing the failures for a bit before that.

however, my case is slightly different, so hopefully the difference helps eliminate a few things, if they haven't been already. rather than using the travis gem, i'm using the node module travis-lint in most of my js projects since it calls the same api as the gem and doesn't require devs on my team to have the same gem version installed across all of their machines.

when i reached out to support, i initially assumed that, with the migration of projects from .org to .com, i would be told there was a deprecation involved to encourage migration of these calls from the .org api to a .com api, or something along those lines. it clearly wasnt an issue of the old api being turned off since the failures are so inconsistent, but i assumed brown-outs were probably being used to get attention and nudge the desired migration a bit. this seemed even more likely to me since I don't remember running into this error in my local builds (which i run often). however, after two months, i don't get the impression that is really the case since the support team at least doesn't seem to be aware of something along those lines.

since travis-lint doesn't handle errors, i didnt initially have more information, hoping that the support request would give me some quick insight. the travis-lint module hasnt changed in over 4 years, so i wasn't wanting to dig too deep into the module before i had some insight into whether the module had any impact on the problems (seemed unlikely since it hadnt changed in so long). i did end up forking the module in a few of my projects to at least print the error message, which has been "Forbidden" in all of the cases that I have caught since. i've most recently included NODE_DEBUG=request in a few of my projects to get more detail that was requested. here are a few cases that I have seen since:

also, this seemed to start happening at the same time as another issue I was having around that time (linked to in the ticket that i mentioned above) related to encrypted secrets in shareable configs. from an outsider perspective, it seems like there was likely a large feature release around that time that changed enough things that allowed a few regressions like this to slip through.

hopefully some of this detail is helpful in narrowing down the issue.

@BanzaiMan
Copy link
Contributor

I still cannot reason why /lint endpoint should enforce auth (and respond with 403):

https://github.com/travis-ci/travis-api/blob/3fb913cdcdcc940466112c21e11a546a643b56cc/lib/travis/api/app/endpoint/lint.rb#L8-L9

Can you examine the response body of these 403 requests? It's supposed to be only 10 bytes, which seems too short for most (if not all) of the error message api should be generating.

Thanks.

@travi
Copy link

travi commented May 29, 2020

unfortunately the debug mode of request isn't configurable, so it will take a bit of digging to get that. without getting that far yet, it seems likely that the content is simply "Forbidden", maybe with a newline.

is it possible that this is being introduced by a load balancer or something else in the network stack? like i mentioned above, i dont remember ever seeing this error locally. even though it is random, i run things locally often enough that it seems very unlikely that i wouldn't have hit this yet, as often as i see it when running on travis-ci.com. also, when it does happen, there seems to be a cluster of other projects failing for the same reason around the same time.

i obviously dont know any details of your infrastructure layout, but it seems a bit like traffic from .com to the .org api is behaving differently than truly external traffic to the same api.

@travi
Copy link

travi commented May 29, 2020

it seems likely that the content is simply "Forbidden", maybe with a newline

i think this is confirmed by https://github.com/pwmckenna/node-travis-ci/blob/75204638777ae5f0fba6ba889f4b085277ae21b9/lib/travis-http.js#L45-L46 where the err is passed to the callback from https://github.com/pwmckenna/node-travis-lint/blob/05423d6a447c8a2cb9fabadd3386e966e6269b76/lib/travis-lint.js#L17 as res.body

@BanzaiMan
Copy link
Contributor

Maybe there is something else at play.

In the successful build, the response looks completely different. Different Content-Type, different set of headers, etc.

https://travis-ci.com/github/form8ion/ruby-scaffolder/builds/168660399#L291

@travi
Copy link

travi commented Jun 4, 2020

i am unsure if this is related, but thought it was worth at least capturing. i finally saw a failure locally, but the output is different than i've seen in the build logs when run on ci, so i think this failure is at least somewhat different. running again immediately after this failure, passed as expected.

[lint:travis   ] undefined:1
[lint:travis   ] Invalid JSON in request body
[lint:travis   ] ^
[lint:travis   ] 
[lint:travis   ] SyntaxError: Unexpected token I in JSON at position 0
[lint:travis   ]     at JSON.parse (<anonymous>)
[lint:travis   ]     at Request._callback (/Users/travi/development/form8ion/mocha-scaffolder/node_modules/travis-ci/lib/travis-http.js:42:29)
[lint:travis   ]     at Request.self.callback (/Users/travi/development/form8ion/mocha-scaffolder/node_modules/request/request.js:185:22)
[lint:travis   ]     at Request.emit (events.js:315:20)
[lint:travis   ]     at Request.<anonymous> (/Users/travi/development/form8ion/mocha-scaffolder/node_modules/request/request.js:1154:10)
[lint:travis   ]     at Request.emit (events.js:315:20)
[lint:travis   ]     at IncomingMessage.<anonymous> (/Users/travi/development/form8ion/mocha-scaffolder/node_modules/request/request.js:1076:12)
[lint:travis   ]     at Object.onceWrapper (events.js:421:28)
[lint:travis   ]     at IncomingMessage.emit (events.js:327:22)
[lint:travis   ]     at endReadableNT (_stream_readable.js:1221:12)

@ljharb
Copy link

ljharb commented Jun 4, 2020

Is there any chance the endpoint is getting rate limited, based on how many times everyone is trying to run travis lint at one time?

@travi
Copy link

travi commented Jul 7, 2020

is this still being investigated? i seem to be having this happen significantly more today than previously. i normally could at least restart a build and, while still very annoying and should be unnecessary, usually have it succeed. today, i'm not having success, even after several restart attempts.

@travi
Copy link

travi commented Aug 4, 2020

@BanzaiMan can you give any update on this issue at all? This is still a big problem and is not limited to the Ruby gem

@BanzaiMan
Copy link
Contributor

BanzaiMan commented Aug 5, 2020

@travi Could you point to what you are observing? Certain kind of input to the command reliably leads to errors (on API, which is an issue in its own right), but I can't reproduce the "intermittent" errors.

travis@0ce1faa8192d:~$ while true; do echo "n" | travis lint -x; done
server error (500: "Sorry, we experienced an error.\n\nrequest_id:2df1e207e2c0d6640f96e0f43c9c7442\n")
server error (500: "Sorry, we experienced an error.\n\nrequest_id:0a1bf3320255c364855a3db71cba376e\n")
server error (500: "Sorry, we experienced an error.\n\nrequest_id:2aa6a23e05566020b00b62da05ac3c8d\n")
server error (500: "Sorry, we experienced an error.\n\nrequest_id:1562cc59d6a993a7214d3a03d6e6af5e\n")
⋮
travis@0ce1faa8192d:~$ while true; do echo "{}" | travis lint -x; done
Hooray, STDIN looks valid :)
Hooray, STDIN looks valid :)
Hooray, STDIN looks valid :)
Hooray, STDIN looks valid :)
Hooray, STDIN looks valid :)
Hooray, STDIN looks valid :)
Hooray, STDIN looks valid :)
⋮
travis@0ce1faa8192d:~$ while true; do echo "a: b" | travis lint -x; done
Warnings for STDIN:
[x] [warn] on root: unknown key a (b)
Warnings for STDIN:
[x] [warn] on root: unknown key a (b)
Warnings for STDIN:
[x] [warn] on root: unknown key a (b)
Warnings for STDIN:
[x] [warn] on root: unknown key a (b)
⋮
travis@0ce1faa8192d:~$ while true; do echo "language: ruby" | travis lint -x; done
Hooray, STDIN looks valid :)
Hooray, STDIN looks valid :)
Hooray, STDIN looks valid :)
⋮

None of these are authenticated.

travis@0ce1faa8192d:~$ cat ~/.travis/config.yml
---
last_check:
  etag: '"b2d547ae1df5207852df42a8c407008a"'
  version: 1.9.1
  at: 1596658398
checked_completion: true
completion_version: 1.9.1
endpoints:
  https://api.travis-ci.org/:
    access_token:

@travi
Copy link

travi commented Aug 7, 2020

@BanzaiMan i'm not sure how to provide the information you are suggesting beyond what i've already provided above and in the support issue that i mentioned. if there is specific information that you think i can send that i haven't already, i think i need help understanding what that is.

if reproducing/observing the intermittent nature of this is the problem, is it possible to just follow my account or my form8ion org internally for a few days somehow? i have at least a handful of these failures almost every day often github PRs result in one of the two builds passing and the other failing. these PRs make no changes to the .travis.yml file, so there should be no reason for a different result between the two builds.

as mentioned above, it seems most likely that the failure response is somewhere in front of the actual lint service rather than the service itself, possibly the result of something like rate limiting.

here is a sample of failures from just today. it is worth noting that i had several other PRs for the same dependency updates that had both builds pass. i will leave the failures alone for now, but would like to correct them sometime soon. unfortunately, restarting the builds does not leave the old result behind in a linkable way, so that complicates reporting this behavior in a way that is accessible over time. i would appreciate it if you could investigate these builds relatively quickly so that i can fix them by restarting them as a follow up:

please let me know if this information is helpful beyond what has already been provided.

@ljharb
Copy link

ljharb commented Aug 7, 2020

@travi
Copy link

travi commented Aug 14, 2020

@BanzaiMan

@BanzaiMan
Copy link
Contributor

@travi Can you record the payload and the actual return value from POST https://api.travis-ci.com/lint? I am quite puzzled how res can be undefined at https://travis-ci.com/github/form8ion/remark-update-legacy-badge-markers/builds/179642627#L287 when it's handed down from

https://github.com/pwmckenna/node-travis-lint/blob/master/lib/travis-lint.js#L22

In my testing, when the input is usable (even when it's empty), we should be returning a data structure that includes lint.warnings:

$ curl -s -X POST  https://api.travis-ci.com/lint
{"lint":{"warnings":[]}}
$ curl -s -X POST -d "language: ruby" https://api.travis-ci.com/lint
{"lint":{"warnings":[]}}

Only when it's egregiously wrong, do we return something catastrophic (which should be fixed)

$ curl -s -X POST -d "foo" https://api.travis-ci.com/lint
Sorry, we experienced an error.

request_id:6e314394aa334c75e3bf98a1eaae5ed5

@travi
Copy link

travi commented Aug 14, 2020

Can you record the payload and the actual return value from POST https://api.travis-ci.com/lint?

quite honestly, no. as you can see, i'm observing these errors distributed across many projects in ways that are very unpredictable. it would be a significant investment for me to find a way to record additional information like this across enough of my projects to gather useful data. I've already traced through most of the code stack for the processing of this request above and provided additional debugging output that i've also linked above.

Only when it's egregiously wrong, do we return something catastrophic (which should be fixed)

again, this appears to be behavior unrelated to the lint service itself, along the lines of rate limiting in the network stack before the request reaches the actual service. you being this puzzled by how the service could return this type of response suggests this even further in my mind. has there been investigation into this issue outside of the actual lint service?

is it possible to just follow my account or my form8ion org internally for a few days somehow?

i could provide lists like i have been recently almost every day. these accounts have plenty of builds running each day that observing them without me in the loop should provide enough examples to identify issues. the tooling you must have internally should be much better at tracing network calls than anything i could possibly add on my end.

travi added a commit to travi/travis-scaffolder-javascript that referenced this issue Aug 30, 2020
since it doesnt look like travis-ci/travis.rb#732 isnt going to be fixed or even investigated further
travi added a commit to travi/travis-scaffolder-javascript that referenced this issue Aug 30, 2020
since it doesnt look like travis-ci/travis.rb#732 isnt going to be fixed or even investigated further

BREAKING CHANGE: the script that lints the travis config will be
disabled in new projects. the prefix can be removed to re-enable, but
will be unstable until travis-ci/travis.rb#732 is fixed
travi added a commit to travi/travis-scaffolder-javascript that referenced this issue Aug 30, 2020
…roject

since the instability resulting from travis-ci/travis.rb#732 not being fixed outweighs the benefit of catching config errors early
travi added a commit to travi/travis-scaffolder-javascript that referenced this issue Aug 30, 2020
since it doesnt look like travis-ci/travis.rb#732 isnt going to be fixed or even investigated further

BREAKING CHANGE: the script that lints the travis config will be
disabled in new projects. the prefix can be removed to re-enable, but
will be unstable until travis-ci/travis.rb#732 is fixed
travi added a commit to travi/travis-scaffolder-javascript that referenced this issue Aug 30, 2020
…roject

since the instability resulting from travis-ci/travis.rb#732 not being fixed outweighs the benefit of catching config errors early
travi added a commit to form8ion/utils-cli that referenced this issue Sep 4, 2020
travi added a commit to form8ion/project that referenced this issue Sep 15, 2020
travi added a commit to form8ion/javascript-scaffolder that referenced this issue Oct 10, 2020
travi added a commit to form8ion/cucumber-scaffolder that referenced this issue Oct 11, 2020
travi added a commit to form8ion/github-actions-node-ci that referenced this issue Oct 20, 2020
travi added a commit to form8ion/javascript-core that referenced this issue Oct 20, 2020
travi added a commit to form8ion/lift-javascript that referenced this issue Oct 21, 2020
travi added a commit to travi/netlify-scaffolder that referenced this issue Oct 25, 2020
travi added a commit to travi/github-scaffolder that referenced this issue Oct 26, 2020
travi added a commit to travi/ci-vs-cd-slides that referenced this issue Oct 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants