New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large writes to stdout sometimes fail with "Resource temporarily unavailable". #4704

Closed
mvollmer opened this Issue Aug 24, 2015 · 41 comments

Comments

Projects
None yet
@mvollmer
Copy link

mvollmer commented Aug 24, 2015

We are happily using Travis but now we seem to be crossing some limit somwhere that causes make to fail with a mysterious "write error".

This is an example: https://travis-ci.org/cockpit-project/cockpit/jobs/76976494

The write error seems to happen when make writes a long command to stdout in large pieces, and the write syscall fails with EAGAIN. In one case, it made two syscalls with about 6000 bytes followed by about 500 bytes. I am not super sure about the details as debugging this from my end is a bit hard.

I think stdout/stderr is connected to a pty, so maybe that has something to do with it.

I don't have a simple reproducer yet but I can try to make one if that helps.

As a workaround, we are piping the build output through this program: https://github.com/mvollmer/cockpit/blob/69ae5e0c6ae82b0c5824aa088edd608ba9633fe2/tools/careful-cat.c

@mvollmer

This comment has been minimized.

Copy link

mvollmer commented Aug 25, 2015

As an attempt at making this more easily reproducible, I made this pull request:

cockpit-project/cockpit#2616

It rapidly writes large buffers to stdout. Sometimes this succeeds

https://travis-ci.org/cockpit-project/cockpit/jobs/77134346

and sometimes the output is truncated:

https://travis-ci.org/cockpit-project/cockpit/jobs/77134347

However, the build is counted as successful, so I guess the write syscall did not actually fail.

@whitequark

This comment has been minimized.

Copy link

whitequark commented Oct 9, 2016

This bug makes the OS X worker unusable for solvespace/solvespace, without a hacky workaround that restarts the build a few times.

@MariadeAnton

This comment has been minimized.

Copy link
Member

MariadeAnton commented Jan 16, 2017

Possibly related fix: travis-ci/osx-image-bootstrap#15

It has been deployed to the Xcode8.2 image.

@MariadeAnton

This comment has been minimized.

Copy link
Member

MariadeAnton commented Jan 16, 2017

Can you try routing your builds to the xcode8.2 image using:

osx_image: xcode8.2

and let us know how it goes?

@mvollmer

This comment has been minimized.

Copy link

mvollmer commented Jan 16, 2017

I am sorry to report that we are not using Travis anymore, so it is quite difficult for me to verify this fix.

Thanks a lot in any case, I really appreciate it that you didn't lose track of this, as I would have.

@whitequark

This comment has been minimized.

Copy link

whitequark commented Jan 16, 2017

I had the same issue, which broke builds about 60% of time even with automatic restart, and I've just had over five successful builds in a row. I am quite certain this is fixed.

@MariadeAnton

This comment has been minimized.

Copy link
Member

MariadeAnton commented Jan 16, 2017

Thank you both for following up on this - we'll make sure to post more updates here as if/when this fix is extended to other XCode images :)

@Drahflow

This comment has been minimized.

Copy link

Drahflow commented Mar 20, 2017

I'm seeing the same error on linux based builds (non-sudo), when running cat on a largish file (test outputs). On travis.com for a NDAing client, build number: 42124525 (If that does not globally identify the build for you, please contact me directly).

@prydonius

This comment has been minimized.

Copy link

prydonius commented Jun 28, 2017

I've just started seeing this make: write error on the sudo: required linux trusty environment (which was updated last Wednesday): https://travis-ci.org/helm/monocular#L2925. Was anyone able to solve this issue?

@markandrus

This comment has been minimized.

Copy link

markandrus commented Aug 10, 2017

@prydonius I have not solved this, but I see tar: write error often (using travis-multirunner).

@markandrus

This comment has been minimized.

Copy link

markandrus commented Aug 10, 2017

I've tried @prydonius's workaround and it works like a charm:

group: deprecated-2017Q2

Definitely something has changed—I'm not sure what, though.

@prydonius

This comment has been minimized.

Copy link

prydonius commented Aug 11, 2017

@markandrus in particular the thing that worked for me was setting filter_secrets: false (something Travis support helped me with) https://github.com/kubernetes-helm/monocular/blob/master/.travis.yml#L3

@vbraun

This comment has been minimized.

Copy link

vbraun commented Dec 1, 2017

The make: write error is almost certainly EAGAIN from stdout. Pretty much every commandline tool expects stdout to be in blocking mode, and does not properly retry when in nonblocking mode. I just spent a fun day figuring out that something inside "npm install ..." was switching to nonblocking mode, causing random failures further down in my build script.

Turn off O_NONBLOCK:

python -c 'import os,sys,fcntl; flags = fcntl.fcntl(sys.stdout, fcntl.F_GETFL); fcntl.fcntl(sys.stdout, fcntl.F_SETFL, flags&~os.O_NONBLOCK);'

Check whether O_NONBLOCK is set (should print "0"):

python -c 'import os,sys,fcntl; flags = fcntl.fcntl(sys.stdout, fcntl.F_GETFL); print(flags&os.O_NONBLOCK);'
@felker

This comment has been minimized.

Copy link

felker commented Dec 5, 2017

@vbraun 's post has explained the errors I have been experiencing in #8757 . In our case, it was a compilation of a C++ code with MPI that somehow caused a switch to nonblocking mode which caused subsequent compilation make: write error.

I am now able to reproduce the bug outside of Travis CI on my MacBook. I had not observed it locally before because the nonblocking switch requires the two compilations in our test suite to be wrapped by the same process, e.g. the travis_run_script command.

ssanderson added a commit to quantopian/zipline that referenced this issue Dec 15, 2017

triggering this to test if we're getting bit by travis-ci/travis-ci#4704
 (#2055)

* BLD: Ensure stdout is in blocking mode for conda create.
@paulcwarren

This comment has been minimized.

Copy link

paulcwarren commented Dec 16, 2017

In case it is helpful I believe I am running into this. As you can see my build have been working fine for months and all of a sudden I hit this write error. It started in this build in the last 24hrs:-

https://travis-ci.org/paulcwarren/spring-content-examples/builds/316709024

Doesn't matter how often I restart either of the failing builds they fail.

I am unable to ssh into the container but perhaps you can.

@j00bar

This comment has been minimized.

Copy link

j00bar commented Dec 18, 2017

Is this related to #8934?

aniemetz added a commit to CVC4/CVC4 that referenced this issue Dec 19, 2017

Fix travis write errors. (#1445)
For reasons unknown, after the latest update of the Trusty environment on Travis,
we encountered write errors for the three Clang builds. As suggested here
travis-ci/travis-ci#4704 (comment),
adding filter_secrets: false to the .travis.yml fixes the problem.

Note: switching back to the deprecated builds did not fix the problem.
@rainwoodman

This comment has been minimized.

Copy link

rainwoodman commented Dec 19, 2017

We are seeing this error too. Very long output lines triggers this.

Reverting to group: deprecated-2017Q2 didn't seem to help. I had to use tee and awk to convert each line of the output to a single '.' to work around this.

rhavermans added a commit to bolcom/pgjdbc that referenced this issue Jul 13, 2018

rhavermans added a commit to bolcom/pgjdbc that referenced this issue Jul 13, 2018

moylop260 added a commit to vauxoo-dev/runbot-addons that referenced this issue Aug 28, 2018

[REF] .travis.yml: Fix stodout error
Random message when run 'docker pull IMAGE' like
 - [Errno 11] write could not complete without blocking
 - plete without blocking

Seems as a travis issue:
 - travis-ci/travis-ci#8982 (comment)
 - travis-ci/travis-ci#4704 (comment)

moylop260 added a commit to vauxoo-dev/runbot-addons that referenced this issue Aug 28, 2018

[REF] .travis.yml: Fix stodout error
Random message when run 'docker pull IMAGE' like
 - [Errno 11] write could not complete without blocking
 - plete without blocking

Seems as a travis issue:
 - travis-ci/travis-ci#8982 (comment)
 - travis-ci/travis-ci#4704 (comment)

bfirsh added a commit to arxiv-vanity/engrafo that referenced this issue Sep 7, 2018

Hide console output when tests pass
Implementation is due to facebook/jest#4156

This is to get around travis-ci/travis-ci#4704
but is also much neater.

bfirsh added a commit to arxiv-vanity/engrafo that referenced this issue Sep 7, 2018

Hide console output when tests pass
Implementation is due to facebook/jest#4156

This is to get around travis-ci/travis-ci#4704
but is also much neater.

bfirsh added a commit to arxiv-vanity/engrafo that referenced this issue Sep 7, 2018

Hide console output when tests pass
Implementation is due to facebook/jest#4156

This is to get around travis-ci/travis-ci#4704
but is also much neater.

hbrunn added a commit to OCA/runbot-addons that referenced this issue Sep 14, 2018

[REF] .travis.yml: Fix stodout error
Random message when run 'docker pull IMAGE' like
 - [Errno 11] write could not complete without blocking
 - plete without blocking

Seems as a travis issue:
 - travis-ci/travis-ci#8982 (comment)
 - travis-ci/travis-ci#4704 (comment)

@xelaadryth xelaadryth referenced this issue Sep 14, 2018

Merged

Asyncio #360

libre-man added a commit to CodeGra-de/CodeGra.de that referenced this issue Sep 30, 2018

Disable non blocking IO for travis
This fixes the build issues on master. This is due to
travis-ci/travis-ci#4704

olmokramer added a commit to CodeGra-de/CodeGra.de that referenced this issue Sep 30, 2018

Disable non blocking IO for travis (#547)
This fixes the build issues on master. This is due to
travis-ci/travis-ci#4704

gsanchietti added a commit to NethServer/nethserver-mail that referenced this issue Oct 1, 2018

travis: fix stdout error
Reported error:
write /dev/stdout: resource temporarily unavailable

Seems as a travis issue:
 - travis-ci/travis-ci#8982 (comment)
 - travis-ci/travis-ci#4704 (comment)

gsanchietti added a commit to NethServer/nethserver-mail that referenced this issue Oct 1, 2018

travis: fix stdout error
Reported error:
write /dev/stdout: resource temporarily unavailable

Seems as a travis issue:
 - travis-ci/travis-ci#8982 (comment)
 - travis-ci/travis-ci#4704 (comment)
@stale

This comment has been minimized.

Copy link

stale bot commented Oct 10, 2018

Thanks for contributing to this issue. As it has been 90 days since the last activity, we are automatically closing the issue in 7 days. This is often because the request was already solved in some way and it just wasn't updated or it's no longer applicable. If that's not the case, please respond before the issue is closed, or open a new one after. We'll gladly take a look again! You can read more here: https://blog.travis-ci.com/2018-03-09-closing-old-issues

@tnguyen14

This comment has been minimized.

Copy link

tnguyen14 commented Oct 23, 2018

I recently encountered this while running the command to install google-cloud-sdk. Can this issue be reopened?

@mikermcneil

This comment has been minimized.

Copy link

mikermcneil commented Oct 25, 2018

Got here from google? Read all the things but still confused? @vbraun's answer is probably a good hint as to what's happening-- at least it was for us. Had this happen twice now on our team this year, and every time it was a node script doing funny things w/ stdout, but the impact not being seen until later in the build script (e.g. when doing git commit)

@samvv

This comment has been minimized.

Copy link

samvv commented Dec 14, 2018

@vbraun I know these issues shouldn't be used to thank people, but really you saved my day. I would never have found the issue if it weren't for you 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment