Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Casher] Distinct caches for each job in a build matrix #4393

Closed
BanzaiMan opened this issue Jul 15, 2015 · 16 comments
Closed

[Casher] Distinct caches for each job in a build matrix #4393

BanzaiMan opened this issue Jul 15, 2015 · 16 comments
Assignees

Comments

@BanzaiMan
Copy link
Contributor

Originally opened by @theuni as https://github.com/travis-ci/casher/issues/6 with the title "c/c++ workers share a cache when environment variables are used to build the matrix"


As discussed on IRC with @joshk

I believe this is actually a problem in core and not here, but logging a ticket here for discussion.

For c/c++, it seems that the compiler is the only differentiating factor when creating the slug.

Take this example:

language: c
cache:
  directories:
  - mydir
env:
  - FOO=foo BAR=bar
  - FOO=bar BAR=foo

The default compiler will be gcc.
For both builds, the slug will be: cache--compiler-gcc

This means that the builds will stomp each-other's caches. In order to prevent this, the slug would need to be appended with some unique id for each builder. The most reasonable approaches I can come up with would be:

  • A hash of the env
  • A user-specified unique identifier

As an incredibly hackish work-around, I've used a phony compiler (one that can run "compiler --version" without failing) here: https://travis-ci.org/coryfields/bitcoin/builds/33134451

The "true X" compilers will each result in different slugs, so they get independent caches:

On branch master:
cache--compiler-true1  last modified: 2014-08-20 23:03:03  size: 27.95 MiB
cache--compiler-true2  last modified: 2014-08-20 23:00:08  size: 23.57 MiB

It would be nice to be able to do something like this:

matrix:
  include:
    - cache-id: foobuild
      env: FOO=foo BAR=bar
    - cache-id: barbuild
      env: FOO=bar BAR=foo
@BanzaiMan
Copy link
Contributor Author

There was a discussion as to how to implement this; one idea is to use language versions and env vars to create a cache_id that acts as an identifier.

This effectively supersedes #3745, which raises a real possibility of cache corruption when multiple jobs upload the caches simultaneously.

@weitjong
Copy link

Why not just use the last digit of TRAVIS_JOB_NUMBER?

In ruby

job_number = "#{ENV['TRAVIS_JOB_NUMBER'].split('.').last}"

In bash

job_number=${TRAVIS_JOB_NUMBER##*.}

I have successfully used this approach to roll my own cache store backed by Github for my CI jobs on standard build infra. For container-based build infra where Travis built-in cache store is available, I use the similar fake compiler approach as described above to get by the cache thrashing problem. See my analysis in #3699. My point is, a same cache should never be used by more than one job, so using job number as the cache identifier is a no brainer to me. All my CI jobs always have good cache hit ratio now after ensuring this.

@BanzaiMan
Copy link
Contributor Author

You can see the past comments here: https://gist.github.com/BanzaiMan/c6f00138e7354b37fd66

@theuni
Copy link

theuni commented Jul 16, 2015

Yes, using the job number makes sense.

@weitjong
Copy link

@BanzaiMan I am very happy to learn there is plan or discussion going on to fix the "cache thrashing" issue. I think we all agree we need a unique cache id for each job. The question is just how that cache-id should be best computed. IMHO, using a complex hash function may not be necessary. My proposed approach is dead stupid but effective. Of course, it has its own corner case where the stored cache is being invalidated when user joggling around his/her job matrix, but I reckon normal user would not do this frequently and furthermore the invalidated cache will be self-correcting in the next build. Another pro is when creating new topic branch, all the cache can easily be "inherited" initially from the master branch based on their job number.

@AbdealiLoKo
Copy link

This is something I need for my project too - without it, builds take way too long

@joshk
Copy link
Contributor

joshk commented Jul 23, 2015

We will look into this feature more in the coming weeks, but there is not quick fix sorry.

hvr added a commit to haskell-CI/haskell-ci that referenced this issue Aug 13, 2015
This uses a hack to force TravisCI to use separates caches per
build-configuration until travis-ci/travis-ci#4393 is fixed.

Addresses #28
@arthurmensch
Copy link

Hi,

It would be great to have this feature !

pohly added a commit to intel/meta-intel-iot-security that referenced this issue Nov 17, 2015
Test configurations varying by env variables currently share the same
cache. This is a known issue
(travis-ci/travis-ci#4393) with a workaround
used by the bitcoin project
(https://github.com/bitcoin/bitcoin/blob/b5cbd396ca7214f4f944163ed314456038fdd818/.travis.yml).

We use the same workaround here.

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
pohly added a commit to intel/meta-intel-iot-security that referenced this issue Nov 20, 2015
Test configurations varying by env variables currently share the same
cache. This is a known issue
(travis-ci/travis-ci#4393) with a workaround
used by the bitcoin project
(https://github.com/bitcoin/bitcoin/blob/b5cbd396ca7214f4f944163ed314456038fdd818/.travis.yml).

We use the same workaround here.

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
@localheinz
Copy link

👍

pohly added a commit to intel/meta-intel-iot-security that referenced this issue Nov 26, 2015
Test configurations varying by env variables currently share the same
cache. This is a known issue
(travis-ci/travis-ci#4393) with a workaround
used by the bitcoin project
(https://github.com/bitcoin/bitcoin/blob/b5cbd396ca7214f4f944163ed314456038fdd818/.travis.yml).

We use the same workaround here.

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
@cvrebert
Copy link

+1 from Bootstrap. Since #4090 isn't forthcoming (and is understandably complicated), this would be the next best thing.

@discordier
Copy link

+1 also from various PHP developers.
This corrupts all composer based builds as it corrupts the caches when having a build matrix of PHP versions and different dependency versions which are not updated very rarely.

Only solution so far seems to:

  • disable cache (ugly, as it requires loads of bandwidth)
  • symlink the cached directories to some other location on before_install and limiting the concurrent builds to 1 (which keeps the caches at least updated but takes even more time as the builds are now sequential).

Therefore appending the build matrix job number to the cache slug would be highly appreciated.

discordier added a commit to contao-community-alliance/composer-plugin that referenced this issue Feb 5, 2016
ghost pushed a commit to facebook/flow that referenced this issue Mar 31, 2016
Summary:currently our travis caches step on each other. each job installs dependencies into a different folder so they're all disjoint, but it currently takes 4 builds (and is super racy) to get all 4 jobs' dependencies cached. plus, each job has to download the builds for all the other jobs, which is currently ~400MB and takes a non-trivial amount of time (about 90 seconds).

this diff attempts to use a trick suggested in travis-ci/travis-ci#4393 whereby we give Travis a fake "compiler" setting, which gets used in the cache "slug" (the tarball's filename) but which we don't rely on in our build; we still just use gcc.

this should generate a separate cache for each job.

Reviewed By: samwgoldman

Differential Revision: D3123506

fb-gh-sync-id: a896b06c0cbcf07ee32292c24499214f80e479e3
fbshipit-source-id: a896b06c0cbcf07ee32292c24499214f80e479e3
@BanzaiMan
Copy link
Contributor Author

Just a heads up: I deployed a fix to production about 10 minutes ago. Please observe how your caches are behaving, and report issues if you see any.

@BanzaiMan
Copy link
Contributor Author

Allowing custom cache names is an interesting proposition, but it is not implemented in the fix mentioned above.

@BanzaiMan
Copy link
Contributor Author

This has been deployed and documented.

@weitjong
Copy link

Thanks! With this I think I could decommission our custom cache store and switch back to use Travis-CI internal one. The only drawback I can think of with current implementation is that there is no way to exclude certain environment variable to be used/joined in the hash key computation (while we don't have to worry about that when job number is being used as the sole cache key), but I guess we can live with that.

@theuni
Copy link

theuni commented May 18, 2016

Thanks for this!

theuni added a commit to theuni/bitcoin that referenced this issue May 21, 2016
Now that caches are distinct (travis-ci/travis-ci#4393),
we can use the Travis minimal image.
The minimal image should take less time to setup and lead to quicker builds.

Also addressed while I'm in here:
- No need to delete the broken google-chrome repo in the minimal image
- Set the hostname to work-around an openjdk bug
- Remove the non-functional apt-cache option
- Remove useless message at completion
- Install jre where the java tests are run
yacinehmito added a commit to yacinehmito/Idris-dev that referenced this issue Jun 26, 2016
yacinehmito added a commit to yacinehmito/Idris-dev that referenced this issue Jun 26, 2016
yacinehmito added a commit to yacinehmito/Idris-dev that referenced this issue Jun 26, 2016
yacinehmito added a commit to yacinehmito/Idris-dev that referenced this issue Jun 26, 2016
travis-ci/travis-ci#4393 seems to be solved
Removed the trick to seperate caches.
enolan added a commit to enolan/Idris-dev that referenced this issue Oct 13, 2016
The compiler: lines had the effect of getting us separate caches per GHC
version, but since travis-ci/travis-ci#4393 has been fixed we get separate
caches per matrix entry anyway.
enolan added a commit to enolan/Idris-dev that referenced this issue Oct 18, 2016
The compiler: lines had the effect of getting us separate caches per GHC
version, but since travis-ci/travis-ci#4393 has been fixed we get separate
caches per matrix entry anyway. I left the lines in since they make the build
status pages more readable.
sickpig pushed a commit to sickpig/BitcoinUnlimited that referenced this issue Jan 4, 2017
Now that caches are distinct (travis-ci/travis-ci#4393),
we can use the Travis minimal image.
The minimal image should take less time to setup and lead to quicker builds.

Also addressed while I'm in here:
- No need to delete the broken google-chrome repo in the minimal image
- Set the hostname to work-around an openjdk bug
- Remove the non-functional apt-cache option
- Remove useless message at completion
- Install jre where the java tests are run
hroest added a commit to OpenMS/OpenMS that referenced this issue Aug 1, 2017
timosachsenberg pushed a commit to OpenMS/OpenMS that referenced this issue Aug 2, 2017
* enable?

* nop

* nop

* something different

* [BUILD] add directory cache

* merge env

* nop

* nop

* [BUILD] check for contrib

* test

* nop

* remove contrib cache?

* ups

* actually use clang?

* nop

* [BUILD] try again

should work, see travis-ci/travis-ci#4393

* try on hot cache

* getting tired of this

* getting tired of this

* forget it

* debug build for clang

* gcc6 ?

* final cleanup

* final cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants