Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persistent cache relative paths #10400

Closed
vankop opened this issue Feb 17, 2020 · 29 comments
Closed

persistent cache relative paths #10400

vankop opened this issue Feb 17, 2020 · 29 comments

Comments

@vankop
Copy link
Member

vankop commented Feb 17, 2020

Feature request

What is the expected behavior?

right now cache writes absolute paths

What is motivation or use case for adding/changing the behavior?

if all paths will be relative to cache folder it will allow to share cache

How should this be implemented in your opinion?

Did not investigate this, yet

Are you willing to work on this yourself?

Yes

@alexander-akait
Copy link
Member

I think it is good idea

@sokra
Copy link
Member

sokra commented Feb 18, 2020

Yes, great.

Here are some ideas for that:

  • I could image a writeAbsolute (and readAbsolute) function available for serialization and all things use that to serialize absolute paths.
    • It may also accept Sets of paths.
  • contextify is a bit expensive so this should probably an opt-in behavior (similar to the portableRecords) => cache.portable: true
  • We already have util functions available for that contextify and absolutify with memory caching.
  • Cache identifiers also need to be contextifed

@vankop
Copy link
Member Author

vankop commented Feb 18, 2020

I could image a writeAbsolute (and readAbsolute) function available for serialization and all things use that to serialize absolute paths.

  • It may also accept Sets of paths.

I also thought about this. I will try to implement this using this approach

@sokra
Copy link
Member

sokra commented Feb 18, 2020

For debugging you can use node tooling/print-cache-file.js to convert the cache file to readable text.

@vankop
Copy link
Member Author

vankop commented Feb 24, 2020

I have started implementing this feature and got some problems/questions.

First of all what I already did:

  • Create cache.contextify options with default true
  • Move PackFileCacheStrategy constructor interface to separate type and put contextify property there
  • Provide 2nd argument to createFileSerializer with properties write/readAbsolutePath (using absoluteToRequest and requestToAbsolute for now, maybe use contextify and absolutify? or create other properties for them?), write/readAbsolutePaths(accept set for now). This properties already inherit logic about contexifying
  • Extend object middleware context with read/write absolute path(s) methods

Problems:

It may also accept Sets of paths.

  • There are also maps with absolute paths as key, e.g. snapshot timestamps in buildInfo. Do I need to create another class from LazySet to serialize paths in LazySet ? Maybe marking them for Set/Map serializer with decorator:
const absolutePaths = Symbol();

function absolutePathsSet(set) {set[absolutePaths] = true;}

...

serialize({write, absolutePathsSet}) {
     absolutePathsSet(this.buildInfo.missingDependencies)
      write(this.buildInfo)
}
  • Also right now it is simple to provide such object like buildInfo to ObjectMiddleware. Do I need manually (de)serialize all properties, like:
serialize({write, writeAbsolutePaths}) {
      write(this.buildInfo.flag1)
      write(this.buildInfo.flag2)
      writeAbsolutePaths(this.buildInfo.missingDependencies)
      ...
}

or better to create classes for that aka AbsolutePathsSet ?

Cache identifiers also need to be contextifed

Do I need to craft regexp to replace all absolute paths in one place (in store/restore methods)? Actually don't know better way for now..

@vankop
Copy link
Member Author

vankop commented Feb 24, 2020

Maybe marking them for Set/Map serializer with decorator:

I don't know will v8 create another hidden class using such implementation. However, we can instead of marking storing such Set/Map in WeakSet

@smelukov
Copy link
Member

I think that for the first step you just should add two additional methods to the cache context:

writeAbsolute(context, value) {
	if (value instanceof Set) {
		const newSet = new Set();
		for (const item of value) {
			newSet.add(contextify(context, item));
		}
		return this.write(newSet);
	}

	if (value instanceof Array) {
		return this.write(value.map(item => contextify(context, item)));
	}

	this.write(contextify(context, value));
},

and

readAbsolute(context) {
	const value = this.read();

	if (value instanceof Set) {
		const newSet = new Set();
		for (const item of value) {
			newSet.add(absolutify(context, item));
		}

		return newSet;
	}

	if (value instanceof Array) {
		return value.map(item => absolutify(context, item)); // or just modify existing array
	}

	return absolutify(context, value);
},

And then replace all the place that writes a value with absolute path/paths
This will be a first step

@vankop
Copy link
Member Author

vankop commented Feb 24, 2020

And then replace all the place that writes a value with absolute path/paths

Problem is that there are objects with absolute paths as property or property of a property.. e.g. https://github.com/webpack/webpack/blob/master/lib/Module.js#L823

So question was do I need to extract all paths:

serialize({write, writeAbsolutePaths}) {
      write(this.buildInfo.flag1)
      write(this.buildInfo.flag2)
      writeAbsolutePaths(this.buildInfo.missingDependencies)
      ...
}

or provide ability to serialize paths correctly for Set/Map serializers

@sokra
Copy link
Member

sokra commented Feb 24, 2020

  • Maybe marking them for Set/Map serializer with decorator

That looks weird to me. I would better go with the class PathSet or class PathMap approach. A subclass with alternative serialization behavior.

This can also be used for nested values e. g. inside of buildInfo class Path can wrap a string to be serialized as absolute path.

serialize paths correctly for Set/Map serializers

I don't think it makes sense to alter the default Set/Map serializers for absolute path.

@otakustay
Copy link

otakustay commented Apr 14, 2020

Beside simple relative path usage in cache, there is another problem that some build configuration can use require.resolve, a good example is create-react-app uses it to locate loaders

This can also break cache since webpack request contains all loader urls, we should have a chance to fix this as well.

@vankop
Copy link
Member Author

vankop commented Apr 14, 2020

This can also break cache since webpack request contains all loader urls, we should have a chance to fix this as well.

This PR should fix it =)

@webpack-bot
Copy link
Contributor

This issue had no activity for at least three months.

It's subject to automatic issue closing if there is no activity in the next 15 days.

@sokra
Copy link
Member

sokra commented Oct 9, 2020

Still valid

@alexander-akait
Copy link
Member

bump

@worlddai
Copy link

worlddai commented Feb 1, 2021

I've been paying attention to this issue. When we can use file caching in our CI process, it will give us a qualitative increase in build speed. I'm really looking forward to the experts here solving this problem. Thank you very much

@sokra
Copy link
Member

sokra commented Feb 1, 2021

You can use file caching in CI. Just make sure that the absolute paths are consistent between CI builds.

@worlddai
Copy link

worlddai commented Mar 1, 2021

You can use file caching in CI. Just make sure that the absolute paths are consistent between CI builds.

Perfect! I use __dirname to get the current compiled directory, and then generate the same cache file to compile the directory, and it works! Thank you very much. One word woke me up !

@Idletalk
Copy link

Idletalk commented Mar 2, 2021

You can use file caching in CI. Just make sure that the absolute paths are consistent between CI builds.

Perfect! I use __dirname to get the current compiled directory, and then generate the same cache file to compile the directory, and it works! Thank you very much. One word woke me up !

hi,Can you describe the process,about how we "generate the same cache file to compile the directory"

@yoannmoinet
Copy link

Just bumping the issue here.

You can use file caching in CI. Just make sure that the absolute paths are consistent between CI builds.

We allocate different runners and have a specific logic to build our (large) project in randomly named directories to avoid overlaps.

It's quite difficult in this context to run every builds in the same absolute path for every pipelines we have.
I don't think we are the only one in that situation in the CI context, especially with very large projects, where the cache would be so impactful.

So this would be amazing to be able to have the cache without doing pirouettes 🤸‍♂️ with the directories in the CI.

@0rvar
Copy link

0rvar commented Sep 16, 2021

I was quite surprised when my prebaked webpack cache was not used in Github Codespaces. I've tracked it down to the cache using absolute paths. I would love the cache stay valid when moving the project directory or downloading a prebuilt cache folder, it would make things easier.

@markjm
Copy link
Contributor

markjm commented Feb 9, 2022

I have started to look into this as well -- I wonder if the best route is to track down and contextify everywhere using absolute paths, or if there may be a lower level solution within the serializer/middleware itself. I suspect starting on the contextifying will eventually make evident if there is a better way to approach the problem.

To start, I was thinking of tackling the FileSystemInfo cache - I saw @vankop did some similar work a bit over a year ago. Seems like that may be a good place to pick back up. On that note, I wonder if making the lib/util/identifier.js caches two-way (cache input=>result and result=>input) in case we are going to need to be switching back and forth more

@vankop
Copy link
Member Author

vankop commented Feb 9, 2022

this will not help much if you want to put cache in git. Cache is not deterministic, so your git history will grow with random changes in cache files.. for ci there is solution already

You can use file caching in CI. Just make sure that the absolute paths are consistent between CI builds.

@gajus
Copy link
Member

gajus commented Mar 9, 2022

You can use file caching in CI. Just make sure that the absolute paths are consistent between CI builds.

How do you make sure that the paths are consistent if CI is running tests is uniquely generated directories?

@gajus
Copy link
Member

gajus commented Mar 9, 2022

Is there a library I could parse the cache file to rewrite the paths?

In our case, only one fragment of the path changes. It would be a matter of a simple replace operation.

@vankop
Copy link
Member Author

vankop commented Mar 9, 2022

Is there a library I could parse the cache file to rewrite the paths?

no..

@gajus
Copy link
Member

gajus commented Mar 9, 2022

Turns out I can just do gsed -i -e "s/contra-web-app-2/contra-web-app-1/g" 0.pack and it works.

You could also probably figure out how to manipulate the file using webpack/lib/util/serialization, but I was confused by the API.

@vankop
Copy link
Member Author

vankop commented Apr 8, 2022

To make cache portable from VCS we need to implement "relative path" + "deterministic" cache features. Since we don't want to implement deterministic cache ( this feature will reduce cache performance a lot ) we decided to not implement this either.
Also this feature will impact cache performance as well.


We will improve docs with some guides to most popular CIs how to setup cache there ( e.g. gitlab ci cache )
webpack/webpack.js.org#6088

@vankop vankop closed this as completed Apr 8, 2022
@a474516631
Copy link

When gitlab CI has parallel build requirements, although the cache obtained is correct, the build path will be different, because CI builds will add job-Id by default. To solve this problem, it should also be necessary to fix the parallel task packaging path.
There is a configuration in gitlab ci to set this path.
variables: GIT_CLONE_PATH

@SuceV587
Copy link

Turns out I can just do gsed -i -e "s/contra-web-app-2/contra-web-app-1/g" 0.pack and it works.

You could also probably figure out how to manipulate the file using webpack/lib/util/serialization, but I was confused by the API.

'sed' not work for me,could you tell me how to reslove it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests