Join GitHub today
Fixing Yeoman's file system #658
Fixing Yeoman's file system
Bringing the composability feature in yeoman-generator 0.17 really uncovered a huge design mistake we have on our file system helpers. Write are done in a synchronous way, but really they are asynchronous.
This issue mean multiple generators cannot easily edit the same files content. This also mean multiple writes to the same file will trigger multiple conflicts check on content that was added by multiple generators.
Legacy file system
The legacy file system is a pretty rough implementation of file system helpers (
They do the job. Behind the scene they're aware of source and destination directories.
The newest system is file-utils - which is basically a fork of Grunt-file.
The things I take away from our current state is:
Where to go from there?
So I've been thinking about this issue for a while, and I think the way forward is to abstract our file-system handling to an in-memory state of the file-system where generators can read/write to. And once, at the end of the generation process, we commit the memory file system to the file drive passing every file through a series of write filters then to our conflicter.
Some questions still remains:
Should we hide the source/destination logic from the end user?
Are we better with an API that hide the logic away like the legacy system or should we make it very obvious:
(That's an example, we would refine the API)
What do we do with large FS needs?
Right now we have
Would it makes sense/help performance to use streams/RXjs?
I'm not sure about that because really the time is spent on the conflicter needing user input.
I'm raising this question because it was a major point of the yeoman.next document.
discussion is open
How can they be confusing?
YES! There's an old issue somewhere where I propose something similar.
I think we should adopt the vinyl format for representing in-memory files.
Yes, they shouldn't have to care about where the source and destination is. But it should be easy to get the information about it when needed.
Maybe some kind of filter for what files to process. That way we could have only one method for copying.
For copying without processing we should use streams as it's much faster. For processing we could still use streams, but buffer it up in memory.
If anyone has any gripes about this system. Now is the time to bring it up. This especially pertains to generator authors.
I'm not sure I see any advantage for us to use vinyl streams. I feel like most action a generator take are single file based, we don't do batch processing like a build system would.
Any example of how this would apply in the context of a generator system?
(I'm all in for vinyl format, my concern is about vinyl-streams)
Ok, so I've been pondering on this idea. And here's how I see the API:
In Memory FS
This will be one module. And instance will be globally shared on the environment object so multiple generators can access it.
It will basically give access to 3 functionality:
// Basic operations memFs.read('file.js'); // return a vinyl file memFs.write(vinylFile); // When ready to write to disk memFs.commit();
The read action will either return a vinyl file it loaded previously, or create a new one - either from disk or an empty vinyl file.
The commit action will take every file in memory and apply their state to the disk. Each vinyl file can have three state: "idle", "write", "delete". Idle are ignored, write are written and delete are removed from disk.
Once a file action is committed to disk, it is removed from the memory FS. The way it does it is IMO an afterthought and could be anything (I'm alright with using streams here).
One thing I'm unsure is how we're going to handle conflicts at the commit phase. I was thinking of maybe allowing a way to fork a current memory FS so we can determine which one can be written and which one need user validation. I'm still thinking about this one. (And we might just end up removing the commit phase from this module)
This module will be used by the system internally. Generator authors won't need to understand or touch it.
These will be utilities to transform the in memory FS. They're going to be generator specific (so that the versioning can change more easily). These are going to be called by the generator author (so they are the public API).
I guess we know what usual interface we want on this end (
That's your turn :)
After reconsideration, the store won't know how to commit itself to disk. It'll only provide iteration helpers (maybe a
The neat thing about passing it through a stream, is that we could register and reuse any existing gulp plugins (like file beautifier, style standardization, pre-compilation) to process the project to the end user taste. The only "bad" point is that streams are not super flexible and concurrent compared to Rx.js - but I think the leverage we can get from reusing gulp plugins is worth it.
An issue I see we'll have using gulp plugins is that most of them expect a specific type to be passed in. They usually don't check the type and decide whether or not to pass it through.
In the case of yeoman, we'd just stream all the file and we'd expect plugins to either pass through or apply their effect if they support the file type.
Do you think we could just fix gulp-plugins to work this way or is there really an incompatible mind set?