New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Yeoman's file system #658

Closed
SBoudrias opened this Issue Sep 21, 2014 · 14 comments

Comments

Projects
None yet
6 participants
@SBoudrias
Member

SBoudrias commented Sep 21, 2014

Fixing Yeoman's file system

Bringing the composability feature in yeoman-generator 0.17 really uncovered a huge design mistake we have on our file system helpers. Write are done in a synchronous way, but really they are asynchronous.

This issue mean multiple generators cannot easily edit the same files content. This also mean multiple writes to the same file will trigger multiple conflicts check on content that was added by multiple generators.

Current state

Legacy file system

The legacy file system is a pretty rough implementation of file system helpers (this.write(), this.copy()).

The good

They do the job. Behind the scene they're aware of source and destination directories.

The bad

  • The source/destination logic is hidden away.
  • They're tightly coupled to the generator system (not external dependencies)
  • Writes are async, and there's no way to handle them as such.

File-utils

The newest system is file-utils - which is basically a fork of Grunt-file.

The good

  • Source/Destination aware (and that logic is exposed)
  • Allow write-filters to normalize file, convert types, etc.

the bad

  • The Source/Destination logic can be confusing
  • Write are async because of the conflicter

Take away

The things I take away from our current state is:

  • We need a context aware system (source/destination)
  • The Conflicter is a core part and need to stay (users need to know when we're going to override their file)
  • Multiple generators must be able to read/write to the same file
  • Write filters are important to our end users and it's a feature we've been asked a lot.

Where to go from there?

So I've been thinking about this issue for a while, and I think the way forward is to abstract our file-system handling to an in-memory state of the file-system where generators can read/write to. And once, at the end of the generation process, we commit the memory file system to the file drive passing every file through a series of write filters then to our conflicter.

Some questions still remains:

Should we hide the source/destination logic from the end user?

Are we better with an API that hide the logic away like the legacy system or should we make it very obvious:

fs.copy(this.src('_package.json'), this.dest('package.json'))

(That's an example, we would refine the API)

What do we do with large FS needs?

Right now we have this.bulkCopy(), does it makes sense to keep this part of the core system? If not, what is the way forward when someone wants to copy over the 10K files from Wordpress?

Would it makes sense/help performance to use streams/RXjs?

I'm not sure about that because really the time is spent on the conflicter needing user input.

I'm raising this question because it was a major point of the yeoman.next document.

Etc, etc

discussion is open

@sindresorhus

This comment has been minimized.

Show comment
Hide comment
@sindresorhus

sindresorhus Sep 23, 2014

Member

The Source/Destination logic can be confusing

How can they be confusing?

So I've been thinking about this issue for a while, and I think the way forward is to abstract our file-system handling to an in-memory state of the file-system where generators can read/write to. And once, at the end of the generation process, we commit the memory file system to the file drive passing every file through a series of write filters then to our conflicter.

YES! There's an old issue somewhere where I propose something similar.

I think we should adopt the vinyl format for representing in-memory files.

Should we hide the source/destination logic from the end user?

Yes, they shouldn't have to care about where the source and destination is. But it should be easy to get the information about it when needed.

What do we do with large FS needs?

Maybe some kind of filter for what files to process. That way we could have only one method for copying.

Would it makes sense/help performance to use streams/RXjs?

For copying without processing we should use streams as it's much faster. For processing we could still use streams, but buffer it up in memory.

If anyone has any gripes about this system. Now is the time to bring it up. This especially pertains to generator authors.

Member

sindresorhus commented Sep 23, 2014

The Source/Destination logic can be confusing

How can they be confusing?

So I've been thinking about this issue for a while, and I think the way forward is to abstract our file-system handling to an in-memory state of the file-system where generators can read/write to. And once, at the end of the generation process, we commit the memory file system to the file drive passing every file through a series of write filters then to our conflicter.

YES! There's an old issue somewhere where I propose something similar.

I think we should adopt the vinyl format for representing in-memory files.

Should we hide the source/destination logic from the end user?

Yes, they shouldn't have to care about where the source and destination is. But it should be easy to get the information about it when needed.

What do we do with large FS needs?

Maybe some kind of filter for what files to process. That way we could have only one method for copying.

Would it makes sense/help performance to use streams/RXjs?

For copying without processing we should use streams as it's much faster. For processing we could still use streams, but buffer it up in memory.

If anyone has any gripes about this system. Now is the time to bring it up. This especially pertains to generator authors.

@kevva

This comment has been minimized.

Show comment
Hide comment
@kevva

kevva Sep 23, 2014

Member

I think we should adopt the vinyl format for representing in-memory files.

+1. I'd just use vinyl-fs and write some methods around that to cover our needs.

Member

kevva commented Sep 23, 2014

I think we should adopt the vinyl format for representing in-memory files.

+1. I'd just use vinyl-fs and write some methods around that to cover our needs.

@sindresorhus

This comment has been minimized.

Show comment
Hide comment
@sindresorhus
Member

sindresorhus commented Sep 23, 2014

@sindresorhus

This comment has been minimized.

Show comment
Hide comment
@sindresorhus

sindresorhus Sep 23, 2014

Member

@yeoman/yeoman-core

Member

sindresorhus commented Sep 23, 2014

@yeoman/yeoman-core

@phated

This comment has been minimized.

Show comment
Hide comment
@phated

phated Sep 23, 2014

👍 for vinyl, it is a great concept and the module ecosystem is pretty good.

phated commented Sep 23, 2014

👍 for vinyl, it is a great concept and the module ecosystem is pretty good.

@contra

This comment has been minimized.

Show comment
Hide comment
@contra

contra Sep 23, 2014

Also 👍 on using vinyl. Not sure how the "conflicter" works in yeoman (haven't used it before) but this might be of interest for only replacing non-existing or newer files https://github.com/Marak/vinyl-diff

contra commented Sep 23, 2014

Also 👍 on using vinyl. Not sure how the "conflicter" works in yeoman (haven't used it before) but this might be of interest for only replacing non-existing or newer files https://github.com/Marak/vinyl-diff

@SBoudrias

This comment has been minimized.

Show comment
Hide comment
@SBoudrias

SBoudrias Sep 23, 2014

Member

I'm not sure I see any advantage for us to use vinyl streams. I feel like most action a generator take are single file based, we don't do batch processing like a build system would.

Any example of how this would apply in the context of a generator system?

(I'm all in for vinyl format, my concern is about vinyl-streams)

Member

SBoudrias commented Sep 23, 2014

I'm not sure I see any advantage for us to use vinyl streams. I feel like most action a generator take are single file based, we don't do batch processing like a build system would.

Any example of how this would apply in the context of a generator system?

(I'm all in for vinyl format, my concern is about vinyl-streams)

@SBoudrias

This comment has been minimized.

Show comment
Hide comment
@SBoudrias

SBoudrias Oct 12, 2014

Member

Ok, so I've been pondering on this idea. And here's how I see the API:

In Memory FS

This will be one module. And instance will be globally shared on the environment object so multiple generators can access it.

It will basically give access to 3 functionality: read, write, commit.

// Basic operations
memFs.read('file.js'); // return a vinyl file
memFs.write(vinylFile);

// When ready to write to disk
memFs.commit();

The read action will either return a vinyl file it loaded previously, or create a new one - either from disk or an empty vinyl file.

The commit action will take every file in memory and apply their state to the disk. Each vinyl file can have three state: "idle", "write", "delete". Idle are ignored, write are written and delete are removed from disk.

Once a file action is committed to disk, it is removed from the memory FS. The way it does it is IMO an afterthought and could be anything (I'm alright with using streams here).

One thing I'm unsure is how we're going to handle conflicts at the commit phase. I was thinking of maybe allowing a way to fork a current memory FS so we can determine which one can be written and which one need user validation. I'm still thinking about this one. (And we might just end up removing the commit phase from this module)

This module will be used by the system internally. Generator authors won't need to understand or touch it.

Edition commands

These will be utilities to transform the in memory FS. They're going to be generator specific (so that the versioning can change more easily). These are going to be called by the generator author (so they are the public API).

I guess we know what usual interface we want on this end (write, copy, etc). We'll also be able to retrofit most of our current file helpers to use the memory FS system so even old generators will be able to benefit from this update!

Feedback

That's your turn :)

Member

SBoudrias commented Oct 12, 2014

Ok, so I've been pondering on this idea. And here's how I see the API:

In Memory FS

This will be one module. And instance will be globally shared on the environment object so multiple generators can access it.

It will basically give access to 3 functionality: read, write, commit.

// Basic operations
memFs.read('file.js'); // return a vinyl file
memFs.write(vinylFile);

// When ready to write to disk
memFs.commit();

The read action will either return a vinyl file it loaded previously, or create a new one - either from disk or an empty vinyl file.

The commit action will take every file in memory and apply their state to the disk. Each vinyl file can have three state: "idle", "write", "delete". Idle are ignored, write are written and delete are removed from disk.

Once a file action is committed to disk, it is removed from the memory FS. The way it does it is IMO an afterthought and could be anything (I'm alright with using streams here).

One thing I'm unsure is how we're going to handle conflicts at the commit phase. I was thinking of maybe allowing a way to fork a current memory FS so we can determine which one can be written and which one need user validation. I'm still thinking about this one. (And we might just end up removing the commit phase from this module)

This module will be used by the system internally. Generator authors won't need to understand or touch it.

Edition commands

These will be utilities to transform the in memory FS. They're going to be generator specific (so that the versioning can change more easily). These are going to be called by the generator author (so they are the public API).

I guess we know what usual interface we want on this end (write, copy, etc). We'll also be able to retrofit most of our current file helpers to use the memory FS system so even old generators will be able to benefit from this update!

Feedback

That's your turn :)

@SBoudrias

This comment has been minimized.

Show comment
Hide comment
@SBoudrias

SBoudrias Oct 16, 2014

Member

After reconsideration, the store won't know how to commit itself to disk. It'll only provide iteration helpers (maybe a fs.each()), and another module completely will be in charge of committing it to disk (probably via a stream or a reactive interface).

The neat thing about passing it through a stream, is that we could register and reuse any existing gulp plugins (like file beautifier, style standardization, pre-compilation) to process the project to the end user taste. The only "bad" point is that streams are not super flexible and concurrent compared to Rx.js - but I think the leverage we can get from reusing gulp plugins is worth it.

Member

SBoudrias commented Oct 16, 2014

After reconsideration, the store won't know how to commit itself to disk. It'll only provide iteration helpers (maybe a fs.each()), and another module completely will be in charge of committing it to disk (probably via a stream or a reactive interface).

The neat thing about passing it through a stream, is that we could register and reuse any existing gulp plugins (like file beautifier, style standardization, pre-compilation) to process the project to the end user taste. The only "bad" point is that streams are not super flexible and concurrent compared to Rx.js - but I think the leverage we can get from reusing gulp plugins is worth it.

@SBoudrias

This comment has been minimized.

Show comment
Hide comment
@SBoudrias
Member

SBoudrias commented Oct 17, 2014

@SBoudrias

This comment has been minimized.

Show comment
Hide comment
@SBoudrias

SBoudrias Oct 17, 2014

Member

An issue I see we'll have using gulp plugins is that most of them expect a specific type to be passed in. They usually don't check the type and decide whether or not to pass it through.

In the case of yeoman, we'd just stream all the file and we'd expect plugins to either pass through or apply their effect if they support the file type.

Do you think we could just fix gulp-plugins to work this way or is there really an incompatible mind set?

(cc @contra)

Member

SBoudrias commented Oct 17, 2014

An issue I see we'll have using gulp plugins is that most of them expect a specific type to be passed in. They usually don't check the type and decide whether or not to pass it through.

In the case of yeoman, we'd just stream all the file and we'd expect plugins to either pass through or apply their effect if they support the file type.

Do you think we could just fix gulp-plugins to work this way or is there really an incompatible mind set?

(cc @contra)

@contra

This comment has been minimized.

Show comment
Hide comment
@contra

contra Oct 17, 2014

@SBoudrias Incompatible mindset, it's up to the user to decide which files to pass where in gulp. If you pipe a folder with 10 types of files to a pipeline with 10 plugins in it, you need to use gulp-if to only pass *.coffee to coffee and so on

contra commented Oct 17, 2014

@SBoudrias Incompatible mindset, it's up to the user to decide which files to pass where in gulp. If you pipe a folder with 10 types of files to a pipeline with 10 plugins in it, you need to use gulp-if to only pass *.coffee to coffee and so on

@SBoudrias

This comment has been minimized.

Show comment
Hide comment
@SBoudrias

SBoudrias Nov 14, 2014

Member

Now part of 0.18.0!

Member

SBoudrias commented Nov 14, 2014

Now part of 0.18.0!

@SBoudrias SBoudrias closed this Nov 14, 2014

@hemanth

This comment has been minimized.

Show comment
Hide comment
@hemanth

hemanth Nov 14, 2014

Member

Uber kool! @SBoudrias++

Member

hemanth commented Nov 14, 2014

Uber kool! @SBoudrias++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment