Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] `yarn workspaces foreach` features #62

Open
ProLoser opened this Issue Apr 9, 2019 · 12 comments

Comments

Projects
None yet
3 participants
@ProLoser
Copy link

commented Apr 9, 2019

  • I'd be willing to implement this feature

Background

Several feature requests repeatedly pop up for using yarn with workspaces. Most solutions involve using external packages or weird script hacks, voiding some of the merit for yarn workspaces. It would be nice if yarn workspaces commands had slightly more flexibility to remove external dependencies.

See relevant code

Desired Features
List of yarn workspaces foreach (previously yarn workspaces) desired features (with potential API):

  • --parallel Run scripts in parallel instead of sequentially
  • --skip-missing Skip packages that don't have the specified script
  • --prefix Prefix / colorize script output per package
  • Execution ordering (serial only?)

Drawbacks
yarn workspaces foreach is a dead-simple command that simply forwards an arbitrary yarn command to each package. It has no knowledge of what sort of command is being executed (npm run, etc) so detecting missing scripts from here feels messy.

Additionally, I'm not sure how to do await in parallel. Thinking about Promise.all() assuming the Promise API is available within Yarn(?).

We'll need a unique way to isolate missing script errors from regular script errors.

Describe alternatives you've considered

  • Promise.all() seems promising if the Promise API is available in yarn
  • yarn run [script] --skip-missing might be a better place for the flag. While useless when used directly, it allows us to simply do: yarn workspaces foreach [workspace] run [script] --skip-missing
    Note that this approach means you must ALWAYS pass --skip-missing when calling the yarn workspaces foreach command. This could potentially be stored in a config but is that really a good idea?

Additional context
Trying to adopt Yarn Workspaces for my new project, lots of redundant issues for these features exist on the v1 repo.

I'd be willing to implement a solution if I could get some suggestions/consensus on how to tackle the problem. I'm not sure the best way to go about doing it right now.

@ProLoser ProLoser added the enhancement label Apr 9, 2019

@arcanis

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

Hey, thanks for opening an issue about this!

Overall I see the point of those features. Now that we've changed the CLI framework it's also much clearer how those options would interact with "proxied" commands (they would simply have to be put before the final command name component), so I don't have as many objections.

One thing is that instead of --colors I believe it should be --prefix (and the prefixed being colored or not should then be based on the value of the enableColors configuration settings). One potential extra option I can see would be --interlaced which would guarantee that the output from each running process wouldn't be conflated with the other ones on the same terminal row - but that can be done later.

Could you implement this command in a plugin? I think we would eventually want to bundle it by default anyway, but I think it would be worthwhile to have a plugin whose only goal is to contain the commands specific to workspaces, if only to make them more easily discoverable by the other contributors. The plugin documentation is currently here, and feel free to ask me any question on Discord.

yarn workspaces foreach is a dead-simple command that simply forwards an arbitrary yarn command to each package. It has no knowledge of what sort of command is being executed (npm run, etc) so detecting missing scripts from here feels messy.

This actually has a pretty simple solution - now that the CLI uses clipanion, we can simply override yarn workspaces foreach to declare a special command named yarn workspaces foreach run (and we can hide it from the help). The added benefit being that --skip-missing wouldn't be a valid option for the generic yarn workspaces foreach command (which makes sense it wouldn't have any effect).

Additionally, I'm not sure how to do await in parallel. Thinking about Promise.all() assuming the Promise API is available within Yarn(?).

Yep, await Promise.all is fair game (the v2 targets Node 8+ so we even have native async/await).

This could potentially be stored in a config but is that really a good idea?

I'd prefer not - this type of settings is dangerous since it creates vastly different environments and might break users expectations when moving from a project to another.

I'd be willing to implement a solution if I could get some suggestions/consensus on how to tackle the problem. I'm not sure the best way to go about doing it right now.

My suggestion:

  • Implement a plugin named @berry/plugin-workspaces-tools
  • yarn workspaces foreach <name> [... rest] [--with-prefix] [--parallel]
  • yarn workspaces foreach run <name> [... rest] [--with-prefix] [--parallel] [--skip-missing]
  • Since those two functions have similar options, I'd suggest to write a streamUtils.ts file inside your new plugin (similar to the other util libraries in the repo) that you could use to share the same logic across both commands.

How does that sound?

@ProLoser

This comment has been minimized.

Copy link
Author

commented Apr 10, 2019

I incorporated your feedback regarding --prefix as I had the same thoughts. --interlaced is a complex problem that's only relevant in the --parallel scenario and to be quite honest, I don't want to tackle it right away because it raises questions around deferring the log output until each task is completed and then dumping it all at once. Not sure if that's a good approach or not.

Also, do I have to worry about how many processes get spun up like if someone has 50 packages? Maybe we need --parallel=5 or some sort of value that represents the amount of concurrency, but then I'd have to do batching, etc. Maybe I just won't worry about it for now, just thinking out loud.

The documentation links to example plugins are broken. Can you point me to some examples I can reference for what streamUtils.ts should look like and how much effort it is to create a whole new plugin?

@ProLoser

This comment has been minimized.

Copy link
Author

commented Apr 10, 2019

I know you wanted to do it as a separate command but I'm curious what would need to be changed for this PoC to work:

import {Configuration, PluginConfiguration, Project} from '@berry/core';

// eslint-disable-next-line arca/no-default-export
export default (clipanion: any, pluginConfiguration: PluginConfiguration) => clipanion

  .command(`workspaces foreach [... args]`)
  .flags({proxyArguments: true, parallel: false, missingScripts: false})

  .categorize(`Workspace commands`)
  .describe(`run a command on all workspaces`)

  .action(async ({cwd, args, ... env}: {cwd: string, args: Array<string>}) => {
    const configuration = await Configuration.find(cwd, pluginConfiguration);
    const {project} = await Project.find(configuration, cwd);
	let workspaces = project.workspacesByCwd.keys();

	// This feels really hacky
	if (args[0].toLowerCase() === 'run' && !args.missingScripts) {
		workspaces = workspaces.filter( cwd => {
			let pkg = require(`${cwd}/package.json`);
			return pkg.scripts && pkg.scripts[args[1]]
		});
	}

	if (args.parallel) {
		let runs = workspaces.map( cwd => clipanion.run(null, args, {... env, cwd}) );

		return Promise.all(runs);

	} else {
	    for (const cwd of project.workspacesByCwd.keys())
	      await clipanion.run(null, args, {... env, cwd});
	
	    return 0;
	}
  });

Is Promise global? I think people are just going to file more bugs around the various ways they do npm run [script] so I'm not sure how to tackle this effectively.

@arcanis

This comment has been minimized.

Copy link
Member

commented Apr 10, 2019

--interlaced is a complex problem that's only relevant in the --parallel scenario and to be quite honest, I don't want to tackle it right away because it raises questions around deferring the log output until each task is completed and then dumping it all at once.

No, you'd instead:

  • Instantiate a new PassThrough instance for each workspace
  • Run the subprocess on the given workspace, and use the new stream as the subprocess stdout
  • Listen on the workspace' passthrough stdout and use string_decoder to decode it as it is received
  • Each time a full line has been decoded you print it along with the prefix

This way you wouldn't have to wait for the whole command to finish. To change the stdout you just need to pass it to the clipanion env when forwarding the request:

const stdout = new PassThrough();
await clipanion.run(null, args, {... env, cwd, stdout});

Maybe we need --parallel=5 or some sort of value that represents the amount of concurrency, but then I'd have to do batching, etc

Use p-limit to enforce a maximal amount of concurrent promises (we already use it in the core; use the same version range), and put the limit to 5 by default. It'll be easy enough to expand that later if that's worth the while (I suspect 5 will be a reasonable default).

// This feels really hacky
if (args[0].toLowerCase() === 'run' && !args.missingScripts) {
	workspaces = workspaces.filter( cwd => {
		let pkg = require(`${cwd}/package.json`);
		return pkg.scripts && pkg.scripts[args[1]]
	});
}

Instead of manually checking whether the first arg is run, just create another command with the right name. Clipanion should be smart enough to use the first argument by itself in order to figure out which command should the call be routed to.

The documentation links to example plugins are broken. Can you point me to some examples I can reference for what streamUtils.ts should look like and how much effort it is to create a whole new plugin?

All the Yarn commands are implemented through plugins - check the packages directory to see them all. One very simple plugin is plugin-init, which only contains the implementation for yarn init command.

I've got no idea on a good way to check if a script exists or not

Check how yarn run is doing it! It's just a matter of calling scriptUtils.hasPackageScript with the right arguments 🙂

https://github.com/yarnpkg/berry/blob/master/packages/plugin-essentials/sources/commands/run.ts#L35-L46

@bgotink

This comment has been minimized.

Copy link
Member

commented Apr 20, 2019

I've got two questions upon reading this thread:

First and foremost: if the foreach command is run like yarn workspaces foreach [run] <name> [... rest] [--with-prefix] [--parallel], how would it handle flags passed to the <name> script? To clarify what I mean: what happens if I want to run my tests serially but my test command has a --parallel flag?
I would expect the command to have syntax yarn workspaces foreach [--with-prefix] [--parallel] [run] <name> [... rest]. That implies yarn workspaces foreach test --parallel runs yarn test --parallel serially in all workspaces while yarn workspaces foreach --parallel test runs test in parallel in all workspaces.
An alternative solution would be to add -- if you need to pass --with-prefix or --parallel to the package script rather than the workspace-foreach command, but the fact that yarn doesn't have -- is imo a great feature. The amount of time coworkers have lost due to npm swallowing options they pass to run-scripts is too damn high.

Second question: in what order will the foreach command run through the workspaces?Alphabetically by workspace path, alphabetically by package name, order in which they're found in the workspaces entry in the root manifest…
Personally the order I value most is a toplogical sort, running through dependencies before dependants. If package foo depends on bar, I don't think it makes sense for yarn workspace foreach test to run the tests in foo first. If the tests in bar fail, chances are the tests in foo will also fail, but it's the failure in bar I want to know about. The same goes for building an entire workspace, e.g. using yarn workspace foreach tsc: dependants need to go first, otherwise the build is broken because foo can't find package bar.

With respect to --interlaced: lerna calls this option --stream, which I personally prefer. The option switches between logging every line immediately vs logging the entire output at completion of a command, the fact that output becomes interlaced if you enable the flag is a side effect of the flag rather than its goal. "Stream" is also easier to understand if you're not a native English speaker and you've never wondered what the i in the 1080i of a torrent's filename stands for.

@arcanis

This comment has been minimized.

Copy link
Member

commented Apr 20, 2019

First and foremost: if the foreach command is run like yarn workspaces foreach [run] <name> [... rest] [--with-prefix] [--parallel], how would it handle flags passed to the <name> script? To clarify what I mean: what happens if I want to run my tests serially but my test command has a --parallel flag?

The way I made it work in clipanion is that options for the foreach command must be passed before the very last required positional argument. So the following would send --parallel to foreach:

$> yarn workspaces foreach run --parallel hello

While the following would send --parallel to hello:

$> yarn workspaces foreach run hello --parallel

There might be a bug or two around it (there's an extra subtlety in that run would be an override of another command), but I'm pretty sure that can be fixed in clipanion without too much trouble.

An alternative solution would be to add -- if you need to pass --with-prefix or --parallel to the package script rather than the workspace-foreach command, but the fact that yarn doesn't have -- is imo a great feature. The amount of time coworkers have lost due to npm swallowing options they pass to run-scripts is too damn high.

Yep I agree - proxy commands are a large part of why I made clipanion in the first place!

Second question: in what order will the foreach command run through the workspaces?Alphabetically by workspace path, alphabetically by package name, order in which they're found in the workspaces entry in the root manifest…

That's a great question - I guess we should standardize it and add a test since it would be easy to break it when doing a refactoring. Topological first degree then alphabetical second degree sort would make the most sense, like you said. The project.workspaces field might actually be already sorted in an appropriate order (while that's not covered by tests; probably should).

"Stream" is also easier to understand if you're not a native English speaker and you've never wondered what the i in the 1080i of a torrent's filename stands for.

I learned about interlaced via photoshop and png files! 😄 I don't have a strong opinion either way - I tend to prefer the word interlaced (less ambiguous imo than stream), but the semantic argument makes sense. The implementer gets to pick!

@ProLoser

This comment has been minimized.

Copy link
Author

commented Apr 22, 2019

I am going to add ordering as a feature to look into because I agree to it's relevance, however I'm not sure the best way to tackle it.
If we run in parallel, does order matter? Feels non deterministic unless we introduce deferring or some sequencing, which will make things more complex.
If we don't care about parallel order, is the dependency map really good enough? As soon as someone wants an order other than what's declared in the dependency map, this feature hits a wall.

@bgotink

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

Order matters, also in parallel. We always run our lerna scripts with lerna run --stream --concurrency 4 prepublish, and we need dependencies to be finished before dependants run.

Could we expose the ordering via plugins? That would allow us to keep the foreach command simple (we can default to alphabetically or in the order found in the workspace) and it allows for more complicated orderings (e.g. topological sort) to be implemented as separate plugins.

EDIT: replaced --parallel with --concurrency, I got the flags mixed up.

@arcanis

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

Hm, I feel like there's a finite enough number of use case to not make it a plugin. We just have to figure out the right design 🤔

I also didn't realize that you considered that --parallel w/ topological order would require to block on the inter-dependencies. That makes sense in retrospect, but imo that should be an option as I don't believe it's the most frequent case (for example you might want to run babel on all workspaces, in which case you don't care whether they depend on each other or not).

Given the discussion, here's what I suggest:

  • by default (sequential), execution is sorted in topological order (there's virtually no drawback doing so, since the execution would take the same time otherwise)

  • [-p,--parallel] will execute everything all at once (with a reasonable default throttle, let's say Math.max(1, os.cpus().length / 2); we can make it a configuration settings later if needed). The output will be buffered and printed in the sequential order regardless of which process ends first.

  • [-p,--parallel] [--with-dependencies] will do the same thing but it will block a workspace execution until all its dependents have been successfully executed

  • [-p,--parallel] [-I,--interlaced] will change the output mode - the output will still be buffered but on a line basis. Each time a new line is complete, it's flushed.

  • [-P,--prefixed] will change the output mode. Each printed line will be prefixed by a colored prefix component similar to [packages/berry-cli]: Running "babel".

  • [--required] will emit an error if a workspace doesn't declare the specified script. On a second thought I agree this is not a desirable behavior by default, hence why I reverted --skip-missing.

Do you see a design flaw in those options?

@ProLoser

This comment has been minimized.

Copy link
Author

commented Apr 23, 2019

I have very mixed feelings about --with-dependencies opening a can of worms for more bugs and maintenence. Honestly if you care about order, you can nest calls or serial and parallel. I think hard-coding dependency based resolution is only one possible way you may wish to do ordering, and is so non-versatile that even within the same project I expect people to want to almost immediately modify the default dependency order.

Instead I'd rather suggest people bundle calls of serial and parallel into script tasks. If we have defaults and flags, then customization to ordering requires more flags, more options, it just overcomplicates everything. If you care about any particular ordering you should compose the ordering explicitly rather than relying on Yarn (which has much bigger problems to focus on) to just figure out what it thinks you want it to do.

Otherwise, I agree with the rest of the API changes. If people want more complex task execution logic, they should resort to gulp or something more powerful.

@bgotink to respond to your specific use-case: lerna run --stream --parallel 4 prepublish
I'm going to break this down into what I'm assuming you're trying to do:
root/prepublish, package-a/prepublish, package-b/prepublish,
root/publish, package-a/publish, package-b/publish,
root/postpublish, package-a/postpublish, package-b/postpublish

Assuming this even remotely resembles what you're trying to do, this to me is far more complex than a dependency resolution order. Do you want each package to go pre, pub, post or do you want to go in phases pre, pre, pre, pub, pub, pub, post, post, post which I would imagine to be more likely and something you could do by serially making smaller parallel calls.

@bgotink

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

That's not what I'm trying to do. The command

lerna run --stream --concurrency 4 prepublish

runs preprepublish, prepublish and postprepublish in all packages. Note that we don't actually have preprepublish and postprepublish scripts, I just included those for reference.

Supposing a workspace consisting of packages

demo/
packages/
  dependency-a
  dependency-b
  dependant-x
  dependant-y

where the dependency graph is

demo
   ├── packages/dependant-y ──┐
   │                          ├── packages/dependency-a
   └── packages/dependant-x ──┘
                          └────── packages/dependency-b

Running lerna run --stream --concurrency 4 myscript could end up like so:

premyscript  in packages/dependency-a
premyscript  in packages/dependency-b
myscript     in packages/dependency-a
myscript     in packages/dependency-b
postmyscript in packages/dependency-a
postmyscript in packages/dependency-b
premyscript  in packages/dependant-x
premyscript  in packages/dependant-y
myscript     in packages/dependant-x
myscript     in packages/dependant-y
postmyscript in packages/dependant-x
postmyscript in packages/dependant-y
premyscript  in demo
myscript     in demo
postmyscript in demo

if the scripts take an identical time to run in all packages. The order is not deterministic because the timings affect the order, but we do know that the premyscript, myscript and postmyscript of a dependant will only get executed after the script has run in all of its dependencies and dev dependencies. (Lerna doesn't take peer dependencies into account, as they expect peer dependencies to always be listed in dev dependencies.)

The pre and post script are run immediately before and after the script in every package, not as a whole. Lerna actually uses yarn behind the screens to execute the run-scripts.


Honestly if you care about order, you can nest calls or serial and parallel.

That would require an extra option in the foreach command: a way to filter what workspaces the script is run in. It also delegates a bunch of work the yarn user. It would also require quite a bit of code on the user's side to sort the packages properly.

Instead I'd rather suggest people bundle calls of serial and parallel into script tasks

Isn't the purpose of this command to prevent people from writing the same script over and over again? Without topological sort this command boils down to a simple BASH script:

  • Parallel:

    parallel cd '{}' \; yarn run "$@" :::: <(yarn workspaces list)
    # add -j <number> to limit number of parallel jobs to start
    # to interlace output, add --line-buffer
    # use --tag to prefix every line with the package name when interlaced

    assuming GNU parallel is installed.

  • Serial:

    for dir in $(yarn workspaces list); do
      pushd dir >/dev/null 2>&1
      yarn run "$@"
      popd >/dev/null 2>&1
    done

To be very explicit: I'm not refuting the usefulness of the foreach command. I want to prevent people from having to write the scripts above over and over again. However, providing a foreach command without providing topological sort feels like a mistake to me. It's like selling a ford mustang with a 15 liter tank: cool tool, but you won't get very far.

[...] even within the same project I expect people to want to almost immediately modify the default dependency order.

Could you clarify this a bit?

Lerna provides only two orders: topological sort (default) and "I don't care about order" (--no-sort). They have not added any other orders in the years the project has been active. However, their project is an extra command to install, so maybe people in need of other orders simply don't use lerna…

@ProLoser

This comment has been minimized.

Copy link
Author

commented Apr 25, 2019

I completely disagree with the order you surfaced:

premyscript  in packages/dependency-a
premyscript  in packages/dependency-b
myscript     in packages/dependency-a
myscript     in packages/dependency-b
postmyscript in packages/dependency-a
postmyscript in packages/dependency-b
premyscript  in packages/dependant-x
premyscript  in packages/dependant-y
myscript     in packages/dependant-x
myscript     in packages/dependant-y
postmyscript in packages/dependant-x
postmyscript in packages/dependant-y
premyscript  in demo
myscript     in demo
postmyscript in demo

vs

premyscript  in demo // outer pre
premyscript  in packages/dependency-a // first package's first dependency
premyscript  in packages/dependant-y // first package's dependencies are predone
premyscript  in packages/dependency-b // second package's unfinished dependency
premyscript  in packages/dependant-x // second package's dependencies are predone

myscript     in demo // outer main
myscript     in packages/dependency-a
myscript     in packages/dependant-y
myscript     in packages/dependency-b
myscript     in packages/dependant-x

postmyscript in packages/dependency-a
postmyscript in packages/dependant-y
postmyscript in packages/dependency-b
postmyscript in packages/dependant-x

postmyscript in demo // outer post

vs

premyscript  in packages/dependency-a
myscript     in packages/dependency-a
postmyscript in packages/dependency-a

premyscript  in packages/dependency-b
myscript     in packages/dependency-b
postmyscript in packages/dependency-b

premyscript  in packages/dependant-x
myscript     in packages/dependant-x
postmyscript in packages/dependant-x

premyscript  in packages/dependant-y
myscript     in packages/dependant-y
postmyscript in packages/dependant-y

premyscript  in demo
myscript     in demo
postmyscript in demo

vs

premyscript  in demo // outer
premyscript  in packages/dependency-a
premyscript  in packages/dependency-b
premyscript  in packages/dependant-x
premyscript  in packages/dependant-y

myscript     in demo // outer
myscript     in packages/dependency-a
myscript     in packages/dependency-b
myscript     in packages/dependant-x
myscript     in packages/dependant-y

postmyscript in packages/dependency-a
postmyscript in packages/dependency-b
postmyscript in packages/dependant-x
postmyscript in packages/dependant-y
postmyscript in demo // outer, but position seems arbitrary

The order you demonstrated has a lot less intuitable consistency because you're mashing up pre,main,post ordering across packages. I can see problems with almost all of these variations ^, I don't believe this is something that can be tackled in one straight-forward way. And I believe 2 different commands may not want the same order, so yet you'd have to do some sort of lifting yourself (or with another library) one way or another.

There are too many arbitrary decisions around the root hooks firing first, last, or interspersed. It's an arbitrary decision to have ALL the dependencies complete their life-cycles before moving onto their dependents.

In a task system like gulp, every single kind of task can list explicit dependencies, and different tasks (such as lint, build, install, etc) can have completely different dependencies and ordering from one another, despite the dependency heirarchy of the packages themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.