Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for including Web Workers #46

Closed
siku2 opened this issue Sep 16, 2020 · 15 comments · Fixed by #285
Closed

Support for including Web Workers #46

siku2 opened this issue Sep 16, 2020 · 15 comments · Fixed by #285
Labels
assets Build pipelines for specific asset types cli Trunk CLI application discussion This item needs some discussion needs design This item needs design work
Projects

Comments

@siku2
Copy link
Contributor

siku2 commented Sep 16, 2020

My reference for this issue is Yew's multi_thread example which requires two WASM binaries. One for the actual application and one for the Web Worker.
Up until now building applications with web workers has been a real pain but with trunk we have the chance to turn this around.

Requirements

The following is an objective list of problems that need to be solved to support Web Workers.

1. Multiple binary targets

Trunk should handle crates with multiple binary targets gracefully.
Currently, invoking trunk serve on the aforementioned example results in this error:

could not read file `$TARGET/wasm32-unknown-unknown/debug/multi_thread.wasm`
📡 server running at http://127.0.0.1:8080/
  ❌ build finished with errors

The multi_thread.wasm could not be found because there are two WASM binaries called app.wasm and worker.wasm.
While it isn't strictly related to this issue, I think this behaviour could be improved.

2. Letting the app know where to find the Web Worker

Because of how Web Worker are created[link] we need to know the URL of the worker's Javascript file while compiling the app.

3. Building Web Workers with the required Javascript code to start them

Trunk currently runs wasm-bindgen with --target=web. The generated js code exports an initializer function which needs to be called to start the program. This doesn't work for Web Workers as they need to be self-executing.

Proposal

This is just one potential way of solving the issues mentioned above to get the discussion started.

Since Trunk already has a config file, I think it makes perfect sense to use it for this.
We add a new section to [build], let's call it "targets", which is used to configure binary targets.
To take care of 3., we can simply build the WASM with --target=no-modules.
Since we still want to have normal binaries, let's give each target a type key with the following two values:

  • "script" - The default value, this represents the current strategy.
    Builds the binary with --target=web and includes it in index.html.
  • "worker" - The new strategy which builds the value with --target=no-modules.
    The output is moved to dist/ but not added to index.html

For 2. we need to allow targets to side-step hashing so we can assign static URLs to them.
In the future this could use a similar approach to #9 so we don't miss out on caching, but for now we can just add a key "output" to the target which manually sets the output path.

This is what it would look like when applied to the multi_thread example:

[build.targets.app]
type = "script"

[build.targets.worker]
type = "worker"
output = "worker.js"

Trunk should reject projects with multiple binaries (without appropriate configuration) outright and point the user to documentation on the subject.
It might be tempting to guess which binary is which based on certain keywords (like "app" and "worker") but this probably does more harm than good in the long run.
This satisfies 1..


This is quite a high priority feature for Yew and I would love to help out with this.

@thedodd thedodd added assets Build pipelines for specific asset types cli Trunk CLI application discussion This item needs some discussion needs design This item needs design work labels Sep 16, 2020
@thedodd
Copy link
Member

thedodd commented Sep 16, 2020

@siku2 this is really excellent. Thanks for opening the issue. Everything you've proposed above makes sense, and seems quite doable. I would like to propose a slight modification.

One thing that I've been trying to stick to with the overall design of Trunk is to keep the required amount of config as minimal as possible, and wherever we can, keep assets & build instructions in the source HTML. I will admit that this pattern is obviously not best for everything, but it may actually work quite nicely with this web workers system.

drive from index.html

I would propose that we update the source HTML requirements to allow for the inclusion of the following pattern:

<html>
  <head>
    <link rel="trunk-app" data-trunk-bin="app" href="path/to/Cargo.toml"/>
    <link rel="trunk-worker" data-trunk-bin="worker" href="path/to/Cargo.toml"/>
  </head>
</html>

The implications of adopting this pattern is that we would no longer really need or want the --manifest-path or -p/--package options (which we were discussing elsewhere). This might be nice, as the only thing which users would need to be configured would be the path to the Cargo.toml and we are already using the cargo metadata crate which gives us the paths to the target dir and everything else we need, so that might be nice.

This would be a breaking change. However, now is the time for changes like that. I actually considered using this pattern originally, but I wasn't 100% sure about it. This new requirement for web workers definitely makes me lean more in that direction.

Now, to be sure, we will still have the Trunk.toml and such. This will just simplify things a bit so that users only need to have this info declared in their index.html file, and then they're off to the races.

questions / feedback

  • @siku2 (and anyone else): I'm interested to know what you think about rel="trunk-app" & data-trunk-bin="app" patterns.
    • does this sort of pattern make sense? Is it intuitive?
    • do you think we should adopt a different nomenclature? I like using the rel & data* attributes as they are standard HTML attributes, and and perfect for things of this nature. Doesn't diverge for normal HTML patterns.

@thedodd
Copy link
Member

thedodd commented Sep 16, 2020

@MartinKavik any thoughts on the above items as well?

@thedodd
Copy link
Member

thedodd commented Sep 16, 2020

@siku2 this also makes me realize that we should be leveraging the info from cargo-metadata to determine the name bin for locating the WASM output. So far, I've just been a normalized form of the crate's name, however that will obviously not work in every case. I'll open a bug for this bit.

@siku2
Copy link
Contributor Author

siku2 commented Sep 16, 2020

does this sort of pattern make sense? Is it intuitive?

I love the idea of using index.html as the main configuration source for "how to put the app together" and Trunk.toml mainly for command line flags.

There still needs to be an attribute to specify the output path to satisfy the second requirement.

The only nitpick I have is that "href" should be optional and if specified, may point to the parent directory of Cargo.toml instead so as to minimise verbosity.
The same idea could be used for the path to index.html so that trunk -- some/path is the same as trunk -- some/path/index.html if some/path is a directory.
This would remove the need for a --package flag entirely (assuming that the dist dir is relative to index.html).

@thedodd
Copy link
Member

thedodd commented Sep 16, 2020

There still needs to be an attribute to specify the output path to satisfy the second requirement.

Do you mind expounding a bit more on what our requirements are for this? Mainly:

  • outside of the context of Trunk, how were workers loaded previously in the Yew ecosystem?
  • isn't it still required that there be a JS loader which loads & executes the WASM Worker?
  • if the above is true, then wouldn't it be fine to continue to compile the worker with --target=web (as that would handle the importing and initialization)?
  • it seems that such is not the case based on your statements above, so what am I overlooking?

As far as the optional href bits, I can agree. I am tempted to say that the entire <link .../> should be optional, and it should just default to the cargo package in the CWD, and use the default bin for that package.

  • for multiple bins, the user will be directed to include a link in the HTML with the appropriate rel="trunk-app" data-trunk-bin="binfoo" attrs.
  • then we can have a nice sane default (package in CWD), drop the --manifest-path/-p/--package opts, and the only thing the user will normally need to point to is the index.html if it is not in the CWD.

Agreed on supporting the pattern of pointing to the dir of the index.html. Same for Cargo.toml.

This would remove the need for a --package flag entirely (assuming that the dist dir is relative to index.html).

Agreed that it would remove the need for the --package flag; however the dist dir currently defaults to the CWD. I'm tempted to say that the most logical default would be the parent dir of the Cargo.toml given that the location of the index.html can change from project to project. Having it default to the parent dir of the Cargo.toml would potentially cut down on surprises even more. The two locations will be the same a majority of the time as well, I would wager.

@siku2
Copy link
Contributor Author

siku2 commented Sep 16, 2020

outside of the context of Trunk, how were workers loaded previously in the Yew ecosystem?

The worker binary is generated using:

wasm-bindgen --target no-modules --out-name worker  path/to/wasm

The output is then included in the server directory.
When we want to start the Web Worker we call new Worker("/worker.js") where "/worker.js" is the URL of the js file generated by the previous command.
The app needs to know the exact location of the worker script in order to start it. The easiest way to achieve this for now is through 2..

isn't it still required that there be a JS loader which loads & executes the WASM Worker?

Yes, but the app already handles this internally. This isn't something I think Trunk should do because Web Workers are often created dynamically and different apps have different requirements.

if the above is true, then wouldn't it be fine to continue to compile the worker with --target=web (as that would handle the importing and initialization)?

No, we're talking about the worker itself here. The worker script needs to set up everything by itself, we don't have the ability to call the "init" function. --target=no-modules uses an IIFE to create a self-executing script.
The biggest problem with no-modules is that it doesn't support JS snippets. The only way I can think of to properly solve this is to use a JS bundler instead.


As far as the optional href bits, I can agree. I am tempted to say that the entire <link .../> should be optional, and it should just default to the cargo package in the CWD, and use the default bin for that package.

It should definitely be optional! No need to introduce the complexity for simple apps.


I'm tempted to say that the most logical default would be the parent dir of the Cargo.toml given that the location of the index.html can change from project to project. Having it default to the parent dir of the Cargo.toml would potentially cut down on surprises even more. The two locations will be the same a majority of the time as well, I would wager.

The issue there is that you might want to support something like this in the future:

my_app/
  src/
  Cargo.toml

my_worker/
  src/
  Cargo.toml

index.html

where index.html contains:

<link rel="trunk-app" href="my_app" />
<link rel="trunk-worker" href="my_worker" />

Where do you place the dist directory now?

I realize my suggestion of placing it relative to the index file isn't ideal for a setup like this:

static/
  index.html

src/

Cargo.toml

which would result in static/dist...
For now I guess placing it relative to Cargo.toml is the better option.

@MartinKavik
Copy link
Contributor

@MartinKavik any thoughts on the above items as well?

  1. I would definitely use index.html instead of commands / config files to make the configuration more declarative and to keep it in one place.

    • However HTML is pretty dangerous because it's untyped so we will need good linters and error messages in the future and make it as simple as possible. (That's why I was planning to use Rust instead of HTML in Seeder btw.)
  2. According to MDN docs - "WebAssembly is not yet integrated with <script type='module'> or ES2015 import statements" and this library, it looks like the preferred tag should be script instead of link to make it future-proof. Also I can imagine someone will ask in the future how he should use something like <link rel="prefetch" as="wasm" ..>.

  3. I agree that src/href should also accept the parent folder of Cargo.toml and we should have a reasonable default settings so you don't have to write script/link for wasm at all if possible.

  4. If I understand previous comments correctly, then trunk-worker generates a simple WASM file and trunk doesn't run any worker-specific logic. In this case, I think we shouldn't name it "worker" and "app" - imagine we've split our app into WASM parts to improve performance by lazy loading - what is "app" and "worker" in this example? I would declare them the same way, only allow to optionaly "mark" one of them as the main/entry/start/mount point. See example below.

So according to the points above I would suggest for setup like:

my_app/
  src/
  Cargo.toml

my_worker/
  src/
  Cargo.toml

index.html

index.html with:

<script type="wasm" src="my_app" data-trunk-main>
<script type="wasm" src="my_worker">

Notes:

@thedodd
Copy link
Member

thedodd commented Sep 17, 2020

For now I guess placing it relative to Cargo.toml is the better option.

@siku2 your example of having two different crates — one for the app and another for the worker — is definitely a good one, and I had considered that, and I drew the same conclusion you did, so sounds like that may indeed be best.

The biggest problem with no-modules is that it doesn't support JS snippets.

@siku2 thanks for all of the info on the worker setup and all. That makes sense. I'll do some experimentation to see if there may be a simple way to solve these issues. Hopefully we can cross this bridge without a JS bundler ... we'll see.


@MartinKavik great feedback. I agree on most points.

it looks like the preferred tag should be script instead of link to make it future-proof

The main reason I chose to use links here is because ultimately the output HTML (which is what the browser will load) doesn't even include the links at all. Trunk uses the links as "directives" but then removes them from the DOM and then writes the script tag for the JS loader, indirectly loading the WASM. So I think in this case, we might already be doing what you suggested. The links are ultimately transformed into script tags by Trunk.


Ok, a distillation of action items so far coming in my next comment: Scratch that, actually one more subject I would like to discuss while we are on this:

@thedodd
Copy link
Member

thedodd commented Sep 17, 2020

Ok, one more item of discussion here. Related to some general ideas in #3, some from #28, I'm thinking that what we might want to do for the HTML portion of these updates is to stick with <link ...> as much as possible, which will typically then be transformed into some other HTML in the end.

Right now, Trunk only uses the special <link rel="trunk-dist" href=".."/> when linking to a resource which should just be copied into the dist dir, and is then removed from the HTML. We definitely want a general pattern which will survive growth and evolution. As such, perhaps we just keep this as simple as possible and go with rel="rust". This link type could support the following data-* attrs:

  • data-bin="{binName}": an optional attr which declares the specific bin to be included. This would be superfluous for single-bin crates. It can be omitted and will just default to the one bin of the crate. If the crate has 0 or more than 1 bins, Trunk will emit an error with some help info on what to do.
  • data-worker: an optional attr which takes no value. In combination with data-bin, it denotes that the specified bin should be included as a web worker. If this is not specified, then the Rust bin will be treated as the app entrypoint. Trunk will emit an error if more than 1 entrypoints exist.

With the above proposal, the HTML links would now look like this:

<html>
  <head>
    <link rel="rust" data-bin="app" href="path/to/Cargo.toml"/>
    <link rel="rust" data-bin="worker" data-worker href="path/to/Cargo.toml"/>
  </head>
</html>

I know that specifying an href= can seem tedious ... but as I've analyzed all of the different possibilities here, I feel like it is a small price to pay to just require the href= attr. What we gain is that we sidestep the many misconfigurations and confusions over defaults, what is actually being referred to &c. The href will always be relative to the index.html itself. This will also make it intuitive for being able to include the app from one crate, and then potentially multiple workers from other crates.

This pattern will then set the precedent for future link types. Potentially including WASM modules from other languages. Transpiled JS/TS ... whatever. Simple links with simple rel attrs, and then data-* attrs which can customize their behavior.

Thoughts?

@thedodd
Copy link
Member

thedodd commented Sep 17, 2020

A fucking killer combo on the above for JS/TS would be:

<link rel="js" data-build-command="build" data-watch="js/src,ts/src" href="path/to/package.json"/>

With this, Trunk would be able to build JS & TS apps (or any other JS offshoot language), and include their artifacts in the dist dir, and also properly reference the web-ready JS (hashed and all) in the output HTML.

  • rel="js" would cover standard JS, TS and pretty much any other JS offshoot language, as long as it includes a package.json and has a script (in the package.json) which can be reliably invoked to produce an artifact.
  • data-build-script="{script}" would be an optional package.json script, defaulting to build. So npm run build practically speaking.
  • data-watch={path,path,path} would be a set of comma-separated paths for Trunk to watch which will trigger builds specifically for the JS app when changes are detected under the specified paths.

This is the first JS/TS integration option which I've felt ok about. Hype!

@siku2
Copy link
Contributor Author

siku2 commented Sep 17, 2020

I'll do some experimentation to see if there may be a simple way to solve these issues. Hopefully we can cross this bridge without a JS bundler ... we'll see.

I think using no-modules for now is perfectly fine as that's what we've been using so far.
Web Workers can't use ES6 modules yet, so it makes sense that they don't support js snippets without using a bundler.


Right now, Trunk only uses the special <link rel="trunk-dist" href=".."/> when linking to a resource which should just be copied into the dist dir, and is then removed from the HTML. We definitely want a general pattern which will survive growth and evolution. As such, perhaps we just keep this as simple as possible and go with rel="rust". This link type could support the following data-* attrs:

I really like this approach and the possibilities it entails!

Perhaps, to avoid confusion, we could use a custom element like <trunk-link rel="dist" href=".." />. This is still perfectly valid HTML and allows us to use custom attributes without the data- prefix which makes it more flexible.
The main benefit here is that this creates a visible contrast between the parts that will be included in the final HTML and the parts that are stripped by Trunk.
Additionally, when a user tries to use the index.html file outside of trunk, the resulting error should hopefully become apparent more easily.
Speaking of errors, this should help with the creation of a trunk check type of lint because we have a distinction between "normal" HTML and special syntax for trunk.

It's not necessary to use the custom element for everything. For instance, it makes sense for trunk to handle Sass in <link> tags implicitly.


know that specifying an href= can seem tedious ... but as I've analyzed all of the different possibilities here, I feel like it is a small price to pay to just require the href= attr. What we gain is that we sidestep the many misconfigurations and confusions over defaults, what is actually being referred to &c. The href will always be relative to the index.html itself. This will also make it intuitive for being able to include the app from one crate, and then potentially multiple workers from other crates.

I don't see how there's a chance for misconfiguration by making href optional. In most cases href is just "./Cargo.toml" or even worse, just ".", if we allow specifying the parent directory.
The only case where specifying Cargo.toml explicitly should be necessary, is when we're using crates that aren't part of the same workspace.

If we look at the example I used before:

my_app/
  src/
  Cargo.toml

my_worker/
  src/
  Cargo.toml

Cargo.toml  <- added this file to define the workspace
index.html

Thanks to the workspace, running cargo build --bin my_app works even in the root directory because cargo considers the binaries of all members of the workspace.

So the following index.html should work:

<html>
  <head>
    <!-- Using the custom element proposal from above to get a feel for how it looks -->
    <trunk-link type="rust" bin="my_app" />
    <trunk-link type="rust" bin="my_worker" data-worker />
  </head>
</html>

I think this is much cleaner than this:

<html>
  <head>
    <link rel="rust" bin="my_app" href="./my_app/Cargo.toml" />
    <link rel="rust" bin="my_worker" data-worker href="./my_worker/Cargo.toml" />
  </head>
</html>

It's even worse when we're dealing with a single crate with multiple binary targets because then href just lists the same path twice (or more if we have multiple workers).


I feel like we haven't really tackled the second issue yet.
Some alternative approaches I've considered:

  1. Having trunk set an environment variable with the name of the output and using the env!() to read it at compile-time has the problem that Rust doesn't automatically detect changes to env vars unless explicitly forcing it using a build script.
  2. Writing an element into the HTML file which is then read at runtime causes unnecessary overhead and end up being very verbose.

The way I see it, until we have the trunk library which can take care of this, using a pre-determined output name is the best solution.


And finally:

imagine we've split our app into WASM parts to improve performance by lazy loading - what is "app" and "worker" in this example? I would declare them the same way, only allow to optionally "mark" one of them as the main/entry/start/mount point

@MartinKavik makes a good point here. The reason I singled out Web Workers for this issue is because of their special requirements.

Other types of binaries don't necessarily have to:

  1. be self-executing
  2. avoid using ES6 modules because many browsers don't yet support it

Having a "preset" for workers seems very reasonable because of this.

This isn't to say that we shouldn't support other types, but depending on what they do they have entirely different requirements.
One potential use case I can think of is using a separate WASM binary as a library which is dynamically loaded at runtime. In this case we would still want it to be compiled with --target=web so that we can use it as an ES6 module.

With the current design it should be fairly easy to add such functionality in the future, so I don't think we need to focus on this too much yet.

ranile added a commit to ranile/material-yew that referenced this issue Oct 6, 2020
Syntax highlighting is being done on main thread. This is kinda slow operation and blocks the main thread. It should be done in a web worker but that is unsupported by trunk and is a hassle to get working.
See: trunk-rs/trunk#46
@thedodd
Copy link
Member

thedodd commented Oct 6, 2020

Ok, once #73 lands, @siku2 this feature will be ready to implement. I've already got the src/pipelines/rust_worker.rs module in place as a placeholder for when we are ready to hack on this.

@rfwatson
Copy link

Just to mention that an implementation which is compatible with audio worklets (which are somewhat similar to web workers but may have some specific scope restrictions) would be very cool.

@thedodd
Copy link
Member

thedodd commented Nov 10, 2020

@rfwatson do you mind sharing some info on what you would expect Trunk to need to do in order to support audio worklets? If you want to open a new issue for it as well, that would be fine. Once I have some additional cycles I'll of course do some research on this :). Thanks in advance.

@joshyrobot
Copy link

Is there any ongoing work on this? If not, could I give it a shot?

@thedodd thedodd added this to To do in main Aug 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assets Build pipelines for specific asset types cli Trunk CLI application discussion This item needs some discussion needs design This item needs design work
Projects
main
Done
Development

Successfully merging a pull request may close this issue.

5 participants