Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: collapse Tokio sub crates into single `tokio` crate #1318

Open
carllerche opened this issue Jul 16, 2019 · 27 comments

Comments

@carllerche
Copy link
Member

commented Jul 16, 2019

There has been frustration among Tokio users regarding the number of crates pulled in when depending on Tokio. Here is an opportunity to discuss an alternative strategy. By doing this RFC, users who are happy with the current situation may express this.

Summary

Do not maintain tokio-* sub crates, instead all Tokio code will exist in a single tokio crate and components are enabled or disabled using feature flags.

For example, depending on only the timer functionality could be done with:

tokio = { version = "0.2.0", default-features = false, features = [ "timer" ] }

By default, tokio would have the same components enabled as it does today.

Motivation

Maintaining a large number of crates comes with an increased maintainership burden. Maintaining correct dependencies between crates is complex. Users feel that large number of dependencies == bloat. Additional rational can be found here.

Details

Tokio must maintain semver stability of its core APIs. This includes traits as well as some types, such as TcpStream. Tokio would like to be able to release breaking changes to less fundamental APIs without having to break the entire Tokio ecosystem.

Currently, Tokio achieves this goal by breaking up all the various components into individual crates. Doing this allows less stable components to release breaking changes without touching stable components. However, this strategy has drawbacks (see Motivation section).

In this proposal, all Tokio components would be moved into a single crate. Each component would have an associated feature flag, similar to how Tokio does it today.

Not much would change for application developers, they would still just depend on tokio and enable / disable feature flags as needed. Library developers would no longer depend on sub crates. Instead, they would depend on tokio and only pull in the features that they need.

Type stability

Core types can maintain stability between breaking semver releases. For example, if the TcpStream type does not change between Tokio version 0.2 and Tokio version 0.3, then the following steps would be taken to release 0.3:

  • Release tokio 0.3
  • Update tokio 0.2 to depend on tokio 0.3.
  • Replace the implementation of TcpStream in 0.3 by re-exporting the implementation from 0.3.
  • Release a new patch version for 0.2 including the re-exported TcpStream type from 0.3.

By doing this, TcpStream from 0.2 and 0.3 are the same type.

Drawbacks

  • The breaking change release process becomes more complicated as all untouched types must be re-exported in the old version.
  • If a user does not update the patch 0.2 patch release in the above scenario, they can end up with both 0.2 and 0.3.

Alternatives

Continue to release new crates for each component.

@carllerche carllerche added the rfc label Jul 16, 2019

@carllerche carllerche added this to the v0.2 milestone Jul 16, 2019

@saethlin

This comment has been minimized.

Copy link

commented Jul 17, 2019

In the Reddit thread on actix-web that probably prompted this, you said:

Tokio itself is split into many creates specifically to allow libs to pick and choose :) Any lib can depend on exactly the components they need and no more.

Now this sure sounds like a good thing to me, but is it known if any projects only pull in a small fraction of the tokio-* crates? It's always been my impression from looking at dependency graphs that they don't. I'm hoping someone has a good idea on how to collect data for this apart from manually auditing dependency graphs, which is the best I know how to do.

@sfackler

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2019

This seems generally reasonable, but I think we do need to figure out what to do with tokio-io. In 0.1, it's kind of "independent" of tokio in that it doesn't have any hard dependencies on the reactor, etc. Libraries like hyper and tokio-postgres use it as an interface for non-tokio runtimes.

In the new world, I think we will want some independent, small crate that defines AsyncRead/AsyncWrite/etc. Maybe that's just futures-io (there is another issue open for this)?

@ipetkov

This comment has been minimized.

Copy link
Member

commented Jul 17, 2019

Overall I think merging the crates will lead to a better experience for consuming crates.

Breaking changes will require a lot more effort to re-export things than bumping individual crates, but in my experience, there are very few "leaf" tokio crates that can be easily bumped without having to bump other crates as well. So, net-net, I don't think we'll lose much flexibility in practice.

Also feature gating everything could make it easy to accidentally include too much (à la default features), but the remedy is a quick Cargo.toml edit rather than a code refactor, which is a better experience.

@kpcyrd

This comment has been minimized.

Copy link

commented Jul 17, 2019

A single tokio crate would make it a lot easier to maintain tokio in distros that package individual crates, like debian and fedora.

I think using a single crate is also a good idea if rust is ever going to do dynamic linking per crate.

@lambda

This comment has been minimized.

Copy link

commented Jul 17, 2019

I think that merging all of the crates into a single crate may be mostly just sweeping the perceived bloat under the rug.

A quick perusal of cargo tree --no-dev-dependencies in reqwest turns up dependencies that I would expect not to be needed, such as a full tokio runtime, including a threadpool. reqwest provides a synchronous API for simplicity (it also provides an async API, but I feel like that should be an optional feature since many users are only using it for the simple synchronous API).

I think that hyper and/or reqwest include a threadpool to be able to do asynchronous name resolution when using getaddrinfo, but if you're just providing a simple blocking interface anyhow, it seems reasonable to also block on getaddinfo. I feel like a lot could be done to reduce bloat if there were some optional simpler single-threaded runtime for use cases like this that didn't have to pull in crossbeam, parking_lot, num_cpus, and a bunch of other things that make sense for a full-featured asynchronous runtime with a threadpool for CPU-bound or otherwise blocking operations, but don't make sense when just trying to use a synchronous API for a network protocol.

Another source of extra crates seems to be rand, which is depended on in a few places. For some inexplicable reason, it has non-optional dependencies on a number of different random number generation algorithms, despite most users not caring and just wanting an easy and dependable way to get random numbers.

reqwest also has a few mandatory dependencies that should probably be made optional; they could be turned on by default for convenience, but I think that if you want to avoid bloat you should be able to turn them off. For instance, it uses serde for convenience methods for parsing json from requests and building queries from URL pairs, but serde is a somewhat big dependency and these couple of convenience features could be made optional.

There are a number of the dependencies in reqwest's dependency tree that I can nod along and say "yeah, I can see why a convenient library for sending HTTP requests and parsing replies would need that"; URL parsing which then depends on IDNA which depends on some various unicode crates, cookie store, some encoding libraries, http, hyper, h2, and then a collection of basic ecosystem libraries like log, bytes, and so on (though there are also a couple of other basic ecosystem libraries that there isn't a settled answer for, like error handling, leading to both error-chain and failure showing up in the dependency tree. But there are also a lot that seem extra, and even when you take into account that if you have a general-purpose async HTTP implementation you will need some kind of runtime to run it while providing a synchronous API, this particular runtime seems to be way bigger and pull in more dependencies than is really necessary for the purpose.

I think that there are also some real issues of reqwest indirectly depending on many more things than it really should, and merging back into one crate may make that a bit less visible, while as it is is can make it a little easier to find and address those issues one at a time, if anyone is motivated to do so.

There are some places where merging might make sense, such as the crates in the rt-all feature of Tokio which seem to be very frequently used together and I'm unclear if they can really be meaningfully used independently, but I think that merging everything into one crate would just obscure some of the sources of pulling in too many dependencies.

Also, while the original comment in this thread indicates that the tokio crate is just intended to be used by applications, and sub-crates by libraries, it looks like both hyper and reqwest are libraries depending on tokio, so it looks like either that intent hasn't been communicated or there's some issue with using the sub-crates independently; they also happen to depend on a number of sub-crates, so I don't know why without further investigation they also have to depend on tokio itself.

Finally, I think one of the bigger wins may be some better way of counting or visualizing dependencies, which takes into account sets of dependencies that all come from the same source. If some piece of code is coming from a feature or a sub-crate developed in the same repository, the only main difference is that it can be more easily used and semantically versioned independently if it's a separate crate; but it will show up quite differently in a dependency graph, leading to the feeling of bloat; and some of what I think feels like the perceived bloat is also the perceived number of sources that need to be audited or people and organizations that need to be trusted.

If the notion of multiple sub-crates actually all coming from the same parent project/workspace/repository were more prominent in some of these tools like cargo tree or other tools used to count dependencies, it might not over-count in places like this where the separate crates are really just a more convenient way of organizing code within a single logical project, rather than separate projects which need to be each evaluated independently.

Since this was long, in summary:

  • There is definitely value in preserving the independent crates, though there might be a few where that value is unclear
  • The current tooling might lead to perception of more dependency bloat than there really is
  • There also really are some places where this dependency tree could be trimmed down, and with the current tooling, separate crates can help to be able to more easily track that down in a way that just having one monolithic crate with features might be a little harder to do (though the tooling could be written for that as well, if it doesn't already exist)
@oconnor663

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2019

With feature flags there might be a risk that you don't include the ones you need, because other dependencies of yours happen to include them, leading to possible breakage down the road when your dependencies change the flags they require. Maybe that's not a big problem since it's easy enough to fix after it happens? But it would be more of a problem for newer coders who might not understand why they're getting errors.

@vi

This comment has been minimized.

Copy link

commented Jul 17, 2019

Another concern is forking and [patch]/[replace]-ing Tokio components.

With single crate, entire Tokio needs to be replaced instead of e.g. just tokio-tcp.

@magnet

This comment has been minimized.

Copy link

commented Jul 17, 2019

I understand the maintainability concerns but I don't get the issue with the amount of dependencies. Currently someone can depend on tokio which re-export common crates, so one does not have to manually deal with many dependencies. In general I appreciate the modularity of the ecosystem and I feel using features to achieve the same end is less elegant and practical/discoverable for users.

Now, the development burden is a good reason to go for a single crate, but as a Tokio user I feel the current solution is practical. It lets libraries depend only on relevant crates and binaries can easily pull tokio.

Regarding @kpcyrd's comment on packaging tokio in Linux distros, I don't really see how this helps until Rust has a fixed ABI and it seems current decisions shouldn't be based on such long term prospects. I also don't see Linux distros package Rust librairies like they package C/C++ lib headers or Python modules, because Cargo handles that much better.

@kornelski

This comment has been minimized.

Copy link

commented Jul 17, 2019

I'm not sure if Cargo features are robust enough for this.

For example, in practice default-features = false is almost impossible to use, because any dependency anywhere in the dependency tree that just includes tokio = "1" will silently bring all the default features back.

Rust gives poor error messages when user forgets to add a feature flag. It just prints that the thing doesn't exist, but the docs say it exists! Super confusing.

@kornelski

This comment has been minimized.

Copy link

commented Jul 17, 2019

number of crates pulled in when depending on Tokio

In what way? I can see two sides:

  1. User has to add multiple dependencies, so it's a chore to add multiple entries to Cargo.toml, open multiple docs pages, etc.

  2. Compilation lists lots of stuff, so it feels "bloated"

The first case could be fixed by still having separate crates, but also offering a top-level crate that groups and re-exports all of them. Users would add tokio-kitchen-sink to their projects and use all components from there.

The second is not a real problem IMHO, but merely a perception of a problem. The amount of code compiled will be similar either way (or even worse, given default-features=false unusability and limited parallelism in rustc).

I've got a feeling that there's a group of new Rust users who come from languages with either huge stdlib (so nobody needs to use dependencies), or languages where dependencies are a pain (so everybody avoids using dependencies), so they're shocked how nonchalantly Rust/Cargo uses deps. But for Rust that's fine, so the real problem is communicating to users that they shouldn't be worried when the compilation step prints many lines of "Compiling X".

@saethlin

This comment has been minimized.

Copy link

commented Jul 17, 2019

Compilation lists lots of stuff, so it feels "bloated"

I think users (including me) are frustrated by how fast compile time grows as we add dependencies, and the size of the dependency graph is an easy target for complaint. The presence of all the tokio crates and multiple versions of all the rand crates in an ostensibly single-threaded program quickly add credibility to blaming the compilation time on the size of a dependency tree. It would be interesting to assess how end-user compilation time is altered by the suggested changes.

@ehiggs

This comment has been minimized.

Copy link

commented Jul 17, 2019

I think this is an XY problem. The perceived bloat is solved by binary packages in cargo/crates.io and caching it close to the CI instance.

@saethlin

This comment has been minimized.

Copy link

commented Jul 17, 2019

I'm not talking about CI.

@Mathspy

This comment has been minimized.

Copy link

commented Jul 17, 2019

I am personally in the happy with the current situation boat. The arguments against many dependencies usually boil down to three metrics:

  • Too many maintainers: This doesn't change by merging the Tokio crates because they are already maintained by the same people
  • Too much code, takes too long to compile: This is strictly worsened by monolithic dependencies due to advantages of parallel compilations being evaporated. Not to mention this complaint is sometimes naive in not seeing the complexity of a functionality and thinking it could be trivially reimplemented inside the binary itself (which would result in gigantic inline dependencies that just moves compilation time to the binary instead of libraries)
  • "Too many crates"?: This one is always much more vague and in my opinion the least meaningful metric as compiling 20 crates of approximately same size as 1 crate will have around same time (if not better). And the feeling of bloat is often nothing more a false feeling

If maintaining all the smaller tokio-* crates has proved itself to be a challenge and merging them brings a benefit to maintainers and maintenance, I'd be highly in favour but the other reasons I personally disagree with

Note: if managing separate versions of the crates and which depends on which is indeed the main motive for this change there are automation tools that levitate or minimize that burden while keeping all the benefits of current approach

@ehiggs

This comment has been minimized.

Copy link

commented Jul 17, 2019

@saethlin, my point was more general to the issue being addressed by the RFC: users perceive bloat in tokio and the RFC here is attempting to mitigate it by making an uber crate that wraps everything up. Aside from creating a new project, CI for a project depending on tokio, and actually working on tokio itself, when do you need to build tokio?

@kpcyrd

This comment has been minimized.

Copy link

commented Jul 17, 2019

Regarding @kpcyrd's comment on packaging tokio in Linux distros, I don't really see how this helps until Rust has a fixed ABI and it seems current decisions shouldn't be based on such long term prospects. I also don't see Linux distros package Rust librairies like they package C/C++ lib headers or Python modules, because Cargo handles that much better.

Those are two separate issues. If you use micro libraries dynamic linking would imply loading >100 .so's into the process which is a non-zero-cost abstraction. With "C sized" crates the unused code would be LTO'd anyway. This is unrelated to distros.

The very real problem with distros is the review process for new packages. If rand decides it's going to need 5 more rand-* crates we need to get them all reviewed and approved. To upload the new crates we need to update rand-core first which breaks the existing dependency tree. Updating rand is generally a non-trivial effort that takes multiple weeks (up to months).

@AZon8

This comment has been minimized.

Copy link

commented Jul 17, 2019

Now this sure sounds like a good thing to me, but is it known if any projects only pull in a small fraction of the tokio-* crates? It's always been my impression from looking at dependency graphs that they don't. I'm hoping someone has a good idea on how to collect data for this apart from manually auditing dependency graphs, which is the best I know how to do.

Starting at https://crates.io/crates/tokio/reverse_dependencies i've listed the numbers of reverse dependencies on crates.io
Hopes this help

tokio #dependent
tokio 608
core 381
buf 6
codec 114
current-thread 22
executor 51
fs 23
io 318
reactor 43
signal 22
sync 16
tcp 59
threadpool 36
timer 112
tls 61
udp 12
uds 42

Some additional random thoughts.

  • I prefer seeing feature flags over dependencies when editing Cargo.toml. ( serde-derive )
  • Ease of contributing should be a high priority. What @carllerche sounds tedious, but it also depends on the rate of API changes.
@ipetkov

This comment has been minimized.

Copy link
Member

commented Jul 17, 2019

One thing I forgot to consider earlier: can cargo handle different feature flags across dependencies and dev-dependencies? (at least it wasn't able to in the past...)

A common pattern I've seen is libraries depending on the minimal tokio functionality they need, while pulling in all bells and whistles during testing. If cargo cannot support enabling extra dependency features during testing, crates may end up depending on the entirety of tokio

@rpjohnst

This comment has been minimized.

Copy link

commented Jul 17, 2019

  • Too much code, takes too long to compile: This is strictly worsened by monolithic dependencies due to advantages of parallel compilations being evaporated.

@Mathspy This is not necessarily true- rustc itself does parallelize compilation within a single crate, while cargo doesn't (yet) compile dependency chains in parallel. So depending on the crate graph things could go either way- it would have to be benchmarked, and it will change over time as the tools improve.

But this is still a relevant point- one of the reasons people complain about the number of dependencies is that it's a proxy for "compilation is slow," and simply merging them will not really change things there. The only way to fix that is to compile less, and simpler, code.

@kpp

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2019

And we will get a bloat of dependencies if you use tokio for codecs only in the library and tcp/udp in tests, because cargo will combine two features together:

[dependencies.tokio]
version = "0.3"
default-features = false
features = ["codec"]

[dev-dependencies.tokio]
version = "0.3"
default-features = false
features = ["codec", "tcp", "rt-full"]

It will compile tcp and runtime even for a non-test builds for the end users of a library.

@Nemo157

This comment has been minimized.

Copy link

commented Jul 17, 2019

It will compile tcp and runtime even for a non-test builds for the end users of a library.

Not if they are pulling this library from crates.io, dev-dependencies features are only merged in when a crate is inside the current workspace or a path dependency.

@kpp

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2019

dev-dependencies features are only merged in when a crate is inside the current workspace or a path dependency.

Good to know. Thanks

@kornelski

This comment has been minimized.

Copy link

commented Jul 18, 2019

cargo-crev reviews and cargo-audit security advisories are per crate, and don't take features into account. If tokio keeps crates separate, and there's an issue with one of the less often used components, these will affect fewer users.

@Mathspy

This comment has been minimized.

Copy link

commented Jul 18, 2019

@rpjohnst Oh! I see, thank you for the clarification!

@kpcyrd

This comment has been minimized.

Copy link

commented Jul 18, 2019

cargo-crev reviews and cargo-audit security advisories are per crate, and don't take features into account. If tokio keeps crates separate, and there's an issue with one of the less often used components, these will affect fewer users.

That's a tradeoff with runtime overhead for a tooling problem though. It doesn't improve security, only binaries that actively run the vulnerable code are affected in both cases.

@qm3ster

This comment has been minimized.

Copy link

commented Jul 19, 2019

To me, the most important factors are:

  • Ease of contribution (for first time contributors and for everyone)
  • Build time
  • Build size

It seems that this change makes the first one harder, and allows the latter to grow with lesser negligence than the current arrangement.

Why should monolith v0.2 depend on monolith v0.3 instead of them both depending on unchanged-type v0.1?
The superficial issue of "bloat" in a number of tiny packages from the same reputable organization is replaced by the more real issue of "churn" of unchanged types and code being republished over and over again.

@theduke

This comment has been minimized.

Copy link

commented Jul 24, 2019

There are valid arguments for and downsides to both approaches.

Since many of the comments have been pro multi-crate, I'll add some for the single crate approach (which is my preferred solution).

Reviews

I work with multiple companies that require each dependency to be reviewed. Each version must be signed off by an employee for production use. Splitting libraries like tokio into multiple crates makes this a lot more work. The impact is smaller on the initial review because you need to look at all code anyway, but jumping around between different crates still makes this harder than with a single crate. You need more understanding of the architecture and boundaries between the crates, and just need to keep more things in your head.
Constant version bumps across many different crates also increase the workload here and just generally drive up the complexity of the process, leading to annoyed maintainers.

I also know companies that have policies like All dependency releases must be screened for bug fixes and fixed vulnerabilities within 1 working day.. Doing this across many crates with constant releases is definitely more taxing and time consuming. (Reading 1 changelog vs 7 ...)

Maintenance burden

Even without mandatory review requirements as above, more crates invariable lead to higher velocity of change. More releases, more CHANGELOGs to read, more version bumps, more chances for a the inevitable bugs to cause a problem and more friction in the entire ecosystem.

Breaking changes are especially bad in this context.

While this also makes it easier to get changes out instead of consolidating to a single-crate release, I'm wondering if this kind of velocity is actually desirable for such an important low-level building block - assuming a certain stability and maturity of the the codebase.

It also increases the chance of multiple versions of a crate to sneak in to your build, which is always suboptimal (build time, inconsistencies, ...) and leads to the often annoying process of finding out why and fixing it - usually with a PR for another dependency.

Contributing

@qm3ster mentioned that a single crate would make contributing harder. I'm curious why that is.
Personally, I've always found it much easier to understand and contribute to a single crate, rather than something that is split across many crates.
It leads to jumping around multiple crates and following re-exports to gain a full understanding.
It also leads to potentially having to change multiple crates and having to be more aware of the public API boundaries of each crate + extra effort to avoid a breaking change.

A single crate is much better for understanding the code and first time contributing IMO.

Build Performance

There have been multiple claims for better build performance with multiple crates, due to parallel compilation. This claim really needs some substantiation with measurements.
I'd imagine that the benefit is limited to the first check/incremental build run. Multiple crates might very well create more work for the final build steps and LLVM vs a single crate. The first build is important for CI and things like cargo install, but I'd argue that the final build is much more important for daily dev workflow.

Subjectively, I remember build times being better before the split up in tokio and futures. This of course might just be because the stack has grown in complexity and gained more features.

But the point is: including performance in a decision would need some validating benchmarks.

Most Common Use Case

@AZon8 posted some numbers for how tokio is used on crates.io.

Most applications using tokio are private and not on crates.io, so crates-io data actually make dependencies on sub-crates much more likely than total real world use due to the emphasis on libraries and building blocks vs full applications.

I think it makes sense to optimize for the most common use case, assuming that more selective usage is still possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.