Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making regex an optional feature on top of env-filter #1436

Closed
matklad opened this issue Jun 14, 2021 · 10 comments
Closed

Consider making regex an optional feature on top of env-filter #1436

matklad opened this issue Jun 14, 2021 · 10 comments

Comments

@matklad
Copy link
Contributor

matklad commented Jun 14, 2021

regex is a pretty heavy dependency, compile-time wise. In rust-analyzer's crate graph, regex-syntax takes 8 seconds to compile with optimizations, making it the second heaviest dependency (fun fact: the heaviest one is lsp-types, which is just a (big) bunch of derive(Serialize) structs). tracing subscriber with env-filter feature is the only reason for why we pull in regex. It would be nice if env-filter feature (which we do use) worked, in a restricted form, without regex support. A good example here is env-logger -- they do similar filtering, but regex is an add-on on top of that:

https://github.com/env-logger-rs/env_logger/blob/13cafce572362582f57964eae6cb4a41f52fd04a/Cargo.toml#L25

Note that env_logger also enables the perf feature of the regex -- I wonder if it makes sense to do the same, if the regex is opt-in?

EDIT: seems that perf regex feature in env_logger is not intentional, so don't read to much into it: rust-cli/env_logger#154

@davidbarsky
Copy link
Member

I'm personally not opposed to it! That is, if we can separate regex from EnvFilter, which might be a bit tricky.

On an aside, I'm working on an a "new and improved" EnvFilter that tries to address some of the gotchas people hit while also introducing a more expressive filter language. Do you have any what parsing libraries (if any!) you recommend for this?

@matklad
Copy link
Contributor Author

matklad commented Jun 14, 2021

I guess, in the "minimal" configuration, no external parsing libraries should be used. For parsing the grammar of the filter itself, I'd recommend a hand-written parser (sever's one is a good case study: https://github.com/dtolnay/semver/blob/master/src/parse.rs).

As to what the "query language" should be, I bet you want to do filtering using O(1) space, and that means that you end up with regular expressions anyway: constant space is equivalent to regularity of the grammar. But it doesn't mean that you need to start with regexes -- there might be more convenient surface syntaxes. For example, it might be useful to have an explicit swith between full match, prefix match, substring match and regex match. No existing prior art comes to mind though :(

@hawkw
Copy link
Member

hawkw commented Jun 14, 2021

I guess, in the "minimal" configuration, no external parsing libraries should be used. For parsing the grammar of the filter itself, I'd recommend a hand-written parser (sever's one is a good case study: https://github.com/dtolnay/semver/blob/master/src/parse.rs).

We used to have a handwritten parser, FWIW, but it was replaced with a regex-based parser when span value filtering was added (see PRs #22 and #25). Adding syntax for filtering on span field values pushed the hand-written parser's complexity over the line of what was easily maintainable.

One potential option would be to feature flag filtering on span field values, and only require regex in that case. That way, simpler filter strings can be parsed without a parsing dependency, and regex is only pulled in when users want to parse my complex filters.

Another option is to implement a builder-style API for programmatically constructing more complex filters. This has been proposed in the past (see #404), and I'd really like to see it, but I haven't had the time to implement it myself. But, if someone was willing to write an implementation of that, it would have the side benefit of allowing us to have separate feature flags for the EnvFilter type and for parsing EnvFilters from strings --- and we would only need regex for the second case.

@matklad
Copy link
Contributor Author

matklad commented Jun 14, 2021

Hm, if the syntax is complex enough to cause maintenance concerns it might also be prudent to actually write down the BNF grammar for it, irrespective of a specific parsing technique. Judging by the history of he semver crate, parsing approaches are prone to churn,

@hawkw
Copy link
Member

hawkw commented Sep 11, 2021

@matklad you may appreciate PR #1550, which adds a filter that implements only the static, env_logger-like subset of EnvFilter (e.g. no filtering on fields and spans). Since the required grammar is significantly smaller, this can use a very small handwritten parser that doesn't require regex.

@hawkw
Copy link
Member

hawkw commented Sep 13, 2021

I believe #1550 fixes this, since it's now possible to get simple target-based filtering without depending on regex. We probably still want to consider rewriting the EnvFilter parser in the future, but it's now possible to do most of the filtering people want without the regex dep.

@hawkw hawkw closed this as completed Sep 13, 2021
@malobre
Copy link

malobre commented Sep 13, 2021

FYI, here's how I use the new Targets filter:

use tracing_subscriber::{filter::Targets, prelude::*};

tracing_subscriber::registry()
    .with(std::env::var("RUST_LOG").map_or_else(
        |_| Targets::new(),
        |env_var| {
            use std::str::FromStr;
            Targets::from_str(&env_var).unwrap()
        },
    ))
    .with(tracing_subscriber::fmt::layer())
    .init();

@hawkw
Copy link
Member

hawkw commented Sep 13, 2021

we may also want to add a convenience function like Targets::from_env or similar, as well...

@malobre
Copy link

malobre commented Sep 13, 2021

A SubscriberBuilder::with_targets_filter, similar to SubscriberBuilder::with_env_filter, might also be a good addition.

@detly
Copy link
Contributor

detly commented May 10, 2023

It's not just compile times, binary size is affected as well.

I'm trying to use tracing in an embedded Linux application that currently uses env_logger. Because of #2585 I naively tried to move over to using an EnvFilter-based subscriber approach, and did:

tracing_subscriber::fmt()
    .with_env_filter(
        EnvFilter::builder()
            .with_default_directive(LevelFilter::TRACE.into())
            .from_env_lossy(),
    )
    .with_span_events(tracing_subscriber::fmt::format::FmtSpan::ACTIVE)
    .init();

My binary size jumped from 935K to 1.9M as a result of pulling in regex, making it undeployable.

The Targets filter seems like it might be useful, but I cannot figure out how to simply add it to a SubscriberBuilder as I would with with_env_filter().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants