New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multi-threaded use #20

Closed
robinst opened this Issue Oct 7, 2016 · 29 comments

Comments

Projects
None yet
6 participants
@robinst
Copy link
Collaborator

robinst commented Oct 7, 2016

Background: I'm trying to write a service that receives source code via HTTP, parses it using syntect and then returns some information about it. I'm using Iron for it, but this issue would also occur with other uses. In Iron, the Handler trait which handles requests is required to be Send and Sync, meaning it can be reused between threads.

So, the simplest implementation is to just call SyntaxSet::load_defaults_nonewlines() on every request, then figure out which SyntaxDefinition to use, and then parse the text. Unfortunately, this means that the syntax definitions have to be loaded for each request, which is quite a bit of work that could be avoided.

You can see where this is going. I'd like to have a way to only load SyntaxSet once at startup and then share it between all the request handlers. Or if that's not possible, at least share the SyntaxDefinition between threads (and have a mutex protecting SyntaxSet).

If we try to write that with the current syntect, it would look something like this:

#[test]
fn threading_with_shared_syntax_set() {
    let syntax_set = SyntaxSet::load_defaults_nonewlines();
    let syntax = syntax_set.find_syntax_by_name("rust").unwrap();

    thread::spawn(|| {
        let parse_state = ParseState::new(&syntax);
        let mut f = File::open("testdata/parser.rs").unwrap();
        let mut reader = BufReader::new(f);

        for line in reader.lines() {
            let line = line.unwrap();
            parse_state.parse_line(&line);
        }
    });
}

And it currently fails to compile like this:

[101] % cargo test threading
   Compiling syntect v1.0.0 (file:///Users/rstocker/Projects/rust/syntect)
error[E0277]: the trait bound `std::rc::Rc<std::cell::RefCell<syntect::parsing::syntax_definition::Context>>: std::marker::Sync` is not satisfied
  --> tests/threading.rs:14:5
   |
14 |     thread::spawn(|| {
   |     ^^^^^^^^^^^^^
   |
   = note: `std::rc::Rc<std::cell::RefCell<syntect::parsing::syntax_definition::Context>>` cannot be shared between threads safely
   = note: required because it appears within the type `std::option::Option<std::rc::Rc<std::cell::RefCell<syntect::parsing::syntax_definition::Context>>>`
   = note: required because it appears within the type `syntect::parsing::SyntaxDefinition`
   = note: required because it appears within the type `&syntect::parsing::SyntaxDefinition`
   = note: required because of the requirements on the impl of `std::marker::Send` for `&&syntect::parsing::SyntaxDefinition`
   = note: required because it appears within the type `[closure@tests/threading.rs:14:19: 23:6 syntax:&&syntect::parsing::SyntaxDefinition]`
   = note: required by `std::thread::spawn`

I tried to work around this with some unsafe code, but it panics (as expected) when multiple threads use the same SyntaxDefinition.

As I understand the code currently, it is structured the way it is to avoid having to compile all the regexes eagerly. I wonder if there's a simple way to preserve that while still keeping that nice property. Alternatively, it would be cool if there was a way to precompile a SyntaxDefinition and get a new type that is immutable and thus Send and Sync. By the way, apart from this the library works beautifully :).

@robinst

This comment has been minimized.

Copy link
Collaborator Author

robinst commented Oct 7, 2016

Actually, I now see that SyntaxDefinition is Clone. I thought I tried that but maybe I made a mistake and actually cloned the reference? (I'm still pretty new to all this, sorry for the noise if this issue is invalid :).)

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Oct 7, 2016

It's not simple to just make things thread-safe while maintaining speed because there's a bunch of lazy regex-compiling and possibly some other hidden mutation going on that would race.

The correct way to do what you are trying to do is to load the SyntaxSet once and put it behind a Mutex (and possibly also an Arc) then when handling a request lock the mutex and use the state.
If you want to do things in parallel you can create 4 or so SyntaxSets in a pool so that each one is only being used by one thread at a time but multiple threads can highlight in parallel.

I really need to document how things work more, because I use Weak, which means that some things that the type system says are fine will actually crash at run time. SyntaxDefinitions have references to other syntax definitions, like ERB has a reference to Ruby so that it can highlight embedded tags. So cloning SyntaxDefinitions may crash when you try and highlight embedded code.

@robinst

This comment has been minimized.

Copy link
Collaborator Author

robinst commented Oct 12, 2016

For longer running server processes, paying the price for compiling the regexes on startup is well worth it. I wonder if the current approach with laziness could be made thread-safe by blocking on the regex compilation. If that would be possible without imposing too much of an overhead for the single-threaded case, that would be awesome. I'll add it to my list of interesting projects :).

In the meantime, I've come up with two solutions for my use case (documenting them here in case anyone else runs into the same problem). The first one is to use thread-local storage:

thread_local!(static SYNTAX_SET: SyntaxSet = SyntaxSet::load_defaults_newlines());

// and later:
SYNTAX_SET.with(|syntax_set| ... );

The problem with that is that Iron uses quite a few threads (64 on my machine), which means they'll all get the same syntax definitions loaded. And from testing it takes at least 64 requests for it to have used all threads once.

Then I did the thread pool where each thread has its own SyntaxSet. Here's the result: https://gist.github.com/robinst/faff19b13915373c1ac7ed2a2946639d

Do you have an idea how it could be simplified? I tried some other things but always ended up needing unsafe.

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Oct 12, 2016

Hmmm yah that is ugly.

This all comes back to Rust's poor support for cyclic data structures, and how I seem to have chosen the wrong method of dealing with it. I should have used an arena but instead I used Rc<RefCell<>> and Weak.

The way I thought multithreading should work is that instead of creating worker threads you create a crossbeam::MsQueue and push as many SyntaxSets as you have cores and when you want to highlight you do:

let ss = queue.pop() // blocking
ss.dowork()
queue.push(ss)

But that doesn't work since SyntaxSet uses Rc which isn't Send. Even though in this case it is perfectly safe since only one thread is playing with the reference counts for a given object at a time. The Rust type system doesn't know this. An arena-based approach would handle this fine though.

Also just fixing the regex initialization isn't enough since the stuff still has to be mutable because it has to be mutably linked up during creation of the SyntaxSet. Immutable data structures can't be cyclic, which this needs to be. Unless you use an arena in which case you can just switch from a mutable to an immutable borrow after creation.

@robinst

This comment has been minimized.

Copy link
Collaborator Author

robinst commented Oct 12, 2016

I realized that I can also use thread_local combined with the threadpool crate to have a simpler version of my second solution. Still need to actually code it, but I think it would work.

I've read about arenas with early Rust, but apparently they're not in the standard library nowadays. Do you think you could keep the current structure but use Arc and RwLock and co instead of non-threadsafe Rc?

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Oct 13, 2016

I could but its a tradeoff and I'd prefer to have absolute maximum performance for text editors like https://github.com/google/xi-editor that use syntect. Especially given that there is a way (albeit one that's hard and not at all nice) to use syntect from multiple threads right now, but if I switch to Arc there will be no way short of forking the library for other users to get that performance back.

Refactoring to use an arena is the best of both worlds, but I personally don't have the time and motivation to do so right now.

@robinst

This comment has been minimized.

Copy link
Collaborator Author

robinst commented Dec 12, 2016

Thanks for your time, closing this now. For others running into this, I've gone with thread_local and the threadpool crate now. Basically I have this:

thread_local!(static SYNTAX_SET: SyntaxSet = load_syntax_set());

And then I use it in a thread pool like this (thread_pool is a Mutex<ThreadPool>):

let (tx, rx) = mpsc::channel();
{
    // Only lock pool for submitting the task. Release lock before we block on the result.
    let pool = self.thread_pool.lock().expect("Unable to lock ThreadPool");
    pool.execute(move || {
        let result = SYNTAX_SET.with(|syntax_set| use_syntax_set(&syntax_set));
        tx.send(result).expect("Unable to send result");
    });
}
let result = rx.recv().expect("Unable to receive result");

@robinst robinst closed this Dec 12, 2016

@robinst

This comment has been minimized.

Copy link
Collaborator Author

robinst commented Dec 12, 2016

Oh, I've also seen this thread which was interesting: https://users.rust-lang.org/t/help-me-reduce-overhead-of-regex-matching/5220/28?u=robinst

The excellent regex crate had the same problem, wanting to stay fast for the single-threaded case but also allow easy multi-threaded usage. Maybe it would be worth to have a look at how it's implemented nowadays.

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Dec 31, 2016

I'm reopening this for visibility since it is still an issue that it is so difficult to use from many threads and this issue contains valuable info that people should see.

See #29 where this thread would have been useful.

@trishume trishume reopened this Dec 31, 2016

@ojensen5115

This comment has been minimized.

Copy link

ojensen5115 commented Jan 1, 2017

@robinst What are the advantages of using the threadpool crate in the manner that you're using it? I'm using Iron as well, and what I currently have is:

thread_local! {
    static SYNTAX_SET: SyntaxSet = SyntaxSet::load_defaults_nonewlines();
}

Then later in the threaded context (i.e. a function called by a Handler):

fn highlight(....) {
    SYNTAX_SET.with(|ss| {
        let syntax = ss.find_syntax_by_extension(lang).unwrap_or_else(|| ss.find_syntax_plain_text());
        [...]
    })
}

This appears to work without any issues, although I'm relatively new at this and may of course be missing something important.

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Jan 1, 2017

@ojensen5115 I think that works fine if Iron itself uses a thread pool. If Iron does what some other web servers (in other languages) do which is create a new thread for every request, then you'll pay the cost of loading the SyntaxSet a lot.

For web servers that already use a thread pool for you I think that looks like a great solution. But, I would check if Iron uses a thread pool, it probably does but I'm not certain.

@ojensen5115

This comment has been minimized.

Copy link

ojensen5115 commented Jan 1, 2017

According to Iron docs here, Iron does indeed use a threadpool (defaulting to 8 * num_cpus). Thank you both for all your help!

@robinst

This comment has been minimized.

Copy link
Collaborator Author

robinst commented Jan 4, 2017

@ojensen5115 Yeah, in case of Iron that happens to work.

@Keats

This comment has been minimized.

Copy link
Contributor

Keats commented Mar 9, 2017

Just ran into this (at runtime :() parallelizing some markdown rendering with Rayon.

@trishume Do you have time to detail what you have in mind for the refactoring to Arena?
It could also be mentioned on https://users.rust-lang.org/t/twir-call-for-participation/4821/24
This way someone (hell I might even have a look myself but I should probably read about what an Arena is first...) could pick it up, it would be amazing to have it thread-safe by default!

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Mar 9, 2017

@Keats sure:

Basically it comes down to syntect using Rc<RefCell<Context>> in a bunch of places. The basic way things work right now is it loads a whole bunch of syntaxes from files or a dump, then it does a "linking" step which connects references in the syntaxes together with Weak<RefCell<Context>> references. These references can in theory be cyclical, so I can't do it with normal borrows, which necessitates the Rc and Weak. They also have to be mutable after they are created, for the linking phase, and for lazy regex compilation, which is the reason for the RefCell.

One problem with this is that it is easy to accidentally drop other SyntaxDefinitions without realizing that will cause the Weak references to disconnect and embedded definitions to stop working. Say you use SyntaxSet::load_defaults_nonewlines() and then you only need to highlight HTML so you extract the HTML &SyntaxDefinition, clone it, and then drop the SyntaxSet. You'll be left with an HTML syntax definition that appears to work but fails upon encountering a <script> tag since the Weak references to the Javascript syntax broke.

The other problem is that all the RefCell manipulation makes the code ugly. I don't think the counters affect performance directly much but they way I have to structure the code around them does make things a bit slower I think.

The alternative is to use a mutable arena of Contexts. This could be as simple as a Vec<Context>. Instead of Weak<RefCell<Context>> and Rc<RefCell<Context>> the syntaxes would link to contexts by indexes in the arena. This refactoring would require rejiggering all the references, and then passing an extra &mut ContextArena parameter around everywhere. I'm not entirely sure how difficult it would be. A lot of code would change, but it would probably be pretty mechanical, I can't guarantee there aren't tricky cases I haven't thought of though. In theory you could use some arena crate or petgraph (only the indices, not the edges parts) for the arena, but I don't see them adding much over just a Vec.

The problem is this still doesn't fully solve multithreading, only makes it tractable. The real problem after switching to an arena is lazy regex compilation. Lazy regex compilation is necessary for reasonable startup times because regex compilation is slow. Most of the time a process only uses a few of the many syntaxes and often only a few of the regexes within those. The problem with lazy regex compilation for multithreading is that highlighting actually mutates the syntax structure. One way to solve this is to separate the regex cache out into a per-thread data structure, but that's basically equivalent in difficulty to the current solution of just loading the SyntaxSet once per thread, so isn't much of a gain. Note that I'm not sure the way that the onig compiles Oniguruma with thread safety and even then I don't know if using Oniguruma with thread safety enabled reduces performance a lot.

One possibility is doing something similar to what the regex crate does and having a SyntaxSet manage its own pool of regex caches. If you keep calling it from the same thread it will re-use the same regex cache with low overhead, but if you call from multiple threads it will create multiple regex caches and dole them out automatically with some overhead. But the decision of which regex cache to use would be at higher level calls so it wouldn't incur synchronization overhead at too high of a frequency. The regex cache would be something like a HashMap<*const MatchPattern, Regex, BuildHasherDefault<FnvHasher>>. The parser already does some lookups in similar hash tables so I don't think the overhead of looking the regex up in one would affect performance much.

Basically because of regex issues, I'm not sure there is a solution substantially better than just loading multiple SyntaxSets as discussed above. An ergonomic solution like the one in the regex crate is possible, but would require a major refactoring, and then further work.

@trishume

This comment has been minimized.

@Keats

This comment has been minimized.

Copy link
Contributor

Keats commented Mar 10, 2017

Would it make sense to have a PoolSyntaxSet in syntect that does the pooling directly, instead of having users of the crate to write their own?

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Mar 10, 2017

@Keats that definitely sounds like a reasonable compromise. Good idea.

@Keats

This comment has been minimized.

Copy link
Contributor

Keats commented Mar 13, 2017

@trishume any preference on the implementation?
I don't think I will have time this week but maybe next week, so if anyone want to pick it up please do!

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Mar 13, 2017

@Keats I think something like a ParallelSyntaxSet that uses the thread_local crate would work. It would just provide wrappers around the find_syntax_by_* functions that get a new SyntaxSet for the current thread.

The trick is initializing new syntax sets. Currently there's no way to copy a syntax set across threads, so it would have to be loaded from scratch on each thread. This isn't so bad when loading from a default dump, but isn't great if loading in another way. But I think most people that want multithreading (services/websites/tools) are different than the people who want customizable loading (text editors), so I think only allowing ParallelSyntaxSet to be limited to loading one of the two default sets is fine.

@Keats

This comment has been minimized.

Copy link
Contributor

Keats commented Mar 21, 2017

So something like:

#[derive(Debug)]
pub struct ParallelSyntaxSet {
    init_fn: Fn() -> SyntaxSet,
}

impl ParallelSyntaxSet {
    pub fn new(init_fn: Fn() -> SyntaxSet) -> ParallelSyntaxSet {
        ParallelSyntaxSet {
            init_fn: init_fn,
        }
    }

    pub fn find_syntax_by_token(&self, s: &str) -> Option<SyntaxDefinition> {
        self.init_fn().find_syntax_by_token(s).cloned()
    }

    pub fn find_syntax_plain_text(&self) -> SyntaxDefinition {
        self.init_fn().find_syntax_plain_text().clone()
    }
}

?
I'm not sure where thread_local fits in, it requires Send which can't be impl for SyntaxSet.

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Mar 21, 2017

@Keats no that wouldn't work because clone on a SyntaxDefinition doesn't produce something usable from multiple threads because it still has Rc pointers to the same other syntaxes.

That's weird that the thread_local crate seems to require send. I think my understanding of what it does must be flawed because I thought the whole purpose was to wrap types that aren't Send into a pool that is Send. Not sure what I'm getting wrong there.

I was thinking something like @ojensen5115's thread_local solution earlier in this thread, and the thread_local crate was just a way to have one per object rather than a global static thread_local variable. But one global parallel syntax set pool would probably be fine as well, since they appear to be lazy and won't take resources loading unless you use them.

@Connicpu

This comment has been minimized.

Copy link

Connicpu commented Mar 24, 2017

Have you considered instead of having all of the syntaxes refer to each other directly behind a refcounted pointer, you could just have a Vec of all of your syntaxes, and they can refer to each other by their index in the Vec?

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Mar 24, 2017

@Connorcpu I described that I think that would be a better option in a long comment earlier in this thread. It's just a bunch of work and I don't have time.

@Connicpu

This comment has been minimized.

Copy link

Connicpu commented Mar 24, 2017

Ah, sorry, I missed that!

@williamyaoh

This comment has been minimized.

Copy link
Contributor

williamyaoh commented Jun 6, 2017

@trishume Is the interior mutability on Rc<RefCell<Context>> used only when compiling the regexes for a given syntax for the first time, or is it also used at highlighting time to cache match results or something similar? If the former, it seems like it would be possible to replace ContextPtr et. al. with traits exposing necessary methods for manipulating the inner Contexts; one impl would use Rc<RefCell<Context>> and another Arc<RwLock<Context>>, and SyntaxSet could take a type parameter for which one to use, and then SyntaxSet could be Send + Sync and be easily parallelizable if a library user so chooses without compromising single-threaded performance.

williamyaoh added a commit to williamyaoh/syntect that referenced this issue Jun 7, 2017

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Jun 13, 2017

@williamyaoh it's used at runtime for lazy regex compilation.

That trait idea might work, but I'm guessing it will have substantially worse performance than creating a SyntaxSet pool because of Arc synchronization and locking overhead. I don't expect I'd accept a PR implementing it unless it included benchmarks showing to my surprise that it wasn't dramatically slower. I'd rather either implement a ParallelSyntaxSet like above in syntect or just continue to recommend people use their own pool or thread local implementation.

williamyaoh added a commit to williamyaoh/syntect that referenced this issue Jun 17, 2017

@williamyaoh

This comment has been minimized.

Copy link
Contributor

williamyaoh commented Jun 17, 2017

Okay, I've run into some roadblocks trying to get the library to work with RwLock anyways, so I'll move in that direction.

I added a new type SyntaxSetPool which, unsurprisingly, does thread pooling for SyntaxSets. It didn't seem correct to name it ParallelSyntaxSet since the interface had to be changed to prevent references from leaking across thread boundaries. Will create a pull request.

@trishume

This comment has been minimized.

Copy link
Owner

trishume commented Jun 27, 2017

Closing because of #82, and also see discussion in #78. See #83 for new location for discussing moving to an arena, which seems to be the only thing we can safely do other than recommending rayon and thread_local!. This thread is still linked to from lots of places since it has lots of valuable discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment