Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out "slow" language server features to seperate evaluator #181

Closed
DavyLandman opened this issue Sep 13, 2022 · 10 comments
Closed

Split out "slow" language server features to seperate evaluator #181

DavyLandman opened this issue Sep 13, 2022 · 10 comments

Comments

@DavyLandman
Copy link
Member

Is your feature request related to a problem? Please describe.
As we write bigger LSP DSLs, the slowest calls (most often the typechecker) lock down all the other small fast calls.

Describe the solution you'd like
While rascal still has a GIL, I would like to have a way to split of some of the work to a dedicated evaluator. The rascal-lsp side already does this, it has 3 evaluators, one for outline, one for making summaries, and one for running the rascal-core checker.

I foresee the following options:

Option A: Always start second evaluator for summary
Automatically start a second evaluator, and run the main twice, the second evaluator only keeps the summary closure.

pros:

  • This has the benefit that the end user doesn't have to think about it.
  • when we are in compiled land we might be able to drop it without anyone noticing

cons:

  • a memory penalty over the whole runtime
  • performance penalty at start time (load two evaluators in parallel).
  • It might confuse users that the summary call is not running in the same environment as the rest.

Option B: Extend LanguageServer ADT, let user choose
Add a specific constructor to the LanguageServer adt (or common keyword parameter that is always false) that allows users to configure certain callbacks to be moved to a second evaluator. only when that one is set do we actually start a second evaluator.

pros:

  • Users have to think about it, no surprises.
  • performance penalty is less severe, only when needed, and in that case, not in parallel but sequential

cons:

  • takes longer to get the summary up and running
  • the original load still imports all the modules that are only needed for second evaluator (so more memory pressure)
  • It might confuse users that the summary call is not running in the same environment as the rest.

Option C: Extend registerLanguage, add concept of second/slow module
Add extra (optional/overloaded) params to registerLanguage that points to a secondary module. Both modules are loaded into a different evaluator, and the LSP tasks are split up based on the services provided in each module.

pros:

  • We can split functionality based on imports, allowing for faster imports & less memory pressure
  • More clear separation of what happens in which environment

cons:

  • requires more up front work of the developer to make a clean split
  • harder to unify/undo when we remove the GIL.

Discussion

I think option C is the cleanest, but might also "lock" us into a pattern that we don't need for the compiled version of rascal.

@jurgenvinju
Copy link
Member

jurgenvinju commented Sep 14, 2022

I understand the need for this, and we have to find some solution. None of the above are acceptable in the long run.

The problem is that we break the semantics of Rascal in ways that users can not see for options B and C, in particular the semantics of closures and memoization, and the values of globals. Probably some other stuff that we take for granted but will break when we decide to split things over multiple evaluators that have been written together.

The only thing that seems to make sense is the second/slow module option C; because this implies loading that module in its own state with all of its own semantics, independent of other modules. But this one is, indeed, ugly and unnecessary for the future.

How about option D, where we allow different language implementations to register for the same language. So in that sense we keep the registerLanguage silo like option C, for which it is implied that it runs in its own evaluator. We add to the LSP multiplexer that all registered implementations have to react for every LSP call, and we merge the effects in a reasonable way where necessary or we complain if two languages provide the same service in an incompatible way.

Option D is a useful feature to have in general and for the future, because it makes LSP implementation more modular. Other providers can add features that an original provider did not have without having to break into the existing code. And as a side-effect, it fixes our current interpreter-lock problem.

@DavyLandman
Copy link
Member Author

I like the extension opportunity, it would make the code the multi-plexer even more annoying, but that could work. Option D would make it hard to override existing registrations. So if you are developing a new language, you might want to update the registration, but now it doesn't overwrite it anymore, it just adds it to the list. I like

If we make the second/slow module an overload (for option C) we can deprecate that in the future? So that the original function doesn't have to change?

@DavyLandman
Copy link
Member Author

For option D: we would have to think about the parsing aspect, will they share the same tree or not?

@jurgenvinju
Copy link
Member

I would say not; otherwise you'd have to start synchronizing between the servers, which is what we don't want. They can of course easily share a grammar and create their own parser from this.

@jurgenvinju
Copy link
Member

parsing is fast so I'd say the re-work is a non-issue.

@jurgenvinju
Copy link
Member

Option D would make it hard to override existing registrations.

yes, we need something for that. But if we use the module name as a key, then it would be as simple as that.

@jurgenvinju
Copy link
Member

If we make the second/slow module an overload (for option C) we can deprecate that in the future? So that the original function doesn't have to change?

I'd say we change the semantics of registerLanguage to store not only on extension but also module name. Nothing has to change in the interface, but if the user calls registerLanguage with the same extension and a different module, then the language contributions are added to a new LSP server instance rather than overwriting the existing one.

@DavyLandman
Copy link
Member Author

yes, we need something for that. But if we use the module name as a key, then it would be as simple as that.

true, module name + function name would be good enough. It still won't clear out old registrations. Maybe you've renamed a module? I think we'll need a unregisterLanguage or something just to allow a user to explicitly clear all state.

parsing is fast so I'd say the re-work is a non-issue.

Well, we are parsing a lot, but okay, that's maybe a sacrifice we can accept.

@jurgenvinju
Copy link
Member

unregisterLanguage... we had that in Eclipse but we forgot about it here. Definitely something to add. Let's do void unregisterLanguage(str extension, str moduleName="") and let it unregister all modules if moduleName is not provided?

DavyLandman added a commit that referenced this issue Sep 15, 2022
Option D discussed in #181 with exception that the parser are shared
@DavyLandman
Copy link
Member Author

This design is finished now. Closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants