Skip to content
This repository has been archived by the owner on Jan 11, 2020. It is now read-only.

Split Compositor and Awesome compatibility up #556

Closed
Timidger opened this issue Jun 11, 2018 · 17 comments
Closed

Split Compositor and Awesome compatibility up #556

Timidger opened this issue Jun 11, 2018 · 17 comments
Labels

Comments

@Timidger
Copy link
Member

In an effort to be compatible with Awesome, Way Cooler removed its threading and message passing that it used in the past to communicate between Lua code and Wayland. In Awesome there's only one thread, and it mostly works out ok.

However, Wayland is such a different architecture that there have been issues we are running into that make this approach flawed:

  • Lua code can lock up the entire compositor, so nothing is rendered and no input can be processed. This is the biggest flaw by far, and it's something Gnome is also struggling with. To solve this you either need a complicated timeout mechanism or threads/processes.
  • Awesome restarts itself by simply execing itself again. This is fine in X because you are simply another client to the X server and the other clients don't care if you're not there. However, in Wayland the compositor is responsible for the clients and so if it goes down they go down as well.
    • As a side note, Way Cooler historically restarted itself by simply tearing down and reconstructing the Lua interpreter. This is no longer possible due to how Awesome libraries are designed around using glib. You need to tear down the glib loop when you tear down Lua, and you currently you can't do that without tearing down Wayland as well. This can be made to work, but is still fragile because glib (and the bindings to it) seem poorly designed around threading and shared resources anyways.
  • Wayland was designed to be more secure and by allowing arbitrary execution of code within the same context as handles to low level primitives like input devices and DRM handles it makes it even less secure than Awesome in X because you're given a false sense of security. The other clients can't do anything, but Awesome code is essentially a privledged client that is left to run unchecked.
    • To adequately solve this you need to put limitations and changes into the Lua libraries and define a more strict security policy when it comes to what code can be executed. However a necessary first step is separating the trusted "core" code with the untrusted non-core code. This is similar to how modern browsers sandbox plugins by making them escrow their requests via IPC to the browser kernel.
  • Awesome has an amazing test suite. They are able to accomplish this because Awesome itself is just a small client to a more complicated and trusted core. This is much more difficult to test if the compositor and the Lua code are connected. It makes it much more difficult to have a mock compositor to test the Lua code, and mock lua code to test the compositor if they are burned together into the same binary.
    • You could work around this as well, it's not untestable. But if you split them up into two processes you can test them by making a mock of the other and seeing how they communicate back and forth.

Currently we have just one thread, and that makes working on the Awesome side really easy. We know no compositor code is being ran so we can just use the various objects directly without having to send requests to an escrow. Same thing in the compositor code: we can grab the data we need directly from Lua (e.g. grabbing Cairo surfaces and rendering them). The ease this puts on development is really high.

However, the limitations it imposes on performance, extensibility, and testing makes it not tenable in the long term.

Currently Way Cooler already is showing issues not even related to the above:

  1. Startup has a slight lag as the Lua code starts executing
  2. There is a very bad memory leak from somewhere. Like 3 gigabytes in 3 minutes bad.
  3. wlroots-rs and glib weren't designed for thread safety so the original design of having a Lua thread is not very tenable and forced us into this decision of using one thread.
  4. A compositor is very complicated (10k of lines roughly, even with wlroots + Rust). Supporting the Awesome libraries is very complicated as well. These two complex things working in tandem is hard to get right with shared resources. Way Cooler is 4k lines already and doesn't actually do anything.
  5. Working in awesome/ and working in compositor/ already feels like you are working on separate programs so the benifit of having work in awesome/ able to reach across to compositor/ is lost because you need to know the underlying details of both in order to work effectively instead of abstracting it out via message passing. They do very separate, very complicated things, to the point that they should be separate programs.

I don't forsee this change disrupting the goal of being 1-1 with Awesome, but is something that must be done if Way Cooler can be an acceptable alternative to Awesome. Having your entire computer locked up because you accidentally did one while true end is not fun or secure.

This will take much more time to develop though. I forsee using a custom wayland protocol for the two halves to communicated and we'd need to have a nice way to serialized the data that's sent back and forth. That will take time away from actually implementing the compositor or awesome interface.

CCing @psychon and @acrisci since you both have contributed a lot to the two pieces of Way Cooler.

@acrisci
Copy link
Contributor

acrisci commented Jun 11, 2018

Have you considered trying to put events into the wayland event loop from the glib thread?

@Timidger
Copy link
Member Author

@acrisci That would solve the immediant problem of restarting, but none of the performance issues related from putting this all in one process. But that is a good point and is a viable solution if we don't want to go with the two process appoarch.

The main downside I see with the two process approach is that there is more orchestration between the two systems which can get complicated. As an example, when the compositor is notified of a client closing then it must inform the Awesome process that the client is dead. Destruction events are a no-no in wayland protocols that also have requests, so we would need to have IDs or something else shared across the two. This would be hidden from the library user, but crucial for the application to work.

Do you have any thoughts on the design itself? In my opinion this new design is much closer in spirit to Awesome, since currently Awesome is very much a client to the X server. This would be the same thing, we just also control and write the compositing server since this is Wayland.

As already mentioned, I also feel the ability to do integration tests using the IPC and the separation of concerns allowing a tighter amount of sandboxing Lua / the compositor make this new design the correct solution.

@acrisci
Copy link
Contributor

acrisci commented Jun 11, 2018

In my opinion this new design is much closer in spirit to Awesome, since currently Awesome is very much a client to the X server.

Yeah, this is sort of what I'm afraid of. It's like your putting a new display server protocol on top of wayland. That's a big effort and I'm not sure it's something we can ever do well. The biggest challenge with this approach is you have to recreate the memory of the compositor within the lua process through IPC. I think you'd likely need to recreate all of the abstractions of the views and outputs to accommodate that.

And at the end of the day, you actually need to work synchronously with the other process whenever common state changes. For instance, when a window closes, you need to inform lua that the window has closed and wait until it runs all of its handlers to continue or else you are going to have race conditions.

@psychon
Copy link
Contributor

psychon commented Jun 11, 2018

Awesome has an amazing test suite.

I disagree ;-) but that's not important here.

There is a very bad memory leak from somewhere. Like 3 gigabytes in 3 minutes bad.

Huh? I haven't noticed this yet, but I am not running way-cooler much anyway... I'd ask valgrind what is going on, but do not have much time currently, sorry.

Having your entire computer locked up because you accidentally did one while true end is not fun or secure.

Basically the same happens with awesome already. You can still interact with the currently focused window (as long as e.g. no window resize is currently in progress), but if that is your calendar, this does not really help much.

forsee using a custom wayland protocol for the two halves to communicated

Hm. This sounds a bit like "let's reimplement X11 in wayland". The important difference is that this protocol can only be used by a single client/process at the same time?

What does Lua do? Lua mainly decides which windows are visible and where. How are input events handled? If you do a round-trip through Lua, then a deadlocked Lua would still mean that no input is processed, so nothing is won. So this needs to be similar to X11 and Lua can register interest in certain kinds of input with the compositor. Note that in awesome, left-mouse-clicks are grabbed by the default config and is used for changing focus (even when you click an already focused client, it is refocused by Lua). So mouse input would still be a problem. Oh and the while true-loop would also mean that you cannot focus other clients anyway...

Perhaps this needs some goals first so that one can decide "no solution achieves these goals". So far, latency is the only thing that sounds doable to me, but even then: Latency of what exactly? By the above, all left-button-clicks need to go through Lua (and in awesome they at least go through the awesome process before they are sent to the X11 client), so latency of this would not be improved.

Oh and at this point, you need X11's freezing input so that the relative order between input events is preserved, right?

@Timidger
Copy link
Member Author

@acrisci @psychon

Awesome has an amazing test suite.

I disagree ;-) but that's not important here.

Ha, fair enough. It has tests, which is pretty good for most projects I've come across.

There is a very bad memory leak from somewhere. Like 3 gigabytes in 3 minutes bad.

Huh? I haven't noticed this yet, but I am not running way-cooler much anyway... I'd ask valgrind what is going on, but do not have much time currently, sorry.

No need to apologize, I can debug that. I just saw that as a symptom of these joined systems, but I need to actually dig in and find where the problem is. If it's actually a bug in rlua that's easy to fix, if it's a bug in glib that's also easy to fix. If it's a problem with having these two systems joined together that's harder to fix. I'll investigate it.

And at the end of the day, you actually need to work synchronously with the other process whenever common state changes. For instance, when a window closes, you need to inform lua that the window has closed and wait until it runs all of its handlers to continue or else you are going to have race conditions

That's a good point, I didn't consider e.g. server side decorations needing to be removed at the same time the client is. If you keep that async you get a race condition and could have some funky things happening. But synchronous just adds runtime overhead. In my mind everything was async because Lua just manipulates the windows, but because it draws stuff related to them it makes it so that's not possible.

Hm. This sounds a bit like "let's reimplement X11 in wayland". The important difference is that this protocol can only be used by a single client/process at the same time?

Yeah, this is sort of what I'm afraid of. It's like your putting a new display server protocol on top of wayland. That's a big effort and I'm not sure it's something we can ever do well.

If you both think we can't do a good job doing that, then that's enough for me to hold off on this.

Perhaps this needs some goals first so that one can decide "no solution achieves these goals".

That probably is a better place to start 😉 .

Here's the problem I forsee: Way Cooler controls a lot of things. To the point that if it locks up you can't even switch TTYs since that's something Way Cooler needs to handle. In current day Awesome if you lock it up you may lose pointer input but you can (usually) still escape using TTY switching.

Maybe this isn't a problem, and more of a symptom of how I edit my Awesome config, but I usually iterate in a running instance. If it locks up I can switch TTYs and kill it. However, if most people are using xephyr to run it locally and test the changes then thats a better workflow that will avoid this problem. In fact it will be easier to do that with Way Cooler because you don't need xephyr it should "just work".

This may seem like a big change for such a small problem, but it also avoids the gnome performance problem that people bring up often (i.e. the overhead of running a scripting language on the same thread as the rendering one). Here it's a little different since the user is supposed to own their config though (and thus opt-in to the overhead), so maybe that is also unfounded. I'm just worried we will build this and have obvious performance problems because we chose this single threaded approach. It would be easier to make that change now.

However, you've both convinced me enough to at least hold off until Way Cooler is usable enough to actually have performance problems. I'm probably overthinking the issue and simply pre-optimizing.

@acrisci
Copy link
Contributor

acrisci commented Jun 11, 2018

Yeah I think I'll have better suggestions when I start the tiling layer, which should be soon, because that has a lot of interactions with lua.

@psychon
Copy link
Contributor

psychon commented Jun 12, 2018

Since this is now closed, let's go a bit offtopic. What does Lua do?

First, since Gnome worries you: What exactly are their latency problems? Is Javascript involved in "the actual drawing" of things, i.e. does the video player lag? Or is this "resizing windows is laggy"? Or is this really about endless loops? (I do not remember a single report of someone sending awesome into an endless loop; only some X11-related deadlocks)

Lua does:

  • decide on the position and stacking order of windows (more or less). This should not be much of a problem, I think...? (perhaps except for live resizing lagging?)
  • draws the "bar at the top of the screen" and titlebars. Right now in awesome, this is delayed as much as possible. People most likely would not notice this lagging behind anyway. But this is against the spirit of wayland. ;-)
    • One problem might be: What happens when a new window appears? Is the window already drawn without decorations until Lua supplies something (to minimise latency)?
    • I do not actually see a problem with the window decorations when closing a window; the compositor knows which decorations belong to the window and can just remove that together with the window. More of a problem might be "windows are re-tiled to make use of the new space"
  • Does some input-related stuff
    • When a grab is active, only Lua gets the event anyway, so not much of a lag is possible
    • Without a grab, the compositor can already know the keys that Lua is interested. Since Lua cannot "block" input, the compositor can even forward the input event to clients before giving it to Lua, I guess.
  • Misc. stuff like setting a wallpaper, implementing some DBus service, .... This shouldn't be a problem (well, the wallpaper is a problem doing startup, but whatever :-))

Did I forget anything? Would you agree that "input" is the only really latency-critical thing in here? Or am I not "every frame is perfect"-enough? If so, my excuse is "so far I only did X11" ;-)

@Timidger
Copy link
Member Author

Here is an example from a year ago about the lag in gnome.

The problem is, as pointed out, that we do too much on the main thread in the compositor.

That is what I'm worried about. Limiting what is possible for the user to do because it will cause problems for the user.

Here's some anecdotal evidence: a friend of mine thought about writing a scheduler for his Lua status bar because it was doing lots of things that weren't exactly pixel perfect. He was (probably?) pre optimizing a bit, but any problems like that will only get worse in Wayland because there's so much more Way Cooler must do.

The deadlocks from an infinite loop are too trivial to worry about you're right. Let's drop that since that is just the user shooting themselves in the foot.

Ultimately I don't know if this will be a problem because in the past Way Cooler used a separate thread for Lua (and another for dbus). It also did much less work. Mostly just setup code.

I worry because I see the startup lag (basic config takes an extra second to be ready compared to eg rootston and we aren't even doing everything yet) and the problems gnome is running into because they didn't design their architecture well. So in trying to think of alternatives that could avoid this. However the one I presented isn't good enough.

Time will tell if this is an issue. It's possible I am underestimating just how fast our code will be.

(As well since this is Rust, we have much better concurrency primitives to work with. However it is still harder to build a concurrent system like the one I proposed.)

@psychon
Copy link
Contributor

psychon commented Jun 12, 2018

Hm, yeah, awesome is kind of multi-threaded with the compositor (= X11 server)...
So, if mouse movements is "the important thing":

  • I know that the X11 server does lots of magic to make the mouse appear fast. Once upon a time, this involved a signal handler (SIGIO) and these days that was replaced with an input thread. AFAIK the thread just reads input and puts it in a queue to process, but I am not entirely sure. Also, I am not sure which problem this solves (dropped input, because events were not read from the kernel fast enough? That sounds unlikely to me...)
  • Lua is "not really" involved in mouse handling. Things like enter notifications can easily happen asynchronously. However, I do not see how to easily make cursor movements asynchronous and everything else synchronous.
  • Does moving the cursor even require a repaint? Isn't there some kind of sprite that the hardware adds ontop of the resulting image and that is used for cursor movements? If so, optimising for cursor movements should be possible (in some thread-safe part of wlroots?).

@Timidger
Copy link
Member Author

So, if mouse movements is "the important thing":

I wouldn't say mouse movement is the important thing, at least in my mind.

The user should always have a way to send input and, in particular, terminate the compositor if it misbehaves. In previous iterations of Way Cooler it is an explicit feature that, even if you don't define a keyboard binding for it, there will always be a way to exit assuming input can be still be consumed (e.g. if there isn't a bug in the Rust code. The Lua code, remember, was running in a separate thread. It could lock up, but then you could always hit the restart or quit keybindings and that will still work because that doesn't go through Lua).

This was fundamentally a built in feature for Way Cooler, but this might just be philosophical differences between the two projects. Awesome more provides a set of tools/APIs to really make a window manager, whereas previously Way Cooler had a lot of it built-in with the Lua code being more window dressing.

If anti-foot-gun-prevention-techniques are not within the scope of Awesome then they are not within the scope of Way Cooler (at least for the run-up to 1.0).

However, I do not see how to easily make cursor movements asynchronous and everything else synchronous.

Yes I'm not sure how to make that work either. You could probably hack it to work, but I don't think there's any literature/implementations on the subject since most compositors are single threaded.

Does moving the cursor even require a repaint? Isn't there some kind of sprite that the hardware adds ontop of the resulting image and that is used for cursor movements? If so, optimising for cursor movements should be possible (in some thread-safe part of wlroots?).

Hardware cursors are an opt-in feature of wlroots. But again, it's not so much optimizing for "mouse input" as it is optimizing for not running into "oh now you have to restart your computer because there's a bug in some Lua code somewhere".

@acrisci
Copy link
Contributor

acrisci commented Jun 14, 2018

Ok, I'm in the Lua code now (doing the "client") and I need to access the Server to get a list of the views and turn them into lua objects for client.get() and running into this issue of sharing state between the threads now.

I'm getting loads of errors for the wlroots types not being thread safe because they don't implement Sync.

So what's the alternative to this approach for solving this issue? Do we want to make wlroots-rs thread safe?

@Timidger
Copy link
Member Author

@acrisci There are no threads in Way Cooler any more btw. Rust is complaining because user data you share to rlua must implement Send (see this rlua thread).

So I ran into this problem to when implementing the Screen. My solution was...not the best.

This is technically ok, but *Handle types in wlroots-rs cannot be Send because then they could be sent across threads. This is technically ok because there's only one thread and there's no way they can be desynced here (as that handle is only used when there's no way for the Rc underneath it to be modified).

To make wlroots-rs thread safe we'd need to turn all of the Rcs into Arcs. That will impose overhead that I don't really want to do especially since

  1. Most people don't actually want to run multiple threads and pass handles around anyways
  2. For our use case it can continue to be Rc and it'll be fine.

@Hi-Angel
Copy link
Contributor

Hi-Angel commented Dec 7, 2018

So, as I'm reading the conclusion was to not break the processes up. How then Awesome and Way-Cooler ended up being separate processes?

@Timidger
Copy link
Member Author

Timidger commented Dec 8, 2018

@Hi-Angel

To clarify: there are two processes. One is the way-cooler binary which is the Wayland compositor. It acts as a server to the awesome binary which functions similar to the existing X11 window manager except it talks to Way Cooler instead of the XOrg server.

@Hi-Angel
Copy link
Contributor

Hi-Angel commented Dec 8, 2018

Yeah, that's what I'm wondering about. If I'm reading correctly, when this issue started Awesome and Way-Cooler were in a single process. And the issue was closed with conclusion that splitting them is not worth the effort.

So I wonder, have this discussion been continued somewhere else, which ended up in decision to split processes up?

@Timidger
Copy link
Member Author

Timidger commented Dec 8, 2018

@Hi-Angel

The discussion was partly in idc (I don't have logs, sorry) and detailed in this blog post.

Edit: also this issue: #561

@Hi-Angel
Copy link
Contributor

Hi-Angel commented Dec 8, 2018

Well, I wouldn't say it's really detailed in there, because from what's written in the blog-post a simple multithreading would do the job. I've read in some other issue though that glib+Lua issues, which, like, would be hard to fix with multithreading though. I'll look at #561.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants