Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards a text editor construction kit #1187

Open
cmyr opened this issue May 5, 2019 · 24 comments
Open

Towards a text editor construction kit #1187

cmyr opened this issue May 5, 2019 · 24 comments

Comments

@cmyr
Copy link
Member

cmyr commented May 5, 2019

Towards a text editor construction kit

One of my goals in writing the rust playground for macOS was to see how much work would be involved in reusing components from the xi-editor core library (xi-core).

Basically: xi-core contains all the logic for handling user actions (such as editing the buffer or modifying the selection) but currently the architecture of xi-core is sort of all-or-nothing; it assumes that if you're using it you're using it in a certain way.

As an example of this: currently, all of the logic for various editing operations is attached to the Editor struct. This means that if you want to use xi-rope as the backing store for some text editor, you can't currently use xi's implementations of complex operations like backspace without also using xi's View struct, Editor struct, and plumbing.

This is less than ideal. We've put a lot of effort over the past few years into writing good, high-performance implementations of a bunch of text-editing primitives, and it would be nice if they were easy to use.

Improving code reuse and modularity

I believe that the xi-editor project should provide a set of components and functions for various text editing tasks, and these components and functions should be mix-and-match; you can use whatever works for you, and fill in the blanks however you wish.

This approach has obvious practical advantages, but it has some secondary advantages as well; most importantly, it encourages us to write code that is easier to test and profile.

Below, I will outline what this means, some specific things I think could be improved.

Good primitive text editing data structures

The most basic thing we would like to provide is a set of well-tested and robust data structures for text editing that work well together.

In general, we don't have a ton of work to do here; this is all (mostly) already in place, and working.

  • A text buffer: xi-rope is in quite good shape.
  • Selections: the current implementation works, but we might consider replacing it with a immutable version, but I'd want this project to be backed up with good profiling
  • line breaks (mostly fine)
  • style spans (okay for now)
  • annotations (in progress)

Move selection modification out of View

Functions that set and transform selections should be callable directly. For these, we need the ability to navigate between lines (which may be either visual (soft breaks / word wrap) or logical (newline characteres)). Currently these functions (in movement.rs) take a View struct, which exposes functions like offset_of_line, and line_of_offset. To make these work without needing a View, we'll need some sort of abstraction (like a trait) over this behaviour.

Ideally, then, all movement becomes a function with a signature like,

// LineMovement is a trait for things that implement fns like `line_of_offset`.
fn move_left(text: &Rope, selection: &Selection, view: &dyn LineMovement) -> Selection;

(it may be worth making the HorizPos a field on SelRegion to simplify some of this logic.)

move editing functions out of Editor

Editing operations need a few pieces of state: The current buffer, the current selections, the 'view' (or whatever is tracking line state) and likely certain config settings; for instance, the insert_tab edit operation needs to know whether to insert a tab character or spaces.

Edit operations produce deltas. In the general case, the selection state following an edit can be inferred (any selection regions are collapsed, as their contents are replaced/deleted; carets move based no the inserted or deleted text) but in certain cases an edit can also produce explicit selections, such as in transpose, where the selection moves along with the swapped regions.

Edit operations should be moved out of Editor, and be callable directly.

fn backspace(
    text: &Rope,
    selection: &Selection,
    view: &dyn LineMovement,
    config: &Config)
-> (Delta, Option<Selection>);

fn indent(
    text: &Rope,
    selection: &Selection,
    view: &dyn LineMovement,
    config: &Config)
-> (Delta, Option<Selection>);

Alternatively, we could some sort of context for this state, and hang edit functions off of that:

struct EditContext<'a> {
    text: &'a Rope,
    selection: &'a Selection,
    view: &'a dyn LineMovement,
    config: &'a Config,
}

impl<'a> EventContext<'a> {
    fn backspace(&self) -> (Delta, Option<Selection>) { .. }
    fn insert_text<T: Into<Rope>>(&self, text: T) -> (Delta, Option<Selection>) { .. }
}

It's unclear that this offers much advantage.

Find & Replace

Find and replace should be moved out of View. there's room for some design thinking here; for instance I could imagine wanting to use a FindContext struct that keeps track of find state, and implements methods for moving between results.

Other editor features

We should provide good standalone implementations of things like syntax highlighting, line breaking, auto-indent, and diffing, which can be computed incrementally where possible.

In addition, it would be nice if we could provide a basic non-CRDT undo implementation.

Other misc stuff: encoding / decoding files? a general purpose file system watcher? More generic support for configuration?

Conclusion

This is intended as a discussion issue; if there is general agreement with this approach, I'll open a tracking issue for specific tasks. If anyone has any thoughts about this, feel free to comment.

@cmyr cmyr added the discussion label May 5, 2019
@nangtrongvuon
Copy link
Member

nangtrongvuon commented May 6, 2019

The main goal here seems to be reducing state needed to perform common editing operations - I tend to mostly agree with what you have written, though I do have some questions:

  1. Do we want to move these operations into a plugin of its own? I assume there'd be something like a xi-core-plugin that runs, which for xi would be the main way of interacting with the rope backend.
  2. What should all of this accomplish? Are we doing a xi-core crate that acts as a text editing framework for prospective editors? How should xi-editor differ from those if so? Just curious and wanted to get your thoughts on this.

On that note, I think it would be beneficial to make use of GitHub's project management feature, though it might be a lot of work that you might not want to take on. Perhaps let's probe around and see who's willing to take on some of this stuff? Personally I'm open to being assigned anything here, or I could even help with some of the management work as well (If so, let's talk about this on Zulip 😄)

@raphlinus
Copy link
Member

I've given some thought, also provided here for discussion.

  1. I very much agree with the idea of unbundling basic editing ops. In many cases, they represent very careful attention to get things right, and of course the data structures support scaling to large documents, which I think is an appealing feature.

  2. Having immutable functions return a delta is a good signature, agree.

  3. My personal feeling is that there are parts of xi that have not turned out well - the async plugin system and CRDT's, in particular. I'm fine with abandoning both. I think a non-CRDT undo is a good thing, and wouldn't be too hard to do.

  4. Many editing operations need access to measurement, for example up and down arrow, to find the closest "horiz" to the previous cursor position. I think a simple way to unbundle this is to have a measurement trait implemented by the client. This could be super simple for console-flavored implementations (monospace font etc), more complex as text gets richer.

  5. I'm thinking about how this stuff might be useful for implementing, eg, rich text, without adding a lot of complexity to the core. One way to do that is to have the client responsible for maintaining those extra data structures (with of course the option of using the interval tree from xi-rope).

  6. A lot of the potential value of such a kit is getting highlighting working correctly, using syntect. The plugin-based implementation has some very good engineering around reducing memory use and latency, but also extra unneeded complexity about keeping the "cache" in sync. With a synchronous, in-memory implementation it could just reference the rope as needed. Also, the current implementation does OT on the style spans, which is unnecessary and adds complexity (in the case of a conflict which is currently resolved by OT, it's fine to just throw away the style spans and recalc, because they're a stateless function of the original).

  7. If a very simple, cut down editing library existed, I would probably use it for the text control in druid. It might be useful to focus on that use case first as a practical focus, rather than dreaming about potential future applications.

  8. There are other things we can do, especially short term, to make infrastructure more useful. For example, the serde serialization in xi-rope should be optional, because many of the use cases for it aren't going to be sending (eg) deltas over json-rpc.

Lastly, regarding my personal directions on some of this stuff. Text and code editing are going to be on the back burner for the next few months, as I have my plate more than full of other things. That said, a great deal of my current work on UI infrastructure has been strongly motivated by the text editor case. In particular, the new 2D renderer I've been prototyping is a direct evolution of the current xi-mac text plane, and I can see it replacing it, with the added advantage of being cross-platform. I think it's reasonable to expect that druid will either prove itself as a viable UI toolkit in the next few months, or we can look at other alternatives. I can see building out the original dream of an extremely performant cross-platform editor on that stack. But I think that work should depend on druid being more mature and stable. (And if druid falters for whatever reason, it would be reasonable to build an editor on whichever Rust toolkit turns out the best)

Lastly, people reading this issue should check out makepad, which looks like a viable approach to building an editor in Rust. The current makepad codebase makes a number of assumptions that limit it to small files. So one thing to explore is whether a xi-based editing toolkit could easily be retrofitted to makepad to improve it in the ways we've invested.

@scholtzan
Copy link
Member

Good to see this set in motion! I like the idea of moving editing operations out and making it (hopefully) easier to test in the process.

I totally agree that the current state of find and replace has a lot of room for improvement and should be moved out from view. I have the feeling that things could get tricky here very quickly especially if we'd want to continue support incremental find, and maybe extend find to search across files etc. But with a bit of planning this shouldn't be an unsolvable problem.

Also, one thing that should probably be considered is making line wrapping less painful. There are some thoughts written down in #1135 but not sure if there are already some more specific ideas for improvements.

Another part I'm pretty interested in hearing more about (though it's only indirectly related to the whole text editor kit thing) is about improving the plugin system. I saw some cool ideas buzzing around, but there doesn't seem to be a clear path yet. Anyway, that's probably something that should rather be discussed in #845

@chriskrycho
Copy link

@raphlinus, can you elaborate on this, especially the CRDT part? (I'm doing research in a similar space right now and evaluating various tradeoffs and options, and Xi's choice of CRDT has been an important data point there.)

My personal feeling is that there are parts of xi that have not turned out well - the async plugin system and CRDT's, in particular. I'm fine with abandoning both. I think a non-CRDT undo is a good thing, and wouldn't be too hard to do.

@cmyr
Copy link
Member Author

cmyr commented May 9, 2019

@nangtrongvuon

  1. Do we want to move these operations into a plugin of its own? I assume there'd be something like a xi-core-plugin that runs, which for xi would be the main way of interacting with the rope backend.

This is really independent of plugins. Nothing in terms of how xi-editor currently works would change; it's just that inside Editor we would call out to functions declared elsewhere, instead of functions that are in an Editor impl block.

  1. What should all of this accomplish? Are we doing a xi-core crate that acts as a text editing framework for prospective editors? How should xi-editor differ from those if so? Just curious and wanted to get your thoughts on this.

This just makes it easier for other projects to reuse xi-core code. xi-editor doesn't really change, it's just structured slightly differently.

On that note, I think it would be beneficial to make use of GitHub's project management feature, though it might be a lot of work that you might not want to take on. Perhaps let's probe around and see who's willing to take on some of this stuff? Personally I'm open to being assigned anything here, or I could even help with some of the management work as well (If so, let's talk about this on Zulip 😄)

Heh, yes this might be a good idea. Project management is none of our strong suits. :)

@raphlinus:

We're broadly in alignment here, so won't address every point.

  1. My personal feeling is that there are parts of xi that have not turned out well - the async plugin system and CRDT's, in particular. I'm fine with abandoning both. I think a non-CRDT undo is a good thing, and wouldn't be too hard to do.

This is a tough call. The CRDT has definitely proven fragile, but I think there are still some ideas here that could be worth experimenting with. I'm not currently advocating getting rid of it wholesale, but I think it would be nice to provide an implementation of undo that worked independently.

  1. Many editing operations need access to measurement, for example up and down arrow, to find the closest "horiz" to the previous cursor position. I think a simple way to unbundle this is to have a measurement trait implemented by the client. This could be super simple for console-flavored implementations (monospace font etc), more complex as text gets richer.

This is a good point, I hadn't thought of how vertical movement cares not just about line breaks but also about actual measurement. Line breaking and width measurement are tightly coupled, and it might be worth sort of treating these as a bundle; if you have line breaks you have to have width measurement.

  1. I'm thinking about how this stuff might be useful for implementing, eg, rich text, without adding a lot of complexity to the core. One way to do that is to have the client responsible for maintaining those extra data structures (with of course the option of using the interval tree from xi-rope).

I've been thinking about this as well, and in particular I've been thinking about the difference in Cocoa between NSString and NSAttributedString. I think I could imagine having a similar concept in xi; representing these interfaces as traits, you'd get something like,

/// Implemented by `Rope`
trait TextStore {
    fn edit(&mut self, iv: Interval, new: &str);
    fn slice(&self, iv: Interval) -> Cow<str>;
}

enum Attribute {
    Font(FontObject),
    ForegroundColor(Color),
    BackgrouondColor(Color),
    // etc
}

type Attributes = Spans<Attribute>;

/// Implemented by something that wraps both `Rope` and `Spans`
trait RichTextStore: TextStore {
    fn add_attributes(&mut self, interval: Interval, attrs: Attributes);
    fn set_attributes(&mut self, interval: Interval, attrs: Attributes);
    fn get_attributes(&self, interval: Interval) -> Attributes;
}

This would provide a fairly simple, coherent API for managing rich text, without the consumer having to worry about things like updating spans in response to edits; we could just do that ourselves behind the scenes.

  1. A lot of the potential value of such a kit is getting highlighting working correctly, using syntect. [...]

I agree this is something we should do, and maybe it even fits into the above pattern; you could have the concept of a StyleProvider that was attached to a given buffer, and which would be called on each edit. Syntect could be one such provider.

  1. There are other things we can do, especially short term, to make infrastructure more useful. For example, the serde serialization in xi-rope should be optional, because many of the use cases for it aren't going to be sending (eg) deltas over json-rpc.

Yep, this is a good idea as well.

@scholtzan

I totally agree that the current state of find and replace has a lot of room for improvement and should be moved out from view. I have the feeling that things could get tricky here very quickly especially if we'd want to continue support incremental find, and maybe extend find to search across files etc. But with a bit of planning this shouldn't be an unsolvable problem.

How this all will work incrementally is a really good question. I do think that the incremental approach remains important generally, but there might be some way of combining the incremental and immediate; for instance if we think of incremental tasks as working like futures, where you have some 'TaskContext' that you poll and which performs units of work, the immediate version could just be the incremental version being polled until it's complete.

Also, one thing that should probably be considered is making line wrapping less painful. There are some thoughts written down in #1135 but not sure if there are already some more specific ideas for improvements.

Agree with this. I think that if we stop expecting width measurement to happen over RPC, this task just becomes much simpler.

Another part I'm pretty interested in hearing more about (though it's only indirectly related to the whole text editor kit thing) is about improving the plugin system. I saw some cool ideas buzzing around, but there doesn't seem to be a clear path yet. Anyway, that's probably something that should rather be discussed in #845

One thing that this proposal is really not talking about is plugins. Maybe it should be. My thinking right now is that this work isn't really about xi-editor, as it exists, today; it's more just about making the code that we have more reusable in other projects.

@raphlinus
Copy link
Member

This reply is going to be scoped to the CRDT only. I have been meaning to write a larger retrospective of xi-editor; consider this the CRDT portion of that.

I took on investigation of OT/CRDT with the hope that there was a "right answer" to collaborative editing, only needing to be discovered, and that one approach could unify a number of asynchrony-related issues, including IME being in a different process (actually the earliest motivation, as Android has serious asynchrony correctness problems around this), syntax highlighting being potentially slower than typing, and the effect where annotations from Language Server style modules being applied to incorrect regions in the face of concurrent editing, not to mention collaborative editing itself.

I also hoped that a collaborative editing engine might be implemented, possibly with a tricky implementation, but so that the complexity could be encapsulated, and clients of this engine wouldn't have to worry about it.

In hindsight, I believe these were reasonable assumptions, but I believe turned out to be wrong. I'll go into them in detail.

There is a "right answer" to collaborative editing

Here I was especially hopeful that CRDT would turn out to be that "right answer," because it is on a sound mathematical footing (monotonic semi-lattices). Those mathematical structures might sound exotic to many, but are comforting and familiar to me (I had read and loved Dijkstra's Predicate Calculus and Program Semantics when it came out, so it would have been especially rewarding to see a new application of lattices).

Indeed, the literature of CRDT does specify a mathematically correct answer (see Specification and Complexity of Collaborative Text Editing for a careful exposition of the specification as well as a result that it requires some form of tombstone to implement correctly). But this does not always line up with what humans would find the most faithful rendering of intent. Take for example, a document initially "A B C", with one user deciding to change "B" to "D", and the other user deciding that sentence needs rewriting, with "E F G" as the result. Clearly either "A D C" or "E F G" is a reasonable result, but a CRDT essentially demands that the result be either "DE F G" or "E F GD", the tie resolved through timestamps or some similar mechanism.

Different use cases for asynchronous editing have different intent:

  • In actual collaborative editing, you probably want to preserve the inserts, the CRDT is not unreasonable, at least you can likely get to a good state just by deleting stuff.

  • IME is a complex beast in and of itself (as @lord can attest from our work on this in Fuchsia). Suffice it to say that there is an impedance mismatch between platform IME protocols and a CRDT, and the fact that xi still has a somewhat hacky approach to IME makes my point.

  • For syntax highlighting, any form of OT or CRDT is overkill; the highlighting is a stateless function of the document, so if there's a conflict, you can just toss the highlighting state and start again.

  • For annotations from Language Server, you might not want to toss all annotations in the face of concurrent editing (as this might add to latency between a result being available and presented to the user), but a simple approach can work well, no need to solve the TP2 puzzle. In cases that get really hairy, it is fine to discard and re-analyze, as it's unlikely that a heavily OT'ed result will be perfectly accurate in any case.

  • One of the trickiest cases is insertion of whitespace and bracket balancing. The problem here is that, unlike the previous two cases, it's stateful. The plugin edits are direct responses to user-initiated edits. Another complicating factor is that these are not just string edits, the cursor position is affected (it's common for some whitespace to be inserted before the cursor, and say a newline after). This motivated the entire "priority" system, which I tried to explain in Rope science part 10.

My conclusion now is that these different cases are actually different, and a "one size fits all" approach does a disservice.

Encapsulating complexity

Successful software project management is mostly about taming complexity. A happy case is when you can define a module that might be complex internally, but that complexity is hidden from the rest of the system. An extreme example of this is a SAT solver - these are complex beasts, on which many academic careers have been made, but you can write a brute force one in an afternoon. Within text editing, an example I'd hold up is xi-rope. There's a lot of cool functionality (you can iterate by Unicode grapheme cluster!), excellent performance characteristics (especially worst-case, which is where ropes shine), but isn't significantly harder to use for editing tasks than strings.

Sadly, contrary to my hopes, CRDT's do not have this property. By now we have lots of examples where trying to design features around the structure imposed by CRDT turned out to be a lot more complicated than it would be in a more synchronous world - we saw the auto-indent stuff above, difficulty getting the selection right in transpose (which required us to invent a new "drift" mechanic), interactions between selection and undo, and, my personal most regretted, soft spans, where we had a promising prototype, but it ultimately was abandoned because of impedance mismatch with the CRDT model.

In all of these cases, I believe the complexity exported by the CRDT is responsible, and we'd have solutions in hand if it weren't for that.

Actual collaborative editing is a lot harder

For a while, xi had difficulty making up its mind whether it was actually a collaborative editor or not. Tristan's intern project got us to a tech demo of collaborative editing on Fuchsia (it was very heartening to see!), but we always knew that building a real collaborative editor was a major increase in scope. At the back of my mind, I hoped that a solid CRDT engine would evolve into something that would relatively easily support collaborative editing.

In retrospect, this was a pretty bad case of YAGNI. It would have been better to either commit to doing collaborative editing, and marshal the resources to make that happen, or to have it out of scope and reduce the complexity.

CRDT is a tradeoff

I still believe you could build a collaborative editor on the basis of the original xi-editor roadmap, CRDT and all, and that would have some very desirable properties. This promise is one reason it's taken me so long to get to my current thinking.

Aside from the mathematical foundation, the main thing CRDT buys you is the ability to propagate edits through a mesh-like network, exploiting local connections, as opposed to requiring a central server. I think that's appealing if you're actually editing in such an environment. For large scale organizations (I'm familiar with those and have had great conversations more recently with @dsp), I think it does make sense to treat the set of analysis and review tools as a form of collaborative editing, but those are also cases where a centralized server is completely viable.

As a side note, I've heard an interesting theory about why CRDT-type solutions are relatively popular in the cloud. To do OT well, you need to elect a centralized server, which is responsible for all edits to a document. I believe the word for this is "server affinity," and Google implements it very well. They need to, for Jupiter-style OT (Google Docs) to work. But when you're getting cloud resources from platforms like AWS, server affinity is much more difficult to attain, so in those cases you favor an eventually-consistent, federated network, where any server can accept any edit in any order, and it'll all work out.

The flip side of the tradeoff is that you have to express your application logic in CRDT-compatible form. How do you represent the merge rules for soft spans as a monotonic semi-lattice? I bet it's possible (I actually love those kinds of puzzles), but it's probably a Masters level research topic, while doing it synchronously is something any of us could do. If we had a hard requirement to do collaborative editing over a mesh network, and resources to make that happen, I bet we could solve that and all the other problems. Indeed, this is pretty much what I hoped to be doing a couple years ago.

Conclusion

So I come to the conclusion that the CRDT is not pulling its (considerable) weight. When I think about a future evolution of xi-editor, I see a much brighter future with a simpler, largely synchronous model, that still of course has enough revision tracking to get good results with asynchronous peers like the language server. The basic editing commands, including indentation and paren matching, are more simply done as synchronous operations, and I think avoiding that complexity is key.

I hope this view into my thinking is useful. I might expand this into a blog post or at least link to it when I do get around to writing a retrospective.

@chriskrycho
Copy link

For my part at least, that's extremely helpful. Thank you for the thorough and considered reply!

@gritzko
Copy link

gritzko commented May 11, 2019

Actually, soft spans may work well in the CRDT framework.
They have to be part the the data type, though.
That may even resolve some CRDT-specific glitches, e.g. concurrent typo correction.

It is easier to express in the Causal Tree terms than in RGA terms... basically, you may have "soft" symbol subtrees that blend together if their values match.
Consider

H<-E<-L<-O

corrected concurrently to

H<-E<-L<-(L)<-O
      ^--(L)

where two soft (L) atoms have different ids, but nevertheless get displayed as a single character.

@espadrine
Copy link

Take for example, a document initially "A B C", with one user deciding to change "B" to "D", and the other user deciding that sentence needs rewriting, with "E F G" as the result. Clearly either "A D C" or "E F G" is a reasonable result, but a CRDT essentially demands that the result be either "DE F G" or "E F GD", the tie resolved through timestamps or some similar mechanism.

An essential property for intent preservation is that the end state must have a possible linearizable history of expressed UI operations. Here, neither "DE F G" nor "E F GD" have one. CRDT, especially CmRDT, can do it, when operations are not character-based, but span-based, like UI operations:

  1. Writer 1: Delete characters between id(A) and id(C).
  2. Writer 2: Delete characters between id(start) and id(end).
  3. Writer 1: Insert " D " between id(A) and id(C).
  4. Writer 2: Insert "E F G" between id(start) and id(end).

In the CRDT literature, I worry that emphasis often is on minimizing worse-case algorithmic complexity, instead of maximizing intent preservation. Pragmatically, it is fine to have O(n log n) when n is usually small. In this example, a workable design would be to have a history with immutable timestamps with total order, and upon reception of outside operations, undo to the oldest parent, and redo with the operations inserted.

@lord
Copy link
Member

lord commented May 11, 2019

To add on to Raph's points about Fuchsia's input method APIs — I tried for a while to make the CRDT model work on Fuchsia without dramatically increasingly complexity for input method authors. In the process of trying to hide the CRDT complexity, some other folks and I realized we were basically implementing a retry-based system, which we eventually just settled on instead of the CRDTs. The text field has a revision number that increments any time there is a change; the input method submits a batch of several changes along with the last revision number it has seen, and the edit only gets accepted if the revision numbers match. There's definitely the possibility of more roundtrips (or, theoretically, an infinite number of roundtrips) per edit, but considering the low latency of IPC calls (vs. the web, where CRDTs I guess were traditionally used?), we considered it an acceptable cost to pay. In the future, we may also reduce the likelihood of conflicts without changing the protocol by accepting an edit, even if the revision numbers don't match, if we know that the input method has only read sections of the text field that remain unchanged.

And as Raph says, there were significant complexities to getting CRDTs to interop with an existing platforms' input method protocols: as one example, look at the web, where the content of a text field is frequently completely overwritten by some Javascript, making it impossible to get "canonical" diff commands for a CRDT protocol.

@ncm
Copy link

ncm commented May 11, 2019

I have a stronger conclusion: any attempt to automate resolving simultaneous editing conflicts that, e.g., git merge could not resolve, will fail in a way that fatally confuses users.

The only viable solution is the one that git merge chooses: split that line or region of the document and include both, with markup to indicate the split. After that, party A can edit only in A's branch of that region, B only B's branch. Then A or B can nominate discarding or adopting one or other change; or even both, or neither.

It's a heavy-weight solution in terms of what happens in the UI but lightweight in its demand on comprehension of users.

Users need to know when they are trying conflicting edits, and have the opportunity to back off without causing confusing damage to the document that might be hard to right. At the same time, they need not to be interrupted before they finish expressing their respective thoughts. The software can't mediate correctly; users will need to discuss the correct outcome. The software's role is to collect their expression and present it comprehensibly with sensible operations available.

@xi-editor xi-editor deleted a comment May 11, 2019
@josephg
Copy link

josephg commented May 11, 2019

Hi! I've got some experience on these systems (wave, sharejs, etc).

Some comments:

As a side note, I've heard an interesting theory about why CRDT-type solutions are relatively popular in the cloud. To do OT well, you need to elect a centralized server, which is responsible for all edits to a document. I believe the word for this is "server affinity," and Google implements it very well. They need to, for Jupiter-style OT (Google Docs) to work.

You don't need to do this. (Although I'm not sure if we knew that on the wave team). As @lord mentioned, you can implement an OT system on top of any database that has a transactional write model. The approach is to enter a retry loop where you first try to apply the operation (but in a way that will reject the operation if the expected version numbers don't match). If an error happens, fetch the concurrent edits, transform and retry. Firepad implemented this retry loop from the client, and it worked much better than I expected. Here is a POC of a collaborative editor on top of statecraft. The only OT code on the server is this middleware function. The code is essentially:

function apply(op, intendedVersion):
  loop:
    match upstream.apply(op, intendedVersion):
      Ok => return
      Conflict =>
        conflictingChanges = upstream.getOpsFrom(indendedVersion)
        op = op.transform(conflictingChanges)
        intendedVersion = conflictingChanges.lastVersion

I think the reason why semi- or fully- centralized systems are popular in products like google docs is that they're easier to implement. Access control in a decentralized system like git is harder. Gossip networks don't perform as well as straight offset-based event logs (kafka and friends). And if you have a canonical incoming stream of edits, its easier to reason about.


I have a stronger conclusion: any attempt to automate resolving simultaneous editing conflicts that, e.g., git merge could not resolve, will fail in a way that fatally confuses users.

I think you have to act with intent about what you want to happen when two users edit the same text at the same time. There are basically 2 approaches:

  1. Resolve to some sort of best-effort outcome. (Eg "DE F G" or "E F GD")
  2. Generate an error of some sort (eg via conflict markers) and let the user explicitly resolve the conflict

As much as it pains me to say, for code I think the most correct answer is to use approach (1) when the code is being edited live and (2) when the code is being edited offline / asyncronously. When we can see each other's changes in realtime, humans handle this sort of thing pretty well. We'll back off if someone is actively editing a sentence and we'll see them typing and let them finish their thought. If anything goes wrong we'll just correct it (together) before moving on. The problem happens when we're not online, and we edit the same piece of code independently, "blind" as it were. And in those cases, I think version control systems have the right approach - because the automated merge is often wrong.


I guess there's three main architectures you could use for this sort of thing:

  1. Make a monolith, where a big application just bundles in syntax highlighting, an editor, etc. Everything is wired together explicitly. (Eg classic IDEs).
  2. Make a swarm of systems that collaboratively edit a document model like you've been trying to do
  3. Use an event sourcing design. Data flows one way (and the order is explicit), but you use a well specified data model.

This last one is what I've been working on lately. I'm using a central source of truth (at least architecturally - it could itself wrap something more exotic). Then that source of truth provides a linear operation history and monotonically increasing version numbers. Different parts of the system are arranged in a DAG, not a graph. Communication is either writes (going "upstream") or reads (which flow downstream).

In this model syntax highlighting is a computed view - so the system which generates those annotations subscribes upstream, then only tells downstream consumers about any changes to the highlighting markup.

I'm hoping to eventually implement a lot of the features you want for xi. For a collaborative IDE, you could arrange your system in a dag like this:

                            .─────────────.
                    ┌─────▶(Build artifacts)
                    │       `─────────────'
 .─────.      ┌──────────┐
(       )  ┌─▶│ Compiler │───────┐
│`─────'│  │  └──────────┘  Annotations  ┌──────────┐     ┌──────────┐
│       │──┼─────────────────────┴──────▶│  Merge   │──┬─▶│  Editor  │
│ Root  │  │  ┌───────────┐              └──────────┘  │  └──────────┘
└───────┘  └─▶│File system│                            │  ┌──────────┐
              └───────────┘                            ├─▶│  Editor  │
                                                       │  └──────────┘
                                                       │
                                                       └─▶    ...

And this means you don't have to worry about priority between the systems or anything like that, because the system converges in a deterministic way. I think this would still work fine if behind the root "source of truth" was actually a gossip network passing CRDT operations. You could replicate that whole thing across multiple computers and have a shared root source of truth (decentralized or otherwise), and all editors would end up working collaboratively none the wiser.

I hope that doesn't derail this discussion too much. Feel free to reach out if you want to talk more about this stuff offline.

@raphlinus
Copy link
Member

@josephg I for one welcome your thoughts on this discussion.

I know I overstated the case for OT requiring a centralized server. The OT literature has a bunch of papers on getting some form of strong serialization using distributed-system techniques like vector clocks. Your suggestion is probably even a lot more practical than that.

I gave some thought to solving the indentation / bracket matching problem asynchronously, and if I absolutely had to maintain typing responsiveness even when those algorithms were slow (for example, of you needed to scale to very large documents), then I think a retry-based system is the way to go, as opposed to having the merge logic baked into the CRDT. In fact, it might be worth doing this for completion results from the language server, so you get consistent results not dependent on the speed of the response from the LS.

But after thinking about it some more, for just doing indentation and brackets in code, I think even the complexity of a retry scheme is not worth it. Just make your parser fast enough to do it for any reasonable-size document at interactive speeds - and making the parser fast has other advantages for the system.

@josephg
Copy link

josephg commented May 12, 2019

Yeah I agree. Maybe I'm imagining your system as having 2 levels of architectural decoupling / distance:

  1. Stuff that happens synchronously, inside the same process / called directly from your onEdit event handler.
  2. Stuff triggered from witnessing a CRDT edit operation. This could be in another process, on another computer.

Traditional IDEs do everything in (1), and so they become monoliths which struggle to have clean architectural separation. In response, you've been doing lots of things in (2) - which makes decoupling easier, and collaborative editing easier, but maybe its not the right fit for something like for bracket matching. I acknowledge that CRDTs are probably the right approach if you want to do P2P editing. But having a swarm of small pieces communicating in loosely defined ways seems like a huge source of unexpected / unwanted complexity. (I think this is also the reason why I think dataflow style programming > OO style programming.)

So with that in mind I've been approaching this problem space with another architectural layer - basically considering realtime edits moving through the system as an event sourcing problem. Unfortunately this still doesn't solve your bracket matching issue - you might be right there; maybe the right answer is to just do it synchronously in the editor. But for things like compilers, and for generating syntax highlighting information, I think the DDD / Event sourcing / FP / etc philosophy that "data only flows one way" is a really solid grounding. It makes a lot of things easy to reason about, and because its async by nature you can still do magic tricks like pull any parts of your system out into separate threads, processes, or run them on different computers.

I could imagine doing the whole thing with just synchronous changes and an event sourcing system, but that would struggle with p2p collaborative editing. And it sounds like synchronous changes + CRDTs struggle with complexity issues around ordering and priority. As you say, without the CRDT model some problems become quite trivial. I'm not sure if it makes sense to use CRDTs + event sourcing. There's overlap there, so you're paying a complexity cost. But maybe its worth it.

Mind you, maybe we're both running around with our respective hammers and looking for nails. :)

In any case, I've been keeping an eye on Xi for years - I love your work!

@amark
Copy link

amark commented May 12, 2019

Slightly tangential, for our editor we wrote a canonizer. It restructures the rich text documents into a deterministic structure.

Imagine: [i][b]hello[/b][/i] and [b][i]hello[/i][/b] produce visually same output but have different structures (HTML is particularly bad at this, other systems don't use trees, just apply formatting ranges - I'm not sure which Xi uses?).

The canonizer (whether tree or ranges) would always apply formatting in a certain order, like bold first, then italics. This means that rendering, local editing, and (to some degree) structure could be independent from sourcing & collaboration - because regardless if another peer sent dirty edits (diffs, chunks, operations, etc.), the canonizer would clean them up first.

In that sense, the canonizer actually forms its own type of CRDT! That can be composed on top or with the other editing CRDTs (probably RGA-like).

Again, this is useful because now the syntax highlighter etc. can be independent, and it will let us use different editors even if they produce dirty/invalid output.

Source (for normalizing HTML in P2P environments) https://github.com/amark/gun/blob/master/lib/normalize.js
Example use: https://github.com/amark/gun/blob/master/test/normalize/normalize.html#L20-L29

@goldenMetteyya
Copy link

Hello, I am jumping in here but I was looking for rust based editor to use for some project base I wanted to play around for an editor for specialized use case.

There is a project called pijul which is based on the paper https://arxiv.org/pdf/1311.3903.pdf. Its a sound version control design based on patches and it addresses the diff and ordering. Maybe something based on this could be built into the core of an editor as version control is always needed even if the end user doesn't need it can be used under the hood and if needed it would be native to to editor.

@cmyr
Copy link
Member Author

cmyr commented May 14, 2019

Pijul is a very interesting project!

That said, the intended architecture of xi is to let things like version control by handled by external plugins.

@goldenMetteyya
Copy link

Pijul is a very interesting project!

That said, the intended architecture of xi is to let things like version control by handled by external plugins.

hi @cmyr

Yes but at the end using a crdt as structure to track concurrent changes is in some sense version control cause your merging changes, so the internal could use something based on pijul and not exposed to outside unless needed, then git works too. Just an idea I was thinking that might make distributed editor core more robust in some regard.

@P-E-Meunier
Copy link

As one of the authors of Pijul, I confirm that it could form the basis of a text editor, but would probably need some work. Now that all our algorithms are implemented (finally!), our next step is to make the backend storage efficient, and that could lead to a refactoring of the backend to make it (more) usable for a text editor.

@josephg
Copy link

josephg commented May 21, 2019

I'm not sure if it would be more robust. Less total code though, probably.

My view on this is that there's two similar but slightly different tasks involving merging text edits:

  1. Users pair program, in realtime, on a single set of changes
  2. Users work independently on different patches and merge them together later (eg via PRs)

These two tasks look the same algorithmically, but in (2) we generally want our environment to mark merge conflicts, and humans manually review and merge them. In (1) we don't want conflict markers and the changes are merged automatically.

There's a third option where two users collaboratively work in realtime on the same codebase & same patch set, but nobody does that because it wouldn't work. The reason is that neither programmer would be able to test compile / run their changes - since while I'm typing you can't compile and run your code, and while you're typing, I can't compile and run my code.

So we have two use cases. We can solve (1) with CRDTs/OT and we currently solve (2) with git, which is like a crappy CRDT built on top of diff/match/patch. And which, unlike all CRDTs & OT algorithms that I know about, it generates conflict markers.

I can imagine those two use cases sharing an algorithm and data structure. While I'm typing, my environment tracks my edits. If anyone else wants to collaborate on my change set with me, they click my name and they connect to my computer and we share CRDT changes in realtime. Meanwhile, the entire set of our text operations is composed together to form a patch set. (Note this is slightly different than if we just diff the resulting files). Then the composed patch can go through a regular PR / code review workflow and be merged. This would have some nice properties, like preserving per-character authorship information, and you could make a timeline to replay how each patch was written if you wanted. It would also be missing some of the problems you can run into doing a 3 way merge in git, which does not produce correct output in lots of complex merging situations.

This is a beautiful idea. But practically speaking git + ecosystem works surprisingly well and it would be really hard to displace. Meanwhile pair programming isn't addressed at all by most modern tooling. So I think focussing on the collaborative editing angle and leaving git to do its thing is the pragmatic choice.

If I wanted to do something like this I'd probably look into embedding the composed CRDT changes for a patch as a git patch extended object. Then you could make an alternate merge function for git where if both patches have CRDT information, it would use that data to make the merge patch. That gives you your transition ramp to a new system like this. You might need to spend some time trimming patch size though - people make a lot of single character edits and they can add up if you're storing tree information & metadata for each one in your repo, forever.

@goldenMetteyya
Copy link

hi,

For me the robustness was because of the Categorical Theory of Patches being the theory behind it. The two use cases you have presented does show both spectrums. One is interactive synchronous merges while the other is non interactive asynchronous as I interpret it. Adding meta about the authorship and grouping patches to form a PR is the main idea that would be native. If you take a google docs as an example then you get a cursor for each user and the changes are reflected through their proxy. However if two programers change the same function or line of code and they have conflicting idea about the implementation then that's an out of band conflict and not the editors job. So the way I see it is that each programmer has the entire code base locally and they can stream changes to their version with the patches that are incoming. As you suggested you could subscribe to a users changes. When they want to compile they can generate a hash of their version and compile and run the code. This can be shared with the other as they will have all the patches so meta info can say till what patch set has been merged into this build. A lot of useful meta can be added based on the patches and

Git integration as you have suggested would be the way togo as a bridge but a native first approach with pijul would be the ideal case as I dont mind switching new projects to have a system like pijul track my codebases if the data won't get corrupted. Im sure there are others looking for something new and having a fast editor that supports it only helps. Also I think that at some point change is always good too but a smooth transition with GIT is for sure needed for adoption. But if the editor is fast and addresses a new use case it could be a jump start.

Having a delta crdt as the data structure to track the Patches and a meta field that is extensible would be nice. There could be a granularity for the patches where the user can choose how much to store over time or maybe batch certain parts of the history into more chunks rather than so much details. This would offer a tradeoff for the storage.

The conflict markers part might be possible to make it a switch that is set and added in as extra meta.

@nin-jin
Copy link

nin-jin commented Mar 22, 2021

Take for example, a document initially "A B C", with one user deciding to change "B" to "D", and the other user deciding that sentence needs rewriting, with "E F G" as the result. Clearly either "A D C" or "E F G" is a reasonable result, but a CRDT essentially demands that the result be either "DE F G" or "E F GD", the tie resolved through timestamps or some similar mechanism.

Hi, I'm working on CROWD. It's like CRDT, but different. You can try this case in our sandbox. And you can make sure that it works as you expect (E F G). However, please note, if you don't replace, but delete then insert, then result will be E F GD , because deletion and insertion is unrelated and we are afraid to delete user content without explicit intention.

What do you think about this approach? Could it be suitable for your case?

@josephg
Copy link

josephg commented Mar 22, 2021

CRDTs aren’t an algorithm in themselves but they make up a class of algorithm - with some useful properties. Is your system not part of this class of algorithm? Why?

@nin-jin
Copy link

nin-jin commented Mar 22, 2021

@josephg In CROWD, changes from same peer are always ordered and can't be reordered. Deltas from same peer aren't commutative. But Merge result is independent of merge order of deltas from different peers. CROWD slightly weakens the CRDT's guarantees, which gives more compact data representation without garbage collection and compactification. And CROWD is class of algorithms too.)

Comparison of Approaches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests