Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting Scintilla widget with Syntect #242

Closed
brupelo opened this issue Mar 13, 2019 · 65 comments
Closed

Highlighting Scintilla widget with Syntect #242

brupelo opened this issue Mar 13, 2019 · 65 comments

Comments

@brupelo
Copy link

brupelo commented Mar 13, 2019

Here's the thing, in the past I'd been playing around with lark-parser to highlight the syntax of a Scintilla widget with +/- succes. here you can see a very little example where you can see how lark & qscintilla are used together

Thing is, syntax highlighting with Scintilla is really painful (as well as creating new lexers)... So I was considering this idea of being able to use textmate syntax&themes on a Scintilla widget by using syntect.

The idea would be creating some sort of generic Scintilla Lexer that would allow me to do so... At this point I've created python bindings out of mostly of the public syntect API but I'm not sure if this idea of mine is a good idea or not.

If you don't know too much about how QScintilla syntax highlighting works, here's a very good explanation about it. So, before trying anything and wasting any time I'd like to listen your thoughts about it, whether this is a good idea or not... and if it is, what'd be the best way to execute it and how long could take me.

Thanks.

Ps. I'm asking you before trying anything by myself cos I'm considering all your experience on this area & the fact you'll probably know all the pros & cons about using syntect highlighting with another text editors... It's not like I'm asking explicitely which or what function using... at this point I've already become familiar with the whole public API exposed by syntect, so... ;)

@keith-hall
Copy link
Collaborator

it's impossible for us to estimate how long it would take you to do this work, but from a quick glance the lark parser api seems somewhat similar to syntect, so if you already have the Python bindings to syntect sorted, it perhaps shouldn't be too much work to adapt that lark example. If you also plan to adapt scintilla to use TextMate color schemes, that will involve setting up a mapping of styles I believe but doesn't look too complicated.

@brupelo
Copy link
Author

brupelo commented Mar 14, 2019

Ok, from your comments I assume you think is feasible then, good... the main reason why I've decided to ask here was basically for not spending let's say 2 or 3 hours making this experiment and once was done having "unsolvable" issues like performance or just not being able to mix both QScintilla+syntect because the way syntax highlighting works with QScintilla.

Anyway, maybe tonight when I'm back home I'll give it a fast shot. Btw, I've decided to upload my little experimental python bindings of syntect, I call it pysyntect... It's my first rust project and I'm aware I'm not using the best practices, so don't take it very seriously :) . Plus... one thing I'm not happy about it is how I've decided to convert the rust iterators into python... maybe I should have used https://github.com/PyO3/pyo3/blob/master/guide/src/class.md#iterator-types.

@trishume
Copy link
Owner

Looking at the Lark example, I see one significant potential hitch which is that I don't know whether the styleText method is called on the whole file. If it's only called on pieces of the file then I don't see how to easily store parsing state, since syntect needs to parse the whole file from top to bottom to highlight correctly (otherwise it might for example miss that the whole file is wrapped in a string literal). Then editors that care about performance need to implement a parse state caching scheme to be fast at editing large files.

@trishume
Copy link
Owner

Also I suggest that instead of writing scope selectors for each Scintilla token type, you make a Textmate/Sublime theme file that just uses for example red values corresponding to each token type, then just map the red value of the highlighting to the token type. This should be both easier and faster, because the theme file matcher has some optimizations for matching lots of selectors and doing precedence correctly.

@brupelo
Copy link
Author

brupelo commented Mar 17, 2019

I've managed to use syntect to provide syntax higlighting on a scintilla widget and the performance is good enough without havint to rely on any clever trick... On the other hand, I can see there are quite a lot of issues/bugs... let's see first what Sublime Text looks like:

showcase

The behaviour is rock solid, as expected from Sublime... now let's see how my little Scintilla experiment will look like:

showcase

Issues:

  1. Bad highlighting on initial state, is this maybe because I'm using a broken monokai theme? I've downloaded this from some random place on the internet as syntect doesn't support official sublime's ones
  2. Non deterministic behaviour, you can see that in my above demo
  3. State becomes screwed up at certain point and you can't recover from it
  4. I'm parsing the whole text over and over, shouldn't syntect provide coherent nicer results this way?

You can find attached here a little reproducer.

In case you're interested to give it a shot, you'll need windows+py3.6.x+pyqt5

Thanks in advance.

@trishume
Copy link
Owner

trishume commented Mar 18, 2019

It's getting screwed up because you're not understanding how parsing state works. The parser needs to keep state between lines. The easy module stores this in the HighlightLines struct, which you only create once and then just add lines to. You're effectively treating each additional updated highlight range as new lines of the same file, leading to the screwed up highlighting.

In order to do it correctly you have a few different options, which I've mentioned multiple times before:

  • Create a new HighlightState each time you go through the file to highlight it. This effectively re-highlights the entire file on every key press, which you're already doing except also appending it to the previous files. Syntect is fairly fast but you'll start taking longer than one frame to respond to key presses on files larger than maybe 200 lines.
  • Cache the highlighting states every N lines, maybe every line if you don't care much about memory usage. Then only re-highlight from the line before the modified range to the end of the screen before painting. This is a substantial amount of work but is the only way to get instant response on large files. Syntect exposes everything necessary to do this but doesn't provide an implementation, since each serious text editor will want to do it differently. My recommendation for you is that you use the code from Xi's implementation of this.

@brupelo
Copy link
Author

brupelo commented Mar 18, 2019

Awesome, thank you very much for the explanation! That was really helpful ;) .

It's getting screwed up because you're not understanding how parsing state works.

Actually, not just how parsing state but I also don't understand how many other elements in the library works internally (even if I expose them). But that's alright (at least for now), right now I'm just interested to get familiar and proficient with the public API without knowing too much about the implementation details. Said otherwise, I'm just interested to learn how to use syntect the right way and check if it will do the right job for this particular use-case, that's pretty much

That said, when I'm back home tonight I'll give it a shot to those 2 options, first I'll just check the slower and easier one... my motto, first make sure everything is correct and then and just then (if there are issues) optimize it. Probably if the quality I'm getting is good enough to me and it feels good and I'll end up using a faster way like xi's... in some of my real projects I'd want the editing to be as smooth as sublime's... Some of my projects use really intensive 3d real-time processing so I would like the text editor being as thin/lightweight/cheap as possible.

One last note, you said:

In order to do it correctly you have a few different options, which I've mentioned multiple times before:

Mmm, are you sure you've mentioned this? Cos I wasn't even aware about the HighlightState struct existence... in fact, some public structs are not exposed in the python module yet, just check the content and you'll see (ie: dir(syntect)).

Anyway, thanks for your explanation sir, as I said, that's was really helpful explanation and if it wasn't because of it I'm sure it would have taken me time to figure out by myself :)

One last question, do you guys know where to get an official 1:1 monokai.tmTheme similar to Sublime's? If you do, please let me know... otherwise I'll ask about it in the Sublime forums. Related. As I said there, I'd like to get the same syntax highlighting that then one I use in Sublime to code ;)

@keith-hall
Copy link
Collaborator

You could always, y'know, read the readme, where the parsing and caching logic is nicely explained ;)

@brupelo
Copy link
Author

brupelo commented Mar 18, 2019

@keith-hall I'd missed that part in the readme lol :) , thx to point it out, btw, about my question of having a nice "monokai.tmTheme", do you know any I could use over here? Before asking about it in the ST forums...

Just for the record, i've made a fast test as suggested by trishume and yeah, creating a new HighlightLines on each keystroke will give more correct results than my previous attempts, but as already pointed out, using this brute-force approach will lead to the parsing times to increase gradually when the ammount of text is increased, ie:

0.015600204467773438
0.031200170516967773
0.04679989814758301
0.031200170516967773
0.031199932098388672
0.04680013656616211
0.04680013656616211
0.031199932098388672
0.04679989814758301
0.04680013656616211
0.04679989814758301
0.04680013656616211
0.015599966049194336
0.04680013656616211
0.04680013656616211
0.031199932098388672
0.04860210418701172
0.031200170516967773
0.031199932098388672
0.031199932098388672
0.04680013656616211
0.04680013656616211
0.062400102615356445
0.062399864196777344
0.09360027313232422
0.10920047760009766
0.12480020523071289
0.14040017127990723
0.12480020523071289
0.14040017127990723
0.171600341796875
0.218400239944458
0.218400239944458
0.22040081024169922
0.23400020599365234
0.24960017204284668
0.28080058097839355
0.2812013626098633
0.3120005130767822
0.32760047912597656
0.32960057258605957
0.35880064964294434
0.37440061569213867
0.392000675201416
0.40560078620910645
0.4232006072998047
0.4368007183074951
0.45240068435668945
0.46900105476379395
0.49920082092285156
0.49920082092285156
0.5324010848999023
0.5460009574890137
0.5636007785797119
0.5928010940551758
0.6124012470245361
0.6240010261535645
0.6396012306213379
0.655602216720581
0.6708009243011475
0.6728012561798096
0.7040014266967773
0.7196013927459717
0.7332015037536621
0.7492022514343262
0.7662031650543213
0.7820014953613281
0.7956013679504395
0.8288013935089111
0.8288018703460693
0.858001708984375
0.8756017684936523
0.8912014961242676
0.90580153465271
0.9224019050598145
0.9516017436981201
0.9546017646789551
0.9692015647888184
0.9848017692565918
1.0004019737243652
1.0140018463134766
1.0472018718719482

Those are the times of using the sequence ctrl+a, ctrl+c, ctrl+v, ctrl+v, ...., ctrl+v. So I guess it's time to take it from here to optimize it and keep the parsing times at a constant low-latency.

@keith-hall
Copy link
Collaborator

There may not be an equivalent tmTheme that exactly matches Sublime's sublime-color-scheme version as SublimeHQ likely have some tweaks not replicated elsewhere. Your best bet would be to take the last build of ST which shipped with it as a tmTheme, extract it from the sublime-package file and try that...

@brupelo
Copy link
Author

brupelo commented Mar 18, 2019

I've downloaded ST3 build 3200_x64, extracted all content from the packages, the search for monokai in the whole ST folder and 2 files appeared {Monokai Bright.tmTheme, Monokai.sublime-color-scheme}. For the below shot I've used Monokai Bright.tmTheme which lives in Color Scheme - Legacy.sublime-package and here's the results:

showcase

Left(sublime) vs Right(qscintilla), you can see there are quite a lot of subtle differences that will make using it a really disturbing experience :(

@keith-hall
Copy link
Collaborator

it could be helpful to isolate whether the discrepancy is in your code or syntect itself. What happens if you use the synhtml example on the same file, ensuring you use the same version of the Python syntax definition and tmTheme file in both ST and synhtml? Also I meant try ST build ~3149 or so.

@brupelo
Copy link
Author

brupelo commented Mar 18, 2019

Good idea, I'll check it out tonight when back at home, now I need to leave... here's some links I've got pending to read/review/test though:

Also I meant try ST build ~3149 or so.... I'm a happy user of ST build 3176 so I'll use synhtml with the tmThemes provided by it with synhtml... we'll see, ty. Btw, any other suggestions, ideas, things to try... don't doubt to post it here and I'll test all all of them.

Thanks in advance.

@brupelo
Copy link
Author

brupelo commented Mar 18, 2019

I've read few times this:

Cache the highlighting states every N lines, maybe every line if you don't care much about memory usage. Then only re-highlight from the line before the modified range to the end of the screen before painting. This is a substantial amount of work but is the only way to get instant response on large files...

As well as the caching notes in the readme and I'm still confusing about how to implement a fast highlighting. Also... are the start and end styleText parameters may help with this? Guess the start it will... But, what about the end?

Anyway, hopefully you'll be able to clarify a bit with some pseudocode or notes. So when I'm home I'll implement it straightaway instead thinking how to do it hehe 😄

@trishume
Copy link
Owner

Whoops sorry I knew I explained this already in an issue recently (as well as the Readme and docs), but it turns out it wasn't one of yours: #238

Second Keith's suggestion of looking for an older Sublime build, their Monokai is fancier. Also I bet it wouldn't be that hard to manually translate their Monokai into the older theme format. It was probably even auto-converted so may not use any of the newer constructs.

@brupelo
Copy link
Author

brupelo commented Mar 18, 2019

Thanks to point me out to #238 ! it contains few links that could help indeed:

That said and referring to my previous question, do you foresee any advantage by having provided both the start/end positions given by Scintilla each time a change is made? Not sure If you'd read the [document] I'd posted a while ago but it says:

Did you notice the start and end parameters in the styleText() method? Everytime QScintilla calls the styleText() method, it passes a start- and stop point for the syntax highlighting. In fact, QScintilla says: “Hey, I think you should restyle the text between the character at position start up to the character at position end“. You are completely free to ignore this suggestion. It’s only there to help. How can you use those two numbers to your advantage?

In general you want to start highlighting from the suggested start position. So you want to initialize the internal counter to that point:

self.startStyling(start)

You also need to access the text between the suggested start and end positions:

text = self.parent().text()[start:end]

The parent of your lexer-object is the editor. Get the entire text in the editor, and slice a piece out.

Talking about ... I need to check how styleText is called when using multiple cursors :O , hopefully that won't be a too tricky situation to handle neither

@trishume
Copy link
Owner

trishume commented Mar 18, 2019

Yah that's where you get the information on what to highlight. You load the closest cached parse state before the start position and highlight until the end of the screen before re-rendering.

The end position is only useful if you're using some kind of local highlighting model where a change on one line can't influence arbitrarily many future lines.

@brupelo
Copy link
Author

brupelo commented Mar 18, 2019

Mmm, yeah, makes sense, I'll just consider the start position then. In any case, one thing that's bothering me is this typical ST use-case:

showcase

The usage of alt+f3 is one of the actions I use the most when coding stuff... So imagine the worst case scenario, the visible screen is at the end of the document, I press alt+f3 to change some ocurrences and boom, many selections are at the very beginning of the document... would that mean I'd need to syntax highlight the whole document? :/ ... Anyway, I'll read the docs first before speculating about worst case scenarios.

Btw, one thing always amazed me is how ST minimap becomes syntax highlighted immediately, such an awesome text editor, it's a flawless piece of software :)

@trishume
Copy link
Owner

Yah unfortunately if you want to be fully correct on your first re-render. You could do a thing where you first re-render just the current viewport from the parse state just before it and then kick off a background job to re-highlight the whole file from the first modification. Which if you added a string quote at the beginning may initially show the file not being in a string but later it would pop to the correct highlighting.

@brupelo
Copy link
Author

brupelo commented Mar 18, 2019

Tricky... anyway, I'll stick to one of my favourite rules here and don't start predicting situations, I'll start dealing with the simpler cases and have a basic algorithm in place.

I've always thought coding a text editor was one of the easiest projects in computer science... well, after these 3 weeks researching about it I must to say I couldn't be more wrong, it's an extremely complex topic per-se.

Offtopic, probably you already know about this hashed_syntax_highlighting but it can always give ideas. I'll eventually review Jon's blog... maybe he talks somewhere about ST's syntax highlighting algorithms... highly unlikely though :) , who knows :P

@brupelo
Copy link
Author

brupelo commented Mar 19, 2019

:O/

showcase

It's nice when solving an issue takes 5min :) , yeah... I'll be using both Sublime Text 3149 themes & sublime_syntaxes from now on. Glad I won't need to write a conversor for the new sublime-color-scheme format... On the other hand, it doesn't look like a very hard task though and the new color-scheme format is quite nice btw

Ps. 76 .sublime-syntax / 25 .tmTheme - ST_3149

@brupelo
Copy link
Author

brupelo commented Mar 20, 2019

@brupelo
Copy link
Author

brupelo commented Mar 20, 2019

@trishume Any recommendation/strategy to tackle this one https://github.com/brupelo/pyblime/issues/8 ?

@brupelo
Copy link
Author

brupelo commented Mar 23, 2019

@trishume Sorry for going offtopic here but... would you mind to add a link to pyblime to the list of projects using syntect? Your project is already highly popular so it'd help out to receive some traffic from syntect landing page... This is my first open-source project and I'm not sure what's the best way to get contributors faster or If I'm doing things very badly :)

Ty!

@brupelo
Copy link
Author

brupelo commented Mar 23, 2019

Comments from @keith-hall in this issue have convinced me that uploading the python syntect bindings could be a good idea, so I've released them here https://github.com/brupelo/pysyntect

Notice though that's just my first rust project so probably will be load of bad rust practices, afterall the main reason I've created that thing was mainly to learn rust and getting familiar with syntect :/

@keith-hall
Copy link
Collaborator

The reason why a lot of C++ code gets an orange color with syntect and Monokai fromST build 3149 is:

		<dict>
			<key>name</key>
			<string>Function argument</string>
			<key>scope</key>
			<string>variable.parameter - (source.c | source.c++ | source.objc | source.objc++)</string>
			<key>settings</key>
			<dict>
				<key>fontStyle</key>
				<string>italic</string>
				<key>foreground</key>
				<string>#FD971F</string>
			</dict>
		</dict>

this scope selector is unsupported by syntect at this time, see #36. Commenting that part of Monokai.tmTheme out "fixes" the problem

@brupelo
Copy link
Author

brupelo commented Mar 24, 2019

@keith-hall Nice catch! Thanks... my bad, I'd uploaded data coming from build 3149 cos I'd forgotten syntect didn't fully support SublimeText syntax scope grammar, I see few options here:

  1. We fix Support advanced scope selectors #36 by extending syntect so it will fully provide support to the whole EBNF scope grammar rules (I wouldn't know how to do it easily as I see syntect as a black box atm)
  2. Remove syntect uncompatible data... although I recall reason why I'd decided to use data coming from 3149 was mainly because Monokai wasn't giving me similar results to python highlighted like Sublime
  3. Replace syntect dep with a slower but more compatible library... you can check a little test I'd made few weeks ago, there you will see how all the tests provided from http://www.sublimetext.com/docs/3/selectors.html are passing

The ideal solution to me would be of course 1) as for me the most important is the editor to be as fast as possible to overcome the python usage overhead but I also see #36 has been opened in 2 Mar 2017 and nobody dealt with it so I don't have high hopes that will be fixed anytime soon... :(

Dunno, thoughts? Guess for the time being I'll just go with @keith-hall "fixing" strategy as it's the faster one.

One question guys, in your experience, are there many guys from syntect team being familiar with the code apart from Tristan of course? When I say "familiar" I mean... having the syntect code nicely living in their heads so these type of features are obvious to them... :)

@keith-hall
Copy link
Collaborator

one other solution than "commenting out the unsupported scope selector" in the tmTheme would be to just change it to a semantically identical supported one - variable.parameter - source.c - source.c++ - source.objc - source.objc++ (at least for now as a temporary solution).

I don't know Rust too well, but I do understand most of how syntect works. I have even made my own parser in C# to better grok how embed ... escape constructs work to help me think about how to implement it in this project.

@brupelo
Copy link
Author

brupelo commented Mar 24, 2019

Sounds good to me and better than the given options above, yeah, let's go with that strategy from now on to advance the developing... I consider this one important as if one potential contributor see c++ code all in orange I'm sure he won't even open an issue and he'll just discard the project immediately, I've realized catching new contributors for a github project isn't an easy task so far :)

I don't know Rust too well, but I do understand most of how syntect works. I have even made my own parser in C# to better grok how embed ... escape constructs work to help me think about how to implement it in this project.

Nice, so basically you and Tristan are the syntect hardcore devs so far... gotcha! :)

@brupelo
Copy link
Author

brupelo commented Apr 9, 2019

@keith-hall Sorry to ping but maybe you already know about this question, thing is, I'd like to cache as much information as possible about lines using syntect (memory is not an issue) but I'm not sure how to achieve that. Right now I was making some experiments with this:

import operator
import os
from itertools import groupby
from pathlib import Path

from pysyntect import *


CACHE = {}
NOOP = ScopeStackOp.noop()


def ScopeRegionIterator(ops, line):
    i, i2 = 0, 0

    while True:
        if i > len(ops):
            break

        i1 = len(line) if i == len(ops) else ops[i][0]
        substr = line[i2:i1]
        i2 = i1
        op = NOOP if i == 0 else ops[i - 1][1]
        i += 1
        yield (substr, op)


def text_scopes(text, syntax):
    global CACHE
    state = ParseState(syntax)
    stack = ScopeStack()
    out_lines = []
    i = 0
    for num_line, line in enumerate(text.splitlines(True)):
        if line in CACHE:
            ops = CACHE[line]
        else:
            ops = state.parse_line(line, ss)
            CACHE[line] = ops

        for (s, op) in ScopeRegionIterator(ops, line):
            stack.apply(op)
            if not s:
                continue
            for j in range(len(s)):
                out_lines.append(
                    "{:<20}{:<80}{}".format(repr(text[i]), repr(op), stack)
                )
                i += 1

    return out_lines


if __name__ == '__main__':
    text = (Path(__file__).parent / "x.py").read_text()
    ss = SyntaxSet.load_defaults_newlines()
    syntax = ss.find_syntax_by_extension("py")
    text_scopes(text, syntax)
    print('-' * 80)
    scopes = text_scopes(text, syntax)
    print("\n".join(scopes))

But it's unclear to me if this is correct or not ... For instance, assume I call text_scopes with a bunch of different random chunks of text, chunks sharing sometimes a lot of common lines:

text_scopes(text1)
text_scopes(text2)
...
text_scopes(textn)

Would my snippet produce the right output? Reason I'm asking this is mainly because I didn't understand this part of the docs https://docs.rs/syntect/3.2.0/syntect/parsing/struct.ParseState.html#caching

@brupelo
Copy link
Author

brupelo commented Apr 9, 2019

@trishume Do you see anything fishy in this piece of code?

#![allow(unused_imports)]

use pyo3::class::*;
use pyo3::exceptions;
use pyo3::prelude::*;
use pyo3::PyResult;

use syntect::parsing::ParseState as _ParseState;

use crate::scope::ScopeStackOp;
use crate::syntax_set::SyntaxReference;
use crate::syntax_set::SyntaxSet;

// -------- ParseState --------
#[pyclass]
pub struct ParseState {
    pub wrap: _ParseState,
}

#[pymethods]
impl ParseState {
    #[new]
    fn new(obj: &PyRawObject, syntax: &SyntaxReference) {
        obj.init(ParseState {
            wrap: _ParseState::new(&syntax.wrap),
        });
    }

    fn parse_line(&mut self, line: &str, syntax_set: &SyntaxSet) -> Vec<(usize, ScopeStackOp)> {
        let mut res = Vec::new();
        for v in self.wrap.parse_line(line, &syntax_set.wrap) {
            res.push((v.0, ScopeStackOp { wrap: v.1 }));
        }
        return res;
    }
}

#[pyproto]
impl PyObjectProtocol for ParseState {
    fn __richcmp__(&self, other: &ParseState, op: CompareOp) -> PyResult<bool> {
        println!("{:?}", op);

        match op {
            CompareOp::Lt => Err(exceptions::ValueError::py_err("< not implemented")),
            CompareOp::Le => Err(exceptions::ValueError::py_err("<= not implemented")),
            CompareOp::Eq => Ok(self.wrap == other.wrap),
            CompareOp::Ne => Ok(self.wrap != other.wrap),
            CompareOp::Gt => Err(exceptions::ValueError::py_err("> not implemented")),
            CompareOp::Ge => Err(exceptions::ValueError::py_err(">= not implemented")),
        }
    }

    fn __repr__(&self) -> PyResult<String> {
        Ok(format!("{:?}", self.wrap))
    }

    fn __str__(&self) -> PyResult<String> {
        Ok(format!("{:?}", self.wrap))
    }
}

pub fn initialize(m: &PyModule) {
    m.add_class::<ParseState>();
}

Asking cos using that will tell me:

from pysyntect import *


if __name__ == '__main__':
    ss = SyntaxSet.load_defaults_newlines()
    syntax = ss.find_syntax_by_extension("py")

    s1 = ParseState(syntax)
    s2 = ParseState(syntax)
    print(s1 == s2)

    s1 = ParseState(syntax)
    s1.parse_line("# I'm comment1", ss)
    s2 = ParseState(syntax)
    s2.parse_line("# I'm a different comment", ss)
    print(s1 == s2)

    s1 = ParseState(syntax)
    s1.parse_line("a = 10", ss)
    s2 = ParseState(syntax)
    s2.parse_line("b = 20", ss)
    print(s1 == s2)

    s1 = ParseState(syntax)
    s1.parse_line("a = 10", ss)
    s2 = ParseState(syntax)
    s2.parse_line("if a==b: c=30", ss)
    print(s1 == s2)

all those comparisons are true... and that's just wrong, isn't? :(

@brupelo
Copy link
Author

brupelo commented Apr 9, 2019

Ok, let me ask differently my previous question... consider this little rust test:

#[test]
fn test_equality_parse_states() {
    let ss = SyntaxSet::load_defaults_newlines();
    let syntax = ss.find_syntax_by_extension("py").unwrap();

    let mut s1 = ParseState::new(syntax);
    let mut s2 = ParseState::new(syntax);
    assert!(s1==s2, "test1");

    let mut s1 = ParseState::new(syntax);
    s1.parse_line("# I'm comment1", &ss);
    let mut s2 = ParseState::new(syntax);
    s2.parse_line("# I'm a different comment", &ss);
    assert!(s1 == s2, "test2");

    let mut s1 = ParseState::new(syntax);
    s1.parse_line("a = 10", &ss);
    let mut s2 = ParseState::new(syntax);
    s2.parse_line("b = 20", &ss);
    assert!(s1 == s2, "test3");

    let mut s1 = ParseState::new(syntax);
    s1.parse_line("a = 10", &ss);
    let mut s2 = ParseState::new(syntax);
    s2.parse_line("if a==b: c=30", &ss);
    assert!(s1 == s2, "test4");
}

All those assertions are true, why is that?

@trishume
Copy link
Owner

Yup, so in theory and for correctness purposes the state depends on all previous lines, but often the state will end up being the same thing. One very simplified example of a possible state is "am I in a string literal", most of the time you'll be in the same state at the end of every line and changes won't change that, but theoretically a change could add a quote at any previous line and change the state way further down.

So the best way to do caching is to restart from the last cache state before your first modified line, and continue re-parsing until you get to a line that has the same parse state as the one you'd previously cached. Then you don't need to continue re-parsing the rest of the file since you know it's the same from there on.

@brupelo
Copy link
Author

brupelo commented Apr 10, 2019

Mmmm, I'd lie if I told you I'm not a bit confused right now hehe :) ... let's forget for a moment about how I'll cache the syntect objects in the editor to syntax_highlighting or storing all the scope names.

Right now I'd just like to understand what's the real meaning of the ParseState == operator. Let me ask it in a very simple way then.

Let it be s1 and s2 where s1==s2 is True. By considering this, can we guarantee that both s1 and s2 will produce the same result on any given line? ie: s1.parse_line(line, ss)==s2.parse_line(line, ss)

Thanks in advance for your help&patience :)

@trishume
Copy link
Owner

Yup that's correct. If parse states are equal using them will do the same thing, if two equal parse states do different things that's a bug, I don't think it should be possible.

@brupelo
Copy link
Author

brupelo commented Apr 10, 2019

Ok, that's good to know and a good definition to stick with, let me ask you then, on which cases you'd end up with different pare states then? For instance, this:

    let mut s1 = ParseState::new(syntax);
    s1.parse_line("# I'm comment1", &ss);
    s1.parse_line("a = 10", &ss);
    s1.parse_line("b = 10+a", &ss);
    s1.parse_line("print(f'{b = 10+a}')", &ss);
    let mut s2 = ParseState::new(syntax);
    s2.parse_line("# I'm a different comment", &ss);
    assert!(s1 == s2, "test2");

It's giving me True, can you think of a simple example where s1!=s2?

@keith-hall
Copy link
Collaborator

if you know how syntax definitions work, it could help you to know that "parse state" is essentially an alias of "context stack"

@brupelo
Copy link
Author

brupelo commented Apr 10, 2019

@keith-hall I see... well, I've understood this explanation, even though... would you be able to propose a simple example where the parse state s1 & s2 end up being different? I'd like to create some unit tests to check == operator python side

@keith-hall
Copy link
Collaborator

presumably just an unclosed doc-comment/string vs virtually anything else would suffice?

@brupelo
Copy link
Author

brupelo commented Apr 10, 2019

Let me check... Btw, when you say:

it could help you to know that "parse state" is essentially an alias of "context stack"

It means the syntect struct StateLevel is a synonymous of Context then? ie:

pub struct ParseState {
    stack: Vec<StateLevel>,
    first_line: bool,
    proto_starts: Vec<usize>,
}

struct StateLevel {
    context: ContextId,
    prototypes: Vec<ContextId>,
    captures: Option<(Region, String)>,
}

@brupelo
Copy link
Author

brupelo commented Apr 10, 2019

Ok, at this point everything has become really clear to me, time to implement caching, hopefully at the end of the day I'll be able to work efficiently with large files on the editor :)

@brupelo
Copy link
Author

brupelo commented Apr 11, 2019

Hey guys, unfortunately I haven't been able to implement yet the cache thing on the edtior and I've become a little bit stuck hehe... yeah, I know I know... I'm too slow lol :D

So I was considering:

a) Upload pysyntect to pypi... and although i don't see any benefit by uploading that repo to github/pypi (to be honest I'm more interested all the potential users/devs/traffic are driven just to this repo/upstream)
b) That would allow me to create a little mcve in Stackoverflow explaining all the smart SO users how basic syntax highlighting is achieved using pysyntect+QScintilla and then just asking how they'd implement proper caching. If nobody gave a proper answer to that question in few days and the bounty period was opened I wouldn't mind to give 500bounties for a proper answer as I'm extremely interested on this getting solved asap as right now I'm already handling files around ~20kb and my naive syntax highlighter isn't good anymore and doesn't feel good these little delays.

or... if you know exactly the whole algorithm and want to help just post it here and I'll try to implement it myself :(

Thanks in advance! :)

Ps. Lol... at this point the bounties i've offered over the years in Stackoverflow are almost equal than my global rep xD

@trishume
Copy link
Owner

I already pointed you to some resources describing the algorithm as well as an implementation in Xi. Search for "Thanks to point me out to #238 ! it contains few links that could help indeed:" on this page. There's also a whole bunch of different ways you could do it, some of which are simpler but less memory or time efficient than the one Raph describes, for example caching the parse state on every line.

@brupelo
Copy link
Author

brupelo commented Apr 11, 2019

@trishume @keith-hall So far I've come up with this:

easy

use pyo3::prelude::*;
use pyo3::PyResult;

use syntect::easy::HighlightLines as _HighlightLines;

use crate::highlighter::Highlighter;
use crate::highlighter::HighlightState;
use crate::parser::ParseState;
use crate::style::Style;
use crate::syntax_set::SyntaxReference;
use crate::syntax_set::SyntaxSet;
use crate::theme::Theme;


// -------- HighlightLines --------
#[pyclass]
pub struct HighlightLines {
    pub wrap: _HighlightLines<'static>,
}

#[pymethods]
impl HighlightLines {
    #[new]
    fn new(obj: &PyRawObject, syntax: &SyntaxReference, theme: &'static Theme) {
        obj.init(HighlightLines {
            wrap: _HighlightLines::new(&syntax.wrap, &theme.wrap),
        });
    }

    #[getter]
    fn parse_state(&self) -> PyResult<ParseState> {
        Ok(ParseState{wrap: self.wrap.parse_state.clone()})
    }

    fn set_parse_state(&mut self, value: &ParseState) {
        self.wrap.parse_state = value.wrap.clone();
    }

    #[getter]
    fn highlight_state(&self) -> PyResult<HighlightState> {
        Ok(HighlightState{wrap: self.wrap.highlight_state.clone()})
    }

    fn set_highlight_state(&mut self, value: &HighlightState) {
        self.wrap.highlight_state = value.wrap.clone();
    }

    #[getter]
    fn highlighter(&self) -> PyResult<Highlighter> {
        Ok(Highlighter{wrap: self.wrap.highlighter.clone()})
    }

    fn set_highlighter(&mut self, value: &Highlighter) {
        self.wrap.highlighter = value.wrap.clone();
    }

    fn highlight(&mut self, line: &str, syntax_set: &SyntaxSet) -> Vec<(Style, String)> {
        let mut res = Vec::new();
        let x = self.wrap.highlight(line, &syntax_set.wrap);
        for &tpl in &x {
            res.push((Style { wrap: tpl.0 }, tpl.1.to_string()));
        }
        return res;
    }
}

pub fn initialize(m: &PyModule) {
    m.add_class::<HighlightLines>();
}

highlighter

def highlight_cache(self, start, end):
    view = self.parent()
    num_line = view.lineIndexFromPosition(start)[0]
    hl = HighlightLines(self.syntax_reference, self.theme)
    for line in view.text().splitlines(True):
        need_rehighlight = True
        hit = self.cache[num_line]
        if hit:
            old_ps, old_hs, old_spans = hit
            if hl.parse_state == old_ps and hl.highlight_state == old_hs:
                spans = old_spans
                need_rehighlight = False

        if need_rehighlight:
            ps = hl.parse_state
            hs = hl.highlight_state
            spans = hl.highlight(line, self.syntax_set)
            self.cache[num_line] = (ps, hs, spans)

        self._render_spans(spans)
        num_line += 1

Where num_line = view.lineIndexFromPosition(start)[0] is the QScintilla suggestion where rehighlighting should start and view.text() is the whole editor text buffer. self.cache is initialized to self.cache = [None] * 10000

This algorithm is totally broken and it's not working at all, if you've got a moment, could you please review and advice?

Ty in advance!

Ps. In case you're curious and wanted to give it a shot just let me know so I'll upload a proper mcve

@trishume
Copy link
Owner

I didn't read the whole thing or think much about the logic and I've already noted:

  • You're iterating through all the lines in the file every time, but num_line starts out as the first line in the dirty range, which doesn't match up at all.
  • I have no idea what the logic is trying to do, it doesn't look close to anything correct, so I don't know how to fix it other than using entirely different logic.
  • You're using Python's == on parse states, which I hope you fixed to actually use the Rust PartialEq implementation but maybe you didn't.

Please try to only ask for help understanding syntect and not for me to debug your Python loop logic.

Also I notice you must have forked syntect to make some private fields public in HighlightLines. I suggest you don't do that and instead just use the public APIs that HighlightLines uses internally. The whole mutating the parse state inside HighlightLines thing leads to a weird API that's easier to screw up.

I notice you also ran into highlighting states, you're right that you should cache those too along with parse states. There's a more advanced version of caching where you only cache the scope stack and not the full state to save memory, but you don't need that.

@brupelo
Copy link
Author

brupelo commented Apr 11, 2019

You're iterating through all the lines in the file every time, but num_line starts out as the first line in the dirty range, which doesn't match up at all.

Yeah, I'd noticed that silly bug once I'd posted my previous thread, sorry about that :/

I have no idea what the logic is trying to do, it doesn't look close to anything correct, so I don't know how to fix it other than using entirely different logic.

Yeah, the logic is totally incorrect mainly because the algorithm isn't clear to me neither in the first place (so that version was basically just guessing), that's why I asked for help in the first place

You're using Python's == on parse states, which I hope you fixed to actually use the Rust PartialEq implementation but maybe you didn't.

I think this hint was alright, I'll review the whole binding in any case but I think mostly of the structs deriving from Clone, PartialEq, Eq are overriding the right operators... but again, I'll need to review the whole thing just in case.

Please try to only ask for help understanding syntect and not for me to debug your Python loop logic.

I understand... but don't feel obligated to answer :) ... although your help could serve to have this stuff implemented in no time.

Just answer in case you may be interested... Although I suggest you to think about these newbie questions from a different perspective. If this was my library and some users asked about something
totally obvious to me as the library creator I'd ask myself in the first place if the questions are because the users were too lazy putting effort into understand all the implementation details or whether the public API is incomplete. Remember, users/questions... everything helps to cover a wider spectrum of interesting use-cases. Of course, a different matter would be the library is already closed to modification and it's just intended to cover existing deps like xi et al, in that case, sure, focusing into only 1 "client" is totally fair.

Also I notice you must have forked syntect to make some private fields public in HighlightLines. I suggest you don't do that and instead just use the public APIs that HighlightLines uses internally. The whole mutating the parse state inside HighlightLines thing leads to a weird API that's easier to screw up.

I'll be honest with you, using my own fork as a dependency for pysyntect is something i really dislike since the first day, ideally i'd prefer using syntect/upstream directly... but the only time (few weeks ago) I'd created a PR with a trivial modification (changed some private field to public so i could have some unit-tests in place) and then your rejected straightaway I noticed at that time changing the public API in the upstream or even adding new derives into some structs would be something you probably wouldn't be very happy about. Which I totally understand as it's your library and you should have the final word when it comes to decide which type of public interface you want to expose.

That said, if I could use syntect/upstream directly to have a fast pysyntect+QScintilla editor that'd be of course ideal.

I notice you also ran into highlighting states, you're right that you should cache those too along with parse states.

I've got lucky here :)

There's a more advanced version of caching where you only cache the scope stack and not the full state to save memory, but you don't need that.

First I'd like to sort up the most simpler caching version of the highlighter... once that's correct I'll test it and check on which cases that version won't be good enough.

@trishume
Copy link
Owner

changing the public API in the upstream or even adding new derives

You're right that I'm not keen on changing the public API if that's a breaking change, I try to do those infrequently to not break existing clients. Adding new derives is different, I'm totally happy to accept PRs adding new derives.

I also indeed wouldn't accept PRs to make private fields public, since unless I screwed up somewhere (plausible) you should never need access to a private field to do something useful with syntect. The PR you submitted falls under not useful, since exposing those things wouldn't actually allow you to do anything new, just expose an API that is never the correct thing to use and doesn't allow you to accomplish anything extra. The fields I mention fall under being able to do it another way that is cleaner and less error-prone with other public APIs.

Re your thoughts on improving syntect. Basically I consider myself mostly done with making large improvements to syntect, I still maintain it and respond to issues, review pull requests and fix bugs because I want to be helpful and keep syntect useful to people.

Based on your experience and some other questions I've gotten, I agree it would be useful for syntect to provide a basic implementation of caching. I initially didn't do one because I knew any serious text editor (like Xi) would need to do their own implementation specialized to their approach and text data structures anyhow. I didn't really consider the non-serious small document text editor market. I'm also not really interested in spending my limited free time writing an implementation now.

So in general my attitude to issues is I'll try and help you use whatever's there now, I'll fix things that are broken, I'll review and often accept PRs, but I won't dedicate my free time to adding new functionality.

@brupelo
Copy link
Author

brupelo commented Apr 13, 2019

You're right that I'm not keen on changing the public API if that's a breaking change, I try to do those infrequently to not break existing clients.

Sure, I understand, in fact, that's a good safe strategy when a product has reached a certain level of maturity and you want to play safe towards existing clients. If the project was still in the early stages (let's assume a small project like this, <1year development) I'm a big fan of breaking stuff all the time and refactor mercilessly till the product covers all spectrum of use-cases. The most important to me at the early stages is the API becoming stable & user-friendly, also I'll follow strictly OCP (trying to expose a minimal/complete/extensible API).

In fact, even more, I'm not a huge fan of depecreating/compromising the quality of the API because existing deps... Of course, we're assuming here you've got the luxury of having total freedom and you're not attached to customers or $$$ :) . Deadlines, money and customers and all those external forces are the worst enemies against software quality, even if you know how to create proper software :D

Adding new derives is different, I'm totally happy to accept PRs adding new derives.

Noted!

I also indeed wouldn't accept PRs to make private fields public, since unless I screwed up somewhere (plausible) you should never need access to a private field to do something useful with syntect. The PR you submitted falls under not useful, since exposing those things wouldn't actually allow you to do anything new, just expose an API that is never the correct thing to use and doesn't allow you to accomplish anything extra. The fields I mention fall under being able to do it another way that is cleaner and less error-prone with other public APIs.

You're right here, thinking backwards maybe it wasn't a good approach cloning certain unit-tests that used private stuff in pysyntect... at the end of the day, pysyntect is just another client and therefore should only be testing the exposed public syntect stuff. And in case there was a real need (that PR wasn't a real need) I'd should just opened a ticket first exposing the case/need so we could've discussed first if a new PR was required at all.

I still maintain it and respond to issues, review pull requests and fix bugs because I want to be helpful and keep syntect useful to people.

And that's good attitude, believe me, don't consider helping people as a waste of time but a chance to new "oportunities". I like to consider questions, tickets, discussions, conversations, PRs as potential chances that give me new ideas for new/existing projects, creativity, learning...

I initially didn't do one because I knew any serious text editor (like Xi) would need to do their own implementation specialized to their approach and text data structures anyhow. I didn't really consider the non-serious small document text editor market. I'm also not really interested in spending my limited free time writing an implementation now.

And while this would be of course a great use-case for clients your take on this is totally understandable, so nothing to add here.

So in general my attitude to issues is I'll try and help you use whatever's there now, I'll fix things that are broken, I'll review and often accept PRs, but I won't dedicate my free time to adding new functionality.

Again, I totally understand, finding motivation to spend your free time on open source where you'd pretty much solved all the hard problems is difficult... no matter how popular your software has become, at that time finding motivation that keeps you going is difficult. We coders are quite predictable creatures, we like puzzle solving after all and once the puzzle is solving, well, you just need to find new ones... :)

Anyway, thanks for letting me know, this really helps to have more context about the status/future of the project, what to ask or not, PRs, ...

\o

@brupelo
Copy link
Author

brupelo commented Apr 18, 2019

Btw, i've created an extremely little repo here I'll be using to play around with different caching algos... Today I've been talking with @raphlinus and he's been extremely nice offering to help/mentor about this interesting (but hard) subject :)

According to what we've talked we'll be using a google document where he'll be writing/explaining in detail about this subject and I'll be testing all he writes so I'll also be able to add data/conclusions to the doc. He's already got quite experience about this subject but the idea is at certain point once I also master this stuff I'll be able to help him to improve the current algorithm, as if I've understood correctly the method used by Xi isn't perfect neither :)

@raphlinus
Copy link
Contributor

@brupelo You've mischaracterized what I've said. Xi's implementation of syntect is about as good as any - it is incremental (avoiding duplicate work when states converge), is asynchronous (updating the screen quickly even before highlighting is calculated on the whole document), and uses very sophisticated caching techniques to minimize the memory use on large documents. I don't know any way to improve it while staying within the syntect framework (with textmate compatible definitions). Of course, it would be possible to get better speed and accuracy both by using different parsing techniques, but that is well out of scope for syntect.

I'm actually unsure whether there are gaps in the resources linked above or whether there's simply a communication problem. I am considering writing a reasonably comprehensive blog post that would be a good introduction for people implementing syntect in their editors, but have to balance that against the other work on my plate. If people want to see that blog post, feel free to indicate interest either by responding or using the rocket emoji on this comment.

@brupelo
Copy link
Author

brupelo commented Apr 19, 2019

@brupelo You've mischaracterized what I've said. Xi's implementation of syntect is about as good as any...

Raph, my bad, probably I didn't interpret correctly some bits of our conversation on irc back then, After our whole conversation I felt like there was room for optimization in xi but it's good to know it's already using in the best possible way syntect.

I'm actually unsure whether there are gaps in the resources linked above or whether there's simply a communication problem.

Well, after reading the resources link I still don't know how to apply properly syntect on the general case, specially I'm thinking of 2 possible scenarios here:

  • To syntax highlight a QScintilla widget, which is the case I'd like to dispatch as soon as possible and the solution I'm interested to use in the next coming months for my projects
  • To syntax highlight a QPlainTextEdit widget using a generic QSyntaxHighlighter, eventually I'd like to implement multiselection there so I'd be able to get rid of Scintilla dependency and having a standalone text editor widget working in both pyqt & pyside2 (QScintilla only works on pyqt)

In any case, maybe it's just my fault being clumsy trying to understand those resources, maybe i should read them more carefully again... or maybe the fact they're describing the algorithms with a level of abstraction so high make it difficult to apply for my particular concrete cases.

Also, few days ago Tristan had also said:

My recommendation for you is that you use the code from Xi's implementation of this.

But if I didn't follow his advice at that time was mainly because a) my rust knowledge is at a beginner level b) it felt digging down deep on that very particular specific implementation like Xi's maybe wasn't the best way to go with QScintilla's

but have to balance that against the other work on my plate. If people want to see that blog post, feel free to indicate interest either by responding or using the rocket emoji on this comment.

Anyway, as already talked to you on irc I'll be monitoring the google doc to see what you come up with... although I'll also try to implement this stuff asap, right now it's a blocking issue for my projects as the widget is unusable when using a brute-force approach.

Btw, if you find 5min... could you please give me some feedback about the playground toy? I've tried to create a mcve as simple&small as possible so that could help us with the future document... even if you're not familiar with qt bindings on python it should be quite easy to understand what's doing.

Thanks in advance for your help ;)

@brupelo
Copy link
Author

brupelo commented Apr 19, 2019

@raphlinus Today I've upgraded a bit the playground repo, installing and running the thing should take no more of 39s in both widows/linux (assuming you've got installed some python3.6 version). Anyway now you can see there are 2 versions of naive highlighters, take a look here. The main difference of highlighting1 with respect to highlighting0 is basically I'm not relying anymore on syntect's HighlightLines black box and now I can create my own ParseState & HighlightState objects, where both are wrapping properly Clone/PartialEq/Eq.

So... my question is, from highlighting1... what'd be the next logical step in order to optimize the algorithm? I guess with highlighting0 optimization was difficult but with highlighting1 exposing some internal details should be more feasible, isn't?

Thanks :)

@brupelo
Copy link
Author

brupelo commented Apr 20, 2019

Today I've decided to change of direction towards my goal of ending up with a nice python editor to use on my existing 3d tools.

I've decided to use pygments on a qscintilla widget instead syntect. This pygments API is a really nice clean and intuitive piece of code and it's battle tested technology. It supports couple of hundred of languages as well as few dozen themes, it has a large community, well maintained, good docs and it's pure python so I won't have to waste more time recompiling syntect bindings (right now each compilation was taking me ~15s).

I guess the pygments performance won't be as good as syntect but even though my intuition tells me it should be doable to have a responsive editor using it. Today i've wasted 20min creating a little prototype and I've just opened a question here

In any case, it's sad as my the syntect python bindings were already becoming quite usable but reason is I've just wasted too much time thinking how to make caching with no success/results... I definitely can't waste more time being stuck with this stuff :)

That said, thanks for all previous advices... it'd been quite cool to see syntect working properly on Scintilla though ;) . If somebody ever want to give it a shot I'll be leaving the git repo alive with the latest wheels of syntect on the release section

Have a nice day.
Peace!

@trishume
Copy link
Owner

Good luck, although I suspect you'll run into exactly the same problem you have with syntect, although I'm not sure pygments gives you the tools to solve it.

Both syntect and pygments highlight at approximately 10k lines/second. Syntect having higher quality I suspect. You run into exactly the same problem of needing caching on larger files to get responsive editing. Pygments might give you the tools to implement caching, but I wouldn't be surprised if it doesn't since as far as I know it's not designed to be used for text editors. If it does you're probably better off using it though because you won't need to deal with additional Rust binding issues.

Really I think you should just use whatever highlighting comes with QScintilla, surely there is some, and it likely has caching support built in. I think I agree that you shouldn't try to implement caching, it seems like in order to succeed you'd need to be coached through each piece of code you'd need to write step by step, and I don't think you'll find anyone willing to donate the time to do that.

@brupelo
Copy link
Author

brupelo commented Apr 20, 2019

Syntect having higher quality I suspect

Mmm, this statement of yours is bothering me... why you say so?

but I wouldn't be surprised if it doesn't since as far as I know it's not designed to be used for text editors

Yeah, before consider its usage today I was aware about pygments for years, i'd even used it with some html site few years ago, similarly than you my assumption was than pygments was intended to be used only as a static syntax highlighter and not intended to use on any text editor... but today I've seen Eric was using it (didn't test though), check here, although I don't see it's using anything fancy like caching or so :/ . In any case, first i need to figure out how to make brute-force parsing working and once I'm there I'll think about caching.

you won't need to deal with additional Rust binding issues.

Yeah, at certain point wrapping basic rust code using pyo3 wasn't that bad actually if you're just using opaque pointers (i'd actually written a little tool to generate automatic bindings out of rust code although it was pretty basic) but in some cases (also because i'm a rust begineer) some parts were quite tricky to deal with (ie: iterators or generics) :/

Really I think you should just use whatever highlighting comes with QScintilla, surely there is some, and it likely has caching support built in.

Well, let me tell you some of my old 3d tools are already using builtin QScintilla lexers + custom QScintilla lexers but reason why 2 months ago i've started consider alternatives to QScintilla lexers such as syntect et all was because I thought I could do better than these builtins and ending up with a more uniform layer that would allow me to use a lot of data out of the box (ie: hundred languages + dozen themes)

it seems like in order to succeed you'd need to be coached through each piece of code you'd need to write step by step, and I don't think you'll find anyone willing to donate the time to do that.

It's frustrating btw... when it comes down to 3d & maths problems (which is my area of expertise) I rarely get stuck but this problem space of parsers/editors/highlighters is still feel like quite a hard territory to me :) . Anyway, I understand what you're saying about finding people willing to donate his time with this little hard niche... I already knew that and of course, that's also fair of course. My hope was that people would get interested/engaged on the subject as well

Ps. Pygments 10k lines/s? Wow, i expected syntect would be much faster than pygments... good to know :)

@trishume
Copy link
Owner

That example you link to indeed doesn't do any caching, although it only highlights until the specified "end" point, which will make it faster near the beginning of a large file but might lead to incorrect results or might not depending on how Scintilla decides to call that method (not sure, not going to check myself).

In my testing Pygments highlights 9k lines of jquery.js in 1s on my computer where syntect takes 0.65s. So syntect is faster, but same order of magnitude. I'm not sure this is a fully fair comparison though since generally the Sublime syntaxes are quite fancy and have special highlighting for example all sorts of ES6 features, and support nested languages and things, I suspect pygments is lower quality but it doesn't really matter that much. The thing is I'm not sure the Pygments API makes it possible to implement caching, which is how you get instant response in a text editor with syntect.

@brupelo
Copy link
Author

brupelo commented Apr 20, 2019

Thanks to share those results, that helps.

The thing is I'm not sure the Pygments API makes it possible to implement caching, which is how you get instant response in a text editor with syntect.

Right now i've solved the issue i was having with the highlighting getting screwed up on the SO question so the only remaining thing is to start reading pygments code to analize possible strategies to make it real-time... probably you're right and pygments won't expose any caching mechanisms to be used on text editors but the fact python is 2nd nature to me will make this task much easier than trying to figure out how to use pysyntect and asking/bothering people all the time, as in this case i'll be able to tweak pygments internally or even replacing routines if necessary, afterall this is just optimizing vanilla python code and that's not hard at all to me ;)

@brupelo
Copy link
Author

brupelo commented Apr 20, 2019

showcase

It seems the "window" size when using pygments on my laptop will be around <~=16kb... larger than that it becomes unusable and you need to use some sophisticated method like the ones exposed all along in this thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants