Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Scanner (part 2) #54

Merged
merged 85 commits into from Jan 1, 2022
Merged

Faster Scanner (part 2) #54

merged 85 commits into from Jan 1, 2022

Conversation

414owen
Copy link
Contributor

@414owen 414owen commented Dec 31, 2021

See part 1 for more info.


old new speedup
effects 1.07s 0.034s 31.5x
postgrest stackoverflow 0.083s ∞x
ivory 10.16s 0.225s 45.16x
polysemy 3.71s 0.089s 41.7x
semantic 50.62s 1.089s 46.5x
haskell-language-server 24.28s 0.504s 48.2x

Overall speedup: around 45x.

@414owen
Copy link
Contributor Author

414owen commented Dec 31, 2021

$ ./script/parse-example effects |& head -n 100
Successfully parsed 100.00% of 'effects' in 0.09s (38 of 38 files)

Original time: 1.09s

1.09 / 0.09 = 12.11

@maxbrunsfeld
Copy link
Contributor

Nice work on this.

Having removed the use of closures, would it be straightforward to get this compiling without C++14 features? That would make this parser a bit easier to consume in downstream applications, and develop on a Mac.

@tek
Copy link
Contributor

tek commented Jan 1, 2022

awesome!!

@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

Having removed the use of closures, would it be straightforward to get this compiling without C++14 features?

Yes I plan to remove the need for c++14, and maybe even convert the scanner to pure C :)

@maxbrunsfeld
Copy link
Contributor

Oh, that’s interesting. I’d think that C++ would be useful just for std::vector. are you thinking that a dynamically sized array isn’t needed in the scanner?

@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

are you thinking that a dynamically sized array isn’t needed in the scanner?

I think we will need one, but I'm happy to implement my own if it means we can compile this as C.

While we have you here, would we be able to get your opinion on #41 (comment)? We think the scanner's probably running over the whole file (or a lot of it) every edit, but we're not sure if it should.

@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

@wenkokke I've removed all the closures. It now compiles with -std=c++11 but not -std=c++03.

I'll see what I can do.

@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

$ ./script/parse-example effects |& head -n 100
Successfully parsed 100.00% of 'effects' in 0.0297s (38 of 38 files)

Original time: 1.09s

1.09 / 0.0297 = 36.7

There might be one or two micro-optimizations left (PEEKing before taking some code paths?)
As well as maybe decreasing the necessary re-computations (see #41 (comment))

But this will in any case let us edit 36x larger files than before 🎉

@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

@wenkokke I've added c++03 support here: 414owen#1

@wenkokke
Copy link
Contributor

wenkokke commented Jan 1, 2022

With your current faster-scanner branch, you've got the patch to tree-sitter down to:

--- a/lib/binding_web/exports.json
+++ b/lib/binding_web/exports.json
@@ -3,6 +3,9 @@
   "_free",
   "_malloc",

+  "__ZNSt3__25ctypeIcE2idE",
+  "__ZNSt3__212basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE25__init_copy_ctor_externalEPKcm",
+
   "__ZNKSt3__212basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE4copyEPcmm",
   "__ZNKSt3__220__vector_base_commonILb1EE20__throw_length_errorEv",
   "__ZNSt3__212basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm",

See https://github.com/wenkokke/tree-sitter-haskell/tree/tree-sitter-haskell-wasm-faster-scanner.

@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

@tek This is ready for review, when you have a chance.

All tests pass. Any further changes will probably be cosmetic (moving things around, rather than changing what they do).

@wenkokke
Copy link
Contributor

wenkokke commented Jan 1, 2022

@tek I have an updated version of #55 for when this pull request is merged

@tek tek merged commit d4c67bb into tree-sitter:master Jan 1, 2022
@tek
Copy link
Contributor

tek commented Jan 1, 2022

awesome!

src/scanner.cc Show resolved Hide resolved
peek('i')(in)
)(state);
// TODO Convert to switch
if (cond::symbolic(PEEK) || PEEK == '`') {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 peeks happen here, store in a variable? Or use a switch(PEEK) like the TODO mentions :)

src/scanner.cc Outdated Show resolved Hide resolved
src/scanner.cc Show resolved Hide resolved
string mark_target = "consume_until " + target;
util::mark(mark_target, state);
while (PEEK != 0 && !seq_v2(target, state)) {
while (PEEK != 0 && PEEK != first) S_ADVANCE;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 PEEKs, store in a var to only PEEK once?

src/scanner.cc Show resolved Hide resolved
src/scanner.cc Show resolved Hide resolved
src/scanner.cc Show resolved Hide resolved
src/scanner.cc Show resolved Hide resolved
src/scanner.cc Show resolved Hide resolved
wenkokke pushed a commit to wenkokke/fork4pr-tree-sitter-haskell that referenced this pull request Jan 1, 2022
@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

@luc-tielen I appreciate the comments, but I think a lot of them are out of scope, either because I haven't changed the code from what it was doing before, or because they're microoptimizations that the compiler may or may not already be doing.

I'd be interested to see if there's a performance win by implementing your constexpr hints. Would you be willing to open a PR for them? Same goes with two peeks here comments. Not sure if they're already optimized out, but I'd love to see some numbers :)

@luc-tielen
Copy link

@414owen That's ok. Your PR was fine, so I could only point out the nitpicky stuff.

@tek
Copy link
Contributor

tek commented Jan 1, 2022

@414owen it appears to get stuck in an infinite loop when parsing HLS for me, can you confirm?

@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

@414owen it appears to get stuck in an infinite loop when parsing HLS for me, can you confirm?

I'm building helix atm. I'll run it in gdb and see where :)

@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

Okay I think I need to check for an eof in inline-comment. I'll PR in a few mins.

@414owen
Copy link
Contributor Author

414owen commented Jan 1, 2022

old new speedup
effects 1.07s 0.034s 31.5x
postgrest stackoverflow 0.083s ∞x
ivory 10.16s 0.225s 45.16x
polysemy 3.71s 0.089s 41.7x
semantic 50.62s 1.089s 46.5x
haskell-language-server 24.28s 0.504s 48.2x

@tek
Copy link
Contributor

tek commented Jan 1, 2022

crazy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants