Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added syntest example to run ST syntax tests #44

Merged
merged 10 commits into from
Mar 17, 2017

Conversation

keith-hall
Copy link
Collaborator

This PR adds a new example called "syntest", which will parse and execute ST's "syntax_test_" files.

Currently, there is no way to reference a syntax definition in a SyntaxSet from it's original file path, which is how tests specify which syntax definition to use, so it just lets syntect choose based on the extension of the test file at the moment.

I used the regular "regex" crate to achieve the test line parsing - feel free to replace this with "onig" if you don't want the extra reference - I don't quite have enough experience yet to know how to use it effectively.

Also, if the syntax test files use Windows line endings, it has to replace "\r" with nothing, because otherwise the regular expressions don't match expressions like "$\n?" properly, but perhaps it would be better to update the main parsing module to fix this?

I've checked and it shows success on the XML, HTML and JSON syntax tests. I also tried modifying them to cause a test to fail, and syntest correctly picked that up.

However, it seems to detect a failure in the Haskell syntax test file, and it looks like some meta scopes are being applied multiple times - so I'd appreciate if you could take a look and see if it's a flaw in my syntest logic somewhere or a bug in the parser. Thanks!

cargo run --example syntest testdata/Packages/Haskell/
    Finished debug [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/examples/syntest testdata/Packages/Haskell/`
Testing file testdata/Packages/Haskell/syntax_test_haskell.hs
The test file references syntax definition file: Packages/Haskell/Haskell.sublime-syntax
Assertion selector "- comment.line.double-dash.haskell" from line 6 failed on line 5, column range 0-1 (with text ['2']) has scope [<source.haskell>, <comment.line.double-dash.haskell>, <comment.line.double-dash.haskell>, <comment.line.double-dash.haskell>, <comment.line.double-dash.haskell>, <constant.numeric.haskell>]
Ok(FailedAssertions(1, 422))

@keith-hall
Copy link
Collaborator Author

I vote for something fishy in easy:ScopeRegionIterator, because if I do println!("{:?}", &ops); after state.parse_line, I get what I would expect, but when I println!("{:?}", op); inside the loop, I am missing the last pop:

[(0, Push(<source.haskell>)), (0, Push(<comment.line.double-dash.haskell>)), (0, Push(<punctuation.definition.comment.haskell>)), (2, Pop(1)), (57, Pop(1))]
Noop
Push(<source.haskell>)
Push(<comment.line.double-dash.haskell>)
Push(<punctuation.definition.comment.haskell>)
Pop(1)
[]

@keith-hall
Copy link
Collaborator Author

The following change seems to help:

--- syntect-master/src/easy.rs
+++ syntect-syntest/src/easy.rs
@@ -148,10 +148,9 @@
 impl<'a> Iterator for ScopeRegionIterator<'a> {
     type Item = (&'a str, &'a ScopeStackOp);
     fn next(&mut self) -> Option<Self::Item> {
-        let next_str_i = if self.index >= self.ops.len() {
-            if self.last_str_index >= self.line.len() {
-                return None;
-            }
+        let next_str_i = if self.index > self.ops.len() {
+            return None;
+        } else if self.index == self.ops.len() {
             self.line.len()
         } else {
             self.ops[self.index].0

with this change, the "Batch File" and "Haskell" syntax tests start working, and the number of failures in "PHP" are reduced significantly.

Copy link
Owner

@trishume trishume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a preliminary review. I have to go right now but I can do a complete review soon.

In general this looks awesome. Thanks so much for writing this. It seems to be finding a bunch of bugs, but that's a good thing since after we get these to pass I'll feel a lot more confident that things work.

}

fn main() {
let ss = SyntaxSet::load_defaults_newlines(); // note we load the version with newlines
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case it might be good to use SyntaxSet::load_from_folder(argv[0]) so that you can use it for syntax development without recompiling every time. Once we get syntect to a point where you can actually rely on it to do things correctly that is...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually since this is using the version with newlines included and load_from_folder defaults to without newlines. You can just do what load_from_folder does yourself:

        let mut ss = SyntaxSet::new();
        ss.load_syntaxes(args[0], true).unwrap();
        ss.link_syntaxes();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea! however, in my testing using cargo run --example syntest testdata/Packages/HTML/, args[0] refers to the example in the target/debug folder, and it would have to traverse up a few folder levels to find the testdata/Packages folder where the syntaxes are, so we will need to come up with a suitable workaround I think.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant args[1] and then you would pass testdata/Packages as the first argument and it would run tests for all languages, although it's kinda nice to be able to run tests for only one language I guess. Maybe have the first command line argument be the same as now, and then if there's a second command line argument load from that folder, otherwise just load the defaults.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something along those lines too :) thanks, I'll give it a go

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

println!("Testing file {}", entry.path().display());
let result = test_file(&ss, entry.path(), true);
println!("{:?}", result);
if let Err(_) = result {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if there were a flag/argument to make it not die on the first failing file. It would be easier to get a sense of how much was broken then.

Copy link
Collaborator Author

@keith-hall keith-hall Mar 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it currently doesn't die on the first failing file (unless there is a panic), it just ensures that when it has run through all the files, it will exit with an appropriate exit code :) maybe we could add a flag though so that it optionally can fail fast, to save time (not that these syntax tests take long at all!) when all the caller wants to know is whether there were any failures

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh woops. I just skimmed over the code briefly and missed that this didn't actually stop anything.

I think the behaviour you describe is perfect for now. I don't see any pressing need to add a fail-fast flag.

Cargo.toml Outdated
@@ -24,6 +24,7 @@ rustc-serialize = "^0.3"
bincode = "0.6"
flate2 = "^0.2"
fnv = "^1.0"
regex = "0.2.1"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now the use of regex is fine. I may port it to onig before or after it is merged.

However, it should be under dev-dependencies since it is only used in an example. See rust-lang/cargo#523

@keith-hall
Copy link
Collaborator Author

It seems like the test failures are due to scopes from meta_content_scope being missing in the scope stack.
For example, Java:

Testing file testdata/Packages/Java/syntax_test_java.java
The test file references syntax definition file: Packages/Java/Java.sublime-syntax
Assertion selector "meta.class" from line 20 failed on line 19, column range 7-12 (with text ['c', 'l', 'a', 's', 's']) has scope [<source.java>, <storage.type.java>]

@trishume
Copy link
Owner

@keith-hall interesting. Thanks for the diagnosing. I'm going to see if I can review this PR later today and maybe fix some of the bugs it found. If not today I'll try and do it tomorrow.

@keith-hall
Copy link
Collaborator Author

looking at the Java example a bit more closely, it might be related to how set interacts with meta scopes - in ST, meta_content_scope and meta_scope apply to the match that sets a different context

@keith-hall keith-hall force-pushed the syntest branch 3 times, most recently from aafe342 to b03fe13 Compare March 12, 2017 15:00
@trishume
Copy link
Owner

trishume commented Mar 12, 2017

@keith-hall I'm working on reviewing this. Just a heads up I pushed a commit to the branch refactoring the looking up of syntaxes by path (thanks for enabling maintainer commits). You may want to git pull and take a look at the commit.

@keith-hall
Copy link
Collaborator Author

@trishume nice one, that's a much better solution than what I had, thanks! :)

@trishume
Copy link
Owner

@keith-hall reviewed the fix to ScopeRegionIterator and committed the cleanup I did while doing so. It looks good, thanks for the debugging. You may want to pull.

Now all that's left for me to review is syntest.rs.

@trishume
Copy link
Owner

@keith-hall I just reviewed the syntest.rs file and it looks good. I pushed a commit to rearrange some error handling code and fix a few small things. I recommend you review the whitespace-insensitive diff to see what actually changed and what was just me changing the indentation of a large segment: 3e95235?w=1

I've now reviewed everything in this PR and am pretty happy with it. There's still the bugs that keep the tests from actually passing, but I'm thinking I/we can fix those in separate PRs. I'm ready to merge this now if you are. If you're ready to merge I think I'll do an interactive rebase to remove the dump changes to the first commit that changes them, just to avoid increasing the repo size unnecessarily.

After merging we can create issues for all the bugs we find. The one that makes the C# test suite crash at least already has an issue: #37

I'm currently looking at why the ASP test has so many failures. I think walkdir might actually be giving us the tests in a different order (possibly because of different OSs, I'm on OSX). Mine runs ASP (with many failures), then Batch (success), then C# (success), then another C# test (crash #37). The ASP failures may or may not be the same underlying bug as you are encountering with Java.

@trishume
Copy link
Owner

Also I notice that Travis is failing, but only the nightly one is failing, the stable one passes. And nightly fails on cargo --version so I think it's just Rust nightly that's broken on Travis.

@keith-hall
Copy link
Collaborator Author

@trishume yep looks good, I'm ready to merge - I agree that it makes sense to have separate PRs for the remaining issues :) thanks for the cleanup and associated commentary - makes it super easy to follow along :) it's a pleasure working with you!

do you want me to create the issue for the Java failures? I was thinking that it might make sense to tackle that first, then everything related to the same underlying issue will go away and make it easier to investigate what (if anything) else is wrong...

I was curious about those Travis failures - what's the benefit of building against nightly, if I may ask, when it's so unstable? :)

@robinst
Copy link
Collaborator

robinst commented Mar 13, 2017

Wow, nice change! After all the bugs are fixed, are you planning to run this as part of regular tests?

@trishume
Copy link
Owner

@robinst yah I'm thinking maybe move it to a module and then use it from both the example and cargo test.

@keith-hall
Copy link
Collaborator Author

@trishume FYI, I've fixed the Java failures over at https://github.com/forkeith/syntect/tree/set, and it reduces the number of ASP test failures quite drastically too. I'm not sure if I will get it to a state where it is Pull Request-able or not, but wanted you to see my progress so we can collaborate together rather than work separately to achieve the same thing.

@trishume
Copy link
Owner

@keith-hall awesome. I'll take a look at it.

Don't worry about duplicating work in the next few days. I'll be busy with school work until this weekend so I won't be doing any work on syntect (other than Githubbing).

keith-hall and others added 8 commits March 16, 2017 12:02
commit ammended later to remove changes to packdump files, to reduce
repo size.
This change makes it so that it can't crash on non-UTF8 paths,
makes lookup by file path more efficient, and avoids allocating
a hash map data structure that is never used as a map.

It also adds a test of the functionality

Signed-off-by: Tristan Hume <tris.hume@gmail.com>
i.e. the ASP syntax tests make use of this behavior, to test whether line continuation punctuation works properly
I did this to make it clearer what was going on, so that I could understand it
again and review that it was doing the right thing, since I clearly got it
wrong the first time.
trishume and others added 2 commits March 16, 2017 12:05
The error handling in test_file was rearranged to use
the try! macro and ok_or combinator. This decreases
indentation and makes the code easier to read.

It is recommended to view this diff without whitespace
to make it clearer what actually changed.
i.e. allow text after the assertion to not interfere with the assertion. This is how ST works when it executes the syntax tests.
@trishume trishume merged commit f5e01f1 into trishume:master Mar 17, 2017
@keith-hall keith-hall deleted the syntest branch March 17, 2017 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants