Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed rewriting not perfect #222

Open
keith-hall opened this issue Oct 30, 2018 · 8 comments
Open

Embed rewriting not perfect #222

keith-hall opened this issue Oct 30, 2018 · 8 comments
Labels

Comments

@keith-hall
Copy link
Collaborator

keith-hall commented Oct 30, 2018

After the latest fixes, I tried updating the submodules, and can see we still have some syntax test failures, this time relating to embed...escape:

FAILED testdata/Packages\Java\syntax_test_java.java: 817
FAILED testdata/Packages\Markdown\syntax_test_markdown.md: 61

We get the same failures in Sublime Text, when rewriting the embed...escape rules to with_prototype like syntect's yaml_load is doing, so the good news is that syntects parser is working correctly:

JavaDoc.sublime-syntax:

--- Shipped Packages/Java/JavaDoc.sublime-syntax    2018-10-11 19:11:54
+++ Packages/Java/JavaDoc.sublime-syntax    2018-10-30 11:28:20
@@ -20,11 +20,12 @@
     - meta_include_prototype: false
     - match: /\*\*
       scope: comment.block.documentation.javadoc punctuation.definition.comment.begin.javadoc
-      embed: contents
-      embed_scope: comment.block.documentation.javadoc text.html.javadoc
-      escape: \*/
-      escape_captures:
-        0: comment.block.documentation.javadoc punctuation.definition.comment.end.javadoc
+      push:
+        - [{ meta_include_prototype: false }, { meta_content_scope: 'comment.block.documentation.javadoc text.html.javadoc' }, { match: '\*/', scope: comment.block.documentation.javadoc punctuation.definition.comment.end.javadoc, pop: true }]
+        - contents
+      with_prototype:
+        - match: (?=\*/)
+          pop: true

Markdown.sublime-syntax:

--- Shipped Packages/Markdown/Markdown.sublime-syntax   2018-10-11 19:11:54
+++ Packages/Markdown/Markdown.sublime-syntax   2018-10-30 09:44:46
@@ -1106,12 +1106,12 @@ contexts:
         0: meta.code-fence.definition.begin.html-php.markdown-gfm
         2: punctuation.definition.raw.code-fence.begin.markdown
         5: constant.other.language-name.markdown
-      embed: scope:embedding.php
-      embed_scope: markup.raw.code-fence.html-php.markdown-gfm
-      escape: '{{code_fence_escape}}'
-      escape_captures:
-        0: meta.code-fence.definition.end.html-php.markdown-gfm
-        1: punctuation.definition.raw.code-fence.end.markdown
+      push:
+        - [{ meta_include_prototype: false }, { meta_content_scope: 'markup.raw.code-fence.html-php.markdown-gfm' }, { match: '{{code_fence_escape}}', captures: { 0: meta.code-fence.definition.end.html-php.markdown-gfm, 1: punctuation.definition.raw.code-fence.end.markdown }, pop: true }]
+        - scope:embedding.php
+      with_prototype:
+        - match: (?={{code_fence_escape}})
+          pop: true
     - match: |-
          (?x)
           {{fenced_code_block_start}}

For Java(Doc), the workaround suggested at #160 (comment) works.

For Markdown, the following syntax test is enough to see the problem:

| SYNTAX TEST "Packages/Markdown/Markdown.sublime-syntax"
```html+php
<div></div>
|^^^ entity.name.tag.block.any.html
<?php
|^^^^ punctuation.section.embedded.begin.php
var_dump(expression);
| ^^^^^^ support.function.var.php
```
|^^ punctuation.definition.raw.code-fence.end.markdown

Everything except the last line's assertions pass. Adding ?> before the closing backticks/code fence allows the backticks to be correctly recognized as the closing code fence instead of PHP interpolated string punctuation, but we will need a solution so that the way syntect handles embed actions will work 100% correctly in all situations.

Altering the PHP Source syntax definition can help to work around it, which (although obviously not a solution,) suggests that, for some reason, the with_prototype (rewritten from the embed in the Markdown syntax definition) is not being considered at this point (in both Sublime and syntect):

--- Shipped Packages/PHP/PHP Source.sublime-syntax  2018-10-11 19:11:54
+++ Packages/PHP/PHP Source.sublime-syntax  2018-10-30 13:36:58
@@ -109,6 +109,8 @@
         - include: statements
 
   expressions:
+    - match: (?=```)
+      pop: true
     - include: comments
     - match: (?i)\b((?:require|include)(?:_once)?)\b\s*
       captures:
@keith-hall keith-hall added the bug label Oct 30, 2018
@keith-hall
Copy link
Collaborator Author

I guess the fundamental flaw with rewriting embed actions to the "equivalent" with_prototype construct is that embed is guaranteed to escape at the escape pattern, but with with_prototype, a match pattern could consume the characters first.

Simple example embed_vs_with_prototype.sublime-syntax:

%YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
scope: source.embed-vs-with_prototype
contexts:
  main:
    - match: embed
      scope: keyword.begin
      embed: contents
      escape: escape
      escape_captures:
        0: keyword.end
    - match: with_prototype
      scope: keyword.begin
      push: 
        - [{ meta_include_prototype: false }, { match: escape, scope: keyword.end, pop: true }]
        - contents
      with_prototype:
        - match: (?=escape)
          pop: true
  contents:
    - match: \w+
      scope: string

Test text:

embedtesting123escape
with_prototypetesting123escape

image

So unfortunately, I believe we will need to include native support for embed/escape in the parser.

@robinst
Copy link
Collaborator

robinst commented Oct 31, 2018

Makes sense! So escape needs to have precedence over other matches.

@keith-hall
Copy link
Collaborator Author

Just a note for when we work on this, there is an open question for SublimeHQ to answer regarding whether the prototype context should be ignored in contexts embedded from the prototype. Currently (as at build 3180) Sublime Text's behavior is to apply the prototype.

@robinst
Copy link
Collaborator

robinst commented Mar 28, 2019

@keith-hall So let me get this straight, in this example:

embedtesting123escape

embed matches the embed, that's easy. But then the next match would be that \w+ matches all of testing123escape with scope string. But the embed escape also matches escape.

So in that case, the escape match takes precedence and the match for string is cut short.

I'm wondering how that's implemented (and how we would implement it). Does something like this make sense?:

  1. If we are currently in an embed, try matching the escape first. If that matches:
    • if the match is at our current position, pop out of the embed
    • else continue matching other contexts but limit it to text that comes before the match (basically truncate our input).
  2. Advance position to the next match and repeat

What happens if there are multiple active embeds that has matching escapes, which one wins?

@keith-hall
Copy link
Collaborator Author

yes, that makes sense, and sounds like the way forward.
For multiple embeds, I believe the same ordering as with_prototype is used - the embedded contexts are prioritized by the order in which were pushed onto the stack, an earlier embed's escape wins even if a later escape would match at an earlier index in the string, if that makes sense. It may be better illustrated with an example, I will try to come up with one when I get a chance :)

@robinst
Copy link
Collaborator

robinst commented Mar 29, 2019

I tried this example:

%YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
scope: source.nested-embed
contexts:
  main:
    - match: \(
      scope: paren.begin
      embed: a
      escape: \)
      escape_captures:
        0: paren.end
  a:
    - match: \w+
      scope: string.a
    - match: \<
      scope: bracket.begin
      embed: b
      escape: \>
      escape_captures:
        0: bracket.end
  b:
    - match: \w+
      scope: string.b

With the input ( < > ), the > is bracket.end and ) is paren.end.

With ( < ) >, ) is paren.end and ) doesn't have a scope. So yeah, the earlier embed's escape wins and pops out all the way. (I also did an example where I used the same pattern for both escapes, and it popped out all the way as well.)

@keith-hall
Copy link
Collaborator Author

keith-hall commented Mar 29, 2019

nice 👍 one other thing I wanted to check was how escapes are handled if one escape pattern matches inside another one... say, escape: me and escape: test-me with text test-me - I imagine the first embed should be popped as it should win with me. Sorry for not doing the due diligence myself or laying out the yaml for a proper test case, I'm on mobile atm. EDIT: actually, this is probably covered by your using the same pattern for both escapes.

@robinst
Copy link
Collaborator

robinst commented Mar 29, 2019

Just tested, me wins (if it's the earlier embed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants