Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for raw string literals #39

Closed
u2606 opened this issue Jul 25, 2019 · 11 comments
Closed

Support for raw string literals #39

u2606 opened this issue Jul 25, 2019 · 11 comments

Comments

@u2606
Copy link

u2606 commented Jul 25, 2019

Support for raw string literals

SE-0200, implemented in Swift 5, added support for raw string literals, which are string literals that add # characters to their delimiters, and that partially ignore escape sequences like \n and \\. The above-linked Swift Evolution proposal goes into greater detail about their design, but a quick overview follows.

Overview of raw string literals

Traditional string literals interpret character sequences beginning with \ as escape sequences, but raw string literals interpret these as literal characters:

"\n" // newline
#"\n"# // backslash, n

"\\n" // backslash, n
#"\\n"# // backslash, backslash, n

"\u{2603}" // ☃
#"\u{2603}"# // backslash, u, opening brace, 2, 6, 0, 3, closing brace

"\\u{2603}" // backslash, u, opening brace, 2, 6, 0, 3, closing brace
#"\\u{2603}"# // backslash, backslash, u, opening brace, 2, 6, 0, 3, closing brace

let num = 42
"\(num)" // 42
#"\(num)"# // backslash, opening parenthesis, n, u, m, closing parenthesis
"\\(num)" // 42
#"\\(num)"# // backslash, backslash, opening parenthesis, n, u, m, closing parenthesis

You can use any number of # characters as the delimiters of a raw string literal:

// All of the following strings are equivalent:
"good morning"
#"good morning"#
##"good morning"##
###"good morning"###
// et cetera

If you want to use an escape sequence in a raw string literal, the escape sequence includes the same number of # characters as the delimiters:

// All of the following strings are equivalent:
"\n" // newline
#"\#n"#
##"\##n"##

// All of the following strings are equivalent:
"\u{2603}" // ☃
#"\#u{2603}"#
##"\##u{2603}"##

// All of the following strings are equivalent:
"\(num)" // 42
#"\#(num)"#
##"\##(num)"##

Note, the reason you can use any number of # characters is to allow literal \# sequences in a raw string literal without escaping:

// All of the following strings are equivalent:
"\\#n" // backslash, hash, n (need to escape \\ to get \)
#"\#\#n"# // (need to escape \#\ to get \ before #)
##"\#n"## // (no need to escape)

Swift also allows combining raw string literals with multiline string literals:

// All of the following strings are equivalent
let one = "good\\n\nmorning" // g, o, o, d, backslash, n, newline, m, o, r, n, i, n, g
let two = """
    good\\n
    morning
    """
let three = #"""
    good\n
    morning
    """#

The issue: swift.tmbundle doesn’t support raw string literals

As seen in some of the examples above, swift.tmbundle highlights raw string literals as if they were traditional string literals. That means that some valid sequences involving \ characters are marked as invalid, and some invalid sequences involving \ characters are only partially marked as invalid:

"\d" // invalid escape sequence \d (correctly marked)
#"\d"# // backslash, d (incorrectly marked as invalid escape sequence \d)
#"\#d"# // invalid escape sequence \#d (incorrectly marked as shorter invalid escape sequence \#)
##"\#d"## // backslash, hash, d (incorrectly marked as invalid escape sequence \#`)
##"\##d"## // invalid escape sequence \##d (incorrectly marked as shorter invalid escape sequence \#)
@jtbandes
Copy link
Collaborator

Thanks for the report. Are you interested in making a PR to add this? Here's a similar example from a Rust bundle:

https://github.com/carols10cents/rust.tmbundle/blob/d788eebb847c7673360e5e2d85ab9a1dc877c871/Syntaxes/Rust.tmLanguage#L773-L788

@u2606
Copy link
Author

u2606 commented Jul 25, 2019

Yes, I’m interested in making a PR. I see that literal_raw_string uses a name of string.quoted.double.raw.rust in the example you linked. I can’t seem to figure out where that’s defined.

@jtbandes
Copy link
Collaborator

All scope names are just by convention, but the basic ones are defined here: https://macromates.com/manual/en/language_grammars So you could use string.quoted.double.raw.swift for example.

@jtbandes
Copy link
Collaborator

You might also want to refer to #31

@u2606
Copy link
Author

u2606 commented Jul 25, 2019

Thanks, those resources should help.

@jtbandes
Copy link
Collaborator

Looking at this now – @infininight @sorbits is there a way to use begin capture groups inside patterns? The Swift raw string literals allow the same delimiter to be used on escapes inside the string, for instance ###"new \###n line"### but \1 doesn't seem to work the same way it does inside end...

@sorbits
Copy link
Member

sorbits commented Aug 19, 2019

Looking at this now – @infininight @sorbits is there a way to use begin capture groups inside patterns?

There is not, no. So currently we cannot support the raw string escaping mechanism.

I do have an open issue about supporting ${variables} in patterns, which I have updated to include captures from the parent’s begin rule. Doesn’t solve the problem here and now, but on an infinite timescale… :)

@jtbandes
Copy link
Collaborator

OK, thanks. Maybe what I'll do is create a full set of rules including escapes for n=1, i.e. #"this \#n case"#, and then a general set of (#+)".."\1 delimiters that doesn't support escapes. It's important to at least try to get the string start/end boundaries right, but it sounds impossible to make it perfect (since an escaped subexpression could contain more raw strings: \#( ##"more string here"## ))

@sorbits
Copy link
Member

sorbits commented Aug 21, 2019

Maybe what I'll do is create a full set of rules including escapes for n=1

That’s a great idea. I was about to suggest you could also do a rule for n=2 (before the general rule that just disables escape sequences), but come to think of it, I find it hard to believe that people would pick a string type where their newlines etc. are represented as \##n, it seems counter to the purpose of raw strings to have such weird escape sequences.

@u2606
Copy link
Author

u2606 commented Aug 21, 2019

It seems like raw string literals with more than one # are only useful when you want to express a string containing #", "#, or \# in your string, which aren’t incredibly common sequences of characters. The only use case I can think of for n=2 escapes is code that generates other Swift code. Even that seems rare and unlikely.

@jtbandes
Copy link
Collaborator

Implemented in #40

jtbandes added a commit that referenced this issue Aug 27, 2019
As discussed in #39, adds support for raw strings (Swift 5, SE-0200), supporting all escapes in a single #"...\#n..."# string, and limited support for raw strings with more #s due to grammar limitations.
@jtbandes jtbandes closed this as completed Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants