Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it good to add short escaping alias \s for space (\u0020)? #622

Closed
LongTengDao opened this issue May 17, 2019 · 22 comments
Closed

Is it good to add short escaping alias \s for space (\u0020)? #622

LongTengDao opened this issue May 17, 2019 · 22 comments

Comments

@LongTengDao
Copy link
Contributor

To avoid spaces at line end in multi-line string being auto trimmed accidentally, and simplify \u0020 writting.

@eksortso
Copy link
Contributor

I'm a little fond of this idea. It makes spaces explicit. But it's definitely new syntax.

And mentally, the notion conflicts with the regular-expression character class "\s", which is used to match any sort of whitespace, and not just \u0020. Different use cases for these, of course.

Is there a formulation of this bit of syntax that doesn't conflict with past usages anywhere?

@LongTengDao
Copy link
Contributor Author

LongTengDao commented May 24, 2019

Or define a line end wrapper character which not output?

sample = """
There are\s
two ways \#
to write space \# even comment
at line end."""

Output in JSON:

{ "sample": "There are \ntwo ways \nto write space \nat line end." }

@pradyunsg
Copy link
Member

Marking as post-1.0 since I don't really have the bandwidth to think about the edge cases here and this isn't super important to get to 1.0.

@LongTengDao
Copy link
Contributor Author

LongTengDao commented May 26, 2019

Marking as post-1.0 since I don't really have the bandwidth to think about the edge cases here and this isn't super important to get to 1.0.

Sure, focus on getting 1.0 things done first, we are all looking forward to it, best wishes!

@ChristianSi
Copy link
Contributor

Even if this is post-1.0, it was supposedly meant to stay open until then?

@eksortso
Copy link
Contributor

@LongTengDao, since this issue is already tagged post-1.0, could you please reopen it? We can effectively ignore it until after 1.0 is released, as I think the issue still bears some examination. You could even argue to remove that tag if you think it's merited.

@pradyunsg pradyunsg reopened this Aug 29, 2019
@pradyunsg
Copy link
Member

I can reopen it now but if it's closed again by OP, I won't be able to reopen.

@mmakaay
Copy link

mmakaay commented Aug 29, 2019

Quite confusing if I use regexps in my toml docs (which I do), and having to escape \s as \s. For other escaped chars it's kind of natural to do so since they are generally used well known escapes. The \s to represent a space would be rather unique to TOML (as far as my knowledge goes), which might cause people to trip over it.

my.regexp = "something\\sfeels\sfishy"

@ChristianSi
Copy link
Contributor

@mmakaay: For regexes, literal strings might, in any case, be a better fit.

@LongTengDao
Copy link
Contributor Author

@ChristianSi @eksortso @pradyunsg Sorry, I see.

@mmakaay "\s" is also invalid now in TOML, you need always write "\\s" or '\s'.

But the different meaning of \s in regExp is a problem indeed.

@pradyunsg
Copy link
Member

Let's defer further discussion until after TOML 1.0?


Noting a summary for self-reference later.

This is a request for an additional compact escape sequence: \s -> U+0020 (space), for easier trailing spaces in multiline strings when trailing spaces are auto-trimmed. It is already possible to denote this as \x0020.

The question to answer here is -- is there enough benefit to adding a compact notation for \x0020 for the specified use case? I'm intuitively leaning "no" but have decided on anything yet, since some time should be spent thinking about all the details/implications.

@eksortso
Copy link
Contributor

\x0020

You mean \u0020?

@yyny
Copy link

yyny commented Feb 17, 2020

Why not just \ (Literally escaping a space character), it is less confusing.
By the way, I also like Lua's \z escape, which does something similair (Remove all following space characters including newlines).

@eksortso
Copy link
Contributor

A backslash followed by a space is not very visible. I think the point of having the code was to make the character stand out.

We could make \x a valid prefix for two-character codes, for bytes and ASCII (and ASCII code points in UTF-8 for that matter). A space would be \x20, for instance.

Granted that's not as small as the proposed two-character \s, but it'd be more useful. After all, I think the README uses "\x" in places, doesn't it?

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Feb 18, 2020

I'm not so fond of \s, because in a lot of languages, and specifically in regular expressions, this already has a special meaning. Currently, using it raises an error. Once that's gone, it'll be confusing to people.

I like the \xXX idea, though. Or even just \uXX, shouldn't be hard to parse, and helps the generic case of escaping an ISO-8859-1 character.

@LongTengDao
Copy link
Contributor Author

LongTengDao commented Feb 18, 2020

Why not just \ (Literally escaping a space character), it is less confusing.
By the way, I also like Lua's \z escape, which does something similair (Remove all following space characters including newlines).

The purpose of \s (or other solution, I don't think \s is perfect too) is to prevent eol whitespace from being accidentally cleared by a semi-intelligent text editor, in other words to preserve whitespace, not to remove it. Therefore, \ cannot do this, the tail space may still be accidentally deleted. BTW: there is already a design about \ in TOML for removing subsequent whitespace as you wish.

I think \# which means eol after spaces is better than any kind escape of whitespaces, because they all won't be seen like whitespace anymore, that's not good.

@marzer
Copy link
Contributor

marzer commented Feb 19, 2020

@eksortso I second the \xXX syntax. It's already used in other languages that also allow \uXXXX and \UXXXXXXXX, so the consistency is welcome.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Feb 19, 2020

BTW: there is already a design about \ in TOML for removing subsequent whitespace as you wish.

That's true. And doesn't that already solve the original issue? If you're editor is overzealous in removing tailing white space, you can do this:

key = """
   Alea \
   iacta \
   est""" 

@eksortso
Copy link
Contributor

eksortso commented Feb 20, 2020

That's true. And doesn't that already solve the original issue?

Not really. The end-of-line backslash strips away all whitespace and all newlines following it before the next non-space, non-linebreak character. That means that subsequent line breaks and indentation at the beginning of the following line gets stripped by the \ too. We're looking for an alternative that doesn't do that.

Let's take your example:

key = """
   Alea \
   iacta \
   est"""

This is equivalent to the one-liner key = " Alea iacta est". But what we want is for key to be set to this, with the three-space indentation on all three lines, and the one extra space at the end of the first two lines:

   Alea 
   iacta 
   est

(Of course, you'll need to highlight that or look at the source to notice the end-of-line spaces.)

One way to add a space prominently is with \u0020.

key = """
   Alea\u0020
   iacta\u0020
   est"""

If I can get a PR together later today, then \x20 will work in some post-v1.0.0 version of TOML.

(Edit: added in missing spaces to the one-liner.)

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Feb 20, 2020

@eksortso, thanks, I didn't think of that. I typically would want to get rid of extraneous whitespace, but I guess there are use cases (like code sections), where that isn't desirable. Still, wanting a space right before a newline seems pretty fringe to me.

Nonetheless, short escape codes have a wider use than just spaces, I'd definitely love having that ;)

@pradyunsg
Copy link
Member

Let's add a \xHH -- I don't think a \s is particularly valuable once that's added in.

@pradyunsg
Copy link
Member

Closing this since the answer to the original question: No. :)

#796 can stand on its own, or we can file a dedicated issue for it if it needs extended discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants