-
Notifications
You must be signed in to change notification settings - Fork 30
Added a section about soft tabs and hard tabs #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d3f1cf6
to
c3fbcc0
Compare
98579f4
to
94dde67
Compare
@@ -60,6 +60,15 @@ In EditorConfig: | |||
settings based on the key-value pairs. | |||
- "Editors" permit editing files, and use plugins to update settings for | |||
files being edited. | |||
- The words "tab" and "hard tab" are assumed to be interchangable and to represent the | |||
character defined by the Unicode HT/TAB symbol (U+0009). | |||
- The word "space" is the Unicode character defined by the Unicode Space/SP symbol (U+0020). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other encodings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good concern. However, we support the following (and we say that in the spec) encodings:
- UTF8 including.one with byte order mark
- UTF16, both endianness
- Latin1
All of those implement unicode even UTF16, which is not ascii compatible, but still implement unicode code points. I think we can agree that practically any encoding that you can imagine and that appeared in the last 30-35 years implemented a unicode.
The alternative is that we will not be able to reliably specify what exactly we mean by the words space and tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
- The word "space" is the Unicode character defined by the Unicode Space/SP symbol (U+0020). | |
- The word "space" refers to the character corresponding to the Unicode Space/SP symbol (U+0020) in any encoding. |
... we support the following (and we say that in the spec) encodings: ...
A user can choose to not specify an encoding, in which case EditorConfig disregards any encoding settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is to not define it and leave it to mean what it ordinarily means. One example is from Python spec, which does not require a particular source code encoding: https://docs.python.org/3/reference/lexical_analysis.html#blank-lines
A logical line that contains only spaces, tabs, formfeeds and possibly a comment, is ignored...
The whole text doesn't define space.
This approach may be better here because our "space" is meant to fit in the general context of text, free from any encoding requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general point is that, we would like to define some words since we would like them to have a specific meaning in our context. But we don't need to define all common words, if they are not different from other technical context and they don't bring ambiguity that hinders implementation. One similar general principle is from law, which I believe is also a good principle for specs to follow:
Statutory construction begins with looking at the plain language of the statute to determine its original intent. To determine a statute's original intent, courts first look to the words of the statute and apply their usual and ordinary meanings. https://www.law.cornell.edu/wex/statutory_construction
If we replace "statute" with "spec", and "court" with "implementation", that's exactly how I read specs 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is to not define it and leave it to mean what it ordinarily means. One example is from Python spec, which does not require a particular source code encoding: https://docs.python.org/3/reference/lexical_analysis.html#blank-lines
That is not entirely true, actually.
The documentation section you've mentioned does not specify the meaning of what space because of that:
Python reads program text as Unicode code points; the encoding of a source file can be given by an encoding declaration and defaults to UTF-8, see PEP 3120 for details. If the source file cannot be decoded, a SyntaxError is raised.
The Python interpreter expects the UTF-8 as the default encoding if not specified, so I disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is Python does not require UTF-8. If the source is not UTF-8, what does space mean in its spec? The meaning of space doesn't suddenly become ambiguous simply because Python doesn't enumerate what space means in various encodings.
The most important point is that people understand what space, tab mean in the context, and I can hardly imagine any ambiguities. If you really feel the need to define space, I believe my edit suggested above is more appropriate (the original text confuses readers because it seems to suggest UTF-8 is only supported).
I do not think this should be merged as is. I believe this needs further discussion on the main issue tracker. I agree that we should not explicitly require a character set with ASCII whitespace values. Someone could happily use EditorConfig to edit EBCDIC files, even if doing so is uncommon :) . |
In our specification, we heavily use both tab/hard tab and space/soft tab. I think we need to make clear in the spec what do we mean by those terms.
📚 Documentation preview 📚: https://editorconfig-specification--84.org.readthedocs.build/