Skip to content

Added a section about soft tabs and hard tabs #84

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mipo256
Copy link
Member

@mipo256 mipo256 commented May 3, 2025

In our specification, we heavily use both tab/hard tab and space/soft tab. I think we need to make clear in the spec what do we mean by those terms.


📚 Documentation preview 📚: https://editorconfig-specification--84.org.readthedocs.build/

@mipo256 mipo256 requested review from xuhdev and cxw42 May 3, 2025 12:27
@mipo256 mipo256 force-pushed the spec-polish branch 2 times, most recently from d3f1cf6 to c3fbcc0 Compare May 3, 2025 12:35
@mipo256 mipo256 force-pushed the spec-polish branch 2 times, most recently from 98579f4 to 94dde67 Compare May 3, 2025 16:14
@@ -60,6 +60,15 @@ In EditorConfig:
settings based on the key-value pairs.
- "Editors" permit editing files, and use plugins to update settings for
files being edited.
- The words "tab" and "hard tab" are assumed to be interchangable and to represent the
character defined by the Unicode HT/TAB symbol (U+0009).
- The word "space" is the Unicode character defined by the Unicode Space/SP symbol (U+0020).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other encodings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good concern. However, we support the following (and we say that in the spec) encodings:

  • UTF8 including.one with byte order mark
  • UTF16, both endianness
  • Latin1

All of those implement unicode even UTF16, which is not ascii compatible, but still implement unicode code points. I think we can agree that practically any encoding that you can imagine and that appeared in the last 30-35 years implemented a unicode.

The alternative is that we will not be able to reliably specify what exactly we mean by the words space and tab.

Copy link
Member

@xuhdev xuhdev May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
- The word "space" is the Unicode character defined by the Unicode Space/SP symbol (U+0020).
- The word "space" refers to the character corresponding to the Unicode Space/SP symbol (U+0020) in any encoding.

... we support the following (and we say that in the spec) encodings: ...

A user can choose to not specify an encoding, in which case EditorConfig disregards any encoding settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is to not define it and leave it to mean what it ordinarily means. One example is from Python spec, which does not require a particular source code encoding: https://docs.python.org/3/reference/lexical_analysis.html#blank-lines

A logical line that contains only spaces, tabs, formfeeds and possibly a comment, is ignored...

The whole text doesn't define space.

This approach may be better here because our "space" is meant to fit in the general context of text, free from any encoding requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general point is that, we would like to define some words since we would like them to have a specific meaning in our context. But we don't need to define all common words, if they are not different from other technical context and they don't bring ambiguity that hinders implementation. One similar general principle is from law, which I believe is also a good principle for specs to follow:

Statutory construction begins with looking at the plain language of the statute to determine its original intent. To determine a statute's original intent, courts first look to the words of the statute and apply their usual and ordinary meanings. https://www.law.cornell.edu/wex/statutory_construction

If we replace "statute" with "spec", and "court" with "implementation", that's exactly how I read specs 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is to not define it and leave it to mean what it ordinarily means. One example is from Python spec, which does not require a particular source code encoding: https://docs.python.org/3/reference/lexical_analysis.html#blank-lines

That is not entirely true, actually.

The documentation section you've mentioned does not specify the meaning of what space because of that:

Python reads program text as Unicode code points; the encoding of a source file can be given by an encoding declaration and defaults to UTF-8, see PEP 3120 for details. If the source file cannot be decoded, a SyntaxError is raised.

The Python interpreter expects the UTF-8 as the default encoding if not specified, so I disagree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is Python does not require UTF-8. If the source is not UTF-8, what does space mean in its spec? The meaning of space doesn't suddenly become ambiguous simply because Python doesn't enumerate what space means in various encodings.

The most important point is that people understand what space, tab mean in the context, and I can hardly imagine any ambiguities. If you really feel the need to define space, I believe my edit suggested above is more appropriate (the original text confuses readers because it seems to suggest UTF-8 is only supported).

@mipo256 mipo256 requested a review from xuhdev May 4, 2025 11:49
@cxw42
Copy link
Member

cxw42 commented May 11, 2025

I do not think this should be merged as is. I believe this needs further discussion on the main issue tracker.

I agree that we should not explicitly require a character set with ASCII whitespace values. Someone could happily use EditorConfig to edit EBCDIC files, even if doing so is uncommon :) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants