Added a section about soft tabs and hard tabs #84

mipo256 · 2025-05-03T12:27:01Z

In our specification, we heavily use both tab/hard tab and space/soft tab. I think we need to make clear in the spec what do we mean by those terms.

📚 Documentation preview 📚: https://editorconfig-specification--84.org.readthedocs.build/

index.rst

xuhdev · 2025-05-04T08:01:53Z

index.rst

@@ -60,6 +60,15 @@ In EditorConfig:
  settings based on the key-value pairs.
 - "Editors" permit editing files, and use plugins to update settings for
  files being edited.
+- The words "tab" and "hard tab" are assumed to be interchangable and to represent the 
+  character defined by the Unicode HT/TAB symbol (U+0009). 
+- The word "space" is the Unicode character defined by the Unicode Space/SP symbol (U+0020).


What about other encodings?

That is a good concern. However, we support the following (and we say that in the spec) encodings:

UTF8 including.one with byte order mark

UTF16, both endianness

Latin1

All of those implement unicode even UTF16, which is not ascii compatible, but still implement unicode code points. I think we can agree that practically any encoding that you can imagine and that appeared in the last 30-35 years implemented a unicode.

The alternative is that we will not be able to reliably specify what exactly we mean by the words space and tab.

How about:

Suggested change

- The word "space" is the Unicode character defined by the Unicode Space/SP symbol (U+0020).

- The word "space" refers to the character corresponding to the Unicode Space/SP symbol (U+0020) in any encoding.

... we support the following (and we say that in the spec) encodings: ...

A user can choose to not specify an encoding, in which case EditorConfig disregards any encoding settings.

Another way is to not define it and leave it to mean what it ordinarily means. One example is from Python spec, which does not require a particular source code encoding: https://docs.python.org/3/reference/lexical_analysis.html#blank-lines

A logical line that contains only spaces, tabs, formfeeds and possibly a comment, is ignored...

The whole text doesn't define space.

This approach may be better here because our "space" is meant to fit in the general context of text, free from any encoding requirement.

I think the general point is that, we would like to define some words since we would like them to have a specific meaning in our context. But we don't need to define all common words, if they are not different from other technical context and they don't bring ambiguity that hinders implementation. One similar general principle is from law, which I believe is also a good principle for specs to follow:

Statutory construction begins with looking at the plain language of the statute to determine its original intent. To determine a statute's original intent, courts first look to the words of the statute and apply their usual and ordinary meanings. https://www.law.cornell.edu/wex/statutory_construction

If we replace "statute" with "spec", and "court" with "implementation", that's exactly how I read specs 😉

Another way is to not define it and leave it to mean what it ordinarily means. One example is from Python spec, which does not require a particular source code encoding: https://docs.python.org/3/reference/lexical_analysis.html#blank-lines

That is not entirely true, actually.

The documentation section you've mentioned does not specify the meaning of what space because of that:

Python reads program text as Unicode code points; the encoding of a source file can be given by an encoding declaration and defaults to UTF-8, see PEP 3120 for details. If the source file cannot be decoded, a SyntaxError is raised.

The Python interpreter expects the UTF-8 as the default encoding if not specified, so I disagree.

The point is Python does not require UTF-8. If the source is not UTF-8, what does space mean in its spec? The meaning of space doesn't suddenly become ambiguous simply because Python doesn't enumerate what space means in various encodings.

The most important point is that people understand what space, tab mean in the context, and I can hardly imagine any ambiguities. If you really feel the need to define space, I believe my edit suggested above is more appropriate (the original text confuses readers because it seems to suggest UTF-8 is only supported).

cxw42 · 2025-05-11T19:33:37Z

I do not think this should be merged as is. I believe this needs further discussion on the main issue tracker.

I agree that we should not explicitly require a character set with ASCII whitespace values. Someone could happily use EditorConfig to edit EBCDIC files, even if doing so is uncommon :) .

mipo256 requested review from xuhdev and cxw42 May 3, 2025 12:27

mipo256 force-pushed the spec-polish branch 2 times, most recently from d3f1cf6 to c3fbcc0 Compare May 3, 2025 12:35

mipo256 mentioned this pull request May 3, 2025

The definition of the indent_size is very confusing #85

Open

mipo256 force-pushed the spec-polish branch 2 times, most recently from 98579f4 to 94dde67 Compare May 3, 2025 16:14

xuhdev reviewed May 4, 2025

View reviewed changes

Added a section about soft tabs and hard tabs

59e00dd

mipo256 force-pushed the spec-polish branch from 94dde67 to 59e00dd Compare May 4, 2025 11:49

mipo256 requested a review from xuhdev May 4, 2025 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added a section about soft tabs and hard tabs #84

Added a section about soft tabs and hard tabs #84

Uh oh!

mipo256 commented May 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

xuhdev May 4, 2025

Uh oh!

mipo256 May 4, 2025

Uh oh!

xuhdev May 5, 2025 •

edited

Loading

Uh oh!

xuhdev May 5, 2025

Uh oh!

xuhdev May 5, 2025

Uh oh!

mipo256 May 5, 2025

Uh oh!

xuhdev May 10, 2025

Uh oh!

cxw42 commented May 11, 2025

Uh oh!

Uh oh!

	- The word "space" is the Unicode character defined by the Unicode Space/SP symbol (U+0020).
	- The word "space" refers to the character corresponding to the Unicode Space/SP symbol (U+0020) in any encoding.

Added a section about soft tabs and hard tabs #84

Are you sure you want to change the base?

Added a section about soft tabs and hard tabs #84

Uh oh!

Conversation

mipo256 commented May 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xuhdev May 4, 2025

Choose a reason for hiding this comment

Uh oh!

mipo256 May 4, 2025

Choose a reason for hiding this comment

Uh oh!

xuhdev May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuhdev May 5, 2025

Choose a reason for hiding this comment

Uh oh!

xuhdev May 5, 2025

Choose a reason for hiding this comment

Uh oh!

mipo256 May 5, 2025

Choose a reason for hiding this comment

Uh oh!

xuhdev May 10, 2025

Choose a reason for hiding this comment

Uh oh!

cxw42 commented May 11, 2025

Uh oh!

Uh oh!

mipo256 commented May 3, 2025 •

edited by github-actions bot

Loading

xuhdev May 5, 2025 •

edited

Loading