Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Hard Tabs Issue #544

Closed
basmith opened this issue Oct 18, 2017 · 75 comments
Closed

The Hard Tabs Issue #544

basmith opened this issue Oct 18, 2017 · 75 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@basmith
Copy link

basmith commented Oct 18, 2017

Hi,

(This report is based on the v0.1.1 Win64 binary artifact from the Zig website.)

I noticed that if I create a Zig source file in Windows with a native editor (eg Notepad), the compiler complains about line endings:

$ zig build-exe hello.zig
':\code\zig\first\hello.zig:1:30: error: invalid character: '
const io = @import("std").io;
                             ^

If I manually kill the newlines (resulting in the code being all on one line) it compiles.

I tried using Vim in a Cygwin shell and the file it wrote also compiled without complaint (presumably Unix-style newlines, as Notepad renders that file on one line while Vim looks correct).

@thejoshwolfe
Copy link
Sponsor Contributor

You need to configure your editor to use unix line endings to write Zig code. Additionally, you need to configure your editor to use spaces instead of hard tabs for indentation.

notepad.exe has neither of these features, and it can't even comprehend unix line endings. This has been a long standing bug/missing feature in the windows default plain text editor. Notepad is in fact so deficient as a text editor, that literally every single other text editor in popular use today can comprehend unix line endings. Notepad is the worst text editor in popular use, and has been for decades. Zig will not bend to accommodate Microsoft's gross incompetence or nefarious stunts in their inability or unwillingness to provide a decent default text editor to their paying consumer base. Notepad is the problem here, not Zig. (It's not just me. Here's other angry people complaining about Notepad.)

The rationale for only supporting unix line endings and no hard tabs is part of the "only one obvious way to do things" philosophy. From a practical perspective, never having windows line endings makes it easier to write tools that read zig source files. For example, a tool that searches for "\n\n// TODO" and replaces it with something else that includes newlines: it's much easier to do this without worrying about newline style. Furthermore, git and svn have strange features that convert newline styles at odd times, and now all that's irrelevant for Zig.

Variable newline style and variable indentation style are features that Zig does not support.

Is this documented anywhere, or are users just expected to run into cryptic errors like this? Like a CR character doesn't even print properly in a terminal.

@hasenj
Copy link

hasenj commented Oct 18, 2017

I'm not sure if Zig does this on purpose or it was overlooked, but as far as I'm concerned this is a good feature. Different styles of indentation and line endings cause endless headaches when working on a collaborative project, for example using source control software such as git, etc.

I know notepad is the default text editor in windows, but nearly all developers use something else to write code, such as notepad++ or visual studio.

VS Code is also a good option. It's developed by Microsoft, and it's free.

@PavelVozenilek
Copy link

Nim does it right:

Any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform.

Multiline string should insert LF newlines, as in C. If someone wants CR he could add it via \r.

Correct newlines are is not just problem of stupid Notepad: if one copy pastes example from webpage he gets CR/LF too. Imagine someone failing with Hello World.

@andrewrk andrewrk changed the title v0.1.1 compiler unhappy with source line endings in windows more clear error message when tabs or windows line endings encountered Oct 18, 2017
@andrewrk andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Oct 18, 2017
@andrewrk andrewrk added this to the 0.2.0 milestone Oct 18, 2017
@thejoshwolfe
Copy link
Sponsor Contributor

if one copy pastes example from webpage he gets CR/LF too.

Pasting into what editor does that?

@PavelVozenilek
Copy link

@thejoshwolfe: Notepad, Sublime Text 2 and probably anything else. I do not know Windows editor which by default uses Unix line ending and converts to this style automatically.

@thejoshwolfe
Copy link
Sponsor Contributor

thejoshwolfe commented Oct 18, 2017

@PavelVozenilek Are you saying that when an editor is configured to use unix line endings or is editing a file that already has consistent unix line endings, then pasting from a webbrowser inserts the wrong kind of line ending? Or are you just saying that windows line endings are typically the default line ending style before you configure it in your editor?

@PavelVozenilek
Copy link

@thejoshwolfe: the editors I know (VC++, Sublime Text 2, Notepad ...) do not have configuration option to force Unix ending everywhere from now. At the best one switch it manually file by file.

Programable editors like vim, probably, but I tend to avoid tools smarter than me.

I do not understand why this is even a problem. Line ending chaos is real, it won't go away, pragmatic solution (accept all) is easy and then the mess disappears from the view of ordinary user.

@kyle-github
Copy link

kyle-github commented Oct 18, 2017

@thejoshwolfe: I think it is the browser that does this, not the editor. HTML is defined as using \r\n not \n. Most browsers let you get away with it on input, but when you copy and paste I think it recreates "correct" HTML from the DOM. Not sure about this, but I have run into the problem consistently.

I think @PavelVozenilek has a point. Every useful editor can manage to translate the line endings just fine but few allow you to do it at a project level and change everything automatically. However, the two main platforms, Windows and Mac, do not use the line ending convention that Zig uses. I happen to use Linux, but that is a minority platform.

I also tend to like the use of tools like go-fmt simply because it completely eliminates an entire class of bike-shedding. I've wasted too much time fighting about formats over the years. It is not a winnable war unless you create something like go-fmt.

@thejoshwolfe
Copy link
Sponsor Contributor

I just did some experimentation on my Windows machine. Here's what I found:

Eclipse and Notepad++ normalize line endings when you paste text into a file. Each file is determined to be in a particular style, and anything you put into the file through typing or pasting gets normalized to that style.

In Visual Studio, when you press enter, it uses the newline style of the lines around your cursor. When you paste code with CRLF line endings into any file, you get CRLF line endings for the text that you're pasting without affecting the surrounding text. If you save the file, it saves with mixed line endings without warning you. (You can convert line endings while saving through the "Advanced Save Options".) Visual Studio has no option to automatically normalize line endings on paste or on save. If you want normalized line endings, you gotta do it every time you save.

When you open a file in Visual Studio that has mixed line endings, you get a dialog that prompts you to normalize all the line endings to one style or another.

This is not a bug thread for Visual Studio, but that is a Visual Studio bug. Why would they let you create mixed line endings without warning, but then warn you when you open a file that has mixed line endings? This leads to a "best practice" where you should close and reopen all your files before making a commit to make sure you're not committing files that will produce warnings, which is just silly. This is a bug/missing feature in Visual Studio.

I don't know about Sublime Text; it's not free.

Meanwhile in Linux, copying text out of a web browser always seems to result in unix line endings, not windows line endings. I don't know where you're getting the idea that HTML uses windows line endings; I don't see it in the spec. Maybe you mean HTTP headers? There are parts of the HTML spec that talk about normalizing to CRLF, but I can't figure out how to observe that as an end user. I tried copy-paste and drag-drop text from a Google search page and from the textarea editor I'm typing this in right now, but I always got unix line endings (tested in Chrome).

The so-called "Mac"-style line ending style actually refers to pre-2005 "classic" Mac OS-9 line endings. Modern Mac uses Unix line endings, just like Linux.

@thejoshwolfe
Copy link
Sponsor Contributor

I do not understand why this is even a problem. Line ending chaos is real, it won't go away, pragmatic solution (accept all) is easy and then the mess disappears from the view of ordinary user.

This kind of reasoning leads to JavaScript's automatic semicolon insertion. This kind of reasoning has proven to be very successful at getting widespread adoption. This kind of reasoning is also contrary to the Zen of Zig. In Zig, the code author is required to do more work so that code readers are required to do less work.

I also tend to like the use of tools like go-fmt simply because it completely eliminates an entire class of bike-shedding. I've wasted too much time fighting about formats over the years. It is not a winnable war unless you create something like go-fmt.

Some kind of zig-fmt tool is definitely within the scope of what we want to create. Some plans so far are to make the Zig compiler outright reject any source files that would be modified by zig-fmt. This not only establishes a clear precedent, but forces everyone to use it, or else your code won't compile, not even in debug mode. Whole classes of bikeshedding are gone with this strategy, and all working Zig code has a consistent style. This is already partly the case as we're discussing here, although there's no zig-fmt to fix these problems for you yet. But since the only formatting that's currently rejected is '\r' and '\t' characters, you can pretty trivially clean these up. (You could use this tool, for example.)

@kyle-github
Copy link

@thejoshwolfe, thanks for running the experiments on cut-and-paste from browsers. Interesting that you did not get the CRLN combos. It has been a while since I cared to check and I tend to set all my editor tools to use LN only on save. As you note, Visual Studio is perhaps not the example of what to do :-)

Not sure how I feel about the idea of having the compiler reject code that is not in the One True Format(tm). While I like the idea that all Zig code would be formatted the same, that might be a little too draconian. For Python this almost works because indentation matters and if someone enters code using both tabs and spaces the meaning is ambiguous.

I think your example of JavaScript's semicolon insertion is taking this a bit far. The semicolon insertion (IMO) is an abomination because it can be wrong and change the intended meaning of the code. I do not the see the same thing with handling CRLN, LN or CR as white space.

If format is so important that you would want to make it enforced by the compiler, then perhaps the syntax should be closer to Python? I mean this in all seriousness. I think Guido van Rossum did something really interesting when he decided to make the visual layout elements of Python have meaning at the language level. Python code is not formatted all the same, but even without a python-fmt tool, the code from different projects has more formatting similarities than code in most other languages. I think van Rossum made a mistake in allowing tabs.

@thejoshwolfe
Copy link
Sponsor Contributor

If format is so important that you would want to make it enforced by the compiler, then perhaps the syntax should be closer to Python? I mean this in all seriousness. I think Guido van Rossum did something really interesting when he decided to make the visual layout elements of Python have meaning at the language level.

Yes. My idea is to have both C-like curly braces and Pythonic indentation, and they must agree. Curly braces are arguably easier for tools to understand, and indentation is absolutely easier for humans to understand, so I want both. Curly braces enable things that you can't do with just indentation. And as for the compiler enforcing indentation rules, come on, you should always get indentation right; no excuse for wrong indentation; it's not that hard, and it makes a huge difference for readability.

A neat advantage of having strict indentation rules and curly brace block scopes is that you can have better compile errors for unbalanced curly braces, which is something that is especially chaotic in C and Java.

fn SomeClassThing(comptime T: type) -> type {
    struct {
        const Self = this;
        field: T,
        fn method(self: &const Self) {
            {var i = u32(0); while (i < self.field.len) {
                self.field.something(i);
            }
        } // ERROR: missing '}', or wrong indentation
        // At this point, the compiler can trust the indentation
        // rather than the curly braces for parsing the rest of the file.
    }
}

In practice, indentation tends to be more correct than curly brace balance. This is especially relevant for IDE's where the tooling is trying to follow along with you as you type. Unbalanced parentheses, quotes, curly braces, etc. are very common while you're in the midst of typing code. By contrast, wrong indentation is much less common. Usually the indentation is wrong if you past/move a bunch of code at once, and in that case, you can have an IDE hotkey to trust the curly braces and fix the indentation; then everything's back in agreement.

Generally there are two facets to code formatting: readable for tools and readable for humans. C leans toward readable for tools (curly braces, etc.); Python leans toward readable for humans (indentation, etc.); Zig wants to have it both ways, and so has two sets of formatting rules that must be in agreement for your code to compile. (As a reminder, this is an informal plan for a future version of Zig, not status quo.)

Related is #114.

I think van Rossum made a mistake in allowing tabs.

Absolutely agree. It's horrifying how ugly you can make "correct" indentation in Python by mixing spaces and tabs, even in the same line. What a mess.

Not sure how I feel about the idea of having the compiler reject code that is not in the One True Format(tm). While I like the idea that all Zig code would be formatted the same, that might be a little too draconian.

I have high hopes for this strategy. We've already seen some people scared away by Zig's decision to not support hard tabs, which is a shame. But on the plus side, all Zig code will be consistent with this kind of design philosophy.

@kyle-github
Copy link

@thejoshwolfe, doesn't the use of both curly braces and indentation violate the DRY principle? If one of them is wrong, which one? I think this will add to the cognitive load of the programmer before he or she even thinks about the logic of the code itself.

One of the things I like about Python is that it showed you can have both human friendly and machine friendly syntax at the same time. Parsing Python is not markedly harder than parsing a brace-heavy language. Tooling has become intelligent enough that pleasing the human far outweighs pleasing the machine.

If Zig is to become a useful replacement for C, and I think it has many parts that are very positive, putting too many barriers in the way of adoption could be a problem. The balance that the Go creators did with go-fmt ended up being a pretty good one. Use of go-fmt is not actually required, but your code is going to be heavily criticized and not reused if it isn't used.

I think use of an enforced indentation scheme and providing a tool like zig-fmt would go a very long way to stopping the bike-shedding and help a lot in making all code heavily reusable.

For instance you could simply mandate that all indentation is three spaces per indent level. Fine, 99% of all editors can handle that right now. Mandating that you must have curly braces and that the indentation of the code must also match is not something existing editors are going to help with.

That said, using indentation as a hint that the programmer missed a curly brace? That would be a good thing. I think some editors may do that now. We catch the misaligned indentation by eye easier than the missing curly braces.

Obviously this is all IMO!

@thejoshwolfe
Copy link
Sponsor Contributor

doesn't the use of both curly braces and indentation violate the DRY principle?

Yes, and I think this is a good time to violate that principle. DRY taken to the extreme leads to Haskell's complete type inference, which is very hard to read. Information duplication is only a problem because it's more work to do, which Zig is ok with forcing on authors, and because it can create conflicting information:

If one of them is wrong, which one?

When you're trying to compile your code, probably the indentation is right (still a compile error though). When you're trying to autoformat your code, probably the curly braces are right.

I think this will add to the cognitive load of the programmer before he or she even thinks about the logic of the code itself.

It doesn't seem like much to ask of a programmer to get their indentation right before trying to compile their code. I'm always careful to keep my indentation correct, even if when it's not a compile error, because it makes the code easier to read. An error for incorrect indentation would add 0 cognitive load for me, but if you're not used to being careful to keep your indentation correct, perhaps have your zig compile command preceded by a zig-fmt command. This would be similar to Eclipse JDT's option to run the autoformatter while saving Java source files.

Parsing Python is not markedly harder than parsing a brace-heavy language.

Maybe I'm just bad at it, but I find writing indentation-scoped parsers to be much harder than start/end token-scoped parsers.

Tooling has become intelligent enough that pleasing the human far outweighs pleasing the machine.

I still want to consider people creating new tools. There are lots of cases where you'll want to make a machine that reads Zig code, e.g. custom linters, syntax highlighters, even a one-off sed command to do some refactoring. The more constrained the syntax is, the easier it is to write these tools.

Mandating that you must have curly braces and that the indentation of the code must also match is not something existing editors are going to help with.

Vim can already do this. The = command indents your code based on curly-brace matching even without any installed zig syntax highlighting. It doesn't behave quite correctly in all cases, but it helps.

I don't think curly-braces-to-indentation is an outrageous feature to expect editors to have. And again, I don't think indentation is very difficult to get right manually in the first place.

@thejoshwolfe
Copy link
Sponsor Contributor

thejoshwolfe commented Oct 20, 2017

As an example of how easy Zig is to comprehend with tools, here's a perl one-liner that deletes the content of all the free-form text you can find in Zig code (// comments, "strings", \\ strings, 'characters'). After doing this substitution, every { character is part of the structural syntax.

perl -p -e 's/(["'"'"'])([^\\]|\\.)*?\1/$1$1/g; s/(\/\/|\\\\).*/$1/g'

Even if you don't understand that mess, do notice how short it is. You can't get anything near that simple for C/C++/Java/JavaScript/C# (due to multiline comments), Python (due to multiline string literals), JavaScript/Ruby (due to template strings), PHP/Perl (I don't even want to know), etc. This tokenization simplicity is one reason why Zig does not support /* */-style comments. The tokenizer state is always reset on a newline.

And by newline, I mean '\n', not /\r\n?|\n/. Bringing this rant back to the original topic, Zig source code is meant to be easy to understand by tools, because it's all in a consistent format. The more formatting variability that's allowed, the harder it is to write tools to read it. There should only be one way to do newlines in Zig source code, so that tools don't need to worry about that variability.

EDIT: Just for fun, here's some code in Chrome's debugger console that tries to understand JavaScript source code using simple regex. JavaScript is way too complex for that to work, and you can observe lots of misbehavior in that area if you poke at it long enough. This serves as just one counterexample to the "Tooling has become intelligent enough" idea, fwiw.

@PavelVozenilek
Copy link

What is the use case for tools massaging source code? Qt does it because C++ is lacking usable metaprogramming, but it is hated and very clumsy to use within IDE.

If one-true-newline-rules-them-all is really that important feature then I suggest to switch to CR/LF everywhere. Number of Windows programmers dwarfs the others, and they are not used to accommodate to other platforms.

@hasenj
Copy link

hasenj commented Oct 20, 2017

Even if you don't understand that mess, do notice how short it is.

That's not a feature by Zig's standards though :)

@andrewrk
Copy link
Member

The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler. It's unfortunate that a set of people will have to configure their editors beyond the defaults, but that is necessary for one or the other standard to be selected.

It's not my intention to shut down any discussion, but I would posit the thought in everybody's heads who is involved in this thread: is this how you want to spend your time, discussing whitespace? Or do you want to challenge yourself, and switch over to figuring out some of the more fundamental engineering problems that this project is trying to tackle?

@thejoshwolfe
Copy link
Sponsor Contributor

thejoshwolfe commented Oct 20, 2017

Pros for CRLF: Notepad support. Visual Studio users can be sloppy. You usually don't need to change your native-Windows editor's configuration from the default.

Pros for LF: Easier to write tools that scan for LF than for tools that scan for CRLF. Easier to write tools that produce '\n' instead of "\r\n". sed -i support (always outputs LF regardless of input style). diff looks cleaner, including git diff (metadata always in LF even if the + and - lines end with CRLF).

This is just a start, but the pattern is that LF is more friendly to programmers, and CRLF is more friendly to windows users who don't know any better. In other words, LF is better for advanced users, and CRLF is better for adoption. As an advanced user, I vote for favoring advanced users.

Number of Windows programmers dwarfs the others, and they are not used to accommodate to other platforms.

The number of bad programmers dwarfs the others too, and I'm not sure I want to cater to bad programmers. Sure it's better for adoption, but compromising to increase adoption is not in line with Zig's goals.

@Wulfklaue
Copy link

The issue with errors like this is, when a new user like me downloads Zig and starts coding in Visual Studio Code ( Windows )... and get this error, the result is confusing. Spend first 10 minutes trying out other examples, to run into the same issue. Still did not figure it was file issues. My first idea? Must be a bug in Zig...

In simple terms, the error message is inadequate and needs to be much more clear.

@tiehuis
Copy link
Member

tiehuis commented Oct 26, 2017

Made some improvements (in the above pull request) to these error messages that should handle the most obvious cases and hopefully help a user diagnose exactly where the problem is a little better.

Open to any wording changes or extra special cases if they are considered noteworthy. Regardless of the stance on line endings, hopefully this helps.

2017-10-26-193828_527x46_scrot

2017-10-26-193929_294x49_scrot

2017-10-26-194055_302x48_scrot

@thejoshwolfe
Copy link
Sponsor Contributor

That's a reasonable stance. zig fmt removes lots of freedom from the programmer, and that has its downsides. The tradeoff is that you get consistency between programmers. If everyone is forced to conform to one standard, not everyone will be happy with it, but at least we can try to make it pleasant for as many people as possible. If you have proposals for how everyone's zig code should be formatted, please open issues to argue for them. If you want every zig programmer to be able to format their own code differently, then you're arguing against the purpose of zig fmt (and go fmt).

@nyovaya
Copy link

nyovaya commented Jul 4, 2019

@andrewrk Will tabs be allowed at some point?

@andrewrk andrewrk mentioned this issue Jul 4, 2019
2 tasks
@Calandiel
Copy link

Any updates on this? It's extremely annoying to have the compiler enforce a coding style on you. Even more so than dealing with Rusts borrow checker.

@andrewrk
Copy link
Member

zig fmt fixes whitespace now. I suggest configuring your editor to run that on save. https://github.com/ziglang/zig/wiki/FAQ#why-does-zig-force-me-to-use-spaces-instead-of-tabs

@Calandiel
Copy link

I'd have my two cents to add regarding this topic but if zig fmt can now handle it correctly, it should be far less invasive. Thank you for the quick response.

@codebrainz
Copy link

The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler.

Just a casual observation, but making this preference a syntax error is a sure fire way to guarantee the bikeshedding about it never ends, at best.

@marler8997
Copy link
Contributor

The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler.

Just a casual observation, but making this preference a syntax error is a sure fire way to guarantee the bikeshedding about it never ends, at best.

I don't think you can ever end the bike-shedding. The difference is that since Zig is choosing to only support one format, developers no longer have a decision to make or debate on a project-by-project basis. The bike-shedding is now centralized :)

@Calandiel
Copy link

They do have a decision to not use the language tho x)

@251
Copy link

251 commented Nov 26, 2019

Spaces-only is a real usability issue. Spaces are highly programmer-unfriendly and only work in some way with fancy editor configurations. Let's try to compare in an objective way:

Tabs

advantages

  • user-friendly: tab-width can be configured and adapted to screen width, media type (LaTeX documentation, mobile phone), and personal preferences with a single config setting
  • works in every (even the simplest) editor
  • fewer key presses
  • simple conversion from tabs to spaces possible if needed

disadvantages

  • might not work in some browser input fields
  • maximum line width is tab width dependent

Spaces

advantages

  • maximum line length enforceable

disadvantages

  • accessibility nightmare
  • larger files
  • almost always the wrong indentation width, users prefer 1,2,3,4,8 spaces
  • needs agreement or style guide
    • really annoying when you switch between projects with the same editor
  • has to be reformatted for different media
  • usually leads to mixtures of tabs and spaces as tabs are re-configured to soft tabs etc. in broken editors
  • diffs are harder to read (debatable, see below)
  • conversion from spaces to tabs much harder and usually requires tooling (think tabs for indentation spaces for alignment)

To be honest, I never saw a convincing argument for spaces. It just makes no sense to not use a key that was designed exactly for that and mimicry tabs with soft tabs and alternatives. "Looks everywhere the same" is exactly what you don't want and what brought us the indentation mess.

I'd highly appreciate it if that decision would be reconsidered.

@codebrainz
Copy link

To be honest, I never saw a convincing argument for spaces.

To play devil's advocate (I think tabs should be supported), there are two IMO somewhat convincing arguments against them, and so for spaces:

  1. Inside the lexer/parser/compiler, it's impossible to accurately identify the exact location of a token/node/error. You have to either arbitrarily pick a tab width to use or assume it's one character, and report errors at the offsets using the inaccurate location. For example if there's a syntax error at the front of a line indented with a tab, is the error at column 2 because it's the second char, or 4 or 8 or other guessed tab width? The only way to fix it is to add another option to the tool like editors have to say how many characters a tab is considered to be.
  2. Even if you follow the better practice of using tabs for indentation and spaces for alignment, there are certain cases where the alignment can still get messed up, like if you have aligned with spaces trailing comments at the end of some lines with different indentation levels with tabs, it will be wrong unless viewed with the same tab-width setting as it was aligned with. For example this code was aligned with an editor configured for 2-char tab width:
void foo() {      // this function is silly
	if (1)          // as is this condition
		printf("hi"); // but at least it's friendly
}

A middle ground would be to add a warning flag like -Wtabs, either enabled or disabled by default, so that each project could choose their own preference/convention. IMO, that's overkill though and a better approach would be to just put the switch-case for tab (back?) in the lexer and add a FAQ answering "don't use tabs" for the question "why are my diagnostic message locations inaccurate?".

@251
Copy link

251 commented Nov 27, 2019

is the error at column 2 because it's the second char, or 4 or 8 or other guessed tab width

2 of course (as other compilers do and IDEs expect as error location). Even for fancy arrows in the error message it's simple: copy the affected line, cut at error and replace all non-white-space code points with space.

if you have aligned with spaces trailing comments

This only happens when block comments span different indentation levels. This is a code smell and breaks with every refactoring. (You don't want to check if comments are aligned in every location when you rename a variable, right?) If you're creating ASCII art or quines - fine, but don't use it in real-world code. go fmt for instance would break those too.

And again, this happens with spaces too: try to integrate such sections into documents with different indention requirements...

@thejoshwolfe
Copy link
Sponsor Contributor

thejoshwolfe commented Nov 27, 2019

[tabs] work in every (even the simplest) editor

nope. textarea inputs in web browsers don't support tabs by default, including the github comment editor where i'm typing this right now. It's actually spaces that are supported in every text editor.

[spaces make] diffs are harder to read

nope. it's tabs that are rendered strangely when you prefix every line with a + or -.

...key that was designed exactly for that...

nope. the tab key and character were originally designed to align the cells of a table, not indent structured code. The original purpose of the tab character was to appear in the middle of a line, which is today considered bad practice (at least before the rise of elastic tabstops).

@251
Copy link

251 commented Nov 27, 2019

inputs in web browsers

Because it's a browser and not an editor. A proper in-browser code editor supports tabs (and monospace) - see GitHub's online editor. There are many plugins, key combinations etc. to solve this if one really wants to write code in a browser?! I'd consider an editor that is not able to even work with the ASCII character set (minus weird control characters) as broken.

it's tabs that are rendered strangely when you prefix every line with a + or -

I see what you mean, although it's not rendered "strangely" - it stops exactly at the same level. I thought more about small indentations (1-2 spaces) where you can't make out the indentation level. With tabs you can just pipe it through less -x16 and it gets much clearer.

align the cells of a table

That's exactly aligning at fixed indentation levels... (tabs on typewriters were also used to indent paragraphs or lists, not just tables).

@ziglang ziglang deleted a comment from prez Dec 27, 2019
@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. and removed enhancement Solving this issue will likely involve adding new logic or components to the codebase. labels Dec 27, 2019
@andrewrk
Copy link
Member

I think people are not aware of zig's stance on hard tabs. I updated the wiki page to make it more clear:

Why does Zig force me to use spaces instead of tabs?

see also Why does zig fmt have no configuration options?

@pixelherodev
Copy link
Contributor

pixelherodev commented Dec 29, 2019

less key presses
simple conversion from tabs to spaces possible if needed

Neither of those is strictly true. First off, grammatically speaking the word less does not fit there, and should be replaced with fewer; secondly and more practically, you can use the tab key to insert spaces even in most plain text editors that I've used.

Conversion from tabs to spaces is possible, but the inverse is true as well: the text editor I use literally has the options to go back and forth right next to each other. Also, it's trivial with any good find+replace system to go in either direction.

works in every (even the simplest) editor

Yeah, and? That's not an advantage. To qualify as an advantage, it can't be in the center of a venn diagram, it has to be on a specific side. Both tabs and spaces work in every editor.

larger files

3 bytes per indentation level is not nearly large enough to be a serious concern. You might as well complain that comments require two characters, or that Zig has no multiline comments and therefore a ten-line comment requires twenty characters instead of e.g. four. It's just not a real issue regardless.

usually leads to mixtures of tabs and spaces as tabs are re-configured to soft tabs etc. in broken editors

That's not a disadvantage of spaces; again, that argument could easily be made in either direction. Furthermore, zig uses spaces right now, and you'd be hard pressed to claim that this is an issue with any zig code at the moment.

conversion from spaces to tabs much harder and usually requires tooling (think tabs for indentation spaces for alignment)

Firstly, this is only an issue if you care about supporting both spaces and tabs. Secondly, it's not even true. I've literally done it dozens of times in the past.

needs agreement or style guide

Which exists, that's literally the point of having it be compiler enforced.

There's also many more advantages and disadvantages missing for both sides of the argument in that post.

@andrewrk
Copy link
Member

grammatically

No grammar policing please. Plenty of people around here have English as a second language and teaching English is off topic. Just try to understand intent.

@251
Copy link

251 commented Jan 5, 2020

grammatically speaking the word less does not fit there, and should be replaced with fewer

Thanks, fixed.

secondly and more practically, you can use the tab key to insert spaces even in most plain text editors that I've used.

This misuse led to mixes of tabs/spaces in many code bases.

Conversion from tabs to spaces is possible, but the inverse is true as well: the text editor I use literally has the options to go back and forth right next to each other. Also, it's trivial with any good find+replace system to go in either direction.

That's wrong. You can't go from spaces to tabs without a syntax-aware formatter. Find/replace is simply not capable of distinguishing between indentation and alignment.

Yeah, and? That's not an advantage. To qualify as an advantage, it can't be in the center of a venn diagram, it has to be on a specific side. Both tabs and spaces work in every editor.

I wouldn't consider it "working" from a usability perspective when I have to press 24 times space to reach the 3rd indentation level (or try to hit it with auto-repeat).

3 bytes per indentation level is not nearly large enough to be a serious concern

I prefer a tab to be 8 spaces wide on most screens...

That's not a disadvantage of spaces; again, that argument could easily be made in either direction.

No. Press a space/tab - get a space/tab. Everything else is just a misconfiguration to cope with usability issues of spaces and leads to mixed up indentations (see above).

Furthermore, zig uses spaces right now, and you'd be hard pressed to claim that this is an issue with any zig code at the moment.

It is a usability and more importantly an accessibility issue (think of the limited space on a braille display or the inflexibility to change indentation width).

needs agreement or style guide

Which exists, that's literally the point of having it be compiler enforced.

The compiler doesn't enforce anything. You can indent your code in any way as long as you use spaces. You have to agree on 1/2/3/4/8 spaces for indentation per project. With tabs that's not a problem at all.

There's also many more advantages and disadvantages missing for both sides of the argument in that post.

Please list the most important ones.

@exoticus
Copy link

@pixelherodev the tabs vs spaces thing isn't for aesthetics it's very important for accessibility, tabs are better for accessibility because some users use huge font sizes to be able to see in that case they need to adjust their tab width, because at larger font sizes it becomes harder to even see the spaces.
Just because browsers and some tools don't render tabs correctly doesn't mean to say FU to people with eyesight problems, i think you might regret making that argument some years down the hill when that computer screen finishes it's job :D

Checkout this post which goes into the issue a bit more in detail.

@paulstelian97
Copy link

paulstelian97 commented Apr 14, 2020

I wanted to try to experiment with this language, but the fact that it imposes no tabs, no Windows newlines and in general other coding style issues which will cause me to add 2 more days for a medium size project just to fix these things that shouldn't need fixing as they don't affect the reliability and correctness of the compiled code at all (I am fine with Rust's borrow checker), I'm not going for this anymore. (I would make a "fork" of this language without these... "Political" I shall call them even if not related to actual politics, issues).

I probably will work with 4 spaces per soft tab/indent level. It's fine. But I want my editor (usually an IDE) to be able to give me the proper spaces, automatically convert all preexisting tabs to spaces at a 4-wide alignment (I agree that mixed tabs and spaces are not a good idea), and to be able to delete an indent level with a single backspace character instead of 4 (assuming my style). The only context where I use a different style is Linux kernel, which has its own coding style, imposed things AND the statement that you may break the rules where it makes sense.

And this is the most important factor in imposing the rules at compile time (with an error; warnings may be fine as long as you can locally override them) -- you will be unable to break the rules where it makes sense to do so.

@Sobeston
Copy link
Contributor

add 2 more days for a medium size project

I find this hard to believe.

But I want my editor (usually an IDE) to be able to give me the proper spaces, automatically convert all preexisting tabs to spaces at a 4-wide alignment (I agree that mixed tabs and spaces are not a good idea), and to be able to delete an indent level with a single backspace character instead of 4 (assuming my style).

I use vscode, and it does this. (bottom right - Spaces: 4, UTF-8, LF)

you will be unable to break the rules where it makes sense to do so.

Zig fmt is optional, and can be turned off for top-level declarations with // zig fmt: off. The only hard rules are:

  • No Tabs
  • No Windows newlines
  • UTF-8 (or something that works as a subset of it, like ascii)

and I'm not sure where you'd want to break these rules. Zig fmt is also fairly lenient.

@paulstelian97
Copy link

paulstelian97 commented Apr 14, 2020

On the UTF-8 one I fully agree, it's non-controversial. I fully agree with the premise. Skipping the BOM might be a good feature though, which can be done before giving the characters to the tokenizer (and also, the BOM is a valid UTF-8 character with the value 0xFEFF, which can be conditionally skipped if it's the first one). You can even deny overlong forms of characters (ASCII characters should always be 1 byte), that too makes sense. I won't insist on this.

On Windows newlines, I mostly agree, though again simply skipping the character before the tokenizer (and a stray \r that isn't followed by \n would therefore not be considered a newline) -- so it isn't even part of strings unless escaped in the \r form -- might be an easy solution. Most tools can skip \r on their own as well and, if not, you could run dos2unix on said file anyway. Again, you could run dos2unix on .zig files before compiling or as an added build step so I won't insist either.

On the no tabs one it's a bit more complicated. I'd argue that you should default to no tabs BUT allow support for them in larger projects (not single-file projects) by having some sort of configuration parameter or command line switch to allow tabs (and their width), though only at the beginning of the line (tabs following non-tab characters on the same line can be forbidden just fine). For example build.zig could get by with no tabs at all, and it could have one configuration option that tabs are x spaces wide (which "zig fmt" would also obey). Also preferring 3 spaces per indent level, that's a bit odd (you're the first that I've seen with such a preference, being used to 4 spaces in most projects and 8-wide tabs on the Linux kernel specifically). I'm not sure there are tools that could do this preprocessing either so that we can still fit within our own coding style specifications.

@pixelherodev
Copy link
Contributor

@exoticus Thanks for the link.

Accessibility sways me instantly. Tabs win IMO.

@filipencopav
Copy link

I don't think you can ever end the bike-shedding. The difference is that since Zig is choosing to only support one format, developers no longer have a decision to make or debate on a project-by-project basis. The bike-shedding is now centralized :)

The difference is that developers just won't bother to use Zig)))
I understand about returns, they're not a visible change, but tabs and 4 spaces:
I use an editor which has the tab character be a vertical box draw line, showing the "body" of the function. I can't do that with 4 spaces. Only solution for me: accommodate to the language and feel
pain using 4 tabs or just not use the language since i didn't really learn it.
And the irony was that Zig was supposed to make C painless, by making me suffer from 4 spaces.
Ok, i'm sorry for the rudeness, i'm still wildly interested in Zig, but forcing you to use certain kinds of tabs and a specific type of returns? In my opinion, a language like Zig shouln't care about whitespace at all!

@ziglang ziglang locked as resolved and limited conversation to collaborators Apr 22, 2020
@andrewrk
Copy link
Member

andrewrk commented Apr 22, 2020

This thread has nothing useful left to offer. Here's the FAQ entry pasted:

Why does zig fmt use spaces instead of tabs?

Because no human and no contemporary code editor is capable of handling tabs correctly. Humans tend to mix tabs and spaces on accident, and editors don't have a way to "indent with tabs, align with spaces" without pressing the space bar many times, leading programmers to use tabs for alignment as well as indentation.

Tabs would be better than spaces for indentation because they take up fewer bytes. But in practice, what ends up happening is incorrectly mixed tabs and spaces. In order to simplify everything, tabs are not allowed. Spaces are necessary; we can't ban spaces. But tabs are not strictly needed, so the null hypothesis is to not have them.

Maybe someday, we'll switch to tabs for indentation, spaces for alignment and make it a compile error if they are incorrectly mixed. But if we did that today, writing Zig code would be too hard. For now your options are to configure your editor to insert spaces when you press the tab key, or configure your editor run zig fmt on save (recommended).

What will make it into the final language specification? It isn't decided yet and it doesn't really matter. Just run zig fmt on save.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests