Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require parentheses sometimes to disambiguate confusing operator precedence #114

Open
PavelVozenilek opened this Issue Feb 9, 2016 · 19 comments

Comments

Projects
None yet
7 participants
@PavelVozenilek
Copy link

PavelVozenilek commented Feb 9, 2016

Look at the cryptic table:

x() x[] x.y
!x -x ~x *x &x ?x %x %%x
x{}
* / %
+ - ++
<< >>
&
^
|
== != < > <= >=
&&
||
?? %%
= *= /= %= += -= <<= >>= &= ^= |= &&= ||=

Why it is all needed? Who will find the time to learn it? I had not internalized C precedence after two decades.

Language Pony does it differently:

Pony takes a different approach and outlaws infix precedence. Any expression where more than one infix operator is used must use parentheses to remove the ambiguity. If you fail to do this the compiler will complain.

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented Feb 9, 2016

Interesting idea. This is on the table since it adheres to the principle that being readable is more important than being writable.

Arguments against it are that it would be strange and cumbersome for people coming from a C background. People are used to 3 * 4 + 1 meaning that the multiplication comes first.

@emanuelpalm

This comment has been minimized.

Copy link

emanuelpalm commented Feb 9, 2016

👍

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented Feb 9, 2016

Some precedence is necessary, such as assignment, function call invocation, array index. Too much parens and the language turns into lisp.

@PavelVozenilek

This comment has been minimized.

Copy link
Author

PavelVozenilek commented Feb 9, 2016

Assignment, function call, array index are not infix operators. Could there be any ambiguity with them?

I can imagine having default precedence for basic arithmetic (+-/*), this is what little kids get drilled in schools, but IMO it is not worth the effort. One never writes 3 * 4 + 1 in the code, one does some_variable * another_variable + something_completely_else, and all advantages of short infix syntax get lost there anyway. Brackets would help to separate it into visual chunks.

@PavelVozenilek

This comment has been minimized.

Copy link
Author

PavelVozenilek commented Feb 9, 2016

Also, this feature would make adding user defined operators into the language easier.

@thejoshwolfe

This comment has been minimized.

Copy link
Member

thejoshwolfe commented Feb 9, 2016

I think this would be a cool idea for some categories of operators, specifically the ones i can never remember the precedence for. e.g. a && b || c would be a compile error, or a & b | c. i don't think we need to go so far as to outlaw a + b * c though.

@andrewrk andrewrk added the enhancement label May 7, 2016

@andrewrk andrewrk added this to the 0.1.0 milestone Mar 26, 2017

@thejoshwolfe

This comment has been minimized.

Copy link
Member

thejoshwolfe commented Apr 24, 2017

see also #29

@phase

This comment has been minimized.

Copy link

phase commented Apr 25, 2017

My favorite quote from the Pony page is

Most people will remember to do multiplication before addition, but what about left bit shifting versus bitwise and? Sometimes people misremember (or guess wrong) and that leads to bugs. Worse, those bugs are often very hard to spot.

I love the idea for specific parens. It makes the execution crystal clear.

@thejoshwolfe thejoshwolfe changed the title precedence table require parentheses sometimes to disambiguate confusing operator precedence Apr 25, 2017

@thejoshwolfe

This comment has been minimized.

Copy link
Member

thejoshwolfe commented Apr 25, 2017

The current operator precedence table is inspired by C. But I think Zig can do better than that. For example, & > ^ > | seems pretty silly. Even if you think & > | makes sense (because of the analogy to * and +), it's still pretty confusing that ^ is in the middle of them.

I wanted to write up a proposal here to outline a new precedence table for operators in Zig, but Zig's grammar is a bit more complicated than the current table in doc/langref.md shows, and it's not as simple as rearranging and combining some rows.

At the very least, I'd like to see required parentheses for: &, ^, |, <<, >> intermixed with each other, and for and and or intermixed with each other. There's more to the proposal, but I need to do a more careful analysis of Zig's grammar before posting something comprehensive.

@andrewrk andrewrk added the breaking label May 2, 2017

@thejoshwolfe

This comment has been minimized.

Copy link
Member

thejoshwolfe commented May 9, 2017

Here's the real story on Zig's expression grammar (this does not include grammars such as statements, case declarations, top level declarations, etc.). I'll write a follow up comment documenting what i think the expression grammar should be changed to.

There are three types of operator:

  • infix operators: always start with x and end with y, where x and y are some expressions
  • prefix operators: always end with x, where x is some expression
  • suffix operators: always start with x, where x is some expression
  • primary expressions: not really operators, because they are either a single token, or they start and end with unambiguous grouping tokens.

A chainable operator is one that you can use like a + b + c, !!x, or a(b)(c), and has a well-defined associativity. All infix chainable operators are left-to-right associative (e.g. a - b - c == (a - b) - c). Chainable prefix and suffix operators have an obvious associativity (right-to-left and left-to-right respectively). A not-chainable operator cannot be used repeatedly, e.g. inline inline a, a{}{}, and a = b = c are all not allowed.

The following is a list of operator groups. The groups are listed here in order of highest-to-lowest precedence. Each item in a group has equal precedence. If a group is chainable, the operators in the group can be intermixed while chaining (e.g. a.b[c]).

primary expressions:

  • 123 - and variations, where 123 is a number literal
  • "..." - where ... is the characters of the string literal
  • '...' - where ... is the characters of the char literal
  • true, false, null, continue, undefined, error, this, unreachable
  • id - where id is an identifier
  • @id(...) - where id is an identifier and ... is a parameter list
  • goto label - where label is a label
  • break, return - without an expression
  • (x)
  • {...} - where ... is a statement list
  • asm ("...") - and variations, where ... is inline assembly
  • struct { ... } - and variations, where ... is a container member list
  • if (a) {...} - and variations, where ... is a statement list[1]
  • if (a) b else {...} - and variations, where ... is a statement list[1]
  • while (a) {...} - and variations, where ... is a statement list[1]
  • while (a) b else {...} - and variations, where ... is a statement list[1]
  • for (a) {...} - and variations, where ... is a statement list[1]
  • for (a) b else {...} - and variations, where ... is a statement list
  • switch (a) {...} - where ... is a case list
  • comptime {...} - where ... is a statement list[1]

prefix, not chainable:

  • inline x

suffix, chainable:

  • x(...) - where ... is a parameter list
  • x[a] - and variations
  • x.id - where id is an identifier

prefix, chainable:

  • !x
  • -x, -%x
  • ~x
  • *x
  • &x - and variations
  • ??x
  • %%x
  • ?x
  • %x
  • []x - and variations
  • extern fn() -> x - and variations

suffix, not chainable:

  • x{...} - where ... is a container init body

infix, chainable:

  • x * y, x *% y
  • x / y
  • x % y
  • x ** y

infix, chainable:

  • x + y, x +% y
  • x - y, x -% y
  • x ++ y

infix, chainable:

  • x << y, x <<% y
  • x >> y

infix, chainable:

  • x & y

infix, chainable:

  • x ^ y

infix, chainable:

  • x | y

infix, not chainable:

  • x == y - and variations
  • x < y - and variations

infix, chainable:

  • x and y

infix, chainable:

  • x or y

infix, chainable:

  • x ?? y
  • x %% y - and variations

infix, not chainable:

  • x = y - and variations

prefix, chainable:

  • return x
  • break x
  • if (a) x - and variations[1]
  • if (a) b else x - and variations[1]
  • while (a) x - and variations[1]
  • while (a) b else x - and variations[1]
  • for (a) x - and variations[1]
  • for (a) b else x - and variations[1]
  • comptime x[1]

[1]: for the statement-like expressions if, for, while, and comptime, a special exception is made that if a body expression starts with a block (starts with a {), then the body expression ends at the end of that block (at the }). This disallows, for example, if (a) {b} + 1 else {c}.

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented May 10, 2017

For what it's worth I just removed the inline prefix operator. (See #306)

@thejoshwolfe

This comment has been minimized.

Copy link
Member

thejoshwolfe commented May 17, 2017

Here's my proposal for the "operator precedence table". It's a lot more complicated than a table.

Before people automatically reject this grammar for being overly complicated, the goal here is to make Zig source code less complicated. My goal for grouping everything the way I did was so that any time a user would ask "Which of these two operators has precedence over the other.", the answer is that they're incompatible, and you need to use parentheses.

Now clearly it's difficult to predict when users will ask that, and so most of the proposal is what I think is a good idea to me, and it's wildly up for debate and discussion. For example, why did I make modulo not heterogeneously chainable with multiplication? Because it seemed like you wouldn't really need that, and it would be more clear with parentheses I guess. It's pretty subjective, so debate on all points of the proposal is welcome and requested, even if you're just giving your opinion.

What follows is a proposal (that has a lot in common with the existing situation), then a summary of changes from the existing grammar, and then some examples.

Proposal

There are three types of operator:

  • infix operators: always start with x and end with y.
  • prefix operators: always end with x.
  • suffix operators: always start with x.
  • primary expressions: not really operators, because they are either a single token, or they start and end with unambiguous grouping tokens.

The tree below has ordered lists and unordered lists. The ordered lists specify a precedence relation in order of highest to lowest (1 is the highest precedence.). An unordered list specifies a group of items that cannot be intermixed with each other, unless the header for the list specifies otherwise.

Chainability comes in three forms:

  • not chainable. There is no implicit associativity, so you must use parentheses to combine 3 or more operands. e.g. you can't do a == b == c.
  • chaininable somehow. This means for infix operators associativity is implicitly left-to-right when combining 3 or more operands with the operator in between them. e.g. a + b + c is implicitly equivalent to (a + b) + c. When a group (specified with an unordered list) of 2 or more items is chainiable, it will be one of these two subtypes:
    • heterogeneous chainable. This means all members of the group can be intermixed while chaining. e.g. a + b - c.
    • homogeneous chainable. Each member of the group can be chained only with itself. e.g. a & b & c is allowed, but a & b | c is not.
  1. primary expressions, unambiguous:
    • 123 - and variations, where 123 is a number literal
    • "..." - where ... is the characters of the string literal
    • '...' - where ... is the characters of the char literal
    • true, false, null, continue, undefined, error, this, unreachable
    • id - where id is an identifier
    • @id(...) - where id is an identifier and ... is a parameter list
    • goto label - where label is a label
    • break, return - without an expression
    • (x)
    • {...} - where ... is a statement list
    • asm ("...") - and variations, where ... is inline assembly
    • struct { ... } - and variations, where ... is a container member list
    • if (a) {...} - and variations, where ... is a statement list[1]
    • if (a) b else {...} - and variations, where ... is a statement list[1]
    • while (a) {...} - and variations, where ... is a statement list[1]
    • while (a) b else {...} - and variations, where ... is a statement list[1]
    • for (a) {...} - and variations, where ... is a statement list[1]
    • for (a) b else {...} - and variations, where ... is a statement list
    • switch (a) {...} - where ... is a case list
    • comptime {...} - where ... is a statement list[1]
  2. suffix access and call operators, heterogeneous chainable:
    • x(...) - where ... is a parameter list
    • x[a] - and variations
    • x.id - where id is an identifier
  3. prefix high-precedent operators, heterogeneous chainable:
    • !x
    • -x, -%x
    • ~x
    • *x
    • &x - and variations
    • ??x
    • %%x
    • ?x
    • %x
    • []x - and variations
    • extern fn() -> x - and variations
  4. suffix container initialization operator, not chainable:
    • x{...} - where ... is a container init body
  5. infix value manipulation operators:
    • math:
      1. multiplicative:
        • scalar multiplicative, heterogeneous chainable:
          • x * y
          • x *% y
          • x / y
        • modulo, not chainable:
          • x % y
        • repetition, not chainable:
          • x ** y
      2. additive:
        • scalar additive, heterogeneous chainable:
          • x + y
          • x +% y
          • x - y
          • x -% y
        • concatenation, chainable:
          • x ++ y
    • bit shifting, not chainable:
      • x << y
      • x <<% y
      • x >> y
    • bitwise algebra and monad coercion, homogeneous chainable:
      • x & y
      • x ^ y
      • x | y
      • x ?? y
      • x %% y - and variations
  6. infix comparison operators, not chainable:
    • x == y
    • x != y
    • x < y
    • x <= y
    • x > y
    • x >= y
  7. infix boolean algebra operators, homogeneous chainable:
    • x and y
    • x or y
  8. prefix control operators, heterogeneous chainable:
    • return x
    • break x
    • if (a) x - and variations[1]
    • if (a) b else x - and variations[1]
    • while (a) x - and variations[1]
    • while (a) b else x - and variations[1]
    • for (a) x - and variations[1]
    • for (a) b else x - and variations[1]
    • comptime x[1]

[1]: for the statement-like expressions if, for, while, and comptime, a special exception is made that if a body expression starts with a block (starts with a {), then the body expression ends at the end of that block (at the }). This disallows, for example, if (a) {b} + 1 else {c}.

Proposed Changes

  • Most of the "infix value manipulation operators" group is new.
    • Bitshifting cannot be chained.
    • ++ cannot be mixed with + and the like.
    • ** cannot be mixed with * and the like, and cannot be chained.
    • % cannot be mixed with other multiplicative operators, and is not chainable.
    • None of the bit manipulation operators can be intermixed with the math operators. Not even a << b + c.
    • ?? and %% infix operators cannot be intermixed.
  • and and or cannot be intermixed.
  • assignment is no longer an expression operator, but a statement construct. That's why it's not in the above tree.
  • inline is gone; see #306.

Examples

  • valid: return a and b == c + d * e;
  • error: x << n + 1. fixed: x << (n + 1).
  • error: x << 1 << n. fixed: x << (1 << n) (or whatever).
  • error: " " ** spaces_per_indent ** indent_level. fixed: " " ** (spaces_per_indent * indent_level).
  • error: foo() ?? bar() %% 0. fixed: foo() ?? (bar() %% 0).

Here's some code that already follows this proposal from std/rand.zig:

  • prev_value = int(i) +% f *% (prev_value ^ (prev_value >> (int.bit_count - 2)));
  • mt.array[i] = mt.array[i + m - n] ^ (x >> 1) ^ mag01[x & 0x1];

Here's some code from std/base64.zig that's fine:

  • dest[out_index] = alphabet[((source[i] & 0x3) <<% 4) | ((source[i + 1] & 0xf0) >> 4)];

But this code from std/base64.zig needs parentheses:

  • dest[dest_index] = ascii6[source[src_index + 2]] <<% 6 | ascii6[source[src_index + 3]];

Here's a curious case from std/rand.zig, where there are too many parentheses:

  • return start + (rand_val % range);

Perhaps that last case is a clue that we should not consider % to be a multiplicative math operator, but rather be a value manipulation operator like & that doesn't mix with any other value manipulation operator.

I want ?? to be chainable because of this case:

  • valid: foo() ?? bar() ?? 0. equivalent to foo() ?? (bar() ?? 0) and (foo() ?? bar()) ?? 0.

But now that I think about that some more, it might not actually be correct. Consider if foo() returns a ??u32. This means (foo() ?? bar()) ?? 0 would unwrap both maybes, but foo() ?? (bar() ?? 0) would not. Hm. So either we need to define ?? as right-to-left associative, or make it not chainable. Similarly with %%.

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented May 17, 2017

I agree with this proposal. Regarding the question at the end, let's just make it not chainable.

Regarding %, I think it's related enough to / that it should have the same chain properties as /.

@thejoshwolfe

This comment has been minimized.

Copy link
Member

thejoshwolfe commented May 17, 2017

can you give a usecase for chaining a % b % c? it seems like you'd only ever want at most one % in a chained expression​.

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented May 18, 2017

I meant that you would be able to do return start + rand_val % range

@andrewrk

This comment has been minimized.

Copy link
Member

andrewrk commented Jun 17, 2017

I'll take a look at this before 0.2.0. If anyone wants to see this in 0.1.0 feel free to make a pull request.

@andrewrk andrewrk modified the milestones: 0.2.0, 0.1.0 Jun 17, 2017

@andrewrk andrewrk added the accepted label Dec 3, 2017

@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Jan 3, 2018

@andrewrk andrewrk removed this from the 0.3.0 milestone Feb 28, 2018

@andrewrk andrewrk added this to the 0.4.0 milestone Feb 28, 2018

@skyfex

This comment has been minimized.

Copy link

skyfex commented Jul 24, 2018

I think requiring more parenthesis is wise. Even though I know the precedence in C pretty well, I add parenthesis to make the code easier to read. The proposals here all look pretty good.

I think precedence should be kept for "algebras" though (within them, not between them). That is, a * b + c and a and b or c and a & b | c. If it's kept for algebra on numbers, but not on booleans, then it's more or less arbitrary and it'd probably confuse as many or more people than it would help.

I know that the precedence of logical/bitwise boolean operators isn't high school knowlegde. But I think it should be considered basic programming knowledge that or/and is similar to add/multiply. I mean, if we can't assume a tiny bit of knowledge of boolan algebra from a programmer, especially the kind that is likely to use Zig, what can we assume?

@ghost

This comment has been minimized.

Copy link

ghost commented Jul 24, 2018

I made a proposal about the same issue here crystal-lang/crystal#5074

based on this blog post https://foonathan.net/blog/2017/07/24/operator-precedence.html

with examples crystal-lang/crystal#5074 (comment)

It was generally well accepted but there was no one with sufficient expertise to make it happen

@lerno

This comment has been minimized.

Copy link

lerno commented Jul 25, 2018

I would actually argue (see #1287) that bitshift operations should have higher precedence so that a >> b + c == (a >> b) + c and also obviously allowing b & 1 << 4 == b & (1 << 4) both are useful when working with bits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.