Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bikeshed issue: combinator spelling (|, &) #179

Closed
ljharb opened this issue Apr 21, 2021 · 42 comments
Closed

Bikeshed issue: combinator spelling (|, &) #179

ljharb opened this issue Apr 21, 2021 · 42 comments
Labels
champion group discussion help wanted syntax discussion Bikeshedding about syntax, not semantics.

Comments

@ljharb
Copy link
Member

ljharb commented Apr 21, 2021

For the pattern combinators, the champion group chose spelling that we believe will be widely intuitive to the most developers. TypeScript type notation, as well as pattern matching/case selection in a number of languages, use the pipe (|) and ampersand (&) for "or" and "and" semantics, respectively.

There is a potential confusion here with JS' bitwise OR (|) and bitwise AND (&), but our belief/hope is that users will understand that "pattern mode" is distinctly different (many things that work fine in other parts of the language will be syntax errors where patterns are expected), and will immediately learn that 3 | 4 means "3 or 4", even if they expect it will mean "7" (which, since most users avoid and do not understand bitwise operators, we expect most users will not expect).

Do you have any alternative suggestions? Please provide compelling arguments for them if possible, and I'll edit the OP to compile a list of the viable ones!

Options:

  1. | and &
    • pros: intuitive to TS users and folks using pattern matching in other languages; matches with regex (/(foo|bar)/)
    • cons: conceptual overlap with bitwise ops
  2. || and &&
    • pros: intuitive to JS users
    • cons: conceptual overlap with value selection ops
  3. or and and
    • pros: intuitive to Ruby/Python/C++/C# users and English speakers, no overlap with JS
    • cons: longer to type than other options
@ljharb ljharb added the syntax discussion Bikeshedding about syntax, not semantics. label Apr 21, 2021
@ljharb ljharb changed the title BIkeshed issue: combinator spelling (I, &) BIkeshed issue: combinator spelling (|, &) Apr 21, 2021
@Haroenv
Copy link

Haroenv commented Apr 21, 2021

Intuitively I thought of || and &&, although I'm not sure if that's possible syntax-wise. It would avoid the confusion with bitwise (although they're very uncommon) and typescript types

@ljharb
Copy link
Member Author

ljharb commented Apr 21, 2021

It's certainly possible, since "pattern space" is new, so we can do basically whatever we want.

The overlap with TypeScript types is actually good imo, because it has the exact same semantics. Using ||/&& would not conflict with bitwise operators, but it would conflict (in the same way) with the value selection operators (and, would likely make folks expect ?? to work, which i can't see how it would).

I'll add the suggestion to the list!

@j-f1
Copy link

j-f1 commented Apr 21, 2021

A third option: Python-style 1 or 2 or 3 / { success: true } and { userName }

@ljharb
Copy link
Member Author

ljharb commented Apr 21, 2021

words like "or" and "and" is also Ruby-style; that's not a bad suggestion either!

@haltcase
Copy link

@ljharb C# also uses or / and, which encourages JS to use it because those keywords did not have precedent in the language prior to that — they're just for patterns.

var number = 3;

if (number is 3 or 6) {
  Console.WriteLine("number is 3 or 6");
}

var result = number switch {
  3 or 6 => "number is 3 or 6",
  _ => "number is neither 3 nor 6"
};

I'd vote first for or / and, but | / & is also fine due to the TypeScript (et al.) parallels.

@mpcsh
Copy link
Member

mpcsh commented Apr 21, 2021

I'm fairly opposed to || and && I share Jordan's concern that it will be readily confused with the value selection operators.

@Jack-Works
Copy link
Member

both or / and or | / & looks good to me

@rkirsling
Copy link
Member

Echoing the above: I like | / &, would be totally fine with or / and, would oppose || / &&.

@bathos
Copy link

bathos commented Apr 22, 2021

I would find “or” and “and” to be a big cognitive relief — it seems too easy to misread these as expressions if they repurpose tokens that are normally operators.

@treybrisbane
Copy link

I'm undecided as to which I prefer out of |/& and or/and (both seem like decent choices for different reasons).
I feel like ||/&& is the most risky option in terms of user confusion or being a foot gun, so I'm definitely not as keen on that.

@treybrisbane
Copy link

I feel like someone's gotta ask the question... Is it worth considering unicode characters? E.g. The mathematical symbols for union and intersection, and ? 😅

@Jack-Works
Copy link
Member

I feel like someone's gotta ask the question... Is it worth considering unicode characters? E.g. The mathematical symbols for union and intersection, and ? 😅

LOL, you should use Logic Or “∨” (U+2228) and Logic And "∧" (U+2227) instead of union and intersection. And future JS devs will have a cheat sheet file on their desktop so they don't need to copy-paste that from MDN again and again.

@treybrisbane
Copy link

Yeah I know the DX is far from ideal. It just might be worth explicitly stating that unicode characters are out of scope if that's the case. 😀

@bathos
Copy link

bathos commented Apr 22, 2021

⍝ If TC39 needs to raise money for a big pizza party, I bet an Official ECMAScript Keyboard would fly off the shelves.

A special keyboard from decades ago made to support the APL programming language’s unique symbol characters

@ljharb
Copy link
Member Author

ljharb commented Apr 22, 2021

@treybrisbane explicitly, they are always out of scope.

@t7yang
Copy link

t7yang commented Apr 27, 2021

| and || is OK for me. I always prefer symbol over "word".

@topaxi
Copy link

topaxi commented Apr 27, 2021

As for the con of and or or being "longer" to type, I'd argue it's faster to type due to no modifiers being involved 😅

@Alhadis
Copy link

Alhadis commented Apr 27, 2021

Charging |/& with new semantics is a mistake, IMHO:

  • It's inconsistent with the rest of the language.
  • It makes bitwise operations impossible.
  • It hinders one's ability to copy+paste expressions from other contexts.
  • Having two different "or" operators is potentially confusing for JS/coding novices.

The tradeoff to these problems is quite minimal: one less keystroke, and a postulated convenience for TypeScript programmers.

since most users avoid and do not understand bitwise operators

I'm sorry, what.

@ljharb
Copy link
Member Author

ljharb commented Apr 27, 2021

I'm sorry, what.

@Alhadis there's a whole eslint rule, no-bitwise, to ensure they are not used, and the airbnb eslint config/styleguide forbids them. (let's please not debate the philosophy here; suffice to say, it's a common thing to avoid the operators, and there's also certainly a group of people that use them frequently)

@silicakes
Copy link

It's certainly possible, since "pattern space" is new, so we can do basically whatever we want.

Building on that, if we can do something like /(a|b)/ within "regex space" without it being confused with bitwise operations - we have the precedence we need for this as well.

+1 for using | &

@mpcsh mpcsh changed the title BIkeshed issue: combinator spelling (|, &) Bikeshed issue: combinator spelling (|, &) Apr 27, 2021
@mpcsh
Copy link
Member

mpcsh commented Apr 27, 2021

@silicakes thanks for that comment, that's a great point - regexes are probably the closest intuition that we can hope to draw on here. I've updated the OP to add that to the "pros" column.

@haltcase

This comment has been minimized.

@tabatkins
Copy link
Collaborator

It makes bitwise operations impossible.

Note: it does not. Arbitrary operations, including bitwise ops but also every other operator, are already impossible in the general pattern syntax. (You can't write when (foo + bar) {...}, for example.)

If you use the pin operator, you break out into general expression syntax, and you can use any operator you want, including the bitwise opts. (You can write when ^(foo | bar) {...} to match when the matchable value equals foo | bar; this could be useful when working with bitflags, for example.)

@Alhadis
Copy link

Alhadis commented Apr 28, 2021

Note: it does not. Arbitrary operations, including bitwise ops but also every other operator, are already impossible in the general pattern syntax

My bad. I overlooked the part on leaf patterns, so naïvely assumed "pattern" meant "any valid ECMAScript expression".

The other points still stand though.

@tabatkins
Copy link
Collaborator

Sure, just wanted to make that particular correction, because if it were correct it would be a significant downside on its own.

@ljharb
Copy link
Member Author

ljharb commented May 10, 2021

The champion group largely prefers (but not unanimously) using and/or, although there's some preference for &/| due to their familiarity from TypeScript, and regex grammar (|).

We'll run this by plenary during our next presentation.

@t7yang
Copy link

t7yang commented Jun 4, 2021

This list may not complete, but most of the language here use |/& instead of and/or.

Another disadvantage is if and/or became language keyword, this may break the existing code.

@ljharb
Copy link
Member Author

ljharb commented Jun 4, 2021

@t7yang how could it possibly break existing code? they'd only be keywords inside a pattern context, which no existing code is.

@t7yang
Copy link

t7yang commented Jun 4, 2021

@ljharb OK, my mistake, we should use ^ for LHS.

@ljharb
Copy link
Member Author

ljharb commented Jun 4, 2021

@t7yang the current proposal uses ^ to mean "expression". absent that, it's a pattern, and that's where and and or would apply.

@Alhadis
Copy link

Alhadis commented Jun 4, 2021

@ljharb Just curious, why on earth was ^ chosen to perform this role? It already has 3 unrelated functions that depend on context:

  1. Outside of a regex: Bitwise XOR
  2. Inside a regex, but outside a character class: Match beginning of line/input
  3. Inside a regex, and inside a character class: Set complement (or to us math plebs: "negating a character range")

Given that regular expressions are often referred to as "patterns", ^ would be the very last character I'd choose to mean "not a pattern".

@bathos
Copy link

bathos commented Jun 4, 2021

The RegExp uses (for ^ but also | or anything else) seem pretty different to me. RegExp literals usually look very distinctive, not super confusable w/ other constructs (to me anyway; maybe not to everyone. certainly not to lexers :).

Most RegExp "syntax character" + operator homop - homoglyph(?) - pairs have unrelated meanings. "Brackets surround things, creating groups" is pretty much as far as it goes, right? Given the RegExp grammar is a narrow DSL for describing other regular(-ish) grammars, this seems pretty natural. Superficial alignments might even be a downside if they could imply a stronger analogy than really exists.

(Not for or against ^. These are just my thoughts on how RegExp syntax may not be a great place to seek consistency or precedent.)

@Alhadis
Copy link

Alhadis commented Jun 4, 2021

@bathos That wasn't the point I was making. I brought up ^'s three existing uses because they have nothing to do with a proposed fourth. Since we're charging | and & with new meanings based on set theory, why choose a character easily mistaken for a complement/exclude[1] operation? Especially when it's an escape character. Nowhere are carets used as escape characters except in Windows batch-scripts. And nobody in their right minds would use cmd.exe as a source of engineering inspiration (well... except nightmares, perhaps).

Personally, I'd use a bare backslash to suppress the usual interpretation of a pattern: \foo or \(foo). 🤷‍♂️


[[1]]: Or whatever the correct term is. @Jack-Works, send some of [that](https://github.com//issues/179#issuecomment-824699435) juicy mathematical enlightenment this way. 👍

EDIT: I'm leaving that abortion of markdown parsing intact, because holy hell, what the actual fuck just happened?

@ljharb
Copy link
Member Author

ljharb commented Jun 4, 2021

@Alhadis ^ was chosen because that's what elixir uses. Feel free to suggest a backslash on #178.

@bathos
Copy link

bathos commented Jun 4, 2021

@Alhadis Your positions make sense to me; my comment may have seemed more specific/counterargumenty than I intended. Other folks had mentioned RegExp syntax previously and seeing it again tripped a general “should RegExp really matter here?” question that was stewing somewhere in my head. While I’d answer that “probably not much,” it may still be just as true that "^" isn’t a great choice, and you have listed other reasons.

@m-rutter
Copy link

m-rutter commented Jun 20, 2021

I think one thing to consider is how long chains of pattern combinators would likely be formatted. For example in in typescript union types and in Rust pattern matching you have things like this respectively:

type Value =
  | Foo
  | Bar
  | Baz
  | FooBar
  | FooBaz
  | FooFoo
  | BarBaz
  | BarFoo
  | BarBar;
match value {
      match foo {
        | Foo::Bar(value)
        | Foo::Baz(_ignore_this, value)
        | Foo::Quux(_ignore_this, _and_this, value) => {
            println!("value = {}", value);
            Whatever::One
        }
        Foo::Spam(value) | Foo::Eggs(_, value) => Whatever::Two(value),
    }
}

In both cases the | is allowed as a leading character to indicate the start of a chain of patterns that is broken onto multiple lines.

I think this is desirable for readability, but this isn't really an option for and/or, because it then loses its readability (arguable) and more importantly from an English speaker's perceptive its nonsensical to start with and or or.

match (status) {
  when (
    | 200
    | 201
    | 202
    | 203
  ) {
    // ...
  }
}
match (status) {
  when (
    or 200
    or 201
    or 202
    or 203
  ) {
    // ...
  }
}

@ljharb
Copy link
Member Author

ljharb commented Jun 20, 2021

You wouldn’t be able to start with a combinator regardless, just like how && and || work.

@m-rutter
Copy link

You wouldn’t be able to start with a combinator regardless, just like how && and || work.

There is no reason why it couldn't. Patterns are all open design space because they don't exist yet. The spec could allow a leading character.

Whether it should can be open to debate.

@ljharb
Copy link
Member Author

ljharb commented Jun 20, 2021

We could certainly discuss it, but i don’t like it myself, and i suspect many delegates would push back on it for a number of reasons.

Either way, it’s a weak argument in favor of | and & as combinators.

@theScottyJam
Copy link
Contributor

Allowing a leading "|" would make things awkward if we ever decide to add an "&" combinator, or any other kind of combinator. I don't think it makes sense to allow an operator-like symbol in a leading or trailing position of something.

@ljharb
Copy link
Member Author

ljharb commented Jun 20, 2021

The current proposal includes both | and & (or “and” and “or”). There are no plans to add one without the other.

@mpcsh
Copy link
Member

mpcsh commented Dec 6, 2021

At this juncture, the champions are split between &/| and and/or. The majority preference is for the symbols, but we expect committee pushback.

Our conclusion is that the feature is dramatically more important than the spelling. We plan to present this to committee with both options available, and fall back on the word spellings if we get pushback on the symbol spellings. I'll be updating the README to reflect this conclusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
champion group discussion help wanted syntax discussion Bikeshedding about syntax, not semantics.
Projects
None yet
Development

No branches or pull requests