Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for operator dictionary #87

Closed
NSoiffer opened this issue May 3, 2019 · 20 comments
Closed

Fixes for operator dictionary #87

NSoiffer opened this issue May 3, 2019 · 20 comments
Labels
MathML 4 Issues affecting the MathML 4 specification need specification update Issues requiring specification changes need tests Issues related to writing WPT tests

Comments

@NSoiffer
Copy link
Contributor

NSoiffer commented May 3, 2019

I know the MathML WG did a pass over the operator dictionary for either MathML 2 or MathML 3 (or both). However, I'm at a loss to explain why these entries have a relative precedence that matches several relational operators [Copy paste into a table didn't work well, so here's the image of the whole range along with some context]:
image

The entries between the red lines seems suspicious. In particular why are ⁄ ⁄, ∕ ∕, and ∣ | ∣ here? Their priorities differ from the ASCII char / and ÷ which are more appropriately at 660. ∣ | ∣ might not be 660, but certainly it differs from relational operators. Same for many others such as "ring operator" and "ratio".

These should be rethought.

@NSoiffer NSoiffer added the MathML 4 Issues affecting the MathML 4 specification label May 3, 2019
@fred-wang fred-wang added MathML Core Issues affecting the MathML Core specification need implementation update need resolution Issues needing resolution at MathML Refresh CG meeting need specification update Issues requiring specification changes need tests Issues related to writing WPT tests labels May 16, 2019
@fred-wang
Copy link

I started to add tests for the operator dictionary in web-platform-tests/wpt#19123

@NSoiffer Is this only for priority values? Or do you have other changes in mind?

In general the fence, separator and priority values don't affect the layout described in MathML Core so they are not going to be tested by WPT for now and can probably just be discussed/decided in a MathML Full meeting.

In any case, I'm adding this on the agenda for the next meeting, as there is #6 too.

@NSoiffer
Copy link
Contributor Author

The precedences tend to be related to the left/right spacing amounts, so inappropriate precedences could also mean the spacing is not right. The form could also be wrong. Just off the top of my head, I'm suspicious that 'end of proof' (220E) is an infix operator (seems postfix being the "end" of something).

My best guess is that someone (me?) ran out of energy and didn't think about these characters in detail.

@NSoiffer
Copy link
Contributor Author

Some notes from looking at the operator dictionary:

Remove from Operator Dictionary:
Various "empty set" variant symbols x29B0 - x29B4

Change angle symbols x29A0 - x29AF from infix to prefix? Some are "measured angle", so maybe different??? That block is at 265. Others x299B-x299F at 270, x2220 - x2222 is prefex at 670 (with very different spacing) 221F, 22BF is at 265/infix.

Three different division symbols: "/", x2215 ("division slash"), x2044 ("fraction slash), "/" has different priority/spacing (
. Note: xF7 ("division sign") has same priority as "/", but spacing 4 vs 1.

\ (reverse solidus has 0 spacing, but similar x2216 (set minus) has same priority but spacing 4.
Big solidus/reverse solidus (x29F8/x29F9) has spacing 3 vs 4 for other solidus. Sensible?

Circled times (x2297) is at 390, other infix times are at 410.
x22C7 "Division times" is at 265.
x2224 "does not divide" is at 260
xx2223 "divides is at 265/spacing 5
x2a33 "does not divide with reversed negation slash" is at 265/5

x2295 circled plus, x2296 cirled minus, x2298 circled divsion slash are all 300/4
also circled times 410/4, circled dot operator 710/0 (x22c5 "dot operator" is at 390/4).
differ from squared plus, etc,

@davidcarlisle
Copy link
Collaborator

onary:

Various "empty set" variant symbols x29B0 - x29B4

yes

Change angle symbols x29A0 - x29AF from infix to prefix? Some are "measured angle", so maybe different??? That block is at 265. Others x299B-x299F at 270, x2220 - x2222 is prefex at 670 (with very different spacing) 221F, 22BF is at 265/infix.

TR25 and latex's unicode-math has these all as mathord so no additional space
eg U+29A6 dwangle is mathclass N from the Unicode data and \mathord in unicode-math but the op dict has it with lspace=rspace=3

\ (reverse solidus has 0 spacing, but similar x2216 (set minus) has same priority but spacing 4.

I think that's OK:
set minus is basically like minus but \ (as used for cosets at least) is used with tight spacing K\G/H as in https://en.wikipedia.org/wiki/Coset#Notation

Big solidus/reverse solidus (x29F8/x29F9) has spacing 3 vs 4 for other solidus. Sensible?

Circled times (x2297) is at 390, other infix times are at 410.
x22C7 "Division times" is at 265.
x2224 "does not divide" is at 260
xx2223 "divides is at 265/spacing 5
x2a33 "does not divide with reversed negation slash" is at 265/5

x2295 circled plus, x2296 cirled minus, x2298 circled divsion slash are all 300/4
also circled times 410/4, circled dot operator 710/0 (x22c5 "dot operator" is at 390/4).
differ from squared plus, etc,

I think all "embellished" infix arithmetic operators should be treated alike by default with same entries as + and \times.

@NSoiffer
Copy link
Contributor Author

TR25 and latex's unicode-math has these all as mathord so no additional space
eg U+29A6 dwangle is mathclass N from the Unicode data and \mathord in unicode-math but the op dict has it with lspace=rspace=3

I don't agree with the mathord classification. As I remarked, I think these should be prefix with lspace = rspace=0, not the current infix values. The reasoning for this is that U+2220 is often used like ∠ABC. As such it is prefix. Also, people write something like m∠ABC + m∠CDE = 90°. Sometimes people are sloppy and leave off the 'm' (measure of angle...). Hence, the angles do interact with other operators.

I've never seen U+29A6 (⦦) along with a ton of other Unicode angles symbols used, but my guess is that they would be used in the same manner and so all should have prefix with 0,0 spacing.

@NSoiffer
Copy link
Contributor Author

Here's a small potential mistake:

∂	∂	partial differential	prefix	740	2	1
ⅅ	ⅅ	double-struck italic capital d	prefix	845	2	1	
ⅆ	ⅆ	double-struck italic small d	prefix	845	2	0	

The later two are "differential d"s. Anyone know why they shouldn't all be at the same precedence level. Also, maybe they should all have rspacing=0, although I could see '1' for the capital letter because maybe it leans a bit more. Probably should try it out in a couple of fonts...

@NSoiffer
Copy link
Contributor Author

Some more mistakes: the prefix versions of +, -, ±, and ∓ all should have high priorities, higher than times and divide. Currently they are the same priority as the infix version, which is pretty low. Note that -x-y is either -(x-y) or (-x)-y which are not equal.

When I was at Wolfram, we spent a while working on the precedence of operators and the Mathematica reference guide has these below "dot" but above division. In many cases, it doesn't make much difference -- -x/y = (-x)/y = -(x/y)

@NSoiffer
Copy link
Contributor Author

I think we should pull the ellipses out of the operator dictionary as I believe they are not operators. E.g, in 1, 3, 5, …, the ellipsis are acting as a number(ish) quantity, especially to a parser. Likewise for something like x_1, …, x_n, where in this case maybe they are more like identifiers.

There are five ellipses: vertical, up slope, down slope, lower horizontal, and center horizontal. I believe my argument applies to all five.

@NSoiffer
Copy link
Contributor Author

In working on division signs, I see that we don't have one in the operator dictionary:
HEAVY DIVISION SIGN' (U+2797) -- ➗ (vs regular ÷).
It's in the dingbats block, so maybe not important?

Also long division symbol (U+27CC): ⟌ -- this is in a math symbols block and seems like should be in the table.

@davidcarlisle
Copy link
Collaborator

@NSoiffer yes drop ellipses, no objection to adding U+27CC

@davidcarlisle
Copy link
Collaborator

Two views of the current operator dictionary are now available at

https://mathml-refresh.github.io/xml-entities/opdict.html

@davidcarlisle
Copy link
Collaborator

The values look more reasonable now but it seems to me we still have too many priority values.
you want prefix operators to bind tightly and multiplication-like operators to have higher precedence than addition-like ones but after that assigning individual priorities seems to make things harder to understand than if you assumed a left to right reading of equal precedence operators.

To take a specific example looking at the main "infix lspace=rspace=5" block we assign 20 different priorities.

why is "element of" U+2208 priority 240 but "small element of U+220A priority 265

similarly why "greater than" a different priority than "succeeds" (243 - 260)

Can't we make all of "form:infix lspace:5 rspace:5" priority 270 ?

@davidcarlisle
Copy link
Collaborator

Relating to the previous comment, here is a list of the priority values after the last round of edits. If there are 2 or fewer characters for a priority they are listed, otherwise just give the total number of characters.

It's not clear why wreath product or circled times have unique priorities for example. Also should <decimal separator key symbol> be in the dictionary at all?

Since the numeric values don't matter, only their order, if we reduce the number of distinct values a bit more from the current 45, the remaining values could be spread out more uniformly, or at the very least make them all multiples of 10.

 45 distinct priority values

Priority, (count)
  010, (4)
  020, (58)
  030, (1) <semicolon>
  040, (2) <comma> <invisible separator>
  070, (2) <therefore> <because>
  090, (5)
  100, (3)
  170, (9)
  190, (1) <logical or>
  200, (2) <multiple character operator: &&> <logical and>
  230, (6)
  240, (86)
  260, (232)
  265, (204)
  270, (555)
  275, (10)
  290, (3)
  300, (5)
  310, (26)
  320, (3)
  330, (12)
  340, (1) <wreath product>
  350, (4)
  390, (13)
  400, (1) <middle dot>
  410, (1) <circled times>
  640, (1) <percent sign>
  650, (2) <reverse solidus> <set minus>
  670, (27)
  680, (12)
  690, (7)
  700, (1) <vector or cross product>
  720, (1) <multiple character operator: **>
  730, (1) <circled dot operator>
  740, (4)
  780, (2) <multiple character operator: <>> <circumflex accent>
  800, (4)
  810, (2) <exclamation mark> <multiple character operator: !!>
  820, (1) <multiple character operator: //>
  825, (1) <commercial at>
  835, (1) <question mark>
  845, (3)
  850, (1) <function application>
  880, (58)
  900, (2) <low line> <decimal separator key symbol>

@NSoiffer
Copy link
Contributor Author

NSoiffer commented Mar 3, 2020 via email

@NSoiffer
Copy link
Contributor Author

':' is a tough character to decide on what to do for priority and spacing. Part of the problem is a lot of typesetting comes from TeX and few people bother to tweak the input to fit convention or know about \colon. As @davidcarlisle points out (thanks!) in #176 (comment), TeX defaults to symmetric spacing="5", but also has the asymmetric \colon (lspace=0, rspace=3). The amsmath package adds even more space to \colon (lspace=2, rspace=6).

MathType uses the symmetric ":"; Word (Murray's editor) uses the asymmetric ":".

Here are the use cases I've found:

Infix and symmetric spacing:

  • ratio -- 1 : 3
  • Trilinear coordinates -- similar to ratio -- x : y : z = 2x : 2y : 2z
  • Field extension -- [K : F] -- in three books published before TeX, one used extra spacing (as in a relation) and two had tight spacing.
  • Tensors -- colon product

Infix and asymmetric spacing:

function def/mapping -- f: x -> y

Unclear

This Wikipedia page on math symbols has both symmetric and asymmetric spacing for "such that" usage (see "there exists" for asymmetric example). One comes from TeX and the other is hand typed???

What to do?

This stackexchange article is one that I agree with: ":" has two uses -- one as a punctuation symbol that separates the left and right sides (e.g, function def) and one as a relation symbol between what is on the left and right (e.g, in ratios). I agree with the author that "such that" is a separation, so it such be asymmetric.

I lean towards making the default be symmetric because that is what TeX uses, but would love to have some statistics to know which case is more common. I suspect that "function map" + "such that" > "ratio" + "other symmetric cases". Regardless, somewhere we should have a note that the other use case should should set lspace/rspace.

For both cases, the priority is pretty low, although I think the asymmetric case probably has a very low priority.

A missing op dict entry

It also seems to be part of a multi-char sequence in logic: :⇔ -- "logically equivalent" as in A XOR B :⇔ (A ∨ B) ∧ ¬(A ∧ B). Should we add that to the operator dictionary?

@NSoiffer
Copy link
Contributor Author

At the April 13 call, Murray pointed out that there actually is another ":" in Unicode -- it is U+2236 and is called "ratio". So ":" should be asymmetric, opposite of TeX.

NSoiffer added a commit to mathml-refresh/xml-entities that referenced this issue Apr 13, 2020
See w3c/mathml#87 (comment) for the reason for choosing the prefix version of ":". See also the previous comment (w3c/mathml#87 (comment)) for usages of ":".
The prioity change was to make sure this usage falls below the symbols used for mapping.
@fred-wang
Copy link

Can this issue be closed?

Or at least separate issues / PR that are more specific and less vague than "fixes for operator dictionary" could be opened for discussion.

@NSoiffer
Copy link
Contributor Author

I'm still working through the ones with priority 265 (except for those with spacing 4/4) which are glyphs that I haven't analyzed yet. Just eyeballing what's there, I suspect that around half of them will get removed as they aren't really operators. But I need to understand why they were added and what they may be used for before I can make an informed decision.

@fred-wang
Copy link

Removing "MathML core" label since priority is for full.

@fred-wang fred-wang removed the MathML Core Issues affecting the MathML Core specification label May 22, 2020
@fred-wang fred-wang removed the need resolution Issues needing resolution at MathML Refresh CG meeting label Aug 12, 2020
@davidcarlisle
Copy link
Collaborator

@NSoiffer there are always things we could change but I think most of the issues above are addressed in the current version and in view of w3c/mathml-core#104 I suggest we close this. New issues can be raised if errors are discovered later?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MathML 4 Issues affecting the MathML 4 specification need specification update Issues requiring specification changes need tests Issues related to writing WPT tests
Projects
None yet
Development

No branches or pull requests

3 participants