Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ellipsis #483

Open
NSoiffer opened this issue Jan 4, 2024 · 10 comments
Open

Ellipsis #483

NSoiffer opened this issue Jan 4, 2024 · 10 comments
Labels
intent Issues involving the proposed "intent" attr

Comments

@NSoiffer
Copy link
Contributor

NSoiffer commented Jan 4, 2024

At the meeting today, we agreed to add some other ellipses (vertical and diagonal).
What we haven't discussed (I think) is whether the two horizontal ellipsis

  • baseline (x, …, y)
  • midline (x, ⋯ , y)

should share the same intent name. Currently only baseline ellipsis is listed in the core concepts.

My feeling is that they should share "ellipsis" as a common name.

@dginev dginev added the intent Issues involving the proposed "intent" attr label Jan 4, 2024
@dginev
Copy link
Contributor

dginev commented Jan 4, 2024

If these characters are left as unmarked Unicode, they have names "Horizontal Ellipsis" and "Midline Horizontal Ellipsis".

Have we decided on the relationship between the Core list and the Unicode names? That relationship may automatically answer the question, unless there are alternative notations beyond the dots.

I am getting a little concerned about answering too many questions on an individual case-by-case basis, because it makes Core increasingly unpredictable. Having an answer that uniformly answers this kind of question for all characters that have subtle variations would be helpful.

"ellipsis" is quite a general name and I would be able to stomach it used in all of the elided cases (including diagonal, vertical, ...) if one wanted to convey less information than the typographical use. If one wanted to convey the full amount of information, the Unicode character name does that. Do we need a subtle in-between design that is halfway in one corner, and halfway in another? Unsure.

I think a lot of my confusion comes from not knowing the exact technical relationship between the Unicode character names and the Core list.

Edit: the diagonal-ellipsis is another case of this "halfway" naming. It loses the information of whether it was the up-right-diagonal-ellipsis or the down-right-diagonal-ellipsis.

@NSoiffer
Copy link
Contributor Author

NSoiffer commented Jan 4, 2024

My understanding of the relationship between names in core (listed under "Core Concept Default Fixity properties") and Unicode is that, like every other named intent in core, these are suggestions that AT is free to change. They also serve as a list of names AT should have translations for... which is the main difference between a Unicode character that is listed in core and one that isn't (e.g, ⩫). I'm not certain that others have the same understanding though.

FYI: MathCAT handles about 5,000 unicode characters, most of which use names pulled from the Unicode standard because they don't really have a meaningful name -- most Unicode characters in MathCAT describe the character ("TILDE OPERATOR WITH RISING DOTS" for the char shown above). Most translators don't bother touching the automatic google translations because it is too much work even though the translations may be really poor. Having a core set of names focuses the efforts of translators onto those important characters. I would guess that most of the 5,000 characters will never be spoken in the lifetime of the AT, so a translator's decision to not spend any time on them is likely a wise use of their time. Note that "never spoken by AT" is not the same as never being used by authors.

@dginev
Copy link
Contributor

dginev commented Jan 5, 2024

Sorry, that is a bit incomplete and not the focus I am trying to flesh out.

I want to systematize the naming scheme, to avoid the names getting increasingly ad-hoc (a lot of the creative curation I see tends to run contrary to my tastes...)

Take the "core" of the Unicode operators that overlap with concepts we'd want in Intent's Core. Examples such as:

Intent Core name Unicode name char note on names
ellipsis midline-horizontal-ellipsis this issue, rewording
equals equals-sign = rewording
approximately approximately-equal-to rewording
perpendicular perpendicular exact match
defined-as colon-equals complete divergence

How are these names related? Is it predictable? Do we need a table that lists the non-intent/presentational names of each character (such as "colon-equals")? We talked about various defaulting schemes in the past, and it would be helpful to know if the non-intent/presentational name for = is "equals-sign" or something else.


If we can get a bit more organizational clarity, I suspect the ellipsis question can just use a standard answer that applies to many other cases. Something on the lines of:

  • the full typographic name comes from the Unicode name. Character names relevant to Intent Core are listed in hypothetical-Table-A.
  • the Intent Core concept name commonly associated with a character avoids the typographic details. Core concept names are listed in hypothetical-Table-B.
    • e.g. Unicode equals-sign simplifies to Intent equals, Unicode midline-horizontal-ellipsis simplifies to Intent ellipsis.
  • in some cases there is complete overlap (e.g. perpendicular)
  • in some cases there is complete divergence (e.g. colon-equals is not a Core concept)

The key question I myself want answered is how the name change is made (and when). Is it really

avoids the typographic details

or is it something else? Ideally that allows us to be consistent and make the choices quickly.

@davidcarlisle
Copy link
Collaborator

The name of the concept is at most only slightly related to the Unicode symbol used in a particular notation, or the unicode name for that symbol. As you note the unicode names are not at all consistent and not usually suitable as a default pronunciation so it doesn't really matter if the concept name and the unicode name of one possible symbol denoting that concept differ.

For example if we have a concept defined-equal that may be denoted by = (equals sign), (colon equals) , (equal to by definition), or <mover><mo>=<mi>def (equal sign, latin small letter d, ...) This is by design and not an issue. It may be that we show typical notations to help systems that want to infer intent from mathml that has no intent markup, but that is not part of the normative behaviour. The fact that none of the listed unicode names matches the concept name or its default reading is to be expected.

@polx
Copy link

polx commented Jan 5, 2024

I think we talk about the "répertoire" of names that we put in intents as the general issue that Deyan is describing.
Unicode is certainly a good inspiration but a better inspiration should be a common pronounciation in English, I feel.
So for ⟂, I'd suggest perpendicular-to.

Who judges this? References we found out and some maths knowledge.

Clearly too much typography orientation is damaging the cognition. I'd even say that ellipsis is cognitively demanding instead of "ta ta ta" or "little dots" (which is common in Fench: "trois petits points").
Similarly capital Sigma or capital Pi are Unicode names and could be used for the large-op of the sum or product but should be named with an intent of a "sum" or "product".

We have to have some thought that the typography (business of Unicode) should have an independent life than the intents.

@NSoiffer
Copy link
Contributor Author

NSoiffer commented Jan 5, 2024

I'm sort of repeating what David and Paul said: the names we choose for intent are based on our group's consensus as to how they are commonly spoken. Sometimes that is the official Unicode descriptive name, but mostly it isn't although they tend to fall under Deyan's "rewording" category. When the same symbol is ambiguous, it will have several intents and so then some or all fall under the "complete divergence" category . There is no guarantee that we all agree on how it is commonly spoken either. Being in core, AT is free to choose different speech.

Paul's mention of the speech in French for "ellipsis" does bring up the point that "ellipsis" is not really the appropriate "common" speech. In MathCAT, "dot dot dot" or in some cases "and so on" are used. For the mid-line version, only "dot dot dot". Inconsistently(?), it also uses "vertical ellipsis", "upwards diagonal ellipsis" and "diagonal ellipsis" for other ellipsis characters.

I'm 100% that when we circle back to the core concept intent list (after we finish going through the extensive lists others have gathered), we will change a number of the names we provisionally added.

@polx
Copy link

polx commented Jan 5, 2024

I fully agree.

That said, I find it useful that Deyan tries to minimize the cognitive price. At least we are aware that this "booked": There will be some inconsistency with Unicode names.

@dginev
Copy link
Contributor

dginev commented Jan 6, 2024

@NSoiffer I guess we have to (again) relitigate some fundamentals until all our voices sound sufficiently similar. I thought group consensus matched the curation document (which had both a group vote and an issue, #470 ):

https://w3c.github.io/mathml-docs/concept-lists/

Naming. Each Core list concept is recorded via its English encyclopedic name. In cases of multiple known names, we strive to make a practical choice.


If the group has changed views, we should update/remove that document, and get it closer to the (in my opinion worse) principle that Neil stated:

the names we choose for intent are based on our group's consensus as to how they are commonly spoken

which to me reads as "make an ad-hoc choice based on the context of the group discussion". We just saw that play out with the debate on repeating-decimal, repeating-basimal, repeating-digits and repeating-block this week. An encyclopedic resource can be a very convenient tie-break, since repeating-decimal is notably documented, while the others aren't (but can still be reasonably spoken).

Here, the main concept is ellipsis. Whether we want to include the typographical detail of "horizontal-ellipsis" and "vertical-ellipsis" depends on whether the emitted speech can reasonably capture the up/down direction of a 2D construct, and whether we want an intent expression to convey that typographical information. I think that is a good general question to answer, as it shows up in other glyphs. For example "maps-to" can branch into "maps-to-pointing-left-to-right" , "maps-to-pointing-bottom-to-top" , "maps-to-pointing-diagonally-up-right" , and so on...

@davidcarlisle
Copy link
Collaborator

@dginev I'd agree the concept-lists document would suggest the encyclopedic name "ellipsis" (for any of the directions) which would be fine by me. However most of the discussion above related to the fact that concept names don't match the unicode names of possible symbols used in notations for the concept. As far as I can see that document (rightly) says nothing about unicode names at all.

@NSoiffer
Copy link
Contributor Author

I have added both low and midline ellipsis as having intent names "ellipsis" and also added the vertical and diagonal ellipses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
intent Issues involving the proposed "intent" attr
Projects
None yet
Development

No branches or pull requests

4 participants