Argument order inside intent applications #478

dginev · 2023-10-27T17:18:19Z

Description

An intent application follows the grammar rule:

application        := expression '(' arguments? S ')'

A recent problem @brucemiller and I encountered is that the we are currently under-specified on how argument order is determined. As a minimal example, we have the practical choice between power(2,k) and power(k,2) one of which stands for two to the power k and the other k squared.

@polx mentioned in our WG meeting on Oct 26, 2023, that Content MathML had one solution to this problem. Personally, I think it is a bit too "manual" - there is English text documenting each and every content element and its arguments, each requiring an adopter to carefully consider the description and implement it correctly. As an example, cmml power states:

The power element represents the exponentiation operator. The first argument is raised to the power of the second argument.

A comparable approach, mentioned by @NSoiffer in the same meeting, is prescribing that the order of arguments in a listed "speech hint" for a concept becomes normative for applications of that concept. For example, in the application power($1,$2), a speech hint associated with power stating $1 to the power $2 would dictate that $1 must be the base, while an alternative $1-th power of $2 would dictate $1 to be the expontent.

This is workable, but manually walking a documentation of this nature would get overwhelming as we exceed one thousand concept entries. Is there a more automatic approach we could adopt?

Preliminary Discussion

Draft idea: I spent a little time considering if we can instead propose a notation-based convention for this argument order. MathML Intent annotates presentation expressions, which means we are annotating a known rendering, and we can pose a convention based on that rendering. Here is an example set of rules:

For expressions that have no arguments, or only a single argument, there is a unique choice, so no special rule.
For baseline expressions, arguments are listed left-to-right, based on their rendered order.
For scripted expressions, the base is always the first argument, and scripts are listed in the order prescribed by the mmultiscripts element - skipping anything non-applicable - such as empty script slots, or a base that isn't spoken
- e.g. the C is silent in ${}_n C_k$ "n choose k", hence intent="binomial-coefficient(n,k)"
For over/under expressions, the base is always the first argument, and annotations are listed in the order prescribed by the munderover element - skipping anything non-applicable - such as empty annotation slots, or a base that isn't spoken.
For vertical expressions (such as mfrac), the arguments are provided top-to-bottom, based on their rendered order.
For explicitly marked 2D structures, such as mtable annotated with system-of-equations($1,$2,$3), the argument order should match the reading order of the expression. While a system-of-equations flows top-to-bottom + left-to-right, a tabular diagram may have its reading start center-to-outermost-column, and the argument order should follow that.
- Tempted to add: In cases of multiple possible choices, any natural arrangement would be an acceptable candidate, as long as the narration describes the same conceptual structure, e.g. intent="maps-to(A,B,C,A)" and intent="maps-to(B,C,A,B)" are equivalent for a circular diagram showing directed arrows between A, B and C.

Tabulars actually provide a counter-example to a claim that a "notation convention" is good enough. It's true for simple arithmetic, but in advanced tabular cases the readout is not easy to infer from the presentation markup - and often not unique.

Another flaw in my proposal is the same concept having multiple contradictory notations. The binomial-coefficient is notorious here: All of ${}_n C_k$, $C^n_k$ , $C^k_n$ and $\binom{n}{k}$, can be spoken "n choose k" (if that is the local convention).

For this kind of diversity it is hard to imagine any automatic approach that "optimally" determines argument order. An alternative focus on "order of speaking" may be tempting, but that is easy to invert even in English, and likely multiple orders are possible in foreign languages.

In conclusion, if there is no reliable automatic way to make the choice, our best current mechanism seems to be for the list containing the concept to make a manual, normative choice for adopters' sake. That likely requires a single (primary) speech hint for each entry with two-or-more arguments, as a brief self-documenting device.

One middle-ground thought: we may still use a notational convention as a guide when creating speech hints in the concept lists - so that we have a consistent collection of argument patterns within each list. One would hope they are common sense enough that we have been doing that by inertia already - but I doubt we've been thoroughly consistent.

Better ideas are most welcome, hopefully this description seeds a robust discussion.

The text was updated successfully, but these errors were encountered:

davidcarlisle · 2023-10-27T18:29:54Z

how argument order is determined.

This is something that any application inferring intent (or inferring speech directly) needs to handle, but we do not specify that at all in the current spec, and I don't think we should (or that it's possible in a mathml4 timeframe)

... For over/under expressions, the base is always the first argument

The intent handling is specified without reference to any presentation so the order of arguments is completely explicit: if you have intent="power(a,b,c)" the arguments are a b and c in that order and the implied speech is the same whether this is on an <msup> or on an <mspace> .

As an example, cmml power states:

The power element represents the exponentiation operator. The first argument is raised to the power of the second argument.

But Content MathML is about implied meaning so this has to be given. intent as it has evolved is more or less purely about speech or braille or similar presentational aspects. We could add an additional column giving a hint about typical meaning (or a reference to some standard source)
but again we explicitly dropped this earlier and I don't see an easy way to bring it back, especially as @NSoiffer was hinting going to CR this year. Unlike the argument order issue, I think this could fit into the current framework but it might jeopardise getting to CR any time soon. Perhaps an extra comment column with a comment suggesting typical meanings in some non normative way might be possible?

dginev · 2023-10-27T19:56:30Z

To keep this constructive, I'll only address your example.

If a generator application emits power(a,b,c) on:

<msup intent="power(a,b,c)">
  <mi>c</mi>
  <mfrac>
    <mi>b</mi>
    <mi>a</mi>
  </mfrac>
</msup>

expecting an AT to narrate:

"c raised to the fractional-exponent of b over a end-exponent",

but the consumer AT instead narrates "a raised to the fractional-exponent of b over c end-exponent", the two systems have failed to interoperate, where the listener of the AT receives a completely broken readout (significantly worse than a purely presentational readout "c superscript b over a end-superscript"). Note that there may still be a second system AT2, which correctly meets the expectations of the generator.

Which system made a wrong choice is under-specified today - in fact, neither choice is "wrong", this brokenness is a predictable consequence from an incomplete Intent spec.

In such a world, any system-specific narration by a consumer AT would only be usable with generator systems specifically targeting the argument order supported by that singular AT system.

(for intent applications matching a "known concept" that is - applications using an _ head will interoperate without such issues, and will be even easier to motivate.)

P.S. It took me a minute, but finding a 3-argument use of power wasn't that hard after all - it just needed some extra imagination.

davidcarlisle · 2023-10-27T20:02:47Z

On Fri, 27 Oct 2023 at 20:56, Deyan Ginev ***@***.***> wrote: To keep this constructive, I'll only address your example. If a generator application emits power(a,b,c) on: <msup intent="power(a,b,c)"> <mi>c</mi> <mfrac> <mi>b</mi> <mi>a</mi> </mfrac> </msup> expecting an AT to narrate: "c raised to the fractional-exponent <https://www.cuemath.com/algebra/fractional-exponents/> of b over a end-exponent",

No I would not expect AT to do that at all, the `intent` there will suppress inferring speech from the presentation, that's its purpose, so it will say

power of a comma b comma c

unless the system has a non-core rule for a three argument power. Message ID: ***@***.***>

davidcarlisle · 2023-10-27T20:05:54Z

Note a system that does have a rule for a three argument power would say the same for

<mspace width="2pt" intent="power(a,b,c)"/>

brucemiller · 2023-10-28T12:57:47Z

Any generator that creates intent="power($a,$b,$c)" should expect that most AT will speak something akin to "power of a,b and c". I think that it should not be forbidden that some particular AT may speak anything else it likes, but such usages & implementations are inherently not interoperable. So, you've got to choose extreme interoperability vs flexibility; you don't get both.

I also expect that we're over-complicating things; There will probably be only a handful of concepts in the core dictionary with more than 1 argument which have distinct roles. Add a phrase like "the first argument is the base, the second is the exponent" to a comments column of the Core dictionary, add a few months for arguing, and we're done!

davidcarlisle · 2023-10-28T17:11:04Z

I think that it should not be forbidden that some particular AT may speak anything else it likes, but such usages & implementations are inherently not interoperable.

exactly, yes agreed.

Add a phrase like "the first argument is the base, the second is the exponent" to a comments column

yes sure we could do this. power currently has an empty comments cell, are you suggesting adding such a note in there, or having a new "typical meaning" column? The former is easier and perhaps less controversial.

dginev · 2023-10-28T19:46:04Z

I think that it should not be forbidden that some particular AT may speak anything else it likes, but such usages & implementations are inherently not interoperable.

exactly, yes agreed.

No? The two of you even proceed to brainstorm an interoperability mechanism, so I am not even sure what you agreed to.

If there is a mechanism to ensure argument order interoperability for power($1,$2), then the same mechanism can be used to ensure argument order interop for power($1,$2,$3). A "comment column" in the Core list can be mirrored with a "comment column" in the Open list(s). That implies that each generator and consumer tool will have to be conformant with the conventions of specific lists.

The "comment column" suggestion is at least beginning to engage with the substance of the issue as opened. My view on that matches my comment for the CMML plain-text documentation, quoting from the issue description:

This is workable, but manually walking a documentation of this nature would get overwhelming as we exceed one thousand concept entries. Is there a more automatic approach we could adopt?

To me this is a good direction to brainstorm more on, and I welcome other group participants to join in. A simple convention could provide a nice intuition to both implementers and list curators.

davidcarlisle · 2023-10-28T20:42:26Z

If there is a mechanism to ensure argument order interoperability for power($1,$2), then the same mechanism can be used to ensure argument order interop for power($1,$2,$3). A "comment column" in the Core list can be mirrored with a "comment column" in the Open list(s). That implies that each generator and consumer tool will have to be conformant with the conventions of specific lists.

power (1,2,3) was just a random example of a term not in the lists, obviously it could be added to open (but that would be weird) but if it was I would just change the example to something else.

Being in the open list has no effect on anything, it is a list of suggestions for things that implementers might consider implementing in addition to the things in core, but a given system can implement things not in the open list and may or may not implement the things that are in the list.

I am not really sure what "interoperability of argument order" for foo(a,b,c) means. The default interoperable thing is to read it as foo of a comma b comma c. If a system chooses to implement a rule for foo and give it a specific better reading, that is fine, but naturally that reading is different from the reading produced by systems without such a rule.

brucemiller · 2023-10-29T00:13:44Z

yes sure we could do this. power currently has an empty comments cell, are you suggesting adding such a note in there, or having a new "typical meaning" column? The former is easier and perhaps less controversial.

Probably more the former, at least initially; at least until we see what we've collected in the list and can assess the potential for confusion.

dginev · 2023-10-30T13:34:08Z

I am not really sure what "interoperability of argument order" for foo(a,b,c) means.

It means that for all systems where "foo" is a known concept, there is some shared deterministic mechanism for arranging the arguments of its applications (possibly parametric in which Intent Lists are supported).

For known concepts:

Generator tools need to decide whether to emit foo(a,b,c) or foo(c,a,b), or something else, as do human remediators.
Consumer AT systems need to decide how to narrate each of the variants - whether foo of a between b and c or b foo c with respect to a, or something else.
The two decisions need to match up, arguments shouldn't get "shuffled" when passed from one system into another.

I should again clarify: The Open realm is larger and needs more of this care, but this is a fundamental issue to intent expressions with today's spec. The example power($1,$2) already exhibits this problem in Core, as does any other Core application with 2-or-more arguments.

davidcarlisle · 2023-10-30T14:07:46Z

There is no problem here to fix.

The core entry says power(a,b) should be read as (something equivalent to)

a to the b-th power

It may be the generator should have generated power(b,a) which would be unfortunate but that's just wrong document data not something that the spec can legislate against, any more than the html spec can legislate against <span>yes</span> in some context where it should be <span>no</span>

arguments shouldn't get "shuffled" when passed from one system into another.

There is no "shuffling" of arguments possible, just as the order of words do not get shuffled. You are talking about explicit markup in a document, if the document says <mrow intent="power(a,b)"/> then that is the document content it is not going to change as that is moved to different systems.

If by "shuffling" you mean that different mathml generators should generate the same intent for the same mathematical expression, that's not a general problem just a matter of decribing (especially in the open list) the intended concept in sufficient detail.

Currently the open list at
https://w3c.github.io/mathml-docs/intent-open-concepts/
is pretty much just a sketch mostly automatically generated from google sheets of varying formats, the descriptions are mostly video links (your work, thanks, but we need to add text as well I think) and speech templates are missing completely.

But I think the core list has more or less sufficient descriptions that no implementer would really be in any doubt about which function was intended by each entry.

brucemiller · 2023-10-30T14:23:22Z

Firstly, I personally have been trying to get away from this notion of "known concept" preferring something more like a "known speech pattern", which includes arity (w/o requiring that a specific speech template be used). Power with 2 arguments is a known pattern, with 3 arguments is not. Whether the 3 argument form is a "known concept" is completely uninteresting to me.

Secondly, with regards to at least one generator, I can see no conceivable way that LaTeXML would try go generate power(a,b,c) (other than user code that forced it). If LaTeXML were tempted to create such a construct, it presumably has some idea of what it would mean, and would (hopefully) use appropriate concept symbols.

And finally, there already is a default speech for power(a,b,c), namely (something like) "power of a, b and c". Insisting on "interoperability" for such #$)# basically means requiring that default speech for that case, and forbidding AT to try to do anything else. That seems a very bad idea to me.

Clearly, for the Core list, and moreso for the Open list, there will be concepts which take more then 1 argument and where the order matters; in such cases we need a way to document the expected order. Enforcing that order is out of scope, even assuming it were possible.

dginev · 2023-10-30T21:25:35Z

If I am reading the replies correctly, both of you understand the technical need I described and have agreed with each other to experiment with a list-specific documentation solution. Great.

The solution you've focused on so far is a similar approach to how CMML's <power> was documented, which Paul pointed out last Thursday and I recorded in the description of the issue.

As a supplement to that, I would like us to investigate a good convention for how to organize the arguments for common patterns of applications. You may not be interested in that, and that is OK, we can agree to disagree. If no one but me is interested, the issue can be closed.

I find the appeal of streamlining argument order in the Open realm quite attractive - it can simplify a lot (which means broader coverage for less work). For example: cartesian-product is commonly used as an n-ary infix operator? Great, then we can have a convention for all n-ary infix operators to list their arguments left-to-right, following the rendered notation, and never need to document any of the concrete concepts. Or maybe index is commonly used as a subscript notation? Great, then choose argument order in all subscript notations so that the intent application has the base as the first argument and the subscript as the second. etc.

Sadly this can't be so simple, due to competing notations, but all we may need is a tie-braking clause (= ranking) which notation to use as a "reference notation for argument order" when there are multiple known notations. I suspect we have already been doing some of this "subconsciously" when specifying speech hints, following some common sense take on "taste". Making those choices transparent, and using them consistently, can make a big difference for adopters.

This kind of work is in scope to the current WG's efforts for the same reason the curation principles were in scope ( #470 ). As such, in my opinion, discussion here should be allowed to continue unencumbered.

davidcarlisle · 2023-10-30T21:38:59Z

If I am reading the replies correctly, both of you understand the technical need I described and have agreed with each other to experiment with a list-specific documentation solution. Great.

No, I really can not understand your issue at all., I have no idea why you think an explicit attribute such as intent="power(x,y)" is in danger of being shuffled or what it would mean to specify its argument order.

The solution you've focused on so far is a similar approach to how CMML's <power> was documented, which Paul pointed out last Thursday and I recorded in the description of the issue.

As a supplement to that, I would like us to investigate a good convention for how to organize the arguments for common patterns of applications. You may not be interested in that, and that is OK, we can agree to disagree. If no one but me is interested, the issue can be closed.

It's not that I'm not interested I just don't think it's relevant to the mathml spec, it's just general advice to implementers or contributers to the open list on good concept definitions

I find the appeal of streamlining argument order in the Open realm quite attractive - it can simplify a lot (which means broader coverage for less work). For example: cartesian-product is commonly used as an n-ary infix operator? Great, then we can have a convention for all n-ary infix operators to list their arguments left-to-right, following the rendered notation, and never need to document any of the concrete concepts. Or maybe index is commonly used as a subscript notation? Great, then choose argument order in all subscript notations so that the intent application has the base as the first argument and the subscript as the second. etc.

You could perhaps put some such suggestions in the top of the open list or in the notes-on-mathml, but there is no testable assertion here and nothing that should go in the spec.

Sadly this can't be so simple, due to competing notations, but all we may need is a tie-braking clause (= ranking) which notation to use as a "reference notation for argument order" when there are multiple known notations. I suspect we have already been doing some of this "subconsciously" when specifying speech hints, following some common sense take on "taste". Making those choices transparent, and using them consistently, can make a big difference for adopters.

No there is nothing to be specified here.

This kind of work is in scope to the current WG's efforts for the same reason the curation principles were in scope ( #470 ). As such, in my opinion, discussion here should be allowed to continue unencumbered.

I can't see that we can do anything other than close this with no action.

brucemiller · 2023-10-31T01:20:24Z

I agree that argument order is something to be addressed, but do not agree that it needs anything more than a comment in a few entries, or at most a separate column. Of the concepts in the current Core list, only intervals, quotient, remainder, power, 2 argument root, definite integrals, derivatives, sum, product (and other bigop, limitop) need any clarification (and most of those are already implied by the template). I do think we'll improve interoperability by being explicit and clear.

The open list will likely be more work, but that shouldn't be surprising. Without having community experience using the open list, I wouldn't expect a confusing algorithm to guess argument order from "standard" notation to pose any advantage, by the time you've added to every concept what you think the standard notation is, and what it means.

NSoiffer · 2023-10-31T05:50:45Z

I have to admit that I remain baffled/unconvinced that there will be confusion on argument order for almost all concepts core handles. power, quotient, etc., all have obvious argument order. If there is confusion, looking at the comment or speech template should resolve that.

Notwithstanding (a new favorite word of mine since you pack three words into one) the above, it certainly doesn't hurt to add some text to the top of the concepts lists that says something like:

Unless otherwise noted below, the argument order of the concepts follows the order used by the presentation MathML elements that are typically used represent the concept. For linear notations such as "plus", this means the left-to-right order used in an mrow. For power, it means the order used in msup (base, exponent). And for "root", it means the order used in mroot (radicand, index). Some concepts such as "binomial-coefficient" have multiple notations ($\binom{n}{k}$, $C_n^k$, ${}_nC_k$, $C(n,k)$ ). Where the order might not be clear from the standard notation, the speech hint or comments should make clear what is the intended order of arguments.

I think this says what David and Bruce have assumed and is somewhat like something that Deyan proposed above. Note that this text is informative and that the speech hints and comments are likewise informative, not normative.

@dginev: does this address your concerns or am still not understanding why you think there is a problem with argument order?

I think a similar statement can be made for open concepts. I just did a quick scan through the open list and where arguments were indicated with $n, very few had $2 and those seemed to be mostly linear or subscripted. So maybe even for open, there aren't many cases where one needs speech hints and/or comments to clarify argument order.

dginev · 2023-10-31T12:39:44Z

@NSoiffer Yes, the phrasing you used would already auto-decide the vast majority of ordering cases. I think it is quite appropriate.

Even as an informative note, this is an improvement. But I am wondering if making it normative (in the main spec text) wouldn't reap the ultimate benefit - streamlining all intent applications, so that they follow the same ordering principles and be completely predictable in all uses of intent, cross-list.

Historical context: Recall I was also a voice for streamlining how concepts themselves are named, and wanted us to have some "encyclopedic" convention, also lowercase-dashed. These are ultimately small design tweaks that don't change the nature of what intent is, but make it more uniform and predictable to use.

davidcarlisle · 2023-10-31T12:45:50Z

You could have a normative testable statement that concept names have a specific form such as hyphenated lower case, you can't have a normative statement that the names make sense. Argument order is of the latter type, there is no normative statement you can make as the function (from a spec point of view) only exists as the concept, so there is no way you can normatively say the functions arguments have to be in any order, that is you can not make "typically used represent the concept" in Neil''s phrasing into anything normative.

This is basically just guidelines for submission to the open list and could be added to the top of that file.

davidcarlisle · 2023-10-31T12:59:18Z

also what does "order in presentation mathml" mean?

We have some open issues around intent for calculus, but if you have $\int_0^n f(x) dx$ then a possibly reasonable intent would be integral($integrand,$from,$to,$var) whereas the mathml presentation order would suggest integral($from,$to,$base,$var) and the content mathml order would suggest integral($var,$from,$to,$integrand)

<apply><int/>
  <bvar> x </bvar>
  <lowlimit> a </lowlimit>
  <uplimit> b </uplimit>
   expression-in-x 
</apply>

It really makes no sense to try to specify this order in the abstract for all functions.

When integral is added to an (open or core0 concept list itjust needs to have enough comments or speech hints that it is clear what the arguments need.

But it's a judgement design call in each case, not a normative rule that should or can be followed.

dginev · 2023-10-31T13:00:54Z

@davidcarlisle Ok, sure, the nuance of what can be prescribed is something I need to learn more about. Maybe all that's really needed is some non-normative encouragement from the main text.

But I am not sure I understand the technicality. This is the same spec that documents the order of arguments of <power> and similar - 3.3.2.1 states syntax for mfrac is <mfrac> numerator denominator </mfrac>.

The spec hasn't provided a "testable" way to know if those were used correctly I think. It can't prevent anyone to put the denominator as the first arg of a fraction, or the exponent as the first arg of <power>, but it can clearly state the intended configuration, right? Why can't we (as an in-principle question) state the intended configuration of arguments for intent application?

brucemiller · 2023-10-31T13:30:50Z

Why can't we (as an in-principle question) state the intended configuration of arguments for intent application?

This is exactly what I've suggested (several times). But once we've specified the expected order of arguments, there's nothing we can do to enforce it.

I like @davidcarlisle suggestion to add Open list guidelines about how to choose argument order. That could indeed make the system more predictable. But there's no "correct" order; only common, conventional, convenient, etc, so nothing normative.

davidcarlisle · 2023-10-31T14:07:47Z

Why can't we (as an in-principle question) state the intended configuration of arguments for intent application?

you can possibly say something about a specific function (typically in the comments in its entry) but there are no general rules, You just need to make an arbitrary choice. That is why we need the concept dictionaries so that once a function has been added, different implementations can make the same choice.

definite integration $\int_0^n f(x) dx$

could be any of

integral($integrand,$from,$to,$var)  # lowlimit uplimit in cml
integral($integrand,interval($from,$to),$integrand,$var) # interval or domainofapplication
definite-integral($integrand,$var,$from,$to)  # different choice of name, and bvar  first

and dozens of other possibilities,

There is no way of specifying in advance a rule that tells you which concept name and argument structure to pick, Whoever is writing the concept dictionary entry needs to make an arbitrary choice. By adding it to the dictionary you are saying other systems should (for core) or may (for open) use the same choice to improve interoperability.

Some version of Neil's note above could be added to the open list as general guidelines used to help people choose names and structure of concept entries, but it can't be anything more than general vague hints. There is no way of phrasing anything that applies in general.

To phrase this another way, when describing mfrac's two arguments we can say the first argument is called numerator and the second is called denominator. But in general the only thing that can be said of a 2 argument intent concept is that the first argument is first and the second argument is second. The dictionary is the definition of the functions so the entries are correct by definition there is no previous notion of the arguments which the entries should or can follow.

dginev · 2023-11-02T13:18:17Z

To my understanding Neil's phrasing of:

Where the order might not be clear from the standard notation, the speech hint or comments should make clear what is the intended order of arguments.

addresses the integral example, and any other construct of such variability or complexity.

While an integral may need special treatment (or a more sophisticated general convention), I don't see it as a reason why we can't suggest that the simpler cases should be streamlined.

I.e. a convention where concept applications that are usually presented via <msubsup>, SHOULD order their applications as concept($base,$subscript,$superscript), omitting slots that aren't used for arguments.

The main difference I have with David is that even if we agree that sometimes a normative (SHOULD) order will appear arbitrary, having it arbitrary-yet-fixed can be an important upgrade for systems interoperating. Letting a concept curator do thousands of mutually contradictory arbitrary choices will hurt adoption of Intent, especially the Open realm.

If the curator was instead recommended to make the same fixed choice (even if its principle is arbitrary), it will avoid the need for documenting every entry, and will make implementation briefer and more predictable.

Aside: Currently my own taste leans towards integral($sub,$sup)($integrand), letting the d speak itself as-is. How many ATs will get all possible variations of integral implemented? Isn't this the reason we are trying to streamline derivatives in #473 ?

brucemiller · 2023-11-02T13:54:20Z

With my DLMF hat on, I like to encourage (but not enforce) standard notations wherever possible; with any other hat on, I have to point out that there is no such thing.

Making a rule such as proposed above be normative would actually guarantee lack of interoperability. Those who think "n choose m" is written C^n_m would feel required to write binomial(m,n), while those who write C_n^m would be required to write binomial(n,m)

I'm inclined to agree with @NSoiffer (& @davidcarlisle ?) that the speech hint (where given) is likely sufficient to clarify expected argument order; Compared to adding comments to the dictionary it's less work for the dictionary writer, perhaps more work for the dictionary user.

davidcarlisle · 2023-11-02T15:26:12Z

Letting a concept curator do thousands of mutually contradictory arbitrary choices will hurt adoption of Intent, especially the Open realm.

The speech templates in the open list will have no effect unless implemenmted in AT systems, so at no point should a content creator be adding thousands of such things.

If a content creator is adding concepts that are not known to the system they should use the order that they want the arguments spoken as foo(a,b,c) will be read as foo of a comma b comma c so there is no need to refer to any presentation order here in normative or non normative text.

dginev · 2023-11-02T15:29:20Z

I said "concept curator", not "content creator"

davidcarlisle · 2023-11-02T15:36:01Z

ah sorry. misreead, although the point still holds, the only people who can affect a non default reading of any given concept expression are the implementers of AT systems such as mathcat. We chose not to define a default reading of presentation mathml, leaving that up to vendor experimentations, so you don't know in general how an expression will be read unless you add an intent but if you do add an intent concept function expression it will be read without reference to the presentation, so whoever is adding those will add them based on how they want it read. Vague hints in the dictionary about how the argument order should be different based on some possibly different notation doesn't help anyone.

dginev · 2023-11-05T16:22:35Z

Vague hints in the dictionary about how the argument order should be different based on some possibly different notation doesn't help anyone.

I agree. I prefer a normative SHOULD in the main spec, with a text similar to that proposed by Neil. If that appears too vague, I have a more explicit draft itemization in the issue description, which can be developed further.

I think the current active discussion here boils down to a design preference for how strict we should be with prescribing argument order.

If "list comments" are the only mechanism to decide it, then the "concept curator" (writing the list) has full freedom, and implementers of lists need to manually walk through each entry to find what decision the curators made. This makes it very low-friction for group members to curate the lists, but harder for people outside the group to implement them - since every entry is essentially a special case. power may have the base as the first argument, but index may have the base as the second argument, etc. Any consistency becomes an evolutionary accident, rather than a deliberate design choice. This is especially true when multiple people edit a list.

I will continue advocating for making some deliberate design choice, adding a cross-list mechanism that guides how argument order can be automatically chosen for intent applications. I am not particularly attached to using the presentation tree node order, but I have to admit it's a very tempting choice from a generator perspective - because we already have that information ready for reuse.

davidcarlisle · 2023-11-05T16:30:52Z

The entry is the definition of the concept function and its arguments, there is no pre-existing function, so certainly no normative statement can be made at all, and I don't really see how there is any general non normative statement either. I do not see there is any issue here and think we should close this with no action.

A motivating use case for intent is disambiguating notational differences so whether you have $^nC_k$ 0r $C^k_n$ or $\binom{n}{k}$ you can give them all the same intent. In practice the author of the entry may have a notation in mind, but from the spec point of view intent does not depend on a preferred presentation layout, and we should not suggest at this point that it does.

davidcarlisle · 2023-11-06T11:00:04Z

Perhaps an explict example might help show why argument order should not be a tied to presentation order other than at most as a vague hint as to general considerations that one may take in to consideration.

https://en.wikipedia.org/wiki/Coset#Notation

a reasonable but probably non core pair of concept definitions would be

left-cosets(G,H) "left cosets of $2 in $1"

right-cosets(G,H) "right cosets of $2 in $1"

The conventional notation of the first is G/H and the second is H\G note the presentation mathml has the arguments in opposite order but the functional forms want to have them in the same order as does the spoken form. the functional form could actually have the arguments in either order, you just need to choose, and then specify that choice in the concept dictionary in use.

There is no prior definition of these functional forms and no normative or non-normative test to say which argument order is correct. The concept dictionary entry forms the definition of the concept function and the argument order, whichever is chosen, is correct by definition.

dginev · 2023-11-06T13:27:32Z

The good aspects about a SHOULD rule, and the way Neil wrote his sample text, is that they allow for exceptions to be handled outside of the rule. David's last rebuttal is a good reason not to use a MUST. The rest of the argument I am reading as a design preference.

My design preference is to think of "encyclopedic concepts" and to see https://en.wikipedia.org/wiki/Coset as one encyclopedic page defining the concept (itself summarizing use in actual mathematical practice). The intent lists should primarily aim to make transparent a list of names that systems may interoperate with, and avoid the trap of trying to become developed ontologies of discourse, with all the custom curation decisions that come along with them. The more focused the list - the smaller the friction will be for adopters. But if-and-only-if the crucial operational questions have been addressed by the main spec text.

I've explained the details above, though I still wish we were given the opportunity for proper discussion.

Absent that, I suggest a group meeting and vote on the questions posed.

davidcarlisle · 2023-11-06T15:15:11Z

It should not be a SHOULD, or in the spec, we could at most include it in the notes in the dictionary on design considerations that could be be used when contributing new entries to the open list.

There is no SHOULD or testable assertion that can be made, concept dictionaries are conceptually (and in the current version of the core list, actually) totally independent of any visual layout.

When defining a dictionary entry for a function of more than one argument you might have various things in mind.

A standard function form argument order used in the literature
The visual order of a common notation
The markup order in presentation mathml required to get a common visual layout
The desired reading order

Of the four, I'd say that the two based on presentation mathml are perhaps the least useful, I'd probably use the 1st then the 4th before that. But the point is if I'm adding a dictionary entry, whatever is in my head really doesn't matter and it doesn't matter if someone else would have made a different choice. The dictionary is there to log choices and allow different systems to use the same set of definitions.

This issue is suggesting a SHOULD requirement to use the third bullet (as far as I understand the issue at all) but I don't think there can be any general rule and certainly I do not think that would be a good rule. But in any case as the concept entry does not mention the "common notation" that was in the author's mind, It is impossible to have any requirement on the order that means anything or is testable.

NSoiffer · 2023-11-28T08:02:54Z

Re-reading this after being away from the issue for a while, I think I see a subtle distinction between what Deyan is asking for and what David, Bruce, and myself were saying isn't needed. What the three of us keep saying is that the list is the definition. What I think Deyan is saying is that makes everything a special case. It would be much better to have rules. We all agree that one can't state rules that are always going to work. In other words, we all agree there are special cases. I suspect the special cases comprise well less than 10% of the entries in core, and also in open (probably closer to 1% than 10%). A good part of this is because in both lists (at least so far in the spreadsheet Deyan created), there aren't too many entries with more than one argument, and where these is more than one argument, they are almost always pronounced left-to-right (hence following the stated default ordering).

I believe Deyan's main complaint is that if there are thousands of special cases, it is too much work to implement. But if in fact there are some common rules, then either due to a special column in the table or the presence of a speech hint or something else, an implementer or machine generator could recognize the 90+% cases that follow the general rule and an implementer would have much less work to do, or at least less cognitive load.

@dginev: did I capture what one of your main concerns is?

Is this (a note at the start of the concept list together with an entry in the table that says "this is special") something everyone can get on board with?

I know part of the discussion was normative vs informative. This puts the text in the list document(s) and not in the spec, so it makes it informative. On the other hand, it specifies how the table is to be interpreted, so in that sense, it is normative for people authoring the table and those reading it. Maybe it is easier for a camel to go through the eye of a needle than to make everyone happy :-)

dginev · 2023-11-28T14:40:38Z

@dginev: did I capture what one of your main concerns is?

You did, thank you for the summary.

I know part of the discussion was normative vs informative. This puts the text in the list document(s) and not in the spec, so it makes it informative.

Your middle-ground suggestion upgrades us from "every concept is a special case" to "every list has special argument order rules", which is certainly an improvement to the implementer workload.

If we are discussing 1% of list entries as needing special treatment, as you suggest, then to me it is easy to be tempted by the stronger "MathML Intent has a single convention for argument order" for the remaining 99%.

davidcarlisle · 2023-11-28T14:44:23Z

I don't think anything should be described as "special" here, although no objection for a note at the top of the open list suggesting that people making contributions should consider placing the arguments of functions in the standard reading order of a common notation for the concept.

polx · 2023-11-28T21:49:18Z

So we'd have a different order for binomial of k among n and vector with components x and y although they are displayed the same (at least in some cultures the bigger number is above in the binomial coefficient).
I guess that sounds right.

davidcarlisle · 2023-11-28T23:24:38Z

@polx

So we'd have a different order for binomial of k among n and vector with components x and y although they are displayed the same (at least in some cultures the bigger number is above in the binomial coefficient). I guess that sounds right.

well that more or less indicates why this requirement really shouldn't be here. I'd read $\binom{n}{k}$ as "n choose k" (normally) so read the arguments in the opposite order to you, which is fine, but doesn't say anything about whether it should be binomial-coeff(n,k) or binomial-coeff(k,n) there is no testable rule to say it must be in the order the author of the entry pronounces it, and both phrases can be given as speech hints so you can't have a rule that says the order must be as in the supplied speech hint either.

basically you have to make an arbitrary choice, that is the whole point of having the dictionary, to record that choice.

Really we should close this with no action, or at most add some vague hint to take reading to consideration when specifying a new entry.

dginev · 2023-11-29T14:58:13Z

@davidcarlisle

well that more or less indicates why this requirement really shouldn't be here.

I would have started by asking how Paul arrived at the English "binomial of k among n" and whether he considers it "the standard reading order of a common notation for the concept", before reaching such a strong conclusion.

I took it as another good reason to try and leverage encyclopedic resources. For example, the wiki page informs:

The symbol $\tbinom{n}{k}$ is usually read as "n choose k"

Also, from encyclopedia Britannica:

denoted by ${}_nC_k$, read “n choose k,”

This is a kind of testable evidence for "prevailing use" of at least one speech pattern.

Clearly, a language capable of active and passive voice is capable of order reversal for most phrases we can build. "n choose k" and "k chosen from n" are equally meaningful to a learned listener, but one is in common use and the other is not.

there is no testable rule

I think what you are trying to claim is that "there is no testable rule which uniformly covers the full domain of math syntax", which would be correct.

There are certainly a variety of testable rules which can be designed by us (including the notation-based rules in the issue description) which will cover most of the cases.

For the cases where multiple choices are possible and an arbitrary choice needs to be made (apologies for using "special" before) and documented in the list, that is fine. binomial-coefficient is an example of a case that needs an intervention, which I also raised in the issue description. power would be an example that should be covered by the rules, as we don't have competing notations to confuse the matter.

If the group prefers focusing on English speech as deciding argument order, I think we will need to produce some language rules, of the sort "prefer active voice and brief speech patterns". That would then serve to break the theoretical tie between $1 to the $2-th power and the $1-th power of $2, for power.

davidcarlisle · 2023-11-29T16:00:00Z

I think what you are trying to claim is that "there is no testable rule which uniformly covers the full domain of math syntax", which would be correct.

No I am saying there is no rule that can be stated that refers to any notation as the notation is not part of the entry.

There may be one standard notation, there may be many, they may or may not have argumments in the same order. It does not matter as the notation plays no part in this.

The concept dictionary entry has no entry for notation (except possibly as mentioned in a comment) and the instance the intent processor is trying to match may be

<mrow intent="binomial-coefficient(a,b)"/>

and the concept dictionary entry for binomial-coefficient has to match (or not) and if it matches, give some speech hints

Any rule (certainly any normative rule) can only be a rule about the data at hand.

The person who writes the entry for the dictionary may have a notation in mind, and that may inform their choice of argument order, and of the speech hints, but that's just in their mind.

davidcarlisle · 2023-11-29T16:07:08Z

@dginev although specifically "n choose k" (and the $\binom{n}{k}$ syntax) is in the UK at least not really used until University, at school I did this as "combinations and permutations" with $^nC_k$ and $^nP_k$ syntax pronounced "n C k" or "Combinations of k from n" so the spoken argument order is variable. Basically whether you are "speaking the concept" or "reading the notation".

It certainly makes sense to have some notes highlighting this kind of issue, but there is no rule that can be made, other than advice that the contributer of a concept dictionary entry should consider these things.

polx · 2023-11-29T16:41:37Z

It certainly makes sense to have some notes highlighting this kind of issue, but there is no rule that can be made, other than advice that the contributer of a concept dictionary entry should consider these things.

That'd be my preference: Warn that there may be interpretation differences, especially when going international.

The saying of k among n is from the French's pronounciation I remember.

NSoiffer · 2023-11-30T08:21:09Z

No great insights follow. Just a few reminders and an example...

Reminder: the speech template is not meant to force speech to be spoken a certain way for that notation. It is an example of how it might be spoken. As @davidcarlisle said, you may speak the same intent in different ways. His example is a good one for a terse and verbose way of speaking binomial-coefficient.

The speech template hopefully makes clear which argument means what. If it's not clear, the comments should clarify the order. However, it doesn't mean they are spoken in that manner. As an example, Asian languages speak the denominator and then the numerator ("b under a").

[Surprisingly, at the moment, the concept list doesn't have an entry corresponding to mfrac].

dginev added the intent Issues involving the proposed "intent" attr label Oct 27, 2023

dginev mentioned this issue Nov 27, 2023

extend concept matching description #477

Open

dginev mentioned this issue Jan 5, 2024

Intent for large operators #482

Open

Argument order inside intent applications #478

Argument order inside intent applications #478

Comments

dginev commented Oct 27, 2023

Description

Preliminary Discussion

davidcarlisle commented Oct 27, 2023

dginev commented Oct 27, 2023

davidcarlisle commented Oct 27, 2023 via email

davidcarlisle commented Oct 27, 2023

brucemiller commented Oct 28, 2023

davidcarlisle commented Oct 28, 2023

dginev commented Oct 28, 2023

davidcarlisle commented Oct 28, 2023

brucemiller commented Oct 29, 2023

dginev commented Oct 30, 2023

davidcarlisle commented Oct 30, 2023

brucemiller commented Oct 30, 2023

dginev commented Oct 30, 2023

davidcarlisle commented Oct 30, 2023 • edited

brucemiller commented Oct 31, 2023

NSoiffer commented Oct 31, 2023

dginev commented Oct 31, 2023

davidcarlisle commented Oct 31, 2023 • edited

davidcarlisle commented Oct 31, 2023 • edited

dginev commented Oct 31, 2023

brucemiller commented Oct 31, 2023 • edited

davidcarlisle commented Oct 31, 2023 • edited

dginev commented Nov 2, 2023

brucemiller commented Nov 2, 2023

davidcarlisle commented Nov 2, 2023

dginev commented Nov 2, 2023 • edited

davidcarlisle commented Nov 2, 2023

dginev commented Nov 5, 2023

davidcarlisle commented Nov 5, 2023 • edited

davidcarlisle commented Nov 6, 2023

dginev commented Nov 6, 2023

davidcarlisle commented Nov 6, 2023

NSoiffer commented Nov 28, 2023

dginev commented Nov 28, 2023 • edited

davidcarlisle commented Nov 28, 2023 • edited

polx commented Nov 28, 2023

davidcarlisle commented Nov 28, 2023 • edited

dginev commented Nov 29, 2023 • edited

davidcarlisle commented Nov 29, 2023

davidcarlisle commented Nov 29, 2023

polx commented Nov 29, 2023

NSoiffer commented Nov 30, 2023

davidcarlisle commented Oct 30, 2023 •

edited

davidcarlisle commented Oct 31, 2023 •

edited

davidcarlisle commented Oct 31, 2023 •

edited

brucemiller commented Oct 31, 2023 •

edited

davidcarlisle commented Oct 31, 2023 •

edited

dginev commented Nov 2, 2023 •

edited

davidcarlisle commented Nov 5, 2023 •

edited

dginev commented Nov 28, 2023 •

edited

davidcarlisle commented Nov 28, 2023 •

edited

davidcarlisle commented Nov 28, 2023 •

edited

dginev commented Nov 29, 2023 •

edited