Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a scheme/philosophy/plan for unifying/breaking apart "level 0" names #254

Closed
NSoiffer opened this issue Jan 5, 2022 · 5 comments
Closed
Labels
intent Issues involving the proposed "intent" attr

Comments

@NSoiffer
Copy link
Contributor

NSoiffer commented Jan 5, 2022

Currently we have "level 1" (or perhaps better called "core"?) names that are meant to be known by applications and may have specialized ways of speaking them (e.g., for fraction "one half", "3 over n", "meters per second"). There is also the wild west of names in "level 3" (or perhaps better called "open"?).

This issue is about coming up with a design philosophy for when to unify/split names for level 1. There might be a separate issue needed for how names should be chosen (short vs long, etc.), but the focus here is on when to name something. Here are some examples:

  • interval: should this be just one name with four arguments (the brackets are arguments) or four names (open-interval, open-close-interval, etc.) with two arguments?
  • square root/root: should there be a separate name for square root? cube root? What about for the positive and negative values of roots? Or the real-valued roots?
  • sets: should there be a special name for empty sets? If not, is it ok if "set" has a different number of arguments in the case of an empty set? Should there be versions for explicit sets ({1,2,3} vs sets with a "such that symbol ({x | x^2 < 4})?
  • minus/negative and plus/positive: do we need to distinguish between unary and nary versions?
  • fraction/reciprocal: do we need to distinguish these since sometimes you want to say "the reciprocal of x" instead of "1 over x" (or "fraction 1 over x end fraction")
  • large operator (e.g, sum, union, integral) with limits: do we need special forms if the lower limit is of the form x=0 as opposed to just D or x∈D? If there is only a lower limit?

We should develop a general plan such as "less is better" with exceptions such as "delimiters should not be arguments" so we can make consistent decisions. E.g., if we adopted the above two rules, then all of the above would have just one name with the exception of intervals which would have four.

@dginev
Copy link
Contributor

dginev commented Jan 5, 2022

An underpinning question behind some of these examples is:

"Which types of syntax are acceptable to look up from the annotated presentation tree?"

For all other types of syntax, we need to invent new names in the "intent" lists.

Either AT can handle encountering an <mo>|</mo> when examining the tree carrying the attribute intent="set($arg, $condition)" or we need a special "set-builder" symbol which is only used for the "such that" construction.

@NSoiffer
Copy link
Contributor Author

NSoiffer commented Jan 6, 2022

I strongly feel that anything not given by the value of intent or of it's arguments is out of bounds. So for the case of intervals, if only the start and end values are given, then that means that there has to be differently named interval intent names. For sets, the | is inside the argument to set, so it is findable. It can be tricky though with something like { x | |x| < 2} unless the absolute value has an intent.

FYI: MathCAT has the phases

  1. Canonicalization (which includes fixes to the MathML from poor generation): "MathML" -> "canonical" MathML
  2. Intent phase, including inferring intent when intent is not given: "canonical" MathML -> Intent (tree)
  3. Speech generation phase: Intent -> String

So in my implementation, it is actually impossible to know anything outside of the value of intent when generating speech. It also means the set example with absolute value is not a problem to find the | that corresponds to "such that".

@davidfarmer
Copy link
Contributor

I suggest that more is better. A set with items listed:

{1, 2, 3, 4, 5}

is different than a set constructed with "set builder" notation:

{x : x \in \Z, 0 < x < 6}.

Note that I used a colon as the separator, not a vertical line. I would not want to
be told that I had to use a vertical line or some special symbol.

I'd like to see (almost?) every previous item in this thread as its own separate
intent entity. For example:
\sum_{1 \le x \le M}
is mathematically the same as
\sum_{x=1}^M
but they are pronounced differently. I'd like to see the logic separating those
two cases happen before the MathML is generated. Maybe some AT is capable
of handling that, but why offload something which can be done with intent and
which is common enough to be in Level 0 or 1?

I think we can do this and also avoid the slippery slope of silliness like labeling
the number 4 with intent="4".

@NSoiffer NSoiffer added the intent Issues involving the proposed "intent" attr label Jan 6, 2022
@NSoiffer
Copy link
Contributor Author

Logging the WG's discussion summary:

Based on the WG meeting today, there was general consensus (but no official resolution) that more names are better than few names.

@NSoiffer
Copy link
Contributor Author

No general philosophy but more names are betters as long as speech make good use of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
intent Issues involving the proposed "intent" attr
Projects
None yet
Development

No branches or pull requests

3 participants