Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for accented characters (á,é,à,è, etc) in math mode. #555

Open
useredsa opened this issue Oct 20, 2020 · 9 comments
Open

Support for accented characters (á,é,à,è, etc) in math mode. #555

useredsa opened this issue Oct 20, 2020 · 9 comments

Comments

@useredsa
Copy link

Description

Being able to type á instead of \actue{a} in math mode.

Add info or delete as appropriate:

  • Compiler: lualatex
  • Usage of the package: default
\usepackage{fontspec,unicode-math} % Required for using utf8 characters in math mode
\setmathfont{texgyrepagella-math.otf}

Example

\documentclass{article}
\usepackage{unicode-math}
\setmathfont{texgyrepagella-math.otf}
\begin{document}
\[
  x = á
\]
\end{document}
@hpfr
Copy link

hpfr commented Feb 18, 2022

It would be great if combining accents worked with this, too, e.g. \(p̂\) in the source expanding to \(\hat{p}\) internally (and of course still copyable as from the PDF output).

Note that GitHub's monospace fonts may mangle the above example, so feel free to override the fonts or paste elsewhere for viewing.

@ArchangeGabriel
Copy link
Contributor

@hpfr That works with a bit of Lua (so using LuaLaTeX):

\protected\def\afteracc{\directlua{
    local nest = tex.nest[tex.nest.ptr]
    local last = nest.tail
    if not (last and last.id == 18) then
      error'I can only put accents on simple noads.'
    end
    if last.sub or last.sup then
      error'If you want accents on a superscript or subscript, please use braces.'
    end
    local acc = node.new(21, 1)
    acc.nucleus = last.nucleus
    last.nucleus = nil
    local is_bottom = token.scan_keyword'bot' and 'bot_accent' or 'accent'
    acc[is_bottom] = node.new(23)
    acc[is_bottom].fam, acc[is_bottom].char = 0, token.scan_int()
    nest.head = node.insert_after(node.remove(nest.head, last), nil, acc)
    nest.tail = acc
    node.flush_node(last)
}}
\AtBeginDocument{
\begingroup
  \def\UnicodeMathSymbol#1#2#3#4{%
    \ifx#3\mathaccent
      \def\mytmpmacro{\afteracc#1 }%
      \global\letcharcode#1=\mytmpmacro
      \global\mathcode#1="8000
    \else\ifx#3\mathbotaccentwide
      \def\mytmpmacro{\afteracc bot#1 }%
      \global\letcharcode#1=\mytmpmacro
      \global\mathcode#1="8000
    \fi\fi
  }
  \input{unicode-math-table}
\endgroup
}

@ArchangeGabriel
Copy link
Contributor

And I use it extensively for things like and in mechanics (\dot{x} and \ddot{x}, entered as xU+307 and xU+308).

Using this (U+61U+301) would work, but note that á (U+E1) does not. If you find a way to make that work, it would indeed be nice, as I could type and using my ̇ and ¨ dead keys instead of Unicode points. I guess it should be possible with a logic similar to https://github.com/wspr/unicode-math/blob/master/um-code-sscript.dtx that maps some Unicode points to actual code.

@hpfr
Copy link

hpfr commented Feb 19, 2022

@ArchangeGabriel You should try https://www.ctan.org/pkg/inputnormalization with \Uinputnormalization=2. That should convert the single-character variants in your document to the separate forms, but I'm not sure if it will happen before your Lua processing.

Report back if that works for both and á!

@hpfr
Copy link

hpfr commented Feb 19, 2022

Actually, I just tried it with your snippet and can confirm it works!

I'm not too familiar with Lua or LaTeX. Would you be able to describe in prose what the snippet you posted does exactly and where it applies (only math environments?)? I'm trying to determine if this solves the entire issue. Also, what are "simple noads"? If you set a math font that included glyphs for accented characters and combining accents, would this snippet use them or would it still replace with TeX macros?

@davidcarlisle
Copy link
Member

I'd like to add a dissenting voice here

It would be great if combining accents worked with this, too, e.g. (p̂) in the source expanding to (\hat{p}) internally (and of course still copyable as p̂ from the PDF output).

Actually I think it is fundamentally wrong to associate the unicode accented letters with the math accents. classical tex had good reason to use separate commands for text and math here and using Unicode input doesn't really change that. á is the letter a-acute and there is no reason why the math font in use should not have a glyph in that U+00E1 slot, if it has then it would be accessed with the input á however \acute{a} is different: that is the whatever math operation is being denoted by the acute accent applied to the letter a. It should never cut and paste as the letter á.

The accented letters already work in math now if the font supports them, most math fonts don't but for example \mathrm typically does, even in math mode

image

\documentclass{article}

\usepackage{unicode-math}
\setmathfont{STIX Two Math}
\begin{document}

$\mathrm{á} \neq \acute{\mathrm{a}}$

\end{document}

It's not unreasonable to want accented letters (or more generally non-ascii characters) for operator names etc, but these should be taken from a font that supports the characters not by using the math accent mechanism.

@hpfr
Copy link

hpfr commented Feb 19, 2022

This is a good point. It's desirable for semantics to remain consistent between formats. However, I also like to have a more concise, readable source in my documents. I actually want to represent accent operators as closely as possible with Unicode.

It appears

\documentclass{article}
\usepackage[stixtwo]{fontsetup} % loads unicode-math
\begin{document}
\(\hat{p}\)
\end{document}

already copies as 𝑝 ̂ for me (mathematical italic small p, combining circumflex accent). So it seems unicode-math already lets the mathematical hat operation copy as the circumflex accent applied to the letter p.

Also, your example acutes only look different because you're using Latin Modern (or equivalent) for the text font. If you use STIX Two Text, the two acutes you demonstrate will be visually indistinguishable (and will copy out the same, although one will be the backwards-compatible latin small letter a acute and one will be latin small letter a, combining acute accent. they would be exactly the same if you used the separate characters in your source).

This seems to indicate that, on the copyable output side of things, unicode-math is fine with conflating math operator accents with accented letters.

What do you think about this? Personally, I like having Unicode approximations of these mathematical operations and think it's better for the hat to copy out to Unicode. On the input side of things, I think authors mixing using accented letters as symbols or as part of function/operator names with using letters with mathematical operator accents is not advisable because there is no way to distinguish them in the output, so I'm personally in favor of allowing authors to set an option to treat accents in the source as \hat or \acute, etc in math, when those authors have decided they won't be using accented letters as symbols or in operator names to avoid visual ambiguity.

@davidcarlisle
Copy link
Member

I think authors mixing using accented letters as symbols or as part of function/operator names with using letters with mathematical operator accents is not advisable because there is no way to distinguish them in the output, so I'm personally in favor of allowing authors to set an option to treat accents in the source as \hat or \acute, etc in math,

Yes it would have to be at best an option, as saying it's not advisable doesn't help in some contexts, eg this spanish example I just borrowed from stackexchange actually I'm not sure which of the options here is actually being used, but just demonstrating the end user requirement to have lím as a math operator with the accent not being a math accent in the sense here.

image

\documentclass{article}
\usepackage[spanish,es-ucroman,es-noindentfirst,es-nosectiondot,es-noenumerate,es-noitemize,es-noquoting,es-notilde,es-nodecimaldot,]{babel}

\spanishplainpercent

\begin{document}
$\lim x$
\end{document}

@hpfr
Copy link

hpfr commented Feb 20, 2022

That's a good example. I'm definitely in favor of it being opt-in. But for the sake of argument, I think the only time when the internal representation of \mathrm{lím} being \mathrm{l\acute{i}m} would be noticeable is when you use a text font with an acute accent that's visually distinct from your math font's acute accent, as you did previously. That should be respected, of course, so it should be opt-in.

Actually, I think such an option should probably limit itself to non "text math" environments (meaning only math environments where text is interpreted as symbols, so plain math text and \sym environments) because I don't think there would be any difference in this case. Also, the only accents you would use in text math are probably already represented in your text math font, so there wouldn't be much to gain. @ArchangeGabriel is there a way to limit your Lua to only act on non "text math" math environments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants