-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chemistry and math layout #92
Comments
On line-breaking, people do do it in the middle of formula but it should keep element names together plus any subscripts. So one might break "C6H12O6" as "C6-H12O6" or "C6H12-O6" but not anywhere else. However, as you say it's pretty unusual to break such cases: normally it's formal names that are 'fun'. |
Perhaps worth noting on sub/superscripts that IUPAC have said that compound ions should have charges after any subscript numbers (https://iupac.org/wp-content/uploads/2015/07/Green-Book-PDF-Version-2011.pdf, p 51). Thus what in TeX-like terms might be expresses |
I am the author of mhchem (for LaTeX, MathJax, KaTex). First of all, math typesetting is well suited for chemistry, but there are many more fine details that you have not yet mentioned, like bonds, inner dashes and dots, italic prefixes etc. Upright greek characters are an important, but often missing feature. You might want to take a look at https://mhchem.github.io/MathJax-mhchem/ to see a collection of examples. As there are no many fine details, a more structured approach could be needed that a thread of comments like this. From my experience, I would avoid semantic markup, i.e. giving each part of η²-C₂H₄ a description of why it is typeset as it is. The same notation (upright greek, dash, dot, ...) can have several very special chemical meanings, depending on the field of chemistry, with new meanings being added (and forgotten) all the time. I can see in the examples above, you are suggesting using typographic semantics. This is exactly what I would recommend. |
@mhchem thank you for your input. I hope we will not end up using deprecated versions this time. |
The same notation (upright greek,
dash, dot, ...) can have several very special chemical meanings, depending on the field of chemistry, with new meanings
being added (and forgotten) all the time.
Suppose I am authoring in mhchem in LaTeX. Is that markup semantic?
The point of my question is that if the original source is semantic,
then I would like to retain that information all the way to the browser.
I understand that the same notation (the same, visually) can have
different meanings, but that is exactly the problem I want to
address. A simple math example: What does |X| mean?
The answer is that I can't tell without knowledge and context.
And without knowing what it means, I can't pronounce it.
But if the LaTeX source was \card{X} then I know for sure
that is means "cardinality of X", as opposed to absolute value
or determinant. If the LaTeX source was |X| then too bad.
But my hope is that authors can be induced to write more semantically.
Hence my question: is mhchem semantic now? If not, how hard will it
be to make it semantic?
I am happy with an answer that applies most of the time, for the
first half of the undergraduate curriculum.
|
The mhchem syntax is not semantic in your sense. Example: It says what part goes into superscript, but it does say why. I see that this leads to problems with speech output and machine interpretation. But I don't think users would be willing to have a dozen commands to create a semantic right-hand superscript when a simple |
Great to see all this input! @davidcarlisle was tasked with finding out a bit more about the various TeX packages for chemistry and hence getting an expert like @mhchem providing input. I probably should have added those links into the original issue as looking at those packages can be instructive. Here are the links: He also included a link to siunitx, a package for units that is tangentially related to this thread. What struck me on skimming through them is how much they were focused on shortening the input. Here's an mhchem example from the pdf: One of the goals of the refresh effort is to be explicit about the layout rules. If we plan to include chemical layout in MathML (which I think we should), we need to make sure MathML can handle any differences. We also need to decide whether to add things to MathML such as attributes that make it easy to handle the differences or whether we require authoring tools to make the tweaks explicit. As a simple case,
Note that currently Hence, it is important to collect a list of the differences between math and chemistry. Once we have that list, then we look at various options for each and hopefully come up with a unified strategy for dealing with those differences. We may also find that MathML is missing a couple of features that need to be added. |
You might also be interested in the mhchem for MathJax manual. It has more special syntax than the LaTeX version and a live "test-drive" at the end of the page where you can type in and see the results immediately. Most parts of chemical equations are in an upright font, but not all. If you think about exending Please think about chemistry-in-math and math-in-chemistry use cases, e.g. And here are a few examples that might need extended layouting options: |
I just went through the first dozen or so pages of the PDF documentation Reactions: There are a few other things, but in each case the notation seems to be unambiguous. Unless I am missing something, this situation is pretty similar to how I view LaTeX markup. Is an equals sign distinguished from a double bond by the spaces around it? |
Well, what is semantic? The mhchem syntax is a typographic notation, the transformation mhchem->LaTeX is anambiguous. Yes, spaces are a very important semantic element in mhchem syntax. Don't expect a finite list of elements. Chemists make up new names all the time (D, T, M, THF), some are just conventions within a single article. |
One interesting thing that people can do with the mhchem for mathjax live demo feature is to see the generated mathml(3) code using mathjax right menu view mathml code option. One thing I notice trying a few examples there is use of
The phantom X appearing in several places (acting as a \mathstrut to force the position of superscripts to a fixed height not depending on the base) it might be nice if we could make that simpler.... In full I picked this example
which generated this MathML
|
Let's use a different term than "semantic". Let's say "requires only
local context to infer meaning". And let's stick to the undergraduate
curriculum: of course some researcher can write a paper using some
crazy and inconsistent notation. That markup will be misinterpreted
by a screen reader no matter what we do.
By "local context" I mean: only that one equation, and little
knowledge of the subject.
My assertion is that mhchem syntax "requires only local context to
infer meaning". In particular, "^" does not mean "exponent". It means
"oxidation number" when followed by roman numerals, and it means
"charge" when followed by an integer and a plus or minus.
Thus, one could process mhchem syntax into a form that explicitly
(and verbosely) encodes the semantics.
It is the same for math: "^" does not mean "superscript". It means
"upper limit" when following \int, and it means other things depending
on where it occurs, and one only needs local information to deduce
what it is. As long as one uses "^" only for those cases where local
context determines meaning, all is good and we have semantic source.
Of course, someone could write A^T for "the transpose of A". But they
shouldn't. They should write \transpose{A} . That has the added benefit
of making it easy to use the convention of writing the "T" on the left,
or as lower case. The markup is the same, but the macro definition is
different. For this to work, one needs to think in terms of encoding
the meaning, not encoding the appearance. And there may need to be an
extra preprocessing step before converting to the output format.
|
I'd say, in more than 99% of the cases, one can infer the meaning by "local context". I skipped through the Green Book and the Red Book and found at least these meanings of right superscripts:
There are more, for sure. I don't see your point, why one should not be able to infer the meaning of |
I like the list of 9 examples of "^" in mchem. It seems like the
first 5 occur when an element is to the left of the "^", and what
occurs to the right is different. Thus, all 5 of those are unambiguous.
I don't know enough about the others to say anything.
Back to A^t, which I wrote this time, as some people do,
with a lower case "t". It is impossible to tell what A^t
means if that is all you see. It could be an exponential
function of the real variable t. I'd guess it is more often
that, than it is the transpose of the matrix A. And when all
you have is A^t, you don't even know whether or not A stands
for a matrix.
That is what can go wrong if a screen reader always reads A^t
as "the transpose of A". And that is why the meaning should be
encoded, instead of guessed.
…On Fri, 31 May 2019, mhchem wrote:
I'd say, in more than 99% of the cases, one can infer the meaning by "local context". I skipped through the
Green Book and the Red Book and found at least these meanings of right superscripts:
* charge: ^- ^2- ^3- ^+ ^2+ ^3+ ^0 (when on a particle)...
* oxidation: ^I ^II ^III ^IV ^-I ^-II ^-III ^0 (when at an element) ^{(I)} ^{(II)} ^{(III)} ...
* excited: ^*
* radical: ^. ^2.
* radical and charge: ^.- ^(2.)- ^(2.)2+ ...
* Kroeger Vink notation has completely different semantics: ^x ^. ^.. ^2. ^' ^''
* hapticity: \eta^2 \eta^3 \eta^4
* number of donor atoms: \kappa^2
* (bonding number: \lambda^5)
There are more, for sure.
I don't see your point, why one should not be able to infer the meaning of A^{\mathrm{T}}. (A^T would
definitely be false.) What could go wrong if a screenreader read every instance of "latin uppercase italic
letter, with a right superscript upright T" as "the transpose of A" (or whatever letter)?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute thethread.[AABTULA7GMULHWT5CKS4YULPYGE3HA5CNFSM4HQHHXEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPW
SZGODWWKNUI.gif]
|
We are deviating too much here, I guess, but let me add 2 points. First, when you argue with "some people write it non-standard", that can be the case in chemistry too. Second, your mathematical notation is sloppy. An operator is to be set in an upright font, a variable in italics. (This is a universal scientific notation. Looking at StackExchange, the chemical community observes this much more strictly than the physical and mathematical community.) |
I appreciate that there are standards for typography, but it is a fact that most popular But the more important point I want to make is that no amount of typography addresses Actually, the motivation is making it possible to pronounce correctly without having The point of this thread is how to do similar encoding for chemistry, so that |
I was talking about the T or t. These are operators, so they are to be typeset upright, so there is no confusion with a variable t. I use the chance to bring this thread back to the chemistry topic: I recommend the IUPAC document On the use of italic and roman fonts for symbols in scientific text. |
Lots of good discussion, but I don't see anything here that |
Layout of chemical formulas is very similar to laying out math. People often use math editors to enter those formulas, which means they will show up as MathML on the web.
Here are some known differences:
mi
will normally use italics.msubsup
).Note: @davidcarlisle was tasked with looking into the TeX chemistry packages and those packages might reveal other layout differences.
'1' can be solved with MathML today by using
mathvariant="normal"
. Alternatively, maybe the sans-serif math alphabetics in Unicode can be used if chemical elements are always sans-serif. However, these solutions don't help with semantics (#92), so another solution might be preferable. For units (meters, seconds, etc.), the MathML WG came out with a note that suggests usingclass="MathML-Unit"
. A similar thing could be done here. Alternatively, these might be tagged with some sort of "role" information and that semantic info could be pulled into the rendering. Personally, I think semantics and display should be kept separate.'2' can be solved with MathML today with a hack of using
mphantom
ormspace
or something else for the empty script if only one real script is present. If '1' is solved via some semantic info that says this is a chemical element, then the layout rules can be adjusted using this info. As with '1', I personally think semantics and display should be kept separate. Allowingnone
as a value formsubsup
is cleaner thanmphantom
, etc. Another alternative is to introduce an attr onmsub
andmsup
that says "usemsubsup
layout rules. E.g.,msubsuplayout=true/false
, with defaultfalse
.Until we know more about linebreaking of chemical formulas (which should be really rare), I don't have any proposals.
Using some of the above to put up a straw man proposal
Here's an alternative using "none":
The text was updated successfully, but these errors were encountered: