Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect usage statistics for current MathML elements #55

Closed
physikerwelt opened this issue Feb 25, 2019 · 32 comments
Closed

Collect usage statistics for current MathML elements #55

physikerwelt opened this issue Feb 25, 2019 · 32 comments
Labels
compatibility Issues affecting backward compatibility group admin Tracking agenda and other administrative issues

Comments

@physikerwelt
Copy link
Member

@physikerwelt I think in general it would be good to gather usage metrics of elements/attributes that are proposed for deprecation/removal. Maybe you can do that for wikipedia.

Originally posted by @fred-wang in #1 (comment)

@fred-wang
Copy link

I think we should probably send an email to the Math WG mailing list to see if MathML users or developer of MathML authoring tools can provide more data.

@NSoiffer
Copy link
Contributor

I suspect that we need to ask specific questions when we ask for data. For example:

  • what percentage and/or number of expressions in your database use mstyle?
  • what attributes are used on mstyle?
  • are small, normal, big used with mathsize?

@fred-wang
Copy link

@dginev @kohlhase @brucemiller would you be able to provide info for LaTeXML / Arxiv?
@physikerwelt would you be able to provide info for Mathoid / Wikipedia?

@dginev
Copy link
Contributor

dginev commented Mar 20, 2019

Sure. As @NSoiffer suggests, you could ask what specific statistics may interest you, and I could generate a report. We have been publishing our recent arXiv HTML5 datasets (1.2 million papers with ~500 million math elements) and it is easy to extract some information on frequencies of math elements and attributes. E.g. just counting the math elements and their attributes is somewhat direct.

That said, since the arXMLiv resource is generated via latexml, it may have better behaved MathML than an arbitrary web page, and Bruce can already directly tell you which of the removal suggestions would require latexml changes.

Edit: I've started a stats collection job, should take a couple of days to finish and report back.

@fred-wang
Copy link

Thanks. I'm going to prepare a set of questions. I think both are interesting as it's possible that tools can generate some specific MathML element/attribute but that they are not really used in practice.

@fred-wang fred-wang added the compatibility Issues affecting backward compatibility label Mar 20, 2019
@fred-wang
Copy link

fred-wang commented Mar 20, 2019

This survey intends to track usage statitics of MathML in order to get a better idea of what should belong to MathML Core, to MathML 4 or should be deprecated. Please answer the following questions as accurately as you can:

  1. Description. Please describe the MathML database / authoring tool (e.g. Wiki, digital library, latex-to-mathml converter, WYSIWYG MathML editor, computer algebra system, etc):

  2. Native MathML. Does your database / tool serve MathML content to native web engines (e.g. Firefox, iOS WebView, ...)?

  3. MathML elements. Please provide usage percentage for MathML elements in your database / list of generated MathML elements by your tool. Does your database / tool rely on the following elements?
    munder, mover, msub, msup, msubsup, mlabeledtr, merror, mphantom, maction, mglyph, mfenced, mstyle, ms.

  4. MathML attributes. Please provide usage percentage for MathML attributes in your database / list of generated MathML attributes by your tool. Does your database / tool rely on the following attributes?
    mathvariant, numalign, denomalign, align (on munderover/munder/mover), bevelled, subscripshift, superscriptshift, other, macros, mode, fontfamily, index, fontfamily, fontweight, fontstyle, fontsize, color, background, veryverythinmathspace, verythinmathspace, thinmathspace, mediummathspace, thickmathspace, verythickmathspace, veryverythickmathspace

  5. Attributes on the mstyle element. Does your database / tool use attributes on the mstyle element other than the following ones?
    displaystyle, dir, mathsize, mathbackground, mathcolor, mathvariant, scriptlevel

  6. Attribute values. Does your database / tool use any of the following attribute values?

  • linethickness attribute with value "thin", "thick" or "medium"
  • mathsize attribute with value "small", "normal" or "big"
  • attribute with value a nonzero number without unit (e.g. "4") that are defined as length (i.e. exclude mglyph@index, scriptlevel, mtd@rowspan, mtd@columnspan, maction@selection, msgroup@position, msgroup@shift, msrow@position, mscarries@position, msline@position, msline@length)
  • attribute with value "veryverythinmathspace", "verythinmathspace", "thinmathspace", "mediummathspace", "thickmathspace", "verythickmathspace" or "veryverythickmathspace".
  • notation attribute containing the value "radical" (e.g. notation="radical circle")
  • attribute with leading or trailing white space characters (U+0020, U+0009, U+000A, U+000D or U+000C). For example width=" 5em ".
  1. Trailing/leading whitespace in token elements. Does your database / tool use any token elements (mi, mtext, mn, mo, mtext, ms) whose text content has leading or trailing white space characters (U+0020, U+0009, U+000A, U+000D or U+000C). For example <mi> x </mi>.

@fred-wang
Copy link

I wrote a basic survey in #55 (comment) ; the data can be provided by basic search features and does not require actual knowledge of the MathML semantic.

@fred-wang
Copy link

fred-wang commented Mar 20, 2019

  1. TeXZilla, LALR Javascript Unicode LaTeX-to-MathML converter
  2. Yes, it has a web page https://fred-wang.github.io/TeXZilla/ and a Firefox add-on.
  3. annotation, maction, math, menclose merror, mfrac, mi, mmultiscripts, mn, mo, mover, mpadded, mphantom, mprescripts, mroot, mrow, ms, mspace, msqrt, mstyle, msub, msubsup, msup, mtable, mtd, mtext, mtr, munder, munderover, none, semantics
    It does not use mglyph, mfenced or mlabeledtr.
  4. actiontype, align, colspan, columnalign, columnlines, depth, dir, display, displaystyle, equalcolumns, equalrows, frame, height, linethickness, lspace, mathbackground, mathcolor, mathvariant, maxsize, minsize, notation, rowlines, rowspacing, rowspan, rspace, scriptlevel, stretchy, voffset, width, xmlns
    No, except mathvariant.
  5. It only uses displaystyle, scriptlevel, mathcolor, mathbackground ; dir/mathsize are used on the math element ; mathvariant is used on mstyle in some exceptional situations (always used prior to version 1.0.1).
  6. No (named spaces on mo and mspace prior to version 1.0.0)
  7. No.

@davidcarlisle
Copy link
Collaborator

results of survey for NAG manual (internal draft but basically https://www.nag.co.uk/numeric/fl/nagdoc_fl26.2/html/frontmatter/manconts.html

  1. essentially hand authored (with some XSLT post processing) Mostly using emacs nxml-mode

  2. mathml in HTML5 by default served as-is to firefox, via mathjax to other browsers.

  3. full detail at end, no use of mlabeledtr, merror, maction, mglyph

  4. full detail at end, uses mathvariant but not the others you list other than a few (removable) uses of other

  5. only displaystyle and mathcolor

  6. No, other than some use of lspace="thinmathspace"

  7. no

Details


436,262 math expressions
2,623,875 mathml elements

elements used

436262 instances
<math
 display="block"
 displaystyle="true"
>

58 instances
<menclose
  notation="bottom"
>

133002 instances
<mfenced
separators=","
separators=""
open="|"
open="'"
open=""
open="("
open="["
open="{"
open="&#x2016;"
open="&#x2308;"
open="&#x230a;"
open="&#x2329;"
close="|"
close="'"
close=""
close=")"
close="["
close="]"
close="}"
close="&#x2016;"
close="&#x2309;"
close="&#x230b;"
close="&#x232a;"
close="&#xa0;"
>

8653 instances
<mfrac
 other="display"
 other="small
>

818436 instances
<mi
 href=< URL >
 mathcolor=< #hex >
 mathvariant= bold|bold-italic|italic|monospace|normal|script
>

244 instances
<mmultiscripts>

281635 instances
<mn
 href=< URL >
 mathcolor=< #hex >
 mathvariant= bold|bold-italic|italic|monospace|normal|script
>

469293 instances
<mo
 lspace="0pt"
 rspace="0pt"
 lspace="thinmathspace"
 mathvariant="bold|normal"
 minsize="< length >em"
 other="big"
>

9561 instances
<mover>

2026 instances
<mpadded
  width=< length > em"
  height="< length >em"
  depth="< length >em"
  voffset="< length >em"
>

3914 instances
<mphantom>

244 instances
<mprescripts>


36 instances
<mroot>

157210 instances
<mrow>

127 instances
<ms>

12604 instances
<mspace
 linebreak="newline"
 width=< length >em
 >
 

2637 instances
<msqrt>


940 instances
<mstyle
  displaystyle="true"
  mathcolor="#003399"
>

93239 instances
<msub>

6350 instances
<msubsup>

28049 instances
<msup>

5715 instances
<mtable
  rowlines="none none none solid none"
  columnlines="none none none solid none"
>

61578 instances
<mtd
columnalign="center|left|right"
>

68458 instances
<mtext
 mathvariant="italic"
>

18725 instances<mtr
columnalign="center|left|right"
>

1260 instances
<munder>

3156 instances
<munderover
columnalign="center|left|right"
>

462 instances
<none>

@sideshowbarker
Copy link

I could add use counters to the W3C HTML checker to collect statistics for this

@emilio
Copy link

emilio commented Mar 21, 2019

Also let me know if you want use counters for some of these in Gecko, I can let you know how to add them or add them myself.

@physikerwelt
Copy link
Member Author

physikerwelt commented Mar 21, 2019 via email

@sideshowbarker
Copy link

the MathML is generated via MathJax

Do you mean it’s generated on the client side (from JavaScript running in a browser)?

I guess I should note that for the case of the W3C HTML Checker, I won’t be able to collect use counters for any MathML markup that’s dynamically generated by JavaScript running on the client side in a browser. The HTML Checker sees only the source of the document, not the DOM.

@fred-wang
Copy link

the MathML is generated via MathJax

Do you mean it’s generated on the client side (from JavaScript running in a browser)?

It's server-side: https://github.com/wikimedia/mathoid

@fred-wang
Copy link

Sure. However the MathML is generated via MathJax. @AndreG-P did recently analyse the arxiv dataset. Can you share your results here? However this was also generated (by LaTeXML). Maybe the MathML in PubMed Central is more diverse. I am travelling and will look into the Wikipedia dataset next week.

On Tue, 19 Mar 2019, 17:18 Frédéric Wang, @.***> wrote: @dginev https://github.com/dginev @kohlhase https://github.com/kohlhase @brucemiller https://github.com/brucemiller would you be able to provide info for LaTeXML / Arxiv? @physikerwelt https://github.com/physikerwelt would you be able to provide info for Mathoid / Wikipedia? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ACpiiEMXLbqhebhTgTWqWLN-hcY0IUk_ks5vYQ3ZgaJpZM4bQoDG .

I think it's not a problem to have several replies relying on an analysis of a converter's source code and real content generated by the same converter.

@AndreG-P
Copy link

I currently only have a sneak peek of 1001 mathematical arXiv documents (367.236 MathML expressions). We extracted these expressions from the arXMLiv dataset 08.2018 that @dginev mentioned.

I'm sorry it's not a list for the entire arXiv. We currently working on a minimized MathML dataset to save resources and therefore our distributions wouldn't representative for your questions. However, we didn't apply any filters or changes on these 1001 documents.

Here is a list of the elements (click me to unfold)

<entry count="1365568">mo</entry>
<entry count="1216642">ci</entry>
<entry count="1173720">mi</entry>
<entry count="1127832">apply</entry>
<entry count="889984">mrow</entry>
<entry count="733738">annotation</entry>
<entry count="451464">csymbol</entry>
<entry count="367236">math</entry>
<entry count="367236">semantics</entry>
<entry count="367236">annotation-xml</entry>
<entry count="353718">mn</entry>
<entry count="351962">cn</entry>
<entry count="232557">times</entry>
<entry count="225663">msub</entry>
<entry count="97191">msup</entry>
<entry count="81821">eq</entry>
<entry count="77647">minus</entry>
<entry count="53740">plus</entry>
<entry count="41812">divide</entry>
<entry count="34093">mtd</entry>
<entry count="31299">interval</entry>
<entry count="30881">mfrac</entry>
<entry count="29497">msubsup</entry>
<entry count="26657">in</entry>
<entry count="26254">list</entry>
<entry count="22185">share</entry>
<entry count="21059">mover</entry>
<entry count="20852">mtext</entry>
<entry count="18002">and</entry>
<entry count="17906">leq</entry>
<entry count="16014">abs</entry>
<entry count="13482">mpadded</entry>
<entry count="11809">geq</entry>
<entry count="11353">mstyle</entry>
<entry count="11037">mtr</entry>
<entry count="10250">set</entry>
<entry count="9130">sum</entry>
<entry count="8899">lt</entry>
<entry count="8698">matrixrow</entry>
<entry count="7599">vector</entry>
<entry count="7185">gt</entry>
<entry count="7100">infinity</entry>
<entry count="6491">munder</entry>
<entry count="6472">partialdiff</entry>
<entry count="5952">subset</entry>
<entry count="5018">munderover</entry>
<entry count="4457">intersect</entry>
<entry count="4409">int</entry>
<entry count="4050">mtable</entry>
<entry count="3677">union</entry>
<entry count="3642">root</entry>
<entry count="3622">cerror</entry>
<entry count="3554">msqrt</entry>
<entry count="3394">neq</entry>
<entry count="3118">matrix</entry>
<entry count="2840">setdiff</entry>
<entry count="2759">log</entry>
<entry count="2056">equivalent</entry>
<entry count="1547">factorial</entry>
<entry count="1464">g</entry>
<entry count="1327">emptyset</entry>
<entry count="1288">ln</entry>
<entry count="1287">compose</entry>
<entry count="1230">path</entry>
<entry count="1014">floor</entry>
<entry count="985">min</entry>
<entry count="981">limit</entry>
<entry count="977">max</entry>
<entry count="954">sin</entry>
<entry count="947">notin</entry>
<entry count="945">mspace</entry>
<entry count="861">cos</entry>
<entry count="824">approx</entry>
<entry count="750">span</entry>
<entry count="662">ceiling</entry>
<entry count="650">none</entry>
<entry count="521">exp</entry>
<entry count="412">a</entry>
<entry count="377">mmultiscripts</entry>
<entry count="336">or</entry>
<entry count="327">gcd</entry>
<entry count="294">circle</entry>
<entry count="239">determinant</entry>
<entry count="214">real</entry>
<entry count="214">tan</entry>
<entry count="211">mprescripts</entry>
<entry count="113">sinh</entry>
<entry count="103">cot</entry>
<entry count="102">cosh</entry>
<entry count="94">arg</entry>
<entry count="90">mroot</entry>
<entry count="90">degree</entry>
<entry count="75">tanh</entry>
<entry count="73">prsubset</entry>
<entry count="63">exists</entry>
<entry count="54">imaginary</entry>
<entry count="54">svg</entry>
<entry count="48">sec</entry>
<entry count="26">arctan</entry>
<entry count="23">not</entry>
<entry count="22">img</entry>
<entry count="18">implies</entry>
<entry count="13">arccos</entry>
<entry count="12">arcsin</entry>
<entry count="10">cite</entry>
<entry count="5">exponentiale</entry>
<entry count="4">menclose</entry>
<entry count="2">csc</entry>

And the list of the attributes (click me to unfold)

<entry count="10433270">id</entry>
<entry count="8199861">xref</entry>
<entry count="1100974">encoding</entry>
<entry count="451464">cd</entry>
<entry count="427071">stretchy</entry>
<entry count="398867">class</entry>
<entry count="367236">alttext</entry>
<entry count="367236">display</entry>
<entry count="367236">kmcs-r</entry>
<entry count="351962">type</entry>
<entry count="198299">mathvariant</entry>
<entry count="31299">closure</entry>
<entry count="29269">columnalign</entry>
<entry count="22977">href</entry>
<entry count="20858">accent</entry>
<entry count="16990">rspace</entry>
<entry count="16368">largeop</entry>
<entry count="16368">symmetric</entry>
<entry count="14289">displaystyle</entry>
<entry count="14209">width</entry>
<entry count="13299">mathsize</entry>
<entry count="11920">movablelimits</entry>
<entry count="6199">lspace</entry>
<entry count="6134">maxsize</entry>
<entry count="6134">minsize</entry>
<entry count="3951">rowspacing</entry>
<entry count="3372">fence</entry>
<entry count="3273">linethickness</entry>
<entry count="2887">columnspacing</entry>
<entry count="2163">separator</entry>
<entry count="1524">stroke</entry>
<entry count="1524">stroke-width</entry>
<entry count="1524">fill</entry>
<entry count="1230">d</entry>
<entry count="1086">transform</entry>
<entry count="1001">major-collection</entry>
<entry count="1001">minor-collection</entry>
<entry count="1001">fine-collection</entry>
<entry count="852">accentunder</entry>
<entry count="616">height</entry>
<entry count="540">depth</entry>
<entry count="448">mathcolor</entry>
<entry count="434">style</entry>
<entry count="412">title</entry>
<entry count="338">voffset</entry>
<entry count="294">r</entry>
<entry count="294">cx</entry>
<entry count="294">cy</entry>
<entry count="54">version</entry>
<entry count="54">fragid</entry>
<entry count="54">viewbox</entry>
<entry count="54">overflow</entry>
<entry count="39">scriptlevel</entry>
<entry count="35">align</entry>
<entry count="25">columnspan</entry>
<entry count="22">src</entry>
<entry count="22">alt</entry>
<entry count="4">notation</entry>

Here is the list of the 1001 document IDs. It's probably not helpful but you can check the documents manually if you wish. (click me to unfold)

0705.0012
0705.0175
0705.0179
0705.0194
0705.0457
0705.0528
0705.0698
0705.0768
0705.0908
0705.1220
0705.1273
0705.1732
0705.1806
0705.2109
0705.2182
0705.2422
0705.2578
0705.3171
0705.3241
0705.3273
0705.3310
0705.3443
0705.3457
0705.3673
0705.3693
0705.3715
0705.3929
0705.3953
0705.4015
0705.4111
0705.4123
0705.4178
0705.4483
0705.4536
0705.4573
0706.2433
0707.0035
0707.0111
0707.0229
0707.0491
0707.0518
0707.0699
0707.0907
0707.1102
0707.1108
0707.1111
0707.1177
0707.1790
0707.2121
0707.2122
0707.2123
0707.2124
0707.2221
0707.2259
0707.2563
0707.2591
0707.2870
0707.2995
0707.3052
0707.3364
0707.3371
0707.3373
0707.3394
0707.3426
0707.3450
0707.3590
0707.3615
0707.3903
0707.4034
0707.4112
0707.4261
0707.4328
0707.4499
0710.0143
0710.0144
0710.0163
0710.0193
0710.0234
0710.0464
0710.0813
0710.0886
0710.0943
0710.0967
0710.0989
0710.1019
0710.1147
0710.1234
0710.1295
0710.1360
0710.1468
0710.1521
0710.1886
0710.1911
0710.1929
0710.1981
0710.2088
0710.2123
0710.2216
0710.2296
0710.2304
0710.2310
0710.2379
0710.2388
0710.2625
0710.2627
0710.2645
0710.2685
0710.2973
0710.3001
0710.3177
0710.3188
0710.3389
0710.3409
0710.3413
0710.3451
0710.3531
0710.3595
0710.3718
0710.3857
0710.3882
0710.3928
0710.3947
0710.3956
0710.3964
0710.3997
0710.4347
0710.4437
0710.4586
0710.4605
0710.4991
0710.5148
0710.5328
0710.5478
0710.5518
0710.5648
0710.5683
0710.5799
0710.5863
0710.5894
0711.0071
0711.0111
0711.0225
0711.0417
0711.0445
0711.0560
0711.0717
0711.0915
0711.0947
0711.1132
0711.1153
0711.1185
0711.1333
0711.1417
0711.1479
0711.1753
0711.1943
0711.1956
0711.2054
0711.2223
0711.2269
0711.2443
0711.2502
0711.2673
0711.2876
0711.2938
0711.3221
0711.3269
0711.3485
0711.3488
0711.3512
0711.3656
0711.3678
0711.3711
0711.3940
0711.3974
0711.4074
0711.4322
0711.4357
0711.4394
0711.4412
0711.4426
0711.4456
0711.4480
0711.4595
0711.4648
0711.4949
0711.4985
0711.4986
0711.4999
0711.5004
0909.0083
0909.0106
0909.0113
0909.0240
0909.0301
0909.0303
0909.0335
0909.0339
0909.0362
0909.0471
0909.0684
0909.0710
0909.0783
0909.1050
0909.1162
0909.1437
0909.1452
0909.1616
0909.1620
0909.1665
0909.1900
0909.1965
0909.1994
0909.2101
0909.2304
0909.2497
0909.2640
0909.2696
0909.2744
0909.2817
0909.2983
0909.3354
0909.3453
0909.3459
0909.3566
0909.3653
0909.3682
0909.3763
0909.3928
0909.3968
0909.3972
0909.4111
0909.4246
0909.4329
0909.4396
0909.4591
0909.4718
0909.4760
0909.4774
0909.4865
0909.4913
0909.4960
0909.5071
0909.5072
0909.5199
0909.5512
0909.5623
0909.5652
0909.5664
1004.0033
1004.0154
1004.0167
1004.0197
1004.0200
1004.0253
1004.0290
1004.0394
1004.0582
1004.0674
1004.0713
1004.0723
1004.0759
1004.0904
1004.1068
1004.1084
1004.1244
1004.1326
1004.1661
1004.1883
1004.1934
1004.2214
1004.2285
1004.2511
1004.2639
1004.2759
1004.2946
1004.2983
1004.3038
1004.3259
1004.3358
1004.3376
1004.3552
1004.3799
1004.3826
1004.3866
1004.3904
1004.3938
1004.4194
1004.4293
1004.4374
1004.4539
1004.4832
1004.5183
1004.5273
1004.5434
1004.5510
1007.0115
1007.0157
1007.0225
1007.0257
1007.0259
1007.0316
1007.0353
1007.0567
1007.0568
1007.0677
1007.0688
1007.0713
1007.0804
1007.1027
1007.1175
1007.1441
1007.1553
1007.1615
1007.1734
1007.1786
1007.1839
1007.2054
1007.2239
1007.2295
1007.2521
1007.2822
1007.2959
1007.3072
1007.3399
1007.3401
1007.3406
1007.3460
1007.3467
1007.3659
1007.4022
1007.4030
1007.4283
1007.4285
1007.4757
1007.4811
1007.5197
1007.5273
1007.5335
1007.5350
1007.5426
1009.0065
1009.0098
1009.0285
1009.0392
1009.0468
1009.0487
1009.0568
1009.0575
1009.0793
1009.0821
1009.1160
1009.1219
1009.1419
1009.1429
1009.1439
1009.1467
1009.1500
1009.1670
1009.2152
1009.2199
1009.2644
1009.2973
1009.2984
1009.3061
1009.3383
1009.3608
1009.3973
1009.4059
1009.4322
1009.4440
1009.4454
1009.4750
1009.4814
1009.4995
1009.5245
1009.5296
1009.5366
1009.5783
1009.5835
1009.5842
1009.5893
1009.5912
1009.5970
1009.6023
1009.6138
1009.6225
1103.0255
1103.0324
1103.0533
1103.0868
1103.1041
1103.1152
1103.1272
1103.1295
1103.1310
1103.1354
1103.1418
1103.1776
1103.1801
1103.1906
1103.1920
1103.2043
1103.2087
1103.2202
1103.2470
1103.2513
1103.2576
1103.2600
1103.2629
1103.2657
1103.2825
1103.2959
1103.3136
1103.3365
1103.3428
1103.3533
1103.3576
1103.3803
1103.3810
1103.3858
1103.3945
1103.4068
1103.4508
1103.4514
1103.4518
1103.4725
1103.4752
1103.4796
1103.4994
1103.5137
1103.5227
1103.5406
1103.5473
1103.5505
1103.5728
1103.5826
1103.5960
1204.0109
1204.0287
1204.0362
1204.0530
1204.0609
1204.0620
1204.0705
1204.0712
1204.0930
1204.0994
1204.1090
1204.1351
1204.1600
1204.1841
1204.2001
1204.2057
1204.2568
1204.2595
1204.2709
1204.2963
1204.3112
1204.3193
1204.3215
1204.3222
1204.3313
1204.3387
1204.3549
1204.3937
1204.3947
1204.4516
1204.4641
1204.4648
1204.4953
1204.4963
1204.5014
1204.5134
1204.5141
1204.5160
1204.5166
1204.5192
1204.5490
1204.5494
1204.5510
1204.5565
1204.5956
1204.6131
1204.6443
1204.6457
1204.6520
1204.6569
1204.6589
1204.6681
1204.6731
1206.0098
1206.0128
1206.0320
1206.0407
1206.0455
1206.0779
1206.0860
1206.0892
1206.1107
1206.1136
1206.1167
1206.1170
1206.1175
1206.1342
1206.1474
1206.1535
1206.1613
1206.1761
1206.1811
1206.1823
1206.1941
1206.1945
1206.2023
1206.2259
1206.2376
1206.2409
1206.2576
1206.2815
1206.2849
1206.2880
1206.2955
1206.3011
1206.3020
1206.3057
1206.3082
1206.3139
1206.3396
1206.3409
1206.3544
1206.3652
1206.3703
1206.3744
1206.3947
1206.4177
1206.4186
1206.4227
1206.4353
1206.4530
1206.4731
1206.4740
1206.4950
1206.5012
1206.5167
1206.5449
1206.5523
1206.5867
1206.5868
1206.6143
1206.6174
1206.6212
1206.6327
1206.6340
1206.6638
1206.6690
1206.6708
1206.6731
1206.6743
1206.6904
1206.7001
1206.7074
1302.0044
1302.0048
1302.0078
1302.0125
1302.0144
1302.0276
1302.0348
1302.0472
1302.0571
1302.0778
1302.0872
1302.0917
1302.1038
1302.1058
1302.1167
1302.1218
1302.1244
1302.1247
1302.1384
1302.1439
1302.1454
1302.2039
1302.2100
1302.2294
1302.2315
1302.2329
1302.2338
1302.2405
1302.2639
1302.2784
1302.2789
1302.3149
1302.3192
1302.3207
1302.3212
1302.3531
1302.3678
1302.3811
1302.3840
1302.3899
1302.4042
1302.4192
1302.4396
1302.4401
1302.4434
1302.4513
1302.4626
1302.4825
1302.4902
1302.5020
1302.5038
1302.5210
1302.5304
1302.5588
1302.5591
1302.5719
1302.5976
1302.5987
1302.6042
1302.6046
1302.6097
1302.6116
1302.6375
1302.6583
1302.6950
1302.6954
1302.7066
1302.7249
1306.0033
1306.0107
1306.0136
1306.0143
1306.0167
1306.0204
1306.0280
1306.0403
1306.0819
1306.0822
1306.0943
1306.0988
1306.1113
1306.1114
1306.1117
1306.1138
1306.1172
1306.1174
1306.1376
1306.1477
1306.1524
1306.1558
1306.1715
1306.1728
1306.1900
1306.2012
1306.2032
1306.2254
1306.2382
1306.2383
1306.2741
1306.3073
1306.3103
1306.3508
1306.3513
1306.3648
1306.4006
1306.4046
1306.4179
1306.4290
1306.4299
1306.4344
1306.4386
1306.4387
1306.4416
1306.4481
1306.4504
1306.4559
1306.4573
1306.4850
1306.4891
1306.4943
1306.5225
1306.5283
1306.5403
1306.5497
1306.5635
1306.5645
1306.5656
1306.5732
1306.5872
1306.5952
1306.5956
1306.6391
1306.6398
1306.6409
1306.6786
1306.6821
1306.6902
1307.0259
1307.0554
1307.0625
1307.0630
1307.0900
1307.0960
1307.1036
1307.1047
1307.1054
1307.1065
1307.1455
1307.1521
1307.1600
1307.1664
1307.1768
1307.1801
1307.1981
1307.2069
1307.2127
1307.2131
1307.2163
1307.2527
1307.2604
1307.2666
1307.2770
1307.2833
1307.2895
1307.2976
1307.3042
1307.3047
1307.3096
1307.3215
1307.3287
1307.3462
1307.3693
1307.3716
1307.3809
1307.3815
1307.3971
1307.3983
1307.4006
1307.4047
1307.4111
1307.4203
1307.4245
1307.4320
1307.4328
1307.4387
1307.4393
1307.4439
1307.4679
1307.4884
1307.4936
1307.5033
1307.5088
1307.5115
1307.5401
1307.5407
1307.5413
1307.5417
1307.5453
1307.5509
1307.5836
1307.5927
1307.6029
1307.6054
1307.6076
1307.6443
1307.6502
1307.6693
1307.6944
1307.7363
1307.7431
1307.7455
1307.7778
1307.7794
1307.7797
1307.8030
1307.8135
1307.8161
1307.8236
1307.8321
1307.8347
1307.8370
1402.2703
1402.4005
1611.07204
1702.03425
1703.06195
1704.00273
1704.00487
1704.00600
1704.00657
1704.00779
1704.00851
1704.01109
1704.01156
1704.01303
1704.01418
1704.01658
1704.01726
1704.01892
1704.01907
1704.01951
1704.02459
1704.02480
1704.02611
1704.02634
1704.02871
1704.03066
1704.03378
1704.03434
1704.03510
1704.03637
1704.03771
1704.03842
1704.04143
1704.04150
1704.04262
1704.04318
1704.04388
1704.04540
1704.04640
1704.04665
1704.05535
1704.05666
1704.05994
1704.06068
1704.06132
1704.06401
1704.06585
1704.06667
1704.07022
1704.07090
1704.07159
1704.07200
1704.07209
1704.07264
1704.07311
1704.07328
1704.07634
1704.07902
1704.08037
1704.08060
1704.08184
1704.08417
1704.08474
1704.08483
1704.08952
1704.08959
1704.09016
1802.00339
1802.00556
1802.00558
1802.01099
1802.01260
1802.01324
1802.01330
1802.01608
1802.01711
1802.01944
1802.02027
1802.02321
1802.02478
1802.02533
1802.02630
1802.02726
1802.03073
1802.03078
1802.03087
1802.03382
1802.03387
1802.03443
1802.03444
1802.03552
1802.03553
1802.03579
1802.03618
1802.03754
1802.03846
1802.03947
1802.04022
1802.04481
1802.04531
1802.04677
1802.04689
1802.04921
1802.04984
1802.05026
1802.05061
1802.05158
1802.05222
1802.05331
1802.05468
1802.05582
1802.05704
1802.05724
1802.05770
1802.05953
1802.06031
1802.06097
1802.06170
1802.06200
1802.06298
1802.06499
1802.06696
1802.06985
1802.07046
1802.07519
1802.07609
1802.07646
1802.08001
1802.08015
1802.08443
1802.08556
1802.09003
1802.09039
1802.09250
1802.09309
1802.09521
1802.09858
1802.09969
1802.10075
1802.10239
1802.10486
math0008028
math0008029
math0008039
math0008044
math0008045
math0008052
math0008078
math0008096
math0008107
math0008146
math0008148
math0008152
math0008167
math0008172
math0008180
math0008186
math0008187
math0008210
math0008240
math0109106
math0109162
math0109166
math0109167
math0109168
math0109191
math0109196
math0109197
math0109220
math0109222
math0110028
math0110057
math0110062
math0110066
math0110078
math0110123
math0110157
math0110160
math0110174
math0110197
math0110218
math0111006
math0111065
math0111091
math0111128
math0111168
math0111173
math0111257
math0111282
math0208025
math0208038
math0208043
math0208057
math0208075
math0208077
math0208120
math0208125
math0208158
math0208210
math0208219
math0208221
math0208236
math0304062
math0304090
math0304125
math0304136
math0304137
math0304142
math0304149
math0304160
math0304252
math0304381
math0304399
math0304410
math0304433
math0304434
math0304458
math0304496
math9708216

@brucemiller
Copy link
Contributor

Here are some data for LaTeXML as a converter; No statistics on usage, as that depends on the converted documents -- that'll probably follow.

(1) LaTeXML: authoring tool converts TeX/LaTeX (full documents or fragments) to various forms of XML, HTML, including MathML.
(2) Native MathML intended, but users can configure polyfills such as MathJax when desired.
(3) Used (presentation) elements: annotation, annotation-xml, math, menclose, merror, mfrac, mi, mmultiscripts, mn, mo, mover, mpadded, mphantom, mprescripts, mroot, mrow, mspace, msqrt, mstyle, msub, msubsup, msup, mtable, mtd, mtext, mtr, munder, munderover, none, semantics.
Does NOT use: mlabeledtr, maction, mglyph, ms, mfenced (by default)
(4) Many MathML attributes are used.
Of the explicitly listed attributes, only mathvariant is used (but generally tries to map to Unicode).
(5) mstyle uses displaystyle, scriptlevel, mathcolor (potentially, but rarely, href)
(6) The listed named values for attributes are not used [but see note below]
(7) no leading/trailing whitespace

Note: currently there are a couple of stray "mathspace" values used that were overlooked. These will be replaced by explicit lengths in the next software update, so consider them as not used.

@fred-wang
Copy link

Note: currently there are a couple of stray "mathspace" values used that were overlooked. These will be replaced by explicit lengths in the next software update, so consider them as not used.

I've just released a new version of TeXZilla that replace named mathspace with explicit lengths ; and updated my reply accordingly.

@dginev
Copy link
Contributor

dginev commented Mar 23, 2019

Following up on @brucemiller 's comment, here is the data footprint of the Digital Library of Mathematical Functions

DLMF v1.0.20

  1. Description: DLMF v1.0.20 is a collection of 1828 HTML5 pages, converted from semantically-enriched LaTeX via LaTeXML 0.8.3. It contains 108,952 <math> elements.

  2. The DLMF is served at https://dlmf.nist.gov . It uses (metadata-enhanced) Presentation MathML for capable browser engines, and a MathJax polyfill for others (with client-side MathML rendering).

  3. MathML elements. A full report over the data can be seen here. It was generated by the llamapun toolkit. Comparing to the shortlist:

    • in use: munder, mover, msub, msup, msubsup, mphantom, mstyle.
    • not in use: mlabeledtr, merror, maction, mglyph, mfenced, ms.
  4. MathML attributes:

    • in use: mathvariant, align (on mtable),
    • not in use: numalign, denomalign, align (on munderover/munder/mover), bevelled, subscripshift, superscriptshift, other, macros, mode, index, fontfamily, fontweight, fontstyle, fontsize, color,
      background, veryverythinmathspace, verythinmathspace, thinmathspace, mediummathspace, thickmathspace, verythickmathspace, veryverythickmathspace
  5. <mstyle> attributes:

    • in use (expected): displaystyle, scriptlevel
  6. Attribute values:

    • linethickness -- numeric pt value only, none of named keywords
    • mathsize -- numeric % values only, none of named keywords
    • N/A attribute with value a nonzero number without unit (e.g. "4") other than scriptlevel
    • one use of "veryverythickmathspace", as Bruce mentioned
    • notation with value "radical" - none. Only notation attribute value used is updiagonalstrike
    • attribute with leading or trailing white space characters - none
  7. No trailing/leading whitespace in token elements.

Notes: I find a couple of the reported <mtable> "align" attribute values curious -- unsure if the MathML 4 effort would like to simplify the syntax here. The data for align[1] come from align="baseline 1", as my report splits attributes by whitespace. (e.g. in DLMF 16.17.E1 ). The other curious entry (e.g. in DLMF 10.61.E3 ) is for an align="bottom1". Just reporting these as curious syntax to my untrained eye, I'm by no means an mtable expert.

P.S. Expect a similar report on the full arXiv data later today, walking the corpus for data collection ended up closer to 3 days than 2.

Edit: thanks for the clarification Frédéric! Definitely worth removing the confusion.

@fred-wang
Copy link

@dginev Thanks for the detailed report, looking forward to the arXiv one. Two quick comments:

@dginev
Copy link
Contributor

dginev commented Mar 24, 2019

arXMLiv 08.2018

  1. Description: arXMLiv 08.2018 is an HTML5 dataset of 1.2 million scientific articles from arXiv.org, created by me as part of our work at the KWARC research group. The data is converted from LaTeX via LaTeXML 0.8.3 and the CorTeX build system.
    The collection contains ~550 million <math> elements, with parallel Content MathML annotations.

  2. The dataset can be both downloaded and explored online. The CorTeX preview uses Presentation+Content MathML for capable browser engines, and a MathJax polyfill for others (with client-side MathML rendering).

  3. MathML elements. A de-noised, but otherwise exhaustive, report over the data can be seen here for presentation MathML, as well as here for content MathML.

    • Worth mentioning is that since the arXMLiv dataset is not curated in any form, and includes documents with known latexml errors, there are documents where the MathML is wrongly polluted with elements from external namespaces. I have tried my best to remove all of these cases before reporting here, and included the script, so that there is transparency in what data got discarded, for anyone interested.
    • As Frédéric initially requested, I have included pre-computed ratios for each report row, compared to the total <math> elements in arXiv. It's a curious report to study (again, generated via the llamapun toolkit).
    • Comparing to the shortlist:
      • in use: munder, mover, msub, msup, msubsup, mphantom, mstyle, merror,
      • not in use: mlabeledtr, maction, mglyph, mfenced, ms.
  4. MathML attributes:

    • in use: mathvariant, align (on mtable), mathbackground
    • not in use: numalign, denomalign, align (on munderover/munder/mover), bevelled, subscripshift, superscriptshift, other, macros, mode, index, fontfamily, fontweight, fontstyle, fontsize, color,
      background, veryverythinmathspace, verythinmathspace, thinmathspace, mediummathspace, thickmathspace, verythickmathspace, veryverythickmathspace
  5. <mstyle> attributes:

    • in use : displaystyle, scriptlevel, id, xref, mathcolor, class, style
  6. Attribute values:

    • linethickness -- numeric pt value only, none of named keywords. (some errors for us to fix, but no intentional other use)
    • mathsize -- numeric % values only, none of named keywords
    • attribute with value a nonzero number without unit. Yes, in what I see are two cases: e.g. mtd@rowspan[2], mtd@columnspan[8].
    • one use of "veryverythickmathspace", as Bruce mentioned
    • notation with value "radical" - none. Notation attribute values are:
      - box, downdiagonalstrike, updiagonalarrow, updiagonalstrike
    • attribute with leading or trailing white space characters - none
  7. No trailing/leading whitespace in token elements.

Thanks for the patience with the report, a big part of the delay was the slowdown brought by the incredibly noisy error subset of the articles. I've left more details at the Gist, for anyone curious.

@fred-wang
Copy link

@dginev Thank you so much for this report, it's really cool to have such a big database of concrete MathML.

Regarding "attribute with value a nonzero number without unit", the survey should really be 'length attribute with value a nonzero number without unit'. However, I tried to make it understandable by anyone without detailed knowledge of the spec + so that one could easily write a script to extract data. mtd@columnspan and mtd@rowspan are defined as "positive-integer" ( https://mathml-refresh.github.io/mathml/chapter3.html#presm.mtdatts ) so they are not included in #24 ; I'll try updating the survey.

sideshowbarker added a commit to validator/validator that referenced this issue Mar 30, 2019
@sideshowbarker
Copy link

sideshowbarker commented Apr 1, 2019

I added use counters to the W3C HTML checker. You can view the current results here:

https://validator.w3.org/nu/stats.html

(Scroll down and look at the rows that start with Math.)

@sideshowbarker
Copy link

For the record here, the following is the relevant use-counter data collected so far from 2,316,780 documents checked by the W3C HTML checker:

Use-counter data for 2,316,780 documents
Counter Occurrences* Proportion
element <annotation> 82 0.000035
element <annotation-xml> 2 0.000001
element <maction> 0 0.000000
element <math> 208 0.000090
element <menclose> 10 0.000004
element <merror> 0 0.000000
element <mfenced> 28 0.000012
element <mfrac> 110 0.000047
element <mglyph> 0 0.000000
element <mi> 197 0.000085
element <mlabeledtr> 0 0.000000
element <mmultiscripts> 3 0.000001
element <mn> 197 0.000085
element <mo> 165 0.000071
element <mover> 13 0.000006
element <mpadded> 4 0.000002
element <mphantom> 2 0.000001
element <mprescripts> 2 0.000001
element <mroot> 32 0.000014
element <mrow> 157 0.000068
element <ms> 0 0.000000
element <mspace> 16 0.000007
element <msqrt> 76 0.000033
element <mstyle> 112 0.000048
element <msub> 112 0.000048
element <msubsup> 12 0.000005
element <msup> 100 0.000043
element <mtable> 42 0.000018
element <mtd> 42 0.000018
element <mtext> 53 0.000023
element <mtr> 42 0.000018
element <munder> 5 0.000002
element <munderover> 9 0.000004
element <none> 3 0.000001
element <semantics> 82 0.000035
attribute "actiontype" 0 0.000000
attribute "background" 0 0.000000
attribute "bevelled" 0 0.000000
attribute "color" 0 0.000000
attribute "colspan" 0 0.000000
attribute "columnalign" 35 0.000015
attribute "columnlines" 0 0.000000
attribute "denomalign" 0 0.000000
attribute "depth" 0 0.000000
attribute "dir" 0 0.000000
attribute "display" 32 0.000014
attribute "displaystyle" 104 0.000045
attribute "equalcolumns" 0 0.000000
attribute "equalrows" 0 0.000000
attribute "fontfamily" 0 0.000000
attribute "fontsize" 0 0.000000
attribute "fontstyle" 0 0.000000
attribute "fontweight" 0 0.000000
attribute "frame" 0 0.000000
attribute "height" 7 0.000003
attribute "index" 0 0.000000
attribute "linethickness" 6 0.000003
attribute "lspace" 5 0.000002
attribute "macros" 0 0.000000
attribute "mathbackground" 1 0.000000
attribute "mathcolor" 15 0.000006
attribute "mathvariant" 26 0.000011
attribute "maxsize" 2 0.000001
attribute "mediummathspace" 0 0.000000
attribute "minsize" 2 0.000001
attribute "mode" 0 0.000000
attribute "notation" 10 0.000004
attribute "numalign" 0 0.000000
attribute "other" 0 0.000000
attribute "rowlines" 0 0.000000
attribute "rowspacing" 1 0.000000
attribute "rowspan" 0 0.000000
attribute "rspace" 5 0.000002
attribute "scriptlevel" 83 0.000036
attribute "stretchy" 55 0.000024
attribute "subscripshift" 0 0.000000
attribute "superscriptshift" 2 0.000001
attribute "thickmathspace" 0 0.000000
attribute "thinmathspace" 0 0.000000
attribute "verythickmathspace" 0 0.000000
attribute "verythinmathspace" 0 0.000000
attribute "veryverythickmathspace" 0 0.000000
attribute "veryverythinmathspace" 0 0.000000
attribute "voffset" 0 0.000000
attribute "width" 16 0.000007
attribute "xmlns" 0 0.000000
element <mstyle> with attributes other than "dir", etc. 19 0.000008
attribute "linethickness" with value "thin", "thick" or "medium" 0 0.000000
attribute "mathsize" with value "small", "normal" or "big" 0 0.000000
attribute with unitless-length value 0 0.000000
attribute with "named space" value: "verythinmathspace", etc. 4 0.000002
attribute "notation" with "radical" in value 4 0.000002
attribute with leading/trailing whitespace in value 0 0.000000
element with leading/trailing whitespace in contents 6 0.000003

* out of 2,316,780 documents total

The final column is a proportion where 1.0 would mean 100%. So the 0.000090 number for the <math>-element counter means that 0.009% of documents checked had a math element.

And so assuming all the MathML content checked had a math element, that means the numbers for the other counters can considered relative to 208.

So the “element with leading/trailing whitespace in contents” means 6 out of 208 instances of math content — ~2.9% — had at least one element with leading/trailing whitespace in its text content.

@NSoiffer
Copy link
Contributor

It seems like mover/munder/munderover are big potential problems wrt to the accent rule. If we end up deciding that automatic determination of the value of the accent attr won't be part of core (can't use an ssty-like font attr or whatever), then it is important to get some usage stats as to how often the attr is specified and if it isn't given, how often it should be an accent vs a limit . To do that, we need to know what the second (and third) arguments are, or at least those that are mo and a count of the other cases.

@dginev's detailed data does provide us with those numbers (minus the characters that are accents) because the generator always uses the accent attrs. The really big arXMLiv's numbers are:

  • mover -- when it generates accent, it is always true. That's most of the time: 11.64% out of 11.89%
  • munder -- when it generates accentunder, it is always true. That's a smaller amount of time: .46% out of 2.24%
  • munderover -- when it generates accent and accentunder, it is always true. When one was true, the other was always true. Being true was rare: .03% out of 1.22%

For the smaller (but still substantial) DLMF dataset, there's a similar pattern (same generator):

  • mover -- 1050/1052 accent=true
  • munder -- 8/430 accentunder=true
  • munderover -- 0/2042 values true

So for this generator, we have a good indication that defaults for mover and munderover will work well. For munder, it will be wrong 20% of the time for arXiv, but only 2% of the time for DLMF. The first number is not great, but it's not awful.

These numbers are a great indication, but they come a single generator. Having data from a different generator would add a lot more validity to them.

@dani31415
Copy link

Some statistics extracted from the MathType Web / WIRIS services. Note that some attributes are invalid as MathML but that's what the users tried to use with MathType Web services.

1905001 Expressions

12082445 instances
<mo
  xmlns = "http://www.w3.org/1998/Math/MathML"
  lspace = "mediummathspace" | "thinmathspace" | "? em" | "? pt" | "? px"
  form = "postfix" | "prefix" | "infix"
  stretchy = "false" | "true"
  linebreak = "newline" | "nobreak" | "goodbreak" | "badbreak"
  mathsize = "? px" | "? em" | "big" | "? %" | "? pt"
  separator = "true"
  mathvariant = "bold" | "bold-italic" | "italic" | "normal" | "double-struck" | "fraktur" | "\"italic\""
  linebreakstyle = "before" | "after"
  indentshift = "? em"
  mathcolor = "#??????"
  symmetric = "true"
  fence = "true" | "false"
  accent = "false" | "true"
  class = ...
  movablelimits = "true" | "false"
  mathbackground = "#??????"
  id = ...
  dir = "rtl"
  style = ...
  minsize = "? em" | "? %"
  background = "violet"
  rspace = "mediummathspace" | "? em" | "? pt" | "? px"
  largeop = "true"
  fontstyle = "normal"
  maxsize = "? em" | "? %" | "1"
>

10973191 instances
<mi
  mathbackground = "#??????"
  style = ...
  background = "violet"
  id = ...
  dir = "rtl"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  fontstyle = "normal" | "italic"
  mathsize = "? px" | "? %"
  title = ...
  mathcolor = "#??????"
  mathvariant = ...
>

6552976 instances
<mn
  style = "color:#ff0000" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#000000" | "font-size: 80%"
  mathbackground = "#??????"
  id = ...
  dir = "rtl"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML" | "http://www.w3.org/1999/xhtml"
  fontsize = "? px" | "20"
  bold-italic = ""
  mathsize = "? px" | "0.5" | "? %"
  title = ...
  wrs:positionable = "true"
  mathcolor = "#??????" | "red"
  mathvariant = "bold" | "italic" | "bold-italic" | "normal" | "double-struck" | "bold>1</mn> </mrow><mrow> <mi mathvariant="
>

2026141 instances
<mrow
  wrs:positionable = "true" | "false"
  dir = "rtl"
  style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;" | "color:#ff0000" | "color:#c83740"
  class = ...
  mathcolor = "#??????"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

1777653 instances
<math
  width = "444"
  linebreak = "auto"
  xmls = "http://www.w3.org/1998/Math/MathML"
  mode = "inline" | "display"
  indentalign = "left" | "id" | "right"
  mathvariant = "italic"
  border = "1"
  class = ...
  displaystyle = "true"
  xmlns = ...
  indenttarget = "aaa1" | "aaa2"
  altimg = ...
  tex = "\Omega" | "{}^{2}" | "\boldsymbol{\mathsf{G_{max}}}"
  mathsize = "? em" | "? px" | "16px;" | "15px;" | "? pt" | "medium" | "17px;"
  xml:id = ...
  display = "block" | "inline" | "" | "block;" | "blockquote" | "inline-block"
  text = "Omega" | "^2" | "G _ max"
  mathcolor = "#??????" | "white" | "blue"
  http: = ""
  times = ""
  indentshiftfirst = "? em"
  title = ...
  displaystye = "true"
  id = ...
  float = "left"
  wrs:positionable = "false"
  alttext = ...
  overflow = "scroll" | "scale"
  style = ...
  scriptlevel = "-1"
  baseline = "-2.5"
  align = "center" | "left"
  indentshift = "? em"
  roman = ""
  dir = "rtl" | "\"rtl\""
>

1130582 instances
<mfrac
  style = ...
  id = ...
  linethickness = "0" | "? px" | "1" | "? pt"
  mpadded = "0"
  dir = "rtl"
  denomalign = "center"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  bevelled = "true" | "\"true\""
  title = ...
  numalign = "center"
  mathcolor = "#??????"
  mathvariant = "bold"
>

955930 instances
<msup
  mathsize = "? em"
  dir = "rtl"
  style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"
  class = ...
  mathcolor = "#??????" | "blue"
  title = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

685983 instances
<mspace
  id = ...
  width = "? em" | "- ? em" | "thickmathspace" | "negativethinmathspace" | "? px" | "? pt" | "thinmathspace" | "50" | "mediummathspace" | "? cm" | "? ex" | "3"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  depth = "? ex" | "? em"
  mathsize = "? px"
  linebreak = "newline" | "\"newline\"" | "\"newline\"/" | "././newline" | ""newline"" | "nobreak"
  height = "? em" | "? ex" | "? pt"
  mathcolor = "#??????"
  mathvariant = "bold" | "italic"
>

596815 instances
<msub
  class = ...
  mathcolor = "#??????"
  mathbackground = "#??????"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

527555 instances
<mfenced
  style = ...
  id = ...
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  separators = "|" | "" | "?" | "|,"
  open = "[" | "{" | "|" | "" | "(" | "||" | "?" | "<" | "?" | "?" | "?" | "c" | " " | "¨{¨" | "¨|¨" | "?" | "a" | "]" | "open" | "{{lessthan}}" | "?" | "\"{\"" | ")" | "&#060;" | "¨||¨"
  openclosebrackets = ""
  columnspacing = "200 px;"
  wrs:valign = "middle-baseline" | "middle"
  close = "]" | "}" | "" | "|" | ">" | "||" | "?" | ")" | "?" | "?" | "?" | " " | "¨¨" | "¨|¨" | "?" | "[" | "{{greaterthan}}" | "?" | "\"}\"" | "&#062;" | "¨}¨" | "¨||¨"
  mathcolor = "#??????"
  mathvariant = "bold" | "normal" | "bold-italic"
>

438269 instances
<mtd
  columnalign = "left" | "center" | "right"
  class = ...
  columnspan = "1" | "3"
  id = ...
>

242575 instances
<msqrt
  dir = "rtl"
  mathcolor = "#??????"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

229436 instances
<mtr
  mathsize = "small"
  columnalign = "left" | "right"
  class = ...
  mathbackground = "#??????"
  id = ...
>

209278 instances
<mstyle
  xmlns = "http://www.w3.org/1998/Math/MathML"
  fontweight = "bold"
  indentalign = "left" | "center" | "right"
  mathsize = "? px" | "? pt" | "? em" | "normal" | "? %" | "\"18px\"" | "24" | "18" | "14" | "38" | "8"
  encoding = "LaTeX"
  displaystyle = "true" | "false" | "\"false\"" | "Ã?"falseÃ?"Ã?" | "false''" | ""true"" | "font-family:'Times New Roman' true" | "" | "¨false¨"
  mathvariant = "italic" | "bold" | "normal" | "bold-italic" | "sans-serif" | "fraktur" | "script"
  mathcolor = "#??????" | "red" | "green" | "blue" | "black" | "Black" | "Green" | "DarkGreen"
  denomalign = "center"
  numalign = "center"
  class = ...
  mathbackground = "#??????"
  id = ...
  rowspacing = "? ex"
  scriptsizemultiplier = ".85"
  style = "font-family: 'Euclid Fraktur';font-weight: normal;font-style: normal;" | "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"
  scriptlevel = "0" | "+1" | "-1"
  fontfamily = "Palatino, serif" | "Palatino, serif;" | "serif"
  lineleading = "? ex"
>

107766 instances
<mover
  accent = "true" | "false"
  wrs:positionable = "false"
  class = ...
  mathcolor = "#??????"
  mathbackground = "#??????"
  id = ...
  align = "center"
>

98094 instances
<mtext
  style = "border-color: black" | "font-size: larger;"
  mathbackground = "#??????"
  id = ...
  dir = "rtl"
  class = ...
  xmlns = "http://www.w3.org/1998/Math/MathML"
  mathsize = "? pt" | "? px"
  xml:lang = "es"
  label = "unit"
  mathcolor = "#??????" | "0d87c5"
  matcholor = "#??????"
  mathvariant = "bold" | "bold-italic" | "double-struck" | "italic" | "normal" | "script"
>

96791 instances
<mtable
  columnalign = ...
  mathsize = "? px"
  displaystyle = "true" | "false"
  wrs:columnalign = "relation" | "center center relation" | "relation center left" | "relation relation relation" | "center relation center" | "relation center relation relation" | "relation center relation" | "center relation"
  mathcolor = "#??????"
  columnspacing = ...
  columnlines = ...
  class = ...
  frame = "solid" | "none" | "dashed"
  equalcolumns = "true" | "false"
  rowalign = ...
  id = ...
  width = "? %"
  rowspacing = ...
  align = "center" | "axis" | "right" | "axis 3"
  style = "text-align:axis;" | "" | "text-align: axis;" | "display: block; margin-top: 1.0em; margin-bottom: 2.0em" | "text-align:axis"
  equalrows = "true" | "false"
  fontsize = "? px"
  rowlines = ...
  columnwidth = "auto fit"
>

55664 instances
<menclose
  border = "1"
  notation = ...
  class = ...
  mathcolor = "#??????" | "#ff000"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  align = "center"
>

43909 instances
<msubsup
  class = ...
  mathcolor = "#??????"
  id = ...
>

36164 instances
<mroot
  mathcolor = "#??????"
  xmlns = "http://www.w3.org/1998/Math/MathML"
  id = ...
>

32125 instances
<munder
  wrs:positionable = "false"
  underaccent = "false"
  accentunder = "false" | "true"
  class = ...
  mathcolor = "#??????"
  id = ...
>

26913 instances
<semantics
  style = "line-height: 22.28px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"
  id = ...
>

13441 instances
<mmultiscripts
  mathcolor = "#??????"
>

13273 instances
<mprescripts>

12801 instances
<munderover
  accent = "false"
  accentunder = "false"
  mathcolor = "#??????"
>

7094 instances
<msrow>

2781 instances
<maction
  actiontype = "argument" | "\"argument\"" | "argumentvalue"
  mathcolor = "#??????"
>

2344 instances
<msline
  position = "2" | "1" | "3" | "4" | "6"
  length = "2" | "3" | "6" | "5" | "4" | "1" | "14"
  mathcolor = "#??????"
>

2169 instances
<mstack
  charspacing = "? px"
  mathcolor = "#??????"
  stackalign = "right"
  charalign = "center"
>

1446 instances
<mlongdiv
  longdivstyle = "shortstackedrightright"
  charspacing = "? px"
  mathcolor = "#??????"
  stackalign = "left"
  charalign = "center"
>

1423 instances
<msgroup>

883 instances
<mpadded
  height = "? pt"
  lspace = "- ? px" | "+ ? px"
  voffset = "+ ? px" | "- ? px"
  width = "+ ? pt" | "0"
  voffsett = "- ? em"
  mathcolor = "#??????"
  depth = "? pt"
>

252 instances
<maligngroup
  class = ...
>

197 instances
<mphantom
  font-style = "normal"
>

174 instances
<malignmark>

88 instances
<mlabeledtr>

49 instances
<ms
  mathcolor = "#??????"
>

15 instances
<merror
  class = ...
>

4 instances
<mscarries
  location = "nw" | "s"
>

3 instances
<matrixrow>

@NSoiffer
Copy link
Contributor

NSoiffer commented Apr 30, 2019 via email

@NSoiffer
Copy link
Contributor

NSoiffer commented May 6, 2019 via email

@NSoiffer
Copy link
Contributor

NSoiffer commented May 6, 2019 via email

@fred-wang fred-wang added the group admin Tracking agenda and other administrative issues label May 16, 2019
@fred-wang
Copy link

I had put this on github but I believe it would be really nice to have a better process to collect the replies of the survey, to provide a page to present the results in a consistent way and to allow us to update the questions.

@runarberg
Copy link

runarberg commented Dec 29, 2019

I’m a bit late in the game here, but I’m the author of Mathup (npm; GitHub)—an authoring library that transforms an AsciiMath-like syntax into MathML. I had pretty much abandoned the project but there are still a few user using it (mostly in their custom browser based notebooks where they are taking on the fly math notes). I revisited the project last month and am planning a complete rewrite. Below are my answers to the survey:

  1. Description: Ascii2MathML—an AsciiMath-like to MathML converter
  2. Native MathML: Yes. See website. Future versions plan to offer a .toString(), .toDOM(), and .toVirtualDOM() options all in MathML.
  3. MathML Elements: annotation, math, menclose, mfenced, mfrac, mi, mn, mo, mover, mroot, mrow, msqrt, msub, msubsup, msup, mtable, mtd mtr, munder, munderover, and semantics.
  4. MathML Attributes: mathvariant, bevelled, veryverythickmathspace.
  5. Attributes on the mstyle Element: None. The tool does not use the mstyle element at all.
  6. Attribute Values: lspace and rspace with a value of veryverythickmathspace.
  7. Trailing/Leading Whitespace in Token Elements: No.

Note that in my current rewrite I plan to drop deprecated element and attributes (such as mfenced).

@davidcarlisle
Copy link
Collaborator

this survey proved useful, but is now completed, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility Issues affecting backward compatibility group admin Tracking agenda and other administrative issues
Projects
None yet
Development

No branches or pull requests