New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IFTB draft #151
base: main
Are you sure you want to change the base?
IFTB draft #151
Conversation
Hi @skef could you please go to your W3C account and then link your GitHub account to it. That enables the IPR bot to recognize you. Thanks! |
Here's some high level thoughts, we can likely discuss a lot of this during the upcoming TPAC meeting:
|
IFTB.bs
Outdated
TR: https://www.w3.org/TR/IFTB/ | ||
ED: https://w3c.github.io/IFT/IFTB.html | ||
Editor: Chris Lilley, W3C, https://svgees.us/, w3cid 1438 | ||
Editor: Myles C. Maxfield, Apple Inc., mmaxfield@apple.com, w3cid 77180 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should remove Myles and add yourself as an editor.
IFTB.bs
Outdated
"chunk set" (which could be a bitmap or std::vector<bool> indexed by chunk index. | ||
5. The browser then look up each layout feature in the font subset description in the IFTB table | ||
featureTable. That table maps the initial GID-mapped chunks to higher-indexed feature-specific | ||
chunks. If any chunk in the set maps to a feature-specific-chunk the latter is added to the set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify here if chunks that are added by a feature are allowed to trigger further additions on later features? This process will need to be very precisely defined so that multiple implementations will always arrive at the same result.
Overview.bs
Outdated
=========================================================== | ||
|
||
<!-- TODO: remove obsolete tag once the separate range request spec is published --> | ||
Range request incremental font transfer is specified in a separate document: [[RangeRequest obsolete]] | ||
Binned Incremental Font Transfer is specified in a separate document: [[IFTB obsolete]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[IFTB]] won't work until we've commited the IFTB document. As a workaround you can instead use a regular old link <a href="IFTB.html">...</a> until the document is committed.
Overview.bs
Outdated
static arrangement | ||
of bins makes it more compatible with caching, including regional caching. ("More compatible" in the | ||
sense that chunk files will see a higher cache hit rate compared with subset and patch files.) All IFTB | ||
data is compressed at Brotli level 11 upfront. IFTB transfers all other tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brotli level 11 is an implementation detail of the current open source brotli encoder, but not actually something part of the brotli standard. Instead maybe just say maximum quality?
@@ -229,6 +255,10 @@ Opt-In Mechanism {#opt-in} | |||
|
|||
<em>This section is general to both IFT methods.</em> | |||
|
|||
(XXX Because IFTP is a protocol and IFTB is a format, I suspect most of this section and the related | |||
technology questions are superfluous.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed I think we'll end up with tech(incremental-patch) used with patch subset and format(new-iftb-format-name) for IFTB. Then there's no need for the incremental-auto/incremental-range mechanism and related text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but am leaving this as-is for now.
Offset32 CFFCharStringsOffset - 0 if glyf-based | ||
Offset32 gidMapOffset | ||
Offset32 chunkOffsetListOffset | ||
Offset32 featureMapOffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere Offset fields are used be sure to mention what those offsets are relative too.
IFTB.bs
Outdated
The chunk set is a bit array indicating whether the corresponding chunk is | ||
present. The bits for chunks 0 through 7 are in chunkSet[0], those for 8 | ||
through 15 are in chunkSet[1], and so on. Within a byte the lowest chunk index | ||
is represented by the 1s bit, then the 2s, then the 4s, and so on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be as unambiguous as possible it's helpful to define the mapping to bits within a byte using the most and least significant bit (eg. the least significant bit is chunk 0, the most signifcant bit is chunk 7)
file. The string must contain substrings of "$1", "$2", "$3", "$4" and/or "$5", which must be replaced | ||
with the corresponding hexidecimal digits of the chunk index ("$1" being the ones digit, "$2" being the | ||
sixteens digit, and so on) to get the relative URI of the chunk. (This can then be combined with the | ||
URL of the initial IFTB font file to produce the absolute URL of the chunk.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the encoding of the strings? (eg. ascii? utf-8?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should adopt whatever currently makes sense from a w3c perspective. I believe URIs were ASCII for a long time but I don't know if that's still true. We should also be careful about the dollar signs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://datatracker.ietf.org/doc/html/rfc6570 is probably relevant here. A standardized way to have template parameters in a URL and it also talks about encoding in section 1.6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through this. Given that IFTB would/will be a w3c spec maybe we should consult @svgeesus at a future meeting about the template format. The powers that be might prefer something more elegant than my dollar-sign expansion (although the spec allows for it).
For the time being I've been assuming the encoding will wind up being either ASCII or whatever corresponds to an "encoded" url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's should be OK to reference an IETF RFC in this spec, we reference quite a few in the patch subset specification. You can add reference like this: [[RFC6570]]. Agreed we can discuss this further at the next meeting.
IFTB.bs
Outdated
* The set of glyphs contained in the chunks loaded through the GID and feature maps must be a superset | ||
of those in the GID closure of the font subset description. | ||
|
||
The encoder has three options for addressing with joint dependencies on individual glyphs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would clearly mark this part (the 3 options below) as being non-normative since it's just providing advice on how an encoder might be built and not actually laying out requirements that an encoder must follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or another option this could be re-framed as things that the encoder may do which are valid. For example it's valid to place a glyph in more than one chunk if needed.
process, it can be moved to bin 0 where it will always be included in the initially loaded file. | ||
|
||
Glyph Bin Locality {#iftb-bin-locality} | ||
--------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is probably also non-normative.
One additional thought regarding this: another approach that could be used to reduce the size of the initial file would be to split the font into a small number of subsets where possible and have each individual subset then augmented by IFTB (using unicode-range in CSS to select the subsets that are needed). For example in the CJK case you could (if it wouldn't break layout rules) split the font into a high usage and low usage subset. Then if a client doesn't need anything from the low usage subset it would only download and augment the high usage subset thereby saving from having to transmit all of the layout and metric information from the low usage glyphs. I don't think spec changes are needed to accommodate this, but probably worth mentioning in the section talking about optimizing the encoder. |
Starting to work through some of these comments ...
After thinking about this question for a while I've arrived at some philosophy, or perhaps ideology, for myself: The role or purpose of IFT is to try to improve the network transfer of font data. Accordingly, when considering local storage size we should generally be following developments in general font technology, or influencing such general development but not worry about or attempt unilateral improvements. There are lots of potential ways of organizing a font file. If one were primarily concerned about persistent storage size, one could store the whole file compressed. If one were primarily concerned about in-memory size, one could store most of the per-glyph data together, perhaps on power-of-two boundaries, so that it could be easily paged in. If one were worried about both, one could do both: Store the per-glyph data in compressed, possibly page-aligned chunks, loading and decompressing only those chunks needed at a given time. The general direction of font development in recent years has gone in the opposite direction. CFF had the advance in the glyph, glyf has phantom points. OpenType has moved much of this data into separate tables for better access during shaping. And although there is currently a component proposal to the ISO ad-hoc group for reducing file size, it chose to leave htmx and vtmx as they were. But IFTB is saving substantial amounts of space -- that of any un-merged glyph data. And that it doesn't do so for other tables like hmtx and vmtx is pretty inherent to its GID-preservation-based design. Changing GIDs would (I think) require building knowledge of basically all the shaping tables back into the client-side, making it more complicated than range-request would have been. And experimenting with run-length-encoded h/vmtx, and therefore requiring shapers to be updated to support those formats, seems like beyond the scope of our project. I suppose the counter-argument would be that web fonts are more ephemeral on a given system than system fonts, so there is more to save space. But I'm not convinced on that basis. |
You're right, this shouldn't be a requirement, nor (as I advocated several months ago) should recalculating the checksums be a requirement. How about we change the language to suggest that clients be enhanced to use the version directly and ignore the checksums, but indicate that in contexts where that is not possible/desirable the client implementation can make the necessary adjustments? |
I am open to this discussion but we should think about the implications, not currently discussed in my added documentation (as far as I remember). With IFT you have initial loads and augmentations. Each of these involves retrieving data you need to render the page correctly. There is an already well-established related question of what to do in the mean time. What many browsers do is render with fallback fonts, resulting in the oft-complained-about web font "flash" or "flicker". If we continue to include the shaping data in IFTB browsers will have more rendering options, particularly when it comes to augmentation, because they will be able to arrive at the final page layout once they have the base file. Backup glyphs might then be temporarily coerced into the metrics of the IFTB font and replaced when loaded. Beyond that specific consideration, I'm not sure IFTB will ever be the right format if you're worried about 40k here or 80k there out of initially huge font files like the Noto fonts, just because you will almost inevitably be loading many more glyphs than you'll need for rendering, which will overwhelm those numbers. And looking at your spreadsheet, The plurality of the initial compressed font size is still in the CFF (or glyf/gvar for ttf) table, which I suspect is mostly the result of my crappy encoder. We should think about these files as having a much smaller initial percentage of glyphs. |
Seems like a huge mess but we can talk about it. |
".otf" seems easy enough to replace if we want to. I'm more reluctant to move away from ".woff2" for the compressed files for two reasons:
|
For "traditional" ligature scenarios I'm not sure I'm convinced of the benefit. Are there cases where you'll really be able to group enough related ligatures together to warrant an independent chunk? Or are you thinking it would be worth having single-glyph chunks just for this case? Maybe looking at some specific ligature scenarios would help convince me. OTOH I can see how something like this might be more directly relevant to emoji fonts. I haven't dug into emoji specifics very much in thinking about IFTB. |
I made some updates |
I went through my IFTB commits from earlier this year and reorganized them against the current main branch. Unfortunately the result doesn't quite build yet, I think because the separate IFTB document (which does build) isn't in W3C's bibliography.
A few notes:
Preview | Diff