Produce standalone fonts when subsetting #27

wezm · 2020-06-03T06:46:44Z

2022 Update:

cmap generation when subsetting, which the original text of this issue focussed around landed in Allsorts 0.9. However, this does not get us all the way to generating standalone fonts. This issue will act as a tracking issue for things that are required to achieve that.

Original text:

The subsetting feature is currently tailored for the needs of subsetting fonts for embedding in PDFs, since that was our primary use case when developing Allsorts. The issue is that we don't include a cmap table in the subset font, which makes it invalid for use outside PDF. When a subset font is embedded in a PDF the cmap info is contained in the PDF directly, so we don't need to include it in the font.

In order to support more general subsetting it would be convenient to have an entry point that takes a list of chars and produces a font with glyphs for just those chars. This would be an incremental improvement on what we have so far and would still have some limitations: with chars as input there wouldn't be a way to include ligature glyphs. Doing so would require subsetting gpos, and gsub tables as well, which is a problem for another day.

The subsetting code lives in subset.rs. The new function signature could be along these lines:

/// Subset this font so that it only contains the glyphs for the supplied `chars`.
pub fn subset_chars(
    provider: &impl FontTableProvider,
    chars: &[char],
) -> Result<Vec<u8>, ReadWriteError>

The implementation would need to map chars to glyph ids using a technique similar to this. The subset font would need to include a new cmap table (probably using the Unicode platformID). There's a bunch of formats to chose from to encode the data. An initial implementation might just choose one of the simpler ones at the cost of size of the resulting font. A more sophisticated implementation could examine the data to determine the best option.

The text was updated successfully, but these errors were encountered:

ebraminio · 2020-06-03T06:52:52Z

The issue is that we don't include a cmap table in the subset font, which makes it invalid for use outside PDF.

Some PDF readers also won't work without a valid cmap, https://crbug.com/1071958 guess is needed for their text selection to work properly.

yisibl · 2020-08-21T19:41:05Z

Looking forward to this feature.

wezm · 2022-03-29T05:58:19Z

I've just released 0.9, which implements building of a proper cmap table for subset fonts.

yisibl · 2022-03-29T06:33:39Z

@wezm Can you upgrade the dependency version in allsorts-tools?

Looks like it can be solved: yeslogic/allsorts-tools#16

wezm · 2022-03-29T07:12:24Z

Yes I'm working on that next. I have a draft PR open for it yeslogic/allsorts-tools#18

wezm · 2022-03-30T01:32:53Z

Reopening as we strip the OS/2 table which is required in OpenType fonts.

yisibl · 2022-03-30T04:44:07Z

@wezm I tried to submit a PR to fix it, PTAL. #58

yisibl · 2023-01-03T11:41:18Z

Happy New Year! Any progress here?

wezm · 2023-01-03T23:41:35Z

No, sorry it's a pretty big piece of work that has not been scheduled yet.

dnlmlr · 2023-03-02T19:55:07Z

Hey! I am also trying to use subsetting for embedded fonts in PDF documents. Since I want to avoid getting too deep into the low level PDF structure, I am just using the genpdf -> printpdf -> lopdf stack. The plan was to embed the full subsetted font into the PDF files without touching the PDF internal mappings /Differences.

I got it to work on all tested PDF readers and printers with the current implementation of subset even though the OS/2 table is missing, but only if Unicode Encoding Records are used (mappings with CharExistence::BasicMultilingualPlane, CharExistence::AstralPlane). If CharExistence::MacRoman or CharExistence::DivinePlane is used, it doesn't work.

Would it be a sensible thing to allow forcing the default mode to be Unicode or are there any problems with this?

One workaround that I think I'll be using for now is to manually add a '€' character to the glyph_ids subset so that it can't be encoded with MacRoman, but this is not the nicest solution and will be a problem if a font doesn't actually have '€'

wezm · 2023-03-06T00:42:50Z

Would it be a sensible thing to allow forcing the default mode to be Unicode or are there any problems with this?

I don't think that would make sense as a default as it would unnecessarily inflate the font. There is already an internal CmapStrategy enum used to drive some of the cmap generation behaviour. A new variant could be added to that and then some way to select that strategy could be added.

dnlmlr · 2023-03-06T11:01:11Z

Yeah I agree that it shouldn't be default, since this is kind of an edge case. What I meant was a way to externally change the encoding mode, for example as a parameter to the subset function. Basically any mechanism that would allow to optionally prevent encoding with MacRoman.

wezm added the subsetting label Jun 3, 2020

wezm mentioned this issue Jun 3, 2020

Does the subsetting feature support non-roman characters? #26

Closed

wezm mentioned this issue Jan 19, 2022

otf is not a valid font file #56

Closed

wezm closed this as completed Mar 29, 2022

wezm reopened this Mar 30, 2022

wezm mentioned this issue Apr 1, 2022

fix: OS/2 tables must be included when subsetting OpenType fonts #58

Closed

wezm mentioned this issue Sep 15, 2022

Subset command does not produce complete font yeslogic/allsorts-tools#16

Open

dnlmlr mentioned this issue Mar 2, 2023

Don't include unused fonts in the PDF document fschutt/printpdf#134

Merged

dnlmlr mentioned this issue Apr 3, 2023

Fix subsetting for fonts that dont have € fschutt/printpdf#136

Merged

wezm mentioned this issue Jul 13, 2023

xx.otf is not a valid font file after use subset #92

Open

wezm mentioned this issue Dec 21, 2023

Serializing into WOFF2? #101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Produce standalone fonts when subsetting #27

Produce standalone fonts when subsetting #27

wezm commented Jun 3, 2020 •

edited

Loading

ebraminio commented Jun 3, 2020

yisibl commented Aug 21, 2020

wezm commented Mar 29, 2022

yisibl commented Mar 29, 2022

wezm commented Mar 29, 2022

wezm commented Mar 30, 2022

yisibl commented Mar 30, 2022

yisibl commented Jan 3, 2023

wezm commented Jan 3, 2023

dnlmlr commented Mar 2, 2023 •

edited

Loading

wezm commented Mar 6, 2023

dnlmlr commented Mar 6, 2023

Produce standalone fonts when subsetting #27

Produce standalone fonts when subsetting #27

Comments

wezm commented Jun 3, 2020 • edited Loading

ebraminio commented Jun 3, 2020

yisibl commented Aug 21, 2020

wezm commented Mar 29, 2022

yisibl commented Mar 29, 2022

wezm commented Mar 29, 2022

wezm commented Mar 30, 2022

yisibl commented Mar 30, 2022

yisibl commented Jan 3, 2023

wezm commented Jan 3, 2023

dnlmlr commented Mar 2, 2023 • edited Loading

wezm commented Mar 6, 2023

dnlmlr commented Mar 6, 2023

wezm commented Jun 3, 2020 •

edited

Loading

dnlmlr commented Mar 2, 2023 •

edited

Loading