Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for modern handling of errata-corrige or greater changes in ePub3 publications #24

Open
P5music opened this issue Dec 13, 2021 · 13 comments

Comments

@P5music
Copy link

P5music commented Dec 13, 2021

Proposal for modern handling of errata-corrige or greater changes in ePub3 publications

ePubs can have subsequent editions, but here I mean a different kind of changes that can happen to be made to ePubs:
I know that when you create an ePub publication on some important ePub self-publishing firm, it is possible to upload a corrected version of the same ePub, that customers could download, to update their copy if they want.
This means that the HTML code could change in XHTML documents with no change in ISBN or other metadata publication ID.

The above mentioned firms also have annotating systems (I do not know what is their approach to this problem). And many RSs also have annotation systems.

This proposal is to overcome the epubcfi limitations when something changes.
Or it is general and does not deal with the epubcfi system at all, given its limitations.

This would encourage publishers to make errata-corrige changes respecting reasonable constraints, or providing the old version of the modified elements.

I would like to know, and hear from you, whether it could be possible to devise a method of handling errata-corriges in ePubs that
(this is the proposal)
encompasses including the old version of an HTML element or structure along the new one, in a way that it does not break the eoubcfi values (or it does) but that allows that still the old element is there, like hidden, still in the DOM but not for display,
and even subsequent versions are possible as well (like versioning but inside the XHTML document, not like a it was a website with file versioning).

I think this could be tricky, but maybe HTML5 has something for that.

It could be something "included" in the new element, like a special attribute string with HTML inside (tricky for escaping with subsequent versions maybe?).

Or an attribute that, if present, it refers to another element (it means "this element has a previous version, please see ID='old_chap01_par27' at the end of the DOM").

Or a special enclosing recursive tag I do not know.

The main goal should be that the old epubcfi (or other kind of positioning value) the annotation system has got, is still useful to retrieve the new element, but the RS know that it has changed, and it could even understand what changed and whether the old position is within that range or not.

It would be useful also when a mistake in the HTML layout appearance was made when releasing the publication.

That should be a method that also would be applied to an HTML structure, if necessary (this is vague but I mean something that is "bigger" than just an HTML element).

This proposal is mainly for the RSs to be allowed to retrieve the position of some annotation system even if something changes.
Epubcfi values are very prone to disruption when those changes happens. I made a previous proposal about forcing publishers to put id on every HTML element so that it is redundant to put XML indexes. The proposal was abandoned. But it was not even enough in fact.

This new proposal is maybe better because
-it is not so expensive when single HTML elements change (that is the common case)
-it makes a sort of versioning system available.

It would be an improvement.
This is my idea, maybe it is so simple, but other ideas are welcome because I think this would be an important addition to ePub3 (or ePub4) specs.

Regards

@iherman
Copy link
Member

iherman commented Dec 14, 2021

In my reading, this is out of scope for this Working Group, for two reasons

  • None of the current EPUB documents specify an annotation system for EPUB content; at the moment, this is fully under the purview of the Reading Systems, ie, are implementation specific. Personally, I deplore that situation and would love to see a specification for an interoperable way to handle annotation by EPUB Readers, but that is the reality. Note that W3C has a Recommendation for annotation in general[1][2], but, alas!, it is not widely implemented (although it may be a first step for such interoperable EPUB annotations).
  • The WG's charter clearly says that any new feature should be first incubated in the Publishing Community Group. What you describe here (if my understanding is correct) is a typical case for a feature that would require incubation. I would suggest you join the CG and raise this issue there, and see if you can, together with others, incubate this feature to get to a maturity (implementation facility, etc) so that it could come to a standardization in this WG or its descendant.

[1] Web Annotation Data Model
[2] Web Annotation Protocol

@mattgarrish
Copy link
Member

There is a specification for interoperable distribution/interchange of annotations in EPUBs, but it died with the IDPF and no one has shown any interest in reviving it: http://idpf.org/epub/oa/

The change from the CG's "Open Annotation" to the WG's "Web Annotation" didn't materially affect the EPUB spec., as far as I recall, but it did get CFIs into the list of selectors in the W3C version.

At any rate, instead of closing shouldn't this just be transferred to the CG's repository to let them decide how to pursue?

@P5music
Copy link
Author

P5music commented Dec 14, 2021

@mattgarrish

I saw that, but it seems to be about annotations at publication level, correct me if I am wrong.

My proposal is mainly about user's annotations in regard to that.
But it is more general in fact,
because it is about being able to have a sort of versioning of HTML elements inside the XHTML documents,
in a way that allows to detect when a change happened within a certain range (the annotation range or point).

In the case when it is not possible to put the annotation in the exact place, the RS could present the user a possible comparison between the two versions of the HTML element or structure.

This is also related to epubcfi values because they allow to point even to a x,y position on images, for example, being that some RSs can even annotate up to that level of precision.
And a specified position of a char within element text is retrievable with xxx yyy text "hints" as redundant information, instead of offsets.

But epubcfis values are very prone to being disrupted by changes in HTML code, even with the hypotetical improvement of using IDs instead of XML indexes.

Regards

@iherman
Copy link
Member

iherman commented Dec 14, 2021

because it is about being able to have a sort of versioning of HTML elements inside the XHTML documents, in a way that allows to detect when a change happened within a certain range (the annotation range or point).

If this is what you are talking about, then this means introducing a mechanism within (X)HTML and not specifically to EPUB. The approach of this WG is to (try to) keep away from adding any mechanism into HTML and rely on, instead, the HTML standard as is. (Older versions of EPUB have gone down the route of EPUB-specific features, like switch, and that wasn't really a success...) All the more reason to say: this is out of scope for this WG.

At any rate, instead of closing shouldn't this just be transferred to the CG's repository to let them decide how to pursue?

of course, but only if @P5music agrees.

@P5music
Copy link
Author

P5music commented Dec 14, 2021

@iherman

I do not mean adding new features to (X)HTML. This can be handled at the ePub level, I think.
My proposal has some suggestions in it, like attributes, ids and so on.

Maybe some methods could be devised and discussed informally here before submitting a proposal to another group.

For example, what about:
-ids pointing to DOM hidden elements appended at the end of the document, so XML indexes are not changed
-attribute wih old HTML content inside (old HTML can contain the attribute too for older HTML)
-javascript queryable information about changed elements (like an array of previous versions) present as script in the page
-using some some HTML5 tags
-clever methods that come later in mind
-combination of methods

Thanks
Regards

@mattgarrish
Copy link
Member

it seems to be about annotations at publication level

The open annotation specification isn't restricted to publisher-authored annotations, if that's what you're getting at. It allows anyone to author and distribute annotations and also allows for annotations to be interchanged between reading systems regardless of who they were authored by. It doesn't handle changes to the document content, of course (i.e., as in providing a revision history).

in a way that it does not break the eoubcfi values (or it does) but that allows that still the old element is there, like hidden, still in the DOM but not for display

That doesn't sound possible to me. Even if CFIs worked on the DOM (which they don't), you can't insert elements and then not expect a CFI that is based on element position to still work. Whether you hide the element or use a comment it's in the markup/DOM, so you're shifting the position of every sibling element that follows that new markup.

CFIs already have methods for correction, though. Beyond IDs, you can also add text locators to help reposition the annotation in case of changes. Web Annotations has similar selectors that can adapt to minor changes in the markup.

Flexibility generally has to be found at the text level, as selectors solely based on element structure are always going to be brittle. Packing more elements into a document to retain its history sounds like it will lead to greater and greater brittleness with every change made to a publication.

the RS could present the user a possible comparison between the two versions of the HTML element or structure.

Maybe, but this sounds complicated to implement and is certainly beyond the scope of this group.

As I understand it, you're effectively asking publishers to make their documents a record of every change that has ever occurred to it. There's also a missing component of how all these current and past fragments are linked together so a machine can understand what it's processing.

That's really going beyond EPUB into devising a new model for HTML that allows you to view a document's change history over time. If that's the primary objective of your proposal, the proper route for proposing new HTML features is to go through WICG as it wouldn't be something this group could implement.

@P5music
Copy link
Author

P5music commented Dec 14, 2021

@mattgarrish
@iherman

I am interested in the open annotation document, I will read.

Even if CFIs worked on the DOM (which they don't), you can't insert elements and then not expect a CFI that is based on element position to still work. Whether you hide the element or use a comment it's in the markup/DOM, so you're shifting the position of every sibling element that follows that new markup.

There's also a missing component of how all these current and past fragments are linked together so a machine can understand what it's processing.

The most sensible method among the proposed ones above could be appending further elements beyond the end of the official document. No change required for old-style readers or publishers.
The appended elements are referred to by an ID.
The publisher knows that "beyond the end of the document" means that those elements are hidden and they are not calculated when XML indexes are considered, because they are beyond the last official element (corresponding to the last possible official index).

In that region of the document, there can be also older versions of the same element (that is, older than the older version), indeed every element can be the older version of another element, provided that the newer version has the special attributes that are needed like
oldVersionID='chap01_par7_ver2' versionTimestamp='2014-03-20T09:32:30Z'
All the older elements (and older than the older ones) are at the end of the document. They are in a sequence, it's not particularly error-prone.

I see that annotaions have timestamps, as expected, so RSs have the possibility to manage that.

Packing more elements into a document to retain its history sounds like it will lead to greater and greater brittleness with every change made to a publication.

It is true but in common cases it is very lightweight and it is just matter of keeping elements in some tidy fashion at the end of the documents.
Rare cases are also acceptable because it is a method whose usefulness is that it does not break annotations if willfully implemented: sometimes it could be very important and felt.
The rare cases could include rewriting an entire section. That can be huge but that the annotations positions are not lost is very good, especially when they are important part of the publication but are a separate publication themselves, or they simply are the user's content.
Even if it adds a footprint to the publication, it is up to the publisher not to put many mistakes or huge ones. This should only be a spec for being able to implement it if necessary as an emergency device.

Usually minor changes are involved.
If an entire section is changed, especially its structure, epubcfis are lost in great extent, but as I said, in that case most powerful or institutional RSs can inform the user with a complete comparison, and retrieval.

That's really going beyond EPUB into devising a new model for HTML

I do not think it is about HTML pages at all, but just ePub3, it can be optional, explaining how to comply, that is,

for publishers
-put old elements as hidden elements beyond the end of the "official document"
-put two attributes like oldVersionID='chap01_par7_ver2' versionTimestamp='2014-03-20T09:32:30Z'

for readers or epubcfi processors
-when transversing the DOM (or just as a matter of algorithm) please know that when those attributes are encounterd epubcfis are not valid anymore after that point (= in that DOM tree brach), you have to use special features to read the old structure if your epubcfis or annotating positions are there.

I think that it would not be impossible for a epubcfi library. I just know that from the library of one of the readium siblings. It's good for calculating epubcfis from elements but not vice-versa, however it could be my fault not to understand how it works.
An official JS library for handling epubcfis should be created, maybe encompassing the new improvements.

I know this is huge, it is just a proposal.

Regards

@lordt4ever
Copy link

lordt4ever commented Dec 14, 2021 via email

@iherman
Copy link
Member

iherman commented Dec 14, 2021

I know this is huge, it is just a proposal.

... and, therefore, out of scope for this WG at this moment (I think I have already said that before). Could you, please, move this to the Community Group? That is where incubation ought to happen!

@mattgarrish
Copy link
Member

the Community Group ... is where incubation ought to happen

Right, we may want to note more prominently on the repository main page that this WG is not the correct place for incubating new ideas.

@P5music this Working Group was created to standardize the core EPUB 3 specifications in W3C. We're using the Community Group to develop new ideas and find traction for IDPF specifications without much adoption before bringing them to this group to standardize.

I'm going to transfer this issue across to the CG's repository.

@mattgarrish

This comment was marked as off-topic.

@mattgarrish mattgarrish transferred this issue from w3c/epub-specs Dec 14, 2021
@iherman

This comment was marked as off-topic.

@mattgarrish

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants