-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuity of E-Library #55
Comments
Hello Many years ago I created an ePub reader that does what you seem to be wanting. It can annotate the ePub with text decorations (as underlining and others) or short text notes or even documents of any format. If you are curious the app is at: It also has other advances features like diacritic text search and an advanced navigation system. But where a particular problem is addressed. The annotation positioning is sort of "proprietary" in my app so it is not standard. What do you think? |
Ah Ah |
Thanks for your interest in the subject. Yes, I hope that W3C will be able to develop a solution like Open Annotation. Otherwise, it might encourage readers to convert their books to PDF. |
@Okan-Ozcelik |
@P5music |
This issue is puzzling to me. @P5music Who did you write to? There are hundreds of W3C mailing lists. What do annotations have to do with e-libraries? What does chatGPT have to do with any of this? Please do not confuse writing a specification with adoption of standards. Is this issue about the ability to take notes in ebooks? share notes across platforms? |
@TzviyaSiegman |
@TzviyaSiegman |
Hi @Okan-Ozcelik I think I understand better now. But let me try to summarize and you can confirm. Annotations are just as important to the text for a user as the text itself. You are looking for a way we can clearly attach annotations to an ebook that persists. I assume it would also be interoperable across platforms (like if I annotate my book on Apple Books, and then open the file in Thorium, my highlights and notes carry across). Am I understanding this correctly? |
@wareid this is how I have understood the issue and I will add my vote for the importance of preserving notes. Currently I go to a lot of trouble to save my notes by exporting them to Calibre and making backups of the database from my reader, but they are dissociated from the ebook file and therefore of limited usefulness, and this method is dependent on a plugin developed specifically for one brand of reader so not universal. I recently had to repair the corrupted database of my Kobo, by replacing it with a previous backup. I made sure to restore my annotations as well, however some of them were lost (even though only the database was corrupted, none of my ebooks were damaged). This process was fastidious and probably not accessible to many people as it involved using SQL database editors to manipulate specific tables of the database before copying it back to the reader, which I had to learn to do specifically for this issue. Another problem when attempting to save notes manually / independently of the book file is that text encoding can be improperly preserved, so that special characters are not correctly displayed. Example
A standardised annotation system, integrated into the epub file, would have prevented all of these problems and would allow me to keep those notes for use on future ebook readers even if the Calibre plugin is no longer developed, or the database of my reader becomes corrupted again or my reader is otherwise damaged or lost, or Kobo modifies their software or the proprietary method of annotations currently in use, or I switch to a different kind of reader. |
Agreed. If you can add meta information to a JPG file, it should be easy to include an XML file in the manifest for notes.
Best Regards,
Dale
Dale R Rogers, M.Ed, CIW
Digital Creative Entrepreneur
Instructional Designer, eLearning Developer
Personal: ***@***.******@***.***>
Web: dalerogers.me<https://dalerogers.me/>
From: AudreyLBE ***@***.***>
Date: Friday, February 17, 2023 at 9:34 AM
To: w3c/publishingcg ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [w3c/publishingcg] Continuity of E-Library (Issue #55)
@wareid<https://github.com/wareid> this is how I have understood the issue and I will add my vote for the importance of preserving notes. Currently I go to a lot of trouble to save my notes by exporting them to Calibre and making backups of the database from my reader, but they are dissociated from the ebook file and therefore of limited usefulness, and this method is dependent on a plugin developed specifically for one brand of reader so not universal.
I recently had to repair the corrupted database of my Kobo, by replacing it with a previous backup. I made sure to restore my annotations as well, however some of them were lost (even though only the database was corrupted, none of my ebooks were damaged). This process was fastidious and probably not accessible to many people as it involved using SQL database editors to manipulate specific tables of the database before copying it back to the reader, which I had to learn to do specifically for this issue.
Another problem when attempting to save notes manually / independently of the book file is that text encoding can be improperly preserved, so that special characters are not correctly displayed.
Example
Saved note (highlighted text):
Une seule chose est sûre, toutes ces maisons nâ€Ǧexisteront plus, aussi mes efforts sont-ils infimes, ils peuvent tenir sur une tÃȞte dâ€Ǧépingle, tout comme ma vie. Et ça, il ne faut jamais lâ€Ǧoublier.
Original text of this highlight:
Une seule chose est sûre, toutes ces maisons n’existeront plus, aussi mes efforts sont-ils infimes, ils peuvent tenir sur une tête d’épingle, tout comme ma vie. Et ça, il ne faut jamais l’oublier.
A standardised annotation system, integrated into the epub file, would have prevented all of these problems and would allow me to keep those notes for use on future ebook readers even if the Calibre plugin is no longer developed, or the database of my reader becomes corrupted again or my reader is otherwise damaged or lost, or I switch to a different kind of reader.
—
Reply to this email directly, view it on GitHub<#55 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAOCKEWQJSRY4XX7SR4CE7LWX6LA3ANCNFSM6AAAAAAT2SKBME>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
@Okan-Ozcelik I couldn't agree more: such annotation/note system would be great to have. Let me add one more thing: you describe a set of use cases around e-books but, in fact, those use cases may also be valid for the Web in general (I know that many pages on the Web are ephemeral, unlike books, but a large percentage of content are just as stable as books). The relationship between e-books and the Web is all the more strong that the underlying technology is identical; an e-book in EPUB format is, with a rough approximation, a Website in a package. However. Your description refers to W3C as being the organization that can achieve that. The good/bad news is that W3C has already done what it could do in this respect; indeed, consider these three specifications:
These specifications exist. They may not be perfect and may require improvements, no doubt about that, but they have the merit to exist. However, they have not been adopted, neither by browser vendors nor EPUB readers. The closest adoption is the system offered for Hypothes.is; it is an extension that works with the major browsers on Web sites, and you can share Web page annotations (although, last I checked, you cannot set up your own server to store those annotations), but the major EPUB readers have not adopted it. There may be some smaller readers that may have incorporated Hypothes.is (with the aforementioned restriction), but what would be a game changer is for Apple, Google, Amazon, and the other big ones, as well as browser vendors (Chrome, Firefox, etc.) to pick the specification up and implement them natively. That is where the influence of W3C ends, though: it can shepherd groups to develop specifications, but it cannot force any company to implement those. |
Hi @wareid, |
Saving the annotations in the ePub itself is not good because it would work just for minor annotations, while sometimes, for example, the work of a scholar could be to comment a huge book of thousands of pages (translate it in ePub length), producing lot of text or even attaching lot of documents. |
You are partly right. Still, the issue I'm talking about is notes that are already as advanced as notes that can be taken on PDF. |
Ok, Now I understand, but please consider that PDF is sort of the digital version of a printed book, so you are limited to the kind of annotations you can put on pages of a book. More refined annotation systems include attachments too but all goes inside a new PDF (copy) that is bloated or inside the same original document that is saved again on disk. Then a major issue comes to my mind: how would you undo the modifications to the ePub XHTML? Is the reader keeping track of the annotations to remove them, or are they become part of the book? These are the concerns I had in mind in creating my app. |
I think the more sustainable and likely option would be generating an accompanying annotations file that can live alongside the EPUB file and that reading systems can ingest to display annotations. The web annotations framework mentioned by Ivan is all we really need, but the challenge comes in implementation. Reading systems would need to both generate and ingest a web annotations file, which is not something we've seen in practice yet. We'd also need an agreed-upon and universally supported location method like CFI. We've long taken the approach in this industry of never modifying the source file, especially not a version that could be pulled out of the system it's living in (obviously we all know individual platforms do modify files to ensure optimal rendering on their platform, but those modifications never travel outside of the platform). An accompanying file is the best solution. |
@wareid As to ePub CFI, it is interesting because it allows to have ranges of text to be annotated, or underlined and so on. And also images can be annotated with x-y coordinates, if I am not wrong. Moreover, if my memory does not fail me, the ePub CFI has an useful path prefix rule so that also other ePub publications can be cited or linked (although not necessarily in an automatic way). The ePub values would be useful to have a reference to the exact point of an ePub, but still the issue remains of what happens when the HTML changes, even for minor modifications that break the DOM tree (that is: the tree is now different, maybe from the beginning). This is important not only for minor changes that publishers can do to the ebook for fixing bugs or errata-corriges, but also for subsequent editions. |
Though technically feasible, I do not see the option of extending the EPUB file coming to the fore in implementations, at least in the way the current EPUB implementation strategies work. I think the main obstacle is what @wareid noted in #55 (comment): implementations frown upon modifying the EPUB file itself to add content. There are technical issues (e.g., today, each user has his/her own "copy" of the EPUB file on a local storage, how would the update of these files work to get everyone's annotation added to the local copies?) and non-technical (e.g., who would have the copyright on the annotations, how would that be handled alongside the rights of a specific publication?). The beauty of the aforementioned W3C Web Annotation Framework is that these issues do not arise. There is no modification of the original text (Website or EPUB instance) whatsoever. Everything is stored in an annotation server; each annotation stored by the server has a reference (via a URL and a description of where the annotation resides within the referenced content) and the annotation itself. There is a standard protocol to communicate with the server to get/modify annotations. An annotation server can be private, can be managed by a university or indeed a publisher for its own publications, and the "only" thing a browser or an EPUB reading system ought to do is to communicate with a server (possibly of the reader's choice) to get, display, and create annotations. As I mentioned before, the W3C did "its job", the problem is that the non-trivial implementation work has not materialized. The reason, in my view, is mostly non-technical: what is the business incentive to get EPUB Reading Systems (and browsers) to implement such an annotation system? I am not an expert of business, I do not know which communities could create a strong enough pressure to provide such an incentive. But I believe that is really the core of the issue. The rest is doable, the possible deficiencies in the standards are solvable (and I believe W3C would be more than happy to pick up the standardization work for that), but the pressure should come from the implementers. There is a certain analogy with the recent story around activity streams. There are similarities between the respective standard families around activity streams and annotations (actually, there were contacts between the groups, so this is not entirely by coincidence) and activity streams had a very low implementation and usage level for many years. It needed the recent turmoil around Twitter to suddenly bring Mastodon to the fore, and now everybody talks about activity streams, web mentions, etc. We may need some similar storm around annotations... |
@iherman Being that many considerations have been done along this thread, pointing out different aspects that a single person maybe is not aware of, or maybe is not recalling at the time ideas are freely expressed, it seems that leaving the different parties to deal with how to manage annotations seems to be the best option. This is because many technologies have been introduced and the users showed to like them, as for example collaborative annotating process and other workflows. New ways of working and annotating could be also in use very soon. But This technology is a huge advantage until you realize that PDFs have pages, that can bear many modifications without breaking that structure, while the HTML structure is easily broken even with a single tiny element added or removed. We also know that many documents can stay long in the draft form, still someone could need or decide to annotate them. So I have a proposal. Of course before processing the annotations reference a backup would be adviced, and only the user or admin could decide to process the annotations in a certain community, based on the fact that a new ePub version has been "released". So when an ePub is purchased or downloaded it could already contain a "diff" information file, that provides the change history, a sort of standard and structured change log. Then if the user has not annotations yet the references will be based on the current version because there is no need to process or update anything, while if the RS compares the version of the ePub the present annotations refer to, with the current available or used one, it can inform the user of the need of processing the references, provided that the user agrees with using a modified version. Indeed the user could also refuse because it realizes that a new version is not wanted. I do not know the W3C Web Annotations standard (not studied yet) and I do not know if it has anything to do with ePub CFI, but when a standard would be chosen and any point of text or image (or video timestamp, and so on) could be annotated precisely then also the diff file should be introduced, at least for ePubs, that is what we are dealing with here. What do you all think? |
Let me repeat one thing: there is no clear business interest among publishers and/or reading systems to offer an open annotation system that we are discussing here. Until such an interest is clear, all discussions remain purely academic. And W3C is not an academic institution; it standardizes industry practice, or close to industry practice, rather than inventing its own solutions. It is of course perfectly fine to discuss these things in a W3C Community Group like this one; this is where some level of incubation could and should happen. That is exactly what a CG is for. But in case such a discussion aims at an ulterior standardization, the constraints must also be clear. (And, to be clear, I am first to deplore the situation!) A few comments on your comments:
I am not sure what you mean by original flaw, but I do not see it happening here. The force of EPUB is that it relies on HTML and friends, meaning that we do not have to reinvent any wheel (neither on paper nor in practice; an EPUB reading system these days relies on browser engines for rendering). HTML is used on billions of pages and used by most of the people around the world (through browsers) and I do not see any move to change or drop the DOM. This is one of those constraints. The issue of references within an HTML text that changes if the structure changes is of course known. Please, look at the way the Annotation data model handles this issue: one stores a reference that is based on various mechanisms, and it is possible to describe references like "this annotation refers to the first paragraph of section 12" without modifying the DOM. And yes, EPUB CFI is, sort of, part of the system insofar as this may be one of the ways to express that statement. If new referencing systems (e.g., to point to a specific pixel in an image) comes forward, it is possible to add it to the generic framework in the model (see the Selector concept).
I presume you propose this to keep the references, or at least to be able to recalculate them. This is possible, but would become extremely complex very quickly. The aforementioned model of Selectors makes this unnecessary and, I believe, it is much simpler.
I think you should look at the W3C Web Annotation standard. With all the caveats that I already expressed: the technology has been defined, it is the large scale implementation that is lacking. The very existence and existing user base of Hypothes.is, based on the standard, proves its feasibility in practice, but we get back to the business interest issue (i.e., the lack thereof). |
@iherman I will have a look to the W3C Web Annotations standard, because I do not know how the ePub books can come into the picture, I mean how a single copy of an ePub is referenced as a web resource to annotate some of its parts. I hope I will find information to confirm that possibility on the official documentation. I know indeed that the ePub, although it resembles a website, is not, because it is not on the internet. I have noticed that this metaphor often trick people. In regard to the original flaw, I should have written it "original flaw" because I mean it only in this thread just to refer the issue I pointed out. I am a strong supporter of the ePub format and made some development with it. So, you say that the "change log" file method would be too complex. I do not agree because when you know that your annotations are based on a certain version of the ePub book, because you have the changes.xml file, let's say, you are able to check what happened. Such a change.xml file or change.graph (and so on, we can discuss how it should be inside) would be an useful improvement of the ePub standard. Regards |
I feel like a broken record when these annotation discussions come up, but we worked with Rob and Paolo to develop the Open Annotation in EPUB standard way back when. It was part of the edupub work and kind of died off with it not because of technical challenges getting it over the finish line but because of lack of interest in implementing it by reading systems. They do their own thing internally to store and maintain annotations, and having an interchange format did not garner any interest. |
@iherman @mattgarrish
and a
We see our mission as
However, A few reading systems monopolize the eBook industry. Increasingly fewer large publishers monopolize the publishing industry. It makes me wonder— are we doing the best service to web users and readers if we let the reading system monopolies determine which features and tech are possible? Are we, in some small way, supporting monopolies and stifling competition and innovation? |
I share your frustration (I was fairly active in the development of the Web Annotation standards...). This is a situation where W3C found itself before (and I am sure it will happen again); it does not, and cannot, have a direct influence on how the market evolves. The pressure should come from other organizations, user communities, etc. But there may be sudden changes that may trigger a change. As I wrote in #55 (comment):
I am not sure what that "storm" could be, but any good ideas are welcome! |
Classic books have been reprinted and reprinted for centuries. The paper books on the shelves wear out and disappear. But the content of that book remains permanent because it is printed over and over again. What matters is the content. Users change their computers every 3-5 years. But they keep the personal content they have created. They carry their documents and music to their new computers. They are permanent. It is continuous. E-Book Readers get old like paper books on the shelf. But their content needs to be preserved. Readers may want or even have to change their devices every now and then. And of course, they will want to move the content their e-books to the new device. They use Adobe Digital Editions to send the books to the new device.
W3C is developing EPUB, the universal book format. EPUB becomes the e-book standard. So it can be read on any e-book reader that supports that standard. Now that it has become a standard, it brings to mind MP3s, which can be listened to on almost every computer, music player. But perhaps something is missing.
What makes a book valuable are the parts that the reader likes. They highlight their favorite texts. Maybe they take notes on the page. With effort, they create their personal content. Of course, they will hope for the permanence of this content. New books are archived in the library after they have been read. But most of the books in the library will never be reread from beginning to end. Instead, only notes taken and texts highlighted are reviewed. Personal libraries last for decades. But devices, unfortunately, only last a few years. EBooks are superior because they don't wear out, because their data never disappears. But does the e-library really never disappear when it needs to be moved to a new e-reader!
When a note is made on a book, the note is not actually saved in the book. It is saved in a note file linked to the book. Later, when the book is opened, the relevant parts of the note file are parsed and displayed on the book again. But different devices have different ways of saving notes. There is no harmony. E-books can be moved to the new device, but notes cannot be transferred to the new device. On the new device, the book is as if the cover has never been opened. All the notes taken are gone!
For text highlighting to be more than a momentary technological entertainment, it must be permanent. It would be a waste of time to underline an important text while reading it if it cannot be permanent.
This is why notes can also be saved in a universal format. It can be a note file with the same name in the same folder as the book. W3C could set standards for saving notes. Perhaps this could be the EPUBNotes file type.
The device settings can now offer the following options: Notes can be saved according to the device's own note standard, or they can be saved according to the EPUB note standard. If the reader chooses to save according to the EPUB standard, the notes will be saved according to the rules set by W3C.
Imagine a future reader replacing his e-book reader: He sends the old EPUB books he bought to the device. But the device recognizes these books as new. There are no more notes. But when the reader copies the note files of the books from the old device to the new device will he get his notes back. Any device that supports the universal EPUB format will also support these note files. EBook reader devices are ephemeral. The e-library and the personal content that the reader creates in the book must be sustained.
In fact, a perfect solution would be for W3C to develop a standard for the e-reader to save book notes on top of the Epub file. Notes taken on PDF files are saved in the PDF file. This ensures the permanence of the notes. This is how the notes for Epub books should be! Perhaps WC3 could collaborate with Adobe on this.
Publishers will be pleased to see the design of the e-book improved. But why should they care about the quality of the reader generated content? Why should they ask W3C to improve that too? The reader gets more out of the book with the content they create. It is the reader's favorite parts that make the book valuable. They underline their favorite texts. They remember more parts of the book. So he can talk about the book at length to his friends. He can often bring up different parts of it. The reader will, of course, be advertising the book! The more the reader remembers about the book, the longer he/she will keep it on the agenda. Some of their friends will want to own the book. Then they will advertise it to their friends. The process starts to work. Now we can hope that the sales of that book will increase. It will make the publisher of the book happy.
In fact, this also makes it easier for e-reader manufacturers. They don't have to design new software algorithms for taking notes on the book. The standard rules are already in place. Manufacturers just need to write the appropriate program.
The text was updated successfully, but these errors were encountered: