-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
machine readable metainformation for manifest items? #1675
Comments
I ping @llemeurfr who is leading Text and Data Mining Reservation Protocol Community Group |
Isn't this just a case of resource-specific linked records?
|
Currenrtly, rdf+xml is not listed as a core media type Surprising as well example 20: However, without mentioning such a use case with an extended example, I think, the chance is low, that authors will use it and programmers of upload-filters will implement this. Open question: How to refine a fragment of a resource? Another surprise, that a llinked resource must not be listed in a manifest. Wouldn't it be better to say, that if linked files are no core media types, they must not be listed in the mainfest without fallback and are not expected to be interesting for the human audience? |
Doesn't matter. Link elements are not publication resources so are also not subject to fallback rules. You can link to whatever works if this becomes a reality and the information isn't better stored in each resource.
Do it inline if the format allows. Otherwise, use a relative path with fragment in the |
All RDF serializations (RDF/XML, Turtle, or JSON-LD) define fragment identifiers, so that should not be a problem. |
The discussion moved very quickly from a use-case (the rise of upload filters, via the EU DSM Directive Article 17) to tentative techniques (e.g. links to RDF metadata or embedding RDF in HTML). But link between both ends is not clear to me. The primary use-case seems to be: a person or organization wants to upload an EPUB 3 publication to some platform, which has an EPUB 3 upload filter. The rights holder of the publication has notified the platform that this publication, or fragments of this publication, should not be allowed on the platform. Therefore the filter blocks it. It means that the publication must be unambiguously identified; that the notion of "fragment" must be clearly defined and each fragment properly identified; that the way rightsholders notify platforms must be clear and easy; that an efficient mechanism must be put in place to block bad guys to grab false rights on content (which is exactly what happens in the music industry today). This is a large project, and metadata in EPUB files is only a small part of it. Robust content identification (publications and fragments) can be achieved via ISCC and the most interesting use I've seen of this technology is the Content Blockchain Project, presented at a previous Digital Publishing Summit.. What is interesting in ISCC is that there is not requirement to embed the ID into the publication: it can be computed easily.
The TDM Reservation Protocol will be used in a totally different use-case: rights holders who freely give access to content on the Web will notify TDM actors if they accept or not to get "mined", using simple means like HTTP headers, HTML metadata or a JSON-LD file on the origin server. |
Yes this ISCC https://content-blockchain.org/ sounds interesting. If a platform to provide media for other usage provides a list of identifiers of given licences, EPUB creators can correlate/link from their use case with the book or their creator/contributor identifier to such al list of accepted usage licenses. To get this work done automatically after publication, EPUB still needs some normative structure as a reliable interface to provide such information at creation time to enable creators to indicate, that such a licences exists for a specific book fragment in an automatically checkable way. A first step could be already a normative method to provide al least such an indication for content with a CC licence or a similar licence. |
The issue was discussed in a meeting on 2021-05-21 List of resolutions:
View the transcript4. Machine readable meta information for manifest items?See github issue #1675. Dave Cramer: Proposal to put metadata in the package file, particularly about copyright on specific manifest items Brady Duga: I don't understand the use case Matt Garrish: This seems half baked Tzviya Siegman: Seems like this might be about the EU copyright directive
Tzviya Siegman: Might be about the scholarly world where you can detect if something is shareable Hadrien Gardeur: I think in general when we get these requests, we should point out the current extensibility
Dave Cramer: We should close the issue, nothing to see here
Dave Cramer: That was all our issues |
Due to some new efforts for example in the EU, platforms for digital books may come up with upload-filters, some AI-programs to detect possible copyright issues in EPUBs.
Therefore it could be pretty helpful to relate corresponding metainformation to each manifest item directly inside the OPF-document to reduce the probability of false positives, resulting in a lot of annoyance for authors.
This can be relevant for fonts, images, graphics (SVG documents and fragments inside XHTML documents) articles, quotes etc.
Is there currently an option to do this right now in a normative, unique way?
If not possible already now to provide such information within the OPF-file, what needs to be added into EPUB 3.3 to allow authors to provide such additional information about manifest items and even only fragments of manifest items?
An alternative could be to use meta elements with a refines attribute within the metadata element to associate metadata with documents (and document fragments) referenced in the manifest.
However, currently this metadata element has poor structure, would be much better to use RDF within it to provide such metadata.
In this case it would be helpful both for authors and programmers of such upload-filter-programs to have some (normative?) advice, how to correlate metainformation about licences, authors, external sources etc of items with such books parts.
Maybe similar to
https://www.w3.org/TR/epub-33/#example-4
But here it might be still a problem, if only a fragment of a manifest item is the target of the information (quote, SVG fragment, article).
An explicit example how to correlate the metadata could be helpful within the 3.3 draft.
Currently, within SVG:metadata one can already use RDF for such information.
HTML5 suggests as well RDF as an option for such information within the XHTML:head element:
https://www.w3.org/TR/html52/dom.html#metadata-content-2
(I think, currently still epubcheck 4.2 does not like this, but due to HTML5 it is valid for the XML-serialisation, but within SVG:metadata it is ok even for epubcheck 4.2).
In relation to this possible upload-filter-problem it might increase the probability, that such information is recognised by AI-programs, if one can provide information about such relevant metadata within the opf-file.
The text was updated successfully, but these errors were encountered: