Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does modification date mean? #73

Closed
lrosenthol opened this issue Sep 25, 2017 · 14 comments

Comments

@lrosenthol
Copy link

commented Sep 25, 2017

If we are going to have a modification date, especially one that is a "SHOULD", we need to be extremely clear what it means.

  • Does it only apply to the content or also to metadata (such as the accessibility status)?
  • Does it apply to when a template (eg. a CMS) is changed but the content itself isn't changed?
  • Does it only apply to PWP and not to WP (and if so, should we wait to add it till we get there)?
iherman added a commit that referenced this issue Sep 26, 2017
@llemeurfr

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2017

In a previous standard I worked on (NewsML-G2) the IPTC defined two different metadata properties: versionCreated as a timestamp of the current version of the "news item" (including metadata) and contentModified as a timestamp of the last edition of the news content.

Considering WP and PWP as 2 variants of an interchange format, I would recommend that the modification date applies to the "item", i.e. the last time the (content + metadata) were updated on the publishing site.

@BillKasdorf

This comment has been minimized.

Copy link

commented Oct 4, 2017

@lrosenthol

This comment has been minimized.

Copy link
Author

commented Oct 4, 2017

@atyposh

This comment has been minimized.

Copy link
Contributor

commented May 31, 2018

Let's check whether the mapping we (@iherman) recently made to schema.org includes an acceptable value for dcterms:modified.

@GarthConboy

This comment has been minimized.

Copy link
Contributor

commented May 31, 2018

This is generally viewed as untrustworthy by Reading Systems, but there does not seem to be consensus re simply removing it from our basic infoset. Could well be used in various workflow.

@RachelComerford

This comment has been minimized.

Copy link

commented Jun 1, 2018

To add to @GarthConboy 's comment: This (modification date) solves a problem for publishers and vendors to publishers in providing a quick and easy way to check the "version" of the ebook that they are viewing.

Sample use case/personal experience: A student is using the Psychology 12e epub from bookshare. They complain to the publisher that section 3.2 is being read by their AT before section 3.1. The publisher opens the version in their CMS, sees the version is dated 12/1/17, then checks the version in Bookshare and it's dated 10/1/17.

@atyposh

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2018

This metadata (dcterms:modified or equivalent) may be more useful for a packaged WP (i.e., when it was packaged). In the case of an unpackaged WP, it would be quite problematic for the manifest to assert that none of its constituent resources were modified after a particular date specified therein.

Alternative protocols (e.g., expiration headers, ETags, etc) are more appropriate for (unpackaged) WP, methinks.

@baldurbjarnason

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2018

(TL;DR: modification date is editorial metadata, more akin to cover or authorship than to the more functional metadata such as the manifest or HTTP caching headers. Having it makes OPDS catalogues for web publications more reliable.)

Modification dates in the context of publications is a very different beast from the modification data transmitted by HTTP headers.

The headers are for caching and have to take into account a variety of assets, user session, and a bunch of ephemeral contextual situations that can vary a lot—even though the publication stays virtually the same.

Modification date in publication metadata is editorial and can't be derived in the same way that a caching header would.

The best analogy is the updated field in the Atom spec (emphasis mine):

The "atom:updated" element is a Date construct indicating the most
recent instant in time when an entry or feed was modified in a way
the publisher considers significant. Therefore, not all
modifications necessarily result in a changed atom:updated value.

atomUpdated = element atom:updated { atomDateConstruct }

Publishers MAY change the value of this element over time.

This is very different from how you'd set most update-related HTTP headers.

This is a very important field for both packaged and unpackaged publications as without it syndication and distribution of both publications and notifications about said publications becomes harder.

If you don't include this in the infoset, in most cases publishers, authoring systems, distributors, etc. will have to create an updated/modified date out of band. And in those cases you won't be able to match the notification/syndication/federation/whatever back to the original publication.

While this isn't important for viewing or authoring a publication it is very useful for distribution both of the publication and information about the publication. (The atom spec even goes as far as to make atom:updated a required element in feeds).

This doesn't have to be included in the publication infoset. People are likely to use Atom, Activity Streams, JSON Feed, OPDS and the like for distribution. And those all have updated or modified fields.

But not having it in the publication infoset as well means that there's a bit of a disconnect between the distribution protocols and the publication and that can make referring back to the publication a bit less reliable.

Omitting a modified date from the infoset is a bit like not being able to add an ebook's ISBN to its metadata. Sure, the ISBN has no real bearing on reading the ebook but it's a very important piece of metadata from a sales and distribution perspective and can be quite useful for matching ONIX data back to the ebook file (or matching both back to a common database).

If we keep a modification date (which I'd prefer), my suggestion would be to define it in exactly the same way as the atom spec defines atom:updated. That way we have guaranteed compatibility with both feeds and OPDS (which is based on atom).

(Apologies for the long comment. I didn't have time to edit it down to a short one.)

@atyposh

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2018

Thanks @baldurbjarnason!

It should be made clear that this metadata is completely editorial and therefore cannot be depended upon to determine whether content/resources have changed since the given date.

Certainly most content management systems (scholarly, trade, blog, etc) will have such a date readily available. But there are less formal use cases (e.g. simple ad hoc web pages promoted to WPs) for which such a date will be less obvious.

@atyposh

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2018

The text in section 3.3.7 Last Modification Date of the draft seems to make it clear enough.

This date does not necessarily reflect all changes to the Web Publication (e.g., third-party content could change without the author being aware). User agents SHOULD check the last modification date of individual resources to determine if they have changed and need updating.

I'll reiterate the initial questions from @lrosenthol with answers derived from guidance above and await any last-chance feedback before closing.

Does it only apply to the content or also to metadata (such as the accessibility status)?

It applies to both content and metadata.

Does it apply to when a template (eg. a CMS) is changed but the content itself isn't changed?

This one seems tricky to me. When ScienceDirect (assuming it implements WP site-wide) relaunches with a fresh design (every few years) should all 12 million WPs get a new lastModified date? I just don't know.

Does it only apply to PWP and not to WP (and if so, should we wait to add it till we get there)?

It applies to WP and any implementation of PWP.

@baldurbjarnason

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2018

Does it apply to when a template (eg. a CMS) is changed but the content itself isn't changed?

This one seems tricky to me. When ScienceDirect (assuming it implements WP site-wide) relaunches with a fresh design (every few years) should all 12 million WPs get a new lastModified date? I just don't know.

Well, if we're following the precedent set by atom and similar formats then this would be an authorial decision—a judgement call on the part of the author. I.e. to reuse the phrase from the atom spec: was it "modified in a way the publisher considers significant"?

My suggestion would be to—like the atom spec does—make it clear in the spec that this hinges on author/publisher judgment on what a meaningful change is in the context of their publication.

@wareid

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2019

@mattgarrish Could you add clarification to https://w3c.github.io/wpub/#last-modification-date regarding this issue, then I can close.

@iherman

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

Done in the reference above

@iherman iherman closed this Mar 18, 2019

@iherman

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

This issue was discussed in a meeting.

  • No actions or resolutions
View the transcript What does modification date mean?
Wendy Reid: #73
Wendy Reid: What does modification date mean?
… Matt’s change has been made; go ahead and close?
… yes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.