Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Must the primary entry page be part of the web publication? #386

Closed
dauwhe opened this issue Dec 13, 2018 · 27 comments
Closed

Must the primary entry page be part of the web publication? #386

dauwhe opened this issue Dec 13, 2018 · 27 comments

Comments

@dauwhe
Copy link
Contributor

dauwhe commented Dec 13, 2018

So... the primary entry page doesn't need to be part of readingOrder:

It is not required that the primary entry page be included in the default reading order, nor that it be the first document listed when it is included. This specification leaves the exact nature of the document intentionally underspecified to provide flexibility for different approaches (e.g., the primary entry page could be a marketing document for the Web Publication instead of a specific page of content). If a default reading order is not provided, however, the primary entry page will be used as the default entry.

Does the primary entry page need to be part of the web publication? What if the manifest is embedded in the primary entry page, and not linked from any resource of the publication? Are we allowing all the information that defines a web publication to be entirely outside the bounds of the publication? If such a web publication were to be packaged, how would the content of the manifest be included?

How does this affect the initialization of a web publication? A user navigates to the primary entry page. A user agent must load at least enough of the page to find the manifest, and then determine if the entry page is part of the publication. Should the user agent initiate publication mode for the entry page, even though it's not part of the publication? Should it automatically navigate to the first item in readingOrder? Or does the user have to initiate that navigation, as well as opting in to publication mode?

I don't suppose it matters, but it seems odd that the the resource returned by the publication's URL might not be part of the publication.

@iherman
Copy link
Member

iherman commented Dec 13, 2018

I do not remember the history of this, to be honest. I guess the idea was that the PEP may only contain a TOC, for example, that is taken care of by the User Agent, hence it is not necessarily in the reading order.

But I agree that it should be part either of the reading order or the resources. We have defined the "borders" of the publication to be derived from these two, and I think it would be a reasonable requirement to say that the PEP is in one of these two.

(I suspect that this fuzziness around the PEP came from the time when there was also a fuzziness about the bounds of a publication, actually.)

@mattgarrish
Copy link
Member

I guess the idea was that the PEP may only contain a TOC, for example, that is taken care of by the User Agent, hence it is not necessarily in the reading order.

As I recall, there was a question of whether the PEP would be a marketing page or page from which you could buy the publication, for example, but not actually be a resource that you would read as part of the publication.

I agree it's confusing that it doesn't have to be a resource, but should we be requiring it as a resource if there's no intention of rendering it?

@iherman
Copy link
Member

iherman commented Dec 13, 2018

I agree it's confusing that it doesn't have to be a resource, but should we be requiring it as a resource if there's no intention of rendering it?

Resources in resources (sic!) are not necessarily there to be rendered. It can be, for example, research data carried with the publication and processed on-the-fly by some javascript (also part of resources)

@mattgarrish
Copy link
Member

Resources in resources (sic!) are not necessarily there to be rendered. It can be, for example, research data carried with the publication and processed on-the-fly by some javascript (also part of resources)

Fair enough, but those still have some purpose to the publication itself. Rendering probably wasn't the best angle to address this, so to take a different tack, what if the publication gets pulled offline or packaged, do you want marketing fluff brought along, too? It's effectively bloat in any of those scenarios.

I'm honestly not all that heavily invested in this particular issue. If it makes for less confusion I'm fine with requiring it as a resource and living with the bloat, but we should consider why we didn't require it in the first place, too. Otherwise, we'll just come back to this again and again.

@llemeurfr
Copy link
Contributor

Isn't the PEP intrinsically part of the Web Publication? after all this is "where the game starts".

We can then make clear that it is an "implicit" resource and required to be packaged along with all other "explicit" resources of the publication.

@dauwhe
Copy link
Contributor Author

dauwhe commented Dec 13, 2018

Is there a use case for the marketing page also serving as the primary entry page? Wouldn't it be simpler and more functional for a marketing page to link to the primary entry page (possibly after some purchase or borrow action)?

As currently defined, we can have the complete contents of a web publication, but not know if it is a web publication.

@mattgarrish
Copy link
Member

Although, going back and reading the actual section, haven't we already solved this (i.e., it is a resource of the publication but not necessarily in the reading order):

The paragraph above the one that @dauwhe cites says:

The primary entry page is a key [HTML] document required of every Web Publication. It represents the preferred starting resource for discovery of the Web Publication and enables discovery of the manifest.

It can't be the starting resource and not be a resource.

It's also automatically injected into the reading order if the reading order is not specified:

The default reading order is specified directly in the manifest, but MAY be omitted when it only consists of the primary entry page.

So is perhaps all we need a little further clarification that although it doesn't have to be in the default reading order, it definitely has to be in the resources at a minimum?

@mattgarrish
Copy link
Member

For the record, here are the lengthy discussions we had on this previously:

@BigBlueHat
Copy link
Member

As I recall, there was a question of whether the PEP would be a marketing page or page from which you could buy the publication, for example, but not actually be a resource that you would read as part of the publication.

FWIW, we renamed the "PEP" to Primary Entry Page from "landing page" to avoid the confusion that it might be a marketing page. The publication address should return the publication--not simply a way of discovering it elsewhere.

@mattgarrish
Copy link
Member

FWIW, we renamed the "PEP" to Primary Entry Page from "landing page" to avoid the confusion that it might be a marketing page.

How would we enforce content requirements on the page, beyond forcing the document into the reading order which might make people think twice about its content? Nothing stops anyone from making the page whatever they want it to be.

(I don't think we'll ever enforce it in the reading order given that there would like be opposition for audiobooks, image-based books, etc., but it has never struck me as that bad an idea.)

The publication address should return the publication--not simply a way of discovering it elsewhere.

How doesn't it? The primary entry page is just the first page you encounter, and is a resource of the publication. I don't see in the prose we're saying it's a paywall page that you have to enter through to get to the publication -- that was just part of the original discussion. We're only saying we don't specify what material has to be on the page.

But we don't really need the parenthetical that mentions what it might be, if that makes everyone happier. We can just say that we intentionally under-specify and leave it at that. If someone wants it to be their cover, or their toc or the first page of content, or a piece of marketing fluff, that's not our role to police.

@mattgarrish
Copy link
Member

Interestingly, this is what google recommends for indicating where to purchase:

    "potentialAction": {
        "@type": "ReadAction",
        "target": {
            "@type": "EntryPoint",
            "urlTemplate": "http://www.barnesandnoble.com/store/info/offer/031676947?purchase=true",
            "actionPlatform": [
                "http://schema.org/DesktopWebPlatform",
                "http://schema.org/IOSPlatform",
                "http://schema.org/AndroidPlatform"
            ]
        },
        "expectsAcceptanceOf": {
            "@type": "Offer",
            "Price": 1.99,
            "priceCurrency": "USD",
            "eligibleRegion": {
                "@type": "Country",
                "name": "UK"
            },
            "availability": "http://schema.org/InStock"
        }
    }

https://developers.google.com/search/docs/data-types/book

@BigBlueHat
Copy link
Member

@mattgarrish sorry, my intention wasn't to dictate the page's content, just to clarify the purpose of what one gets back from requesting a publication address (i.e. the publication).

There could certainly be "intermediary" pages along the way to accomplishing a "ReadAction" which might require purchasing a license, unlocking a paywall, etc, but once one retrieved the results of a "publication address," I'd expect they had "the publication" (or conceptually it's "binding").

I'd be happy to write a revision of https://w3c.github.io/wpub/#wp-primary-entry-page if you think that might help make things clearer.

@mattgarrish
Copy link
Member

but once one retrieved the results of a "publication address," I'd expect they had "the publication"

Right, I think that's generally where we ended up earlier. At least two issues so far are:

  1. it's maybe not clear enough that the primary entry page is a resource - the lack of caps on "required" is certainly a culprit and the first sentence could be rewritten to say so more directly
  2. we don't need to get into what the content might be - the reason for saying that much was to be clear that it's not under-defined by omission

Feel free to create a PR if you want to jump in on this, though. I always welcome other input.

What I found interesting/awkward about the readaction is that it identifies the paywall as the "entry point". I'm not sure we've really gotten much further afield from a "landing page" if the goal was not to cause confusion with where you buy the publication.

@BigBlueHat
Copy link
Member

I'll try and work up a revision later today. Thanks, @mattgarrish.

When you say...

What I found interesting/awkward about the readaction is that it identifies the paywall as the "entry point"

Are you referring to a specific chunk of text? I've been eager to dispel that conflation and continue to be baffled as to why it keeps resurfacing...so happy to help eradicate it. 😄

@mattgarrish
Copy link
Member

Are you referring to a specific chunk of text?

Not in our specification, no. But the naming has been a general source of confusion from the start. If the entry point is defined by schema.org as where to go to purchase, as appears to be the case from this snippet:

   "target": {
       "@type": "EntryPoint",
       "urlTemplate": "http://www.barnesandnoble.com/store/info/offer/031676947?purchase=true",

Are we helping or hindering our cause to have an entry "page" as something that isn't the entry point as google uses for search discovery?

It's a descent in bikeshedding madness, I know, but maybe a name like primary "initialization" point/page might strike a better balance of what we expect to happen when the resource is loaded??

@BigBlueHat
Copy link
Member

Not in our specification, no. But the naming has been a general source of confusion from the start. If the entry point is defined by schema.org as where to go to purchase, as appears to be the case from this snippet:

EntryPoint is defined far more broadly than that particular example shows. The textual definition from Schema.org is: "An entry point, within some Web-based protocol." Additionally a ReadAction (the context of the example above) is: "The act of consuming written content."

In the example above, there's an "expectsAcceptanceOf" connected to an "Offer"--which is a required first step for the "ReadAction" to be fulfilled. The "target" pointed to is then not the "publication address" (as we call it), but simply the next step in taking the "ReadAction" (i.e. you might fail to meet the offer acceptance and not ever get to the "act of reading," etc).

Are we helping or hindering our cause to have an entry "page" as something that isn't the entry point as google uses for search discovery?

"EntryPoint" in the defined generic since is certainly still an accurate term. That "ReadAction" could just as easily have pointed to the actual publication's address as it's "target." It didn't, because that particular resource has to be purchased first.

The term "entry point" (or "entry page") also has precedence in the world of Progressive Web Apps and Single Page Web Apps (SPAs) where it serves as the foundation of the Web App.

All of these are informed by the general "entry point" concept from computer science writ large (quoting from that link):

In computer programming, an entry point is where the first instructions in a computer program are executed, and where the program then processes command line arguments

In the PWA, SPA, and WPUB since, the entry page/point is "where the game starts" as @llemeurfr put it.

It's a descent in bikeshedding madness, I know, but maybe a name like primary "initialization" point/page might strike a better balance of what we expect to happen when the resource is loaded??

Much of the "bikeshedding madness" seems to stem from conflating the concept of an "entry point" with the marketing term "landing page."

The response to a publication address (whatever it's name) is what I'm concerned about properly defining. Conceptually, the response to a publication's address must return "the publication" (or conceptually it's "binding")--otherwise, it's not the publication's address but the address of something else entirely.

Does that make it any clearer? 😕

@mattgarrish
Copy link
Member

conflating the concept of an "entry point" with the marketing term "landing page."

But a landing page is an entry point/page into a web site. It's one of many. Entry page still lacks any specific meaning beyond being one doorway.

Conceptually, the response to a publication's address must return "the publication"

I'm not sure I buy into this. The primary entry page is one page that you as the author thinks represents a good point to enter the publication. The page itself is largely irrelevant except that it's ensured to get you to the manifest which does define the publication.

@mattgarrish
Copy link
Member

otherwise, it's not the publication's address but the address of something else entirely.

I do still find this part ambiguous. What does it mean that the address of the publication is the address of the PEP? When a publication is one html page, this isn't a problematic statement, otherwise it is.

I appreciate the consistency you're trying to find in making the page the publication, but since nothing makes any one page more publication-y than any other, it's too random. I also understand where you're going with requiring embedding to this end, but it still feels a bit arbitrary to assign this meaning to one page.

@mattgarrish
Copy link
Member

And just to finish my rambling, I still think of the address as setting more of a scope for the publication, from which a resource has to be returned that can initiate the publication. The resource itself that does this isn't particularly important and isn't the publication.

@BigBlueHat
Copy link
Member

@mattgarrish think we're getting to the heard of our differing perspectives, which hopefully means we're getting close to finding the bridge across the chasm. 😉

We already define the PEP as the "fallback" location for the title, language and language direction, and it's the primary expected location of the Table of Contents. It's also currently the only place the manifest can be embedded--which is where this issue came from. 😃

All of those expectations map pretty cleanly to what can be seen in current Web publications (lowercase on purpose):

Even more documentation-style Web publications (note case again) follow a similar pattern:

Consequently, retrieving something called a "publication address" does already provide an expected result: a means to navigate that publication and understand it's "bounds."

Embedding the manifest shortens the distance for understanding those bounds, provides a clearer purpose for a "publication address," and ultimately provides a clearer foundation for building Web-based publications from existing tooling.

Additionally, "off the Web" reading systems have it easier as they now have a canonical address for the publication which results in a document containing "the publication" (the manifest, the fallbacks for title/language, probably the ToC). The parsing is simpler because those UAs are now parsing for a single manifest location (not jumping between DOM nodes, possibly requesting other docs, possibly extracting a data block). It all becomes much simpler and more clearly defined.

Lastly, embedding JSON-LD in HTML will be normatively defined in JSON-LD 1.1--which makes the extraction process and expectations even clearer (and not unique to WPUB).

So, in sum 😉, the contents of the response to the publication's address (whatever it's name) seem clearly defined in our spec already and also have existing analogs in existing Web publications (note the case one last time). Hope that's clearer yet. 😺

@mattgarrish
Copy link
Member

Sure, I think we're largely just working on conceptual differences of what constitutes "the publication". You can swap out what page is returned from those URLs without meaningfully changing that you've arrived at a publication. Our specification is equally flexible in that regard.

That's all I'm getting at. I don't think the resource that happens to be returned is primary in any special way, or that the publication's address and that page's address are equatable. Nothing requires them to be. What happens at that address is that the publication can be initiated.

@mattgarrish
Copy link
Member

Or if it helps, what does it mean if I serve out different PEPs from the same publication address, one for desktop and one for mobile? It makes no difference to the address of the publication that there are multiple representations, but is only one validly the publication?

@BigBlueHat
Copy link
Member

Or if it helps, what does it mean if I serve out different PEPs from the same publication address, one for desktop and one for mobile? It makes no difference to the address of the publication that there are multiple representations, but is only one validly the publication?

If all the resources you serve for those two responses are different, then you probably do have two different renditions of the same conceptual publication. You'd have the same result if you're publishing a "mobile friendly manifest" and a "desktop friendly manifest"--you'd end up with two publications. (This scenario also highlights that we're leaving a lot of "responsive design" possibilities behind by pushing readingOrder and resources into JSON...i.e. no <picture>, etc.)

Raising the value of the initial response means the publication address actually has meaning and value. It's current place is strange as it's one of (possibly several) ways to find the manifests location which ultimate is used by something to "bind" the publication...but can't itself do so.

It should be clear by now that I'm (still) eager for a progressive Web approach to Web publications. 😃

If it's really important to some use case(s) that the manifest can be used separately (which has it's own host of concerns and risks...), then @iherman's suggestion (in #327) to make rel="publication" relate to the publication address and to use some other rel value point to the manifest seems like a reasonable compromise.

@dauwhe
Copy link
Contributor Author

dauwhe commented Dec 17, 2018

Or if it helps, what does it mean if I serve out different PEPs from the same publication address, one for desktop and one for mobile? It makes no difference to the address of the publication that there are multiple representations, but is only one validly the publication?

Multiple renditions!

I think that underlying web technologies such as responsive design and the picture element have made the idea of serving different publications to different users much less common.

@mattgarrish
Copy link
Member

Multiple renditions!

Beauty, eh!

I think that underlying web technologies such as responsive design and the picture element have made the idea of serving different publications to different users much less common.

It's hard to gauge. I get so locked in to thinking books sometimes that I wonder what people will actually do with this technology. All the times I get "Canadian" or "World" editions of content makes me think it's something we need to consider. How or what that means I'm not entirely sure at this point.

But when i think of scenarios like that, for example, then what does it mean in terms of understanding when or how a publication has been updated if the page is dynamic and the manifest always generated with it.

If the page was meaningfully the publication, I'd fully agree with @BigBlueHat, but right now we just have plain old html pages, and none of the pages (to me) is special to the publication in any way. Yes, you might be able to harvest some bits of information from the page, but like we discussed today there's no guarantee the page is anything of interest.

Now if we had something fantastical like html imports, and we could say the page really is the publication all the time, either containing the content directly or pulling it in, now that would make for a whole new ball game... then the html page would be more like a functioning package document.

@dauwhe
Copy link
Contributor Author

dauwhe commented Dec 18, 2018

At least from the first half of the discussion, we seem to agree that the spec implies that the primary entry page is a resource of the publication, and is therefore inside the bounds of the publication.

Are we OK with that? Or do we want to pursue the possibility of the primary entry page is outside of the publication, and then deal with the resulting issues?

@dauwhe
Copy link
Contributor Author

dauwhe commented Dec 19, 2018

Closed via #389

@dauwhe dauwhe closed this as completed Dec 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants