Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next, Prev, First, Last in AnnotationCollection/AnnotationPage #371

Open
gsergiu opened this issue Nov 1, 2016 · 12 comments
Open

Next, Prev, First, Last in AnnotationCollection/AnnotationPage #371

gsergiu opened this issue Nov 1, 2016 · 12 comments

Comments

@gsergiu
Copy link

gsergiu commented Nov 1, 2016

The Next, prev, First, Last are specified as being IRIs, consequnetly, these means that different requests to the same IRI should return the same annotations.

However, the number of annotations included in a page is not specified in the WA standard, and typically this is an (implementation specific) request paramenter as it can be found also in the jsonapi specifications (where is still looks to be an open issue):
http://jsonapi.org/format/1.1/#fetching-pagination

I would recommended including the pageSize paramenter within the IRIs indicating the next, prev, first, last properties.

This is a followup ticket for the #233 which was mainly focusing on the existance of the properties, but not on the verification of the annotations included in these resources.

@iherman iherman added this to the V1 PR milestone Nov 1, 2016
@azaroth42
Copy link
Collaborator

typically this is an (implementation specific) request parameter

Typically, but not in this case.

From the protocol spec, section 4.2:

The number of IRIs or Annotation descriptions included on each page is at the server's discretion, and may be inconsistent between pages.

We started off with the typical offset/limit pattern but decided against it for the following reasons:

  • It's much harder to cache when every client requests different page sizes
  • It's impossible to implement as a static set of files on disk
  • The size of annotations can vary wildly, particularly if the server doesn't return all of the information for an annotation in the page response. Guessing how many to retrieve requires knowledge of the dataset, which is not feasible to obtain.
  • It's trivial to either request pages until the number the client wants is reached (if per page size is less than desired), or to ignore annotations if the page size is greater than the number wanted.

So ... as this has been discussed, unless there is some new information that makes the current approach un-implementable (which would be strange, considering there are implementations), or some significant benefit that was overlooked in the original discussion, I propose close wontfix.

@gsergiu
Copy link
Author

gsergiu commented Nov 2, 2016

Yes, I do agree with what you say, this is still an open issue, and with this ticket I want to raise awarness of it.

My concern and may claim is that individual pages are considered to be resources, consequently accessing the URIs/IRIs of these resources must return the same content at different points in time and by different users (possibly using additional request parameters.)

In practice, this will not the case, becasue I expect that the most of the implementations will have requirements to use the pageSize. Almost all existing json based APIs are implementing like that.

Offset+Limit is an equivalent representation to PageNr + PageSize... so we are talking about the same thing. And my point is that the current specifications are incomplete in this aspect.

The page number is a unique identifier of the Page resource only in the case that the server doesn't allow the user to set the page size in their request.

Concretely, we cannot say that the following resources are the same or even equivalent.
http://example.org/annotations/?iris=1&page=1&pageSize=1
and
http://example.org/annotations/?iris=1&page=1&pageSize=2
Or
http://example.org/annotations/?iris=1&page=1&pageSize=10

Also .. we can say that these are actually different resources as they have different URIs, that would be also fine, only that the current specifications doesn't mentions something like that.

In this case, we have annother thing that is not clear. Given that the default pageSize used by implementation is 10. How can we indicate that the following resources are the same?
http://example.org/annotations/?iris=1&page=1&pageSize=10
and
http://example.org/annotations/?iris=1&page=1
One possible solution is to enforce or links in the ooutput to include the pageSize, which ... would mean that the pageSize must be normative.

If you don't wont to solve this issue in V1 is ok for me... but I think we should recognize this as being an open issue

@gsergiu
Copy link
Author

gsergiu commented Nov 2, 2016

@azaroth42

It's much harder to cache when every client requests different page sizes
I consider caching to be an implementation only concern, which is out of scope of specifications (as it currently is). Also ... I'm not convinced that the page size is really a problem for caching

It's impossible to implement as a static set of files on disk
I see no problem with that. Of course &pageSize=1 and &pageSize=2 are two differently resources, and in the case that an implementation wants to support this kind of caching they will need to generate at first access and store 2 different files. (again .. these are implementations issues and not normative aspects)

The size of annotations can vary wildly, particularly if the server doesn't return all of the information for an annotation in the page response. Guessing how many to retrieve requires knowledge of the dataset, which is not feasible to obtain.
I would dissagree with this. The jsonapi specifications are transmitting the pageSize a query parameter. This has nothing to do with guessing, but common implementations are imposing a max value for the pageSize. Again implementation corcern and not a specification concern, except if you want to set an error code if requested pageSize is greater than the maxPageSize. Still ... I don't feel there is a need to specify something like that.

It's trivial to either request pages until the number the client wants is reached (if per page size is less than desired), or to ignore annotations if the page size is greater than the number wanted.
Exactly this is why implementations use the per page size, and we have the perfect example in the specifications. It even makes sense that implementations use different defaultPageSize and maxPageSize for pages wher &iris is set to 0 ore 1.
You don't want to kill the server by sending 1000 requests to get all IRIs of 1000 annotations. But one could typically inlcude 100 IRIs in response when iris=1 and 10 annotation when iris=0.

@iherman
Copy link
Member

iherman commented Nov 2, 2016

(admin comment)

@gsergiu, I have the impression that what you want to change is not editorial but substantial. At this point, with (as we plan) only a few weeks away from a PR transition, any substantial change would seriously set back the progress of the work (restarting CR, etc). Of course, it there was a really serious technical issue then we would have no choice, but I do not have the impression it is the case.

I would propose to:

  1. set the issue as 'Postponed', ie, possible improvement for a next version of the document
  2. remove this issue from the V1 PR (or V1 Rec) milestone.

You did hint at this possibility; do you agree for me to proceed?

@gsergiu
Copy link
Author

gsergiu commented Nov 2, 2016

@iherman
Yes, I do recognize that a compromise is needed to be made in order to reach the PR status.
I also think that this is a change request that has more implications, and it is not an editorial only action.
Therefore I find it ok to postpone this ticket for V2.

BR,
Sergiu

@gsergiu
Copy link
Author

gsergiu commented Nov 2, 2016

PS: as general comment/strategy I find more appropriate to postpone the open/known issues that cannot be solved within the scope of V1/PR. I do not support of closing tickets as "won't fix" (in V1) as long as the issues are not actually solved. (Open Issues for V2, and the discussions within these issues are still valuable information for implementors)

@iherman iherman removed this from the V1 PR milestone Nov 2, 2016
@azaroth42
Copy link
Collaborator

My concern and may claim is that individual pages are considered to be resources, consequently accessing the URIs/IRIs of these resources must return the same content at different points in time and by different users

This is an impossible requirement that we should not attempt to solve. A trivial example that would break this: Delete the first annotation from the first page. Now every annotation shifts up one position, and every page's representation changes.

Caching is not just an implementation concern, it's a concern for any practical web standard. If the client has control of the page size, then the server needs to cache all possible sets of pages. That's while not impossible, completely impractical.

If it's all just an implementation concern ... then feel free to add your own pagesize parameter. No need to standardize. If it's not, then you need to actually address the questions.

@gsergiu
Copy link
Author

gsergiu commented Nov 2, 2016

  1. actually I'm using the pageSize parameter already and I'm tempted to say that the most of the implementations will sooner or later do it.
  2. Yes, you are right, the content of the Collections changes in time, in which case also the content of the individual pages will also change. My comments were related for the time period in which the collection doesn't change. In the fact any caching sistem needs to use timestamps to validate the actuality of the information stored in the cache.
  3. However my concern is not about the implementation but about the semantics and consistency.
    If we consider the AnnotationPages to be dereferencable resources that have an own id, in that case the information retrieved by different clients when accessing the same resource, must be consistent. And I claim that this is not ensured by the current specifications.

@BigBlueHat
Copy link
Member

👎 to adding anything with regard to page size. This is far to idiosyncratic, implementation specific, and not necessary for interoperability--and would in fact inhibit it.

It should be noted that existing feed/collection/syndication formats (RSS, Atom, ActivityStreams) do not attempt this--nor should we here.

Our pagination vocabulary is from ActivityStreams 2.0. They do not (nor should they) attempt to specify page size as it only inhibits interoperability and unnecessarily raises the bar of implementation.

@gsergiu
Copy link
Author

gsergiu commented Nov 3, 2016

Ok ... it seems that you consider only the case that the client is not allowed to set by itself the number of items included in a page.

This is a standard functionality in most CMS systems, and in json apis. I assumed that it would be benefic to include it in the WA specifications as well.

It seems that the tendency is to let this decision to the implementations.

I assume than the solution would be to eliminate the page size in the URIs when the default page size is used, and keep it in the URIs otherwise.

I'm not sure if there is a possiblity to document this in some non normative document or ... in the "primer" document that was recommended in some older tickets.

BR,
Sergiu

@BigBlueHat
Copy link
Member

@gsergiu everyone is likely (and does) do it different, and as such, it's not really necessary for us to give pointers--even non-normative ones--about how one might do it. If one needs it, one does it.

Given that the feed formats mentioned earlier haven't addressed the issue (and likely avoided it), I don't think we should venture in "where angels fear to tread."

@gsergiu
Copy link
Author

gsergiu commented Nov 3, 2016

I just find this to be useful information for implementers, even if there is no recommendation of how to deal with.
I think that it is hard for implementers to find and follow our discussions.
But at least the interested persons might be able to find this discussion at a later point in time...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants