Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiting the ability to set a badge #68

Open
mgiuca opened this issue Feb 18, 2020 · 19 comments
Open

Rate limiting the ability to set a badge #68

mgiuca opened this issue Feb 18, 2020 · 19 comments

Comments

@mgiuca
Copy link
Collaborator

mgiuca commented Feb 18, 2020

Raised by @grorg on webkit-dev:

I'd also like to see some specification text describing how the browser should ignore multiple set/clear operations executed in rapid succession (e.g. to create a blinking badge) - maybe the limit is one badge operation per minute or something?

We should address this; it might be a recommendation rather than a requirement, since it relates to the UI of the user agent.

I'd like to see some sort of "eventual consistency" guarantee which says that the final value you write to the badge will eventually be the one displayed to the user. This prevents a situation where you set "3" then 10 seconds later set "12", and due to rate limiting, the "12" never gets set, so the user just sees "3" forever. Instead, the rule should be "If the UI hasn't been updated in > N seconds, update it to the new value. Otherwise, set a timer to update the UI to the new value in N - (however long it has been since the last update) seconds."

@marcoscaceres
Copy link
Member

I think I also raised this previously: while the promise to update the badge is held, we cam just return rejected promises. We then leave it to the UA to release the badge as it sees fit (including a timing protection if it likes).

@marcoscaceres
Copy link
Member

Unreleated to this bug, but should we raise an issue regarding:

Also, given that the main use case for this would be alerting the user to a notification, it seems like it should be able to link it directly to that. This would provide the ability for a push notification to trigger the badge without ever firing up the page context.

I know we've talked about this previously - but might worth discussing again, if it's possible to come up with a technical solution. It be nice to badge things without needing to actually run the SW or anything.

@mgiuca
Copy link
Collaborator Author

mgiuca commented Feb 19, 2020

I think I also raised this previously: while the promise to update the badge is held, we cam just return rejected promises. We then leave it to the UA to release the badge as it sees fit (including a timing protection if it likes).

Yeah I think I remember you raising this, but can't find a bug. A problem with that is that up until now, our promises don't reject for unpredictable reasons, and our examples don't include try/catch around the calls to setXXXBadge. So if we suddenly started rejecting promises, it would break existing code.

Regardless of compatibility questions, I would prefer if users don't have to try/catch every call to the badge, and just expect it to make a best effort to work (i.e., if it isn't supported, it does nothing; if it's supported but without any text, it just shows a dot; if it's within a spam timer, it sets the badge eventually). Seems like a more developer-friendly API if it does the right thing eventually, than just occasionally rejecting, requiring the developer to write retry code.

I know we've talked about this previously - but might worth discussing again, if it's possible to come up with a technical solution. It be nice to badge things without needing to actually run the SW or anything.

That is #28.

@marcoscaceres
Copy link
Member

So if we suddenly started rejecting promises, it would break existing code.

That's ok. It's an early incubation, so one should be using this in production so we are not breaking anyone.

I would prefer if users don't have to try/catch every call to the badge, and just expect it to make a best effort to work

I don't know. This would make setting badges unreliable. For a chat application, I'd definitely want a high level of reliability. Like if two people message me on Slack, and I only see "1", then I might miss the other person's message. That wouldn't be great.

@mgiuca
Copy link
Collaborator Author

mgiuca commented Feb 19, 2020

That's ok. It's an early incubation, so one should be using this in production so we are not breaking anyone.

We (Chrome) are about to ship the API to general availability in mid-March, after more than a year and a half of incubation.

At the time we ship, we will be explicitly breaking the API (we called it setExperimentalAppBadge during incubation), so developers will need to update their code. But we won't be adding the timer error before then, which means there'll be real sites using setAppBadge in production by the time we get around to adding this feature.

At the very least, if we do this, we should update all the documentation (including the examples in the spec) to include a try/catch. But as I said, I would prefer if users don't have to always do that.

It would mean basically every usage of the Badge API would need to be done like this:

function trySetBadge(unreadCount) {
  try {
    navigator.setAppBadge(unreadCount);
  } catch (...) {
    setTimeout(trySetBadge.bind(this, [unreadCount]), 20000);
  }
}

(And probably should do an exponential backoff, in case the error is not something that goes away after a timer.)

If everyone has to do that, I would prefer that the user agent just does it.

I would prefer if users don't have to try/catch every call to the badge, and just expect it to make a best effort to work

I don't know. This would make setting badges unreliable. For a chat application, I'd definitely want a high level of reliability. Like if two people message me on Slack, and I only see "1", then I might miss the other person's message. That wouldn't be great.

I don't understand that concern. I am proposing an API where the badge is eventually correct. Your counter-proposal lets the developer know that it hasn't worked, but fundamentally doesn't let them do anything about it other than retry, thus making the badge eventually correct. (And with strictly more latency than my proposal, since you won't know exactly when the API is allowed to be used again.)

@marcoscaceres
Copy link
Member

We (Chrome) are about to ship the API to general availability in mid-March, after more than a year and a half of incubation.

hhmmm... I thought we (browser vendors) had a general agreement not to ship incubations on the web until we had at least moved things into a WG and had had a chance to at least prototype in two implementations (@cwilso, @yoavweiss, @RByers?)... shipping single browser incubations kinda defeats the purpose of doing incubation at all, as it means other browser vendors don't get sufficient change to provide feedback (e.g., @grorg only gave feedback 3 days ago, and I doubt he's had much of a chance to think much about the API shape or even opinions about what we are discussing here).

At Mozilla, I'd we'd like to prototype this before we settle on the API (as is our position). We'd like to prototype is sometime this year 🤞.

@cwilso
Copy link

cwilso commented Feb 19, 2020

I don't think that general agreement is something anyone in Chrome would have agreed to. It is certainly less desirable to ship single-vendor incubations; this has been an incubation for 17 months now, though, and this has been a clearly missing feature when comparing web vs native platforms for longer than that.

No one should refer to WICG incubations as "standards" - on that we're agreed. (There have certainly been slipups here, and I try to correct and prevent in the future, so please point me at instances.) If there are ways we can work better together with other vendors - by prioritizing better publicly, by providing expected timeframes, better documentation - please weigh in, as we're actively working on how to improve. However, just waiting to ship anything at all until other vendors decide it's time to permit us to do so is not tenable.

Of course, we have a responsibility to maintain and correct any incubation we've shipped when other vendors are ready to pick it up and work to make it a true interoperable standard.

@bzbarsky
Copy link

we have a responsibility to maintain and correct any incubation we've shipped

@mgiuca above explicitly says that once this ships he does not plan to change it in ways that people have already requested, even before it's standards-track, due to compat constraints, no?

@cwilso
Copy link

cwilso commented Feb 19, 2020

@bzbarsky I have to claim some amount of ignorance of the subtleties of this feature. Of course we would have concerns about compat-breaking changes of something we've shipped; that should never equate to "we will not ever make changes," only that we need to mitigate compat pain. At a brief glance above, I think @mgiuca doesn't agree with the requested changes for technical reasons, not just "we can't change this." (cf "Regardless of compatibility questions" in #68 (comment).) That sounds like a technical discussion that needs to continue, or am I misreading this?

@bzbarsky
Copy link

@cwilso What @marcoscaceres and I both understood from the comments above was that while there is a technical discussion that is ongoing Google plans to preempt it by just shipping the current state of its implementation and then ruling out various other technical options based on compat concerns. If that is not Google's plan, please make that explicit.

@marcoscaceres
Copy link
Member

I'd respectfully also like to point out that it's somewhat disingenuous to say that "this has been an incubation for 17 months": The current shape of the API was the result of discussions we had at the last TPAC, where we managed to gather a good representative group of folks to give the API its current shape (with a huge thanks to @hober!). Since then (xmas vacation period aside), we (Mozilla) have been doing our best to engage with this specification with the limited resources that we have (i.e., just me).

I understand it's super frustrating for Chrome folks to be waiting on us smaller/less-well-resourced engines all the time. But Chrome just shipping stuff makes Mozilla call into question the point of incubating at all. That is really sad as I personally think what we are doing in the WICG is of tremendous value - specially when we are working together on refining APIs.

cc @othermaciej, who always has valuable input on these matters.

@yoavweiss
Copy link

yoavweiss commented Feb 19, 2020

As a Blink API owner, I disagree with @mgiuca that there are compatibility concerns with API changes at this point.

If there's merit to changing the API to throw (and I have no strong opinion on that question one way or the other), we could and should delay shipping if needed (and figure out a way to revert/backport whatever changes are needed to the M81 branch).

@mgiuca
Copy link
Collaborator Author

mgiuca commented Feb 20, 2020

@cwilso :

Of course we would have concerns about compat-breaking changes of something we've shipped; that should never equate to "we will not ever make changes," only that we need to mitigate compat pain. At a brief glance above, I think @mgiuca doesn't agree with the requested changes for technical reasons, not just "we can't change this." (cf "Regardless of compatibility questions" in #68 (comment).) That sounds like a technical discussion that needs to continue, or am I misreading this?

That's correct. As I said, disregarding any compat issue, I would prefer an "eventually consistent" design over one that requires the developer to manually catch errors and retry. See my previous comment about this.

If we did have an agreement that the API needed to change in a backwards-breaking way, I would be happy to work through the change. We've done this many times before with other APIs we've shipped in Chrome. Once it's shipped, I would have to be a lot more careful about the change, getting a full understanding of the breakage scenario, preparing blog posts for developers to update their code, etc. And I think that means the barrier for making such a change should be higher. (For example, I would be opposed to flippantly renaming "setAppBadge" to "setBadgeForApp", which I might have been OK with before we shipped.) But if there is a demonstrable need for a change, I am open to making it.

In this particular case, I think the proposal to add a new error condition is technically worse than my eventual consistency proposal. We can continue discussing the technical merits of the two proposals.

@bzbarsky :

while there is a technical discussion that is ongoing Google plans to preempt it by just shipping the current state of its implementation and then ruling out various other technical options based on compat concerns. If that is not Google's plan, please make that explicit.

There wasn't a technical discussion ongoing until yesterday. We announced plans to ship the API to stable a month ago, and we flipped the flag to put it on track to launch. I wasn't anticipating further breaking changes to arise, though I acknowledge that it is always a possibility since there has been no technical standards agreement. That is a risk we (Chrome) take by shipping APIs before they reach standardization. Again, I do not want to "rule out" other options based on compat concerns, but as I was saying above, I think the fact that we have shipped (or are in the process of shipping) the API does raise the bar for breaking changes somewhat.

@marcoscaceres :

I'd respectfully also like to point out that it's somewhat disingenuous to say that "this has been an incubation for 17 months": The current shape of the API was the result of discussions we had at the last TPAC, where we managed to gather a good representative group of folks to give the API its current shape (with a huge thanks to @hober!). Since then (xmas vacation period aside), we (Mozilla) have been doing our best to engage with this specification with the limited resources that we have (i.e., just me).

Thanks, you've been doing a great job helping out with the spec. But it has been a really really long time, from our end. TPAC 2019 was just one milestone in a very long process for us that's seen a lot of churn on the API design. We have clients that want this API (they are blocking their app launches on it). They have been wanting it for years and they can't understand why it has taken this long. At some point, we just have to say "ship it". We aren't "circumventing" the process. We are engaging publicly early and as much as we can. But we can't block our launch on multiple implementations and W3C standardization.

I understand it's super frustrating for Chrome folks to be waiting on us smaller/less-well-resourced engines all the time. But Chrome just shipping stuff makes Mozilla call into question the point of incubating at all. That is really sad as I personally think what we are doing in the WICG is of tremendous value - specially when we are working together on refining APIs.

I think it's a false dichotomy to see this as "wasted" effort. We've done rounds of public consultation, multiple TPACs, many explainer drafts, TAG review, a public spec draft that's been out for at least 6 months, a Chrome origin trial with good user feedback. There has been a lot of good ideas and changes made from the incubation process. That wouldn't have happened if we just shipped the first thing we designed, and documented it later. Our work is a fine balance between velocity and public consultation, and I think the API is better for it.

@yoavweiss :

If there's merit to changing the API to throw (and I have no strong opinion on that question one way or the other), we could and should delay shipping if needed (and figure out a way to revert/backport whatever changes are needed to the M81 branch).

I agree that if we do come to a consensus to add a new throw condition, we should do it before shipping the API. Therefore, the decision should be made in the next month -- the sooner the better, since we would have to land and merge changes in Chrome to revert back to an origin trial.


I propose that we put aside the compatibility issue (pretend we aren't shipping), and discuss the technical merits of both proposals. Namely:

  • Option A ("rejection"): If the badge has been set too recently, the API returns a rejected promise, requiring the developer to catch and set up retry logic if they want the most recent badge to be the one eventually seen by the user.
  • Option B ("eventual consistency"): If the badge has been set too recently, the user agent silently queues up a task to set the badge to its new value at some point in the future, effectively rate-limiting the use of the API without the developer needing to do anything.

Of these two options, I am in favour of B, based on the argument I outlined in this comment.

Another argument in favour of B is that I would like to make this rate limiting optional for the user agent (i.e., it would be a SHOULD rather than a MUST). If we went with A, then developers testing the feature against a browser that does not implement rate limiting would find their software breaks against a browser that does implement rate limiting promise rejection. Whereas B "just works" without the developer needing to care about the rate limiting logic in the UA.

@marcoscaceres
Copy link
Member

An intrinsic aspect of all promises is that they reject, even for unspecified reasons: the point of returning the promise in the first place, as opposed to a "fire and forget API", was to give us the flexibility to reject the promise as needed and as cases arise (such as this one).

It is still possible to implement this as B (queue), but with the ability to reject if abuse is detected (or for any other reason). Thus, there is no getting around a developer having to put a try/catch around an await p, because that's the nature of awaiting promises asynchronously.

@mgiuca
Copy link
Collaborator Author

mgiuca commented Feb 20, 2020

I am OK with the promise rejecting for "reasons"* if it's something that isn't going to work again after a retry. That means if the developer really cares about whether it worked or not, they can catch it and take appropriate measures (like maybe sending a notification, or showing an in-page prompt "Badges don't seem to be working, do you want to turn them on in your OS setting?"). Developers who just want a best-effort can just call the API and ignore errors.

What I don't want is for the badge to unexpectedly fail, but it would have succeeded if you called it a minute later, due to rate limiting. That means there is something the developer can do but they have to work hard to do it. I would rather that we do it for them. That means B.

I am happy to reserve the right (which the spec already does, by returning a promise) to fail for other, more permanent, reasons in the future.

* "reasons" might be user-agent-specific reasons, or future specced reasons after we ship.

@othermaciej
Copy link
Member

I agree with the concerns about interaction with the standards process, but these seem to have been settled reasonable. A not-quite-shipping-yet implementation of an incubation should not, in an ideal world, pre-empt breaking changes.

As for the substance of the issue:

  • I think the spec should design in rate limiting, not just allow for it. This shouldn't be something every browser invents and does differently, because that makes it hard for sites to have a reasonable expectation. This should be done by defining some sort of delay/coalescing where the badge can't be updated more frequently than , perhaps 60 seconds or 5 minutes. This makes things more predictable for web content authors. Perhaps the rate limit rule could be a minimum, if we want to leave freedom to prevent future kinds of abuse that we can't think of at present.

  • I think coalescing is easier on web content authors than rejecting the promise during a holdoff period. Otherwise they have to do coalescing themselves that predicts the browser's limits. I don't think there is a CPU usage (and thus battery life) advantage with the reject model either.

  • I think it would be appropriate to reject the promise if the site has done something so abusive that it causes the browser to revoke its badging permission entirely, or if the site was refused permission by the user when originally requested. (I see that the spec does not yet have a concept of permissions, but there is issue Badge permission...  #44 discussing it.)

@marcoscaceres
Copy link
Member

Thanks @othermaciej. About rate limiting, that definitely needs experimentation to understand what is a reasonable baseline. I'd be hesitant to bake anything into the spec at this point, but I'd be ok having wording to the effect that we are currently investigating what.

About coalescing, I concede that's true. Let's go with recommending coalescing the badging requests in the spec.

On the final point about the abuse case and permission revocation, absolutely. And very good points.

@mgiuca
Copy link
Collaborator Author

mgiuca commented Feb 21, 2020

OK great, I think we are all in agreement (mostly). I will send a PR to add this recommendation. I am happy if we add true (non-recoverable) failure conditions later.

I think the spec should design in rate limiting, not just allow for it. This shouldn't be something every browser invents and does differently, because that makes it hard for sites to have a reasonable expectation. This should be done by defining some sort of delay/coalescing where the badge can't be updated more frequently than , perhaps 60 seconds or 5 minutes. This makes things more predictable for web content authors. Perhaps the rate limit rule could be a minimum, if we want to leave freedom to prevent future kinds of abuse that we can't think of at present.

I disagree on this point. This is a user interface concern for the browser. Different browsers will make different considerations as to what an appropriate timeout should be. Developers should not need to care what the minimum time between badge sets is; with the proposed change, they should be able to send updates at any rate they like, and the user agent will rate limit as it sees fit. Therefore, I think this should be a RECOMMENDATION, not a REQUIREMENT.

@othermaciej
Copy link
Member

I'll have to think about whether websites could intentionally or inadvertently depend on the size of the throttling interval. (If they could, then we aren't doing any favors to web developers or browser implementors by leaving it open-ended).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants