Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add section for BFCache eviction in cache clearing #77

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lozy219
Copy link
Member

@lozy219 lozy219 commented Aug 15, 2023

As discussed in #73 (comment) , we should add a section in #clear-cache to spec the steps of back/forward cache removal.

@domenic @fergald could you take a look at this? Thanks.


Preview | Diff

Copy link

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not mentioning this earlier, but I would feel better if we maintained the abstraction boundary, instead of having this spec poke into HTML's internals.

That is, we should ideally add some operation to HTML like "destroy all bfcached documents for an origin |origin|", that is publicly exported, and this spec can call that.

However, I guess that's only worth doing if this change has cross-browser consensus. Which seems unlikely since it doesn't account for storage partitioning.

index.src.html Outdated
1. For each |entry| in the |traversable|'s [=session history entries=]:

1. Let |state| be |entry|'s <a>document state</a> whose `origin` attribute is identical
to |host|.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence doesn't make sense to me.

I think you want to look at |entry|'s [=session history entry/document state=]'s [=document state/origin=].

Then, you need to compare it to |origin| (not |host|, I don't think??). But you can't just say identical to; you need to use [=same origin=].

And then the idea is you want to [=continue=] the loop if there's no match. You can't just assume there's always a match.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.  Let |cache list| be the set of entries from the <a>network cache</a> whose `target URI`
        [=url/host=] is identical to |host|.

I was copying the terms from line 583 above, which is not accurate for this BFCache clearing case. I have updated the content following the suggestions.

Do we need to explicitly [=continue=] the iteration in the spec if we don't have any steps to skip over below?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do so.

index.src.html Outdated
1. Let |state| be |entry|'s <a>document state</a> whose `origin` attribute is identical
to |host|.

2. Let |document| be |state|'s `document` attribute.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as a "document attribute". I think you want the document state's [=document state/document=].

Copy link
Member Author

@lozy219 lozy219 Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the autolink, seems [=document state/document=] couldn't find the right dfn. I added an dfn entry for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not exported; thus my comment about reaching into the internals in this way.

index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
@lozy219
Copy link
Member Author

lozy219 commented Aug 21, 2023

Thanks @domenic for the review.

Sorry for not mentioning this earlier, but I would feel better if we maintained the abstraction boundary, instead of having this spec poke into HTML's internals.

@fergal suggested to just leave some high level description of BFCache clearing in this spec, and I thought about just putting "BFCache" together with other types of cache that are only mentioned (like prerender pages and script caches). Do you think that's good enough?

However, I guess that's only worth doing if this change has cross-browser consensus. Which seems unlikely since it doesn't account for storage partitioning.

I'm still wondering if we could just have a section in the html spec. Even though there is no such a consensus for the BFCache clearing behavior that we are adding spec for in this PR, but the flushing algorithm itself is standalone right?

@domenic
Copy link

domenic commented Aug 21, 2023

suggested to just leave some high level description of BFCache clearing in this spec, and I thought about just putting "BFCache" together with other types of cache that are only mentioned (like prerender pages and script caches). Do you think that's good enough?

I'm not sure I fully understand the suggestion, but I don't think that's good enough. What you've done here, with a fully specified algorithm, is great. I'm just making a comment about the location of the algorithm.

I'm still wondering if we could just have a section in the html spec. Even though there is no such a consensus for the BFCache clearing behavior that we are adding spec for in this PR, but the flushing algorithm itself is standalone right?

I don't think we could get consensus to add a privacy-violating algorithm like this one to the HTML spec.

@fergald
Copy link

fergald commented Aug 28, 2023

I don't think we could get consensus to add a privacy-violating algorithm like this one to the HTML spec.

Are you referring to the storage partition side of this? Specifically this method of attack?

@domenic
Copy link

domenic commented Aug 28, 2023

Yes.

@fergald
Copy link

fergald commented Aug 28, 2023

Can that attack work with BFCache eviction? Unlike CSD for data, BFCache eviction is not persistent. Also, as I understand it, if a we have a tree of frames like

  • a.com
    • b.com

and the b.com subframe sets CSD in the header, it will only evict pages that have b.com has their top-level frame. This means, in the attack above, the user would need to have opened bucket$i.example as a top-level page and put it into BFCache in order to even have a chance to detect that CSD had occurred.

That does not seems like it can be an effective attack.

It's a bit unclear to me that only evicting if it's top-level is actually the right approach (what does Safari do?) but even if we changed to evicting all BFCached pages that had a b.com frame somewhere in the tree, the attack above with 32 buckets still does not work since the only signal you get is that at least one bucket had CSD on it. You don't know which bucket, so you can't gather 32 bits in single action. I think you could gather 1 bit per back-navigation although even that would be unreliable unless you can distinguish between a CSD flush and all other types of flush.

@annevk
Copy link
Member

annevk commented Aug 29, 2023

Even a single bit seems problematic though.

@fergald
Copy link

fergald commented Aug 29, 2023

Even a single bit seems problematic though.

Can you explain how that 1 bit can be is communicated? I can't see it.

Using the terminology of privacycg/storage-partitioning#11, when the user goes to site.example, what should it do so that next time the user visits news.example it can receive even 1 bit of info from site.example?

If the user goes to news.example, navigates away and navigates back and the page is restored from BFCache, it cannot tell if

  • site.example loaded but did not CSD
  • site.example just was not loaded while the page was in BFCache

I think this makes communication impossible

@johnwilander

@annevk
Copy link
Member

annevk commented Aug 30, 2023

  1. User goes to site.example.
  2. site.example navigates once so it can go back.
  3. User clicks a link to news.example that opens in a popup.
  4. User browses around there and happens to hit a page on news.example that embeds site.example. It clears the cache.
  5. User ends their news.example session and this is made clear somehow to site.example (perhaps through visibility changes or focus changes).
  6. site.example navigates back to determine if it was seen while the user was away.

@fergald
Copy link

fergald commented Aug 30, 2023

In the previous attack, the communication was from site.example to news.example. The attack is to pass site.example's global ID for the user to news.example (or any other site that embed's site.example's JS).

I think your example is trying communicate in the other direction. So let's assume that news.example has a boolea bit and it's trying to pass it to site.example

  1. User goes to site.example.
  2. site.example navigates once so it can go back.

To be specific, let's say it navigates to site.example/nav

  1. User clicks a link to news.example that opens in a popup.

Do you mean a link in site.example/nav? If so, these now have an opener relationship and they have no need for CSD as a way of communicating.

The interesting case is where the user arrives at news.example entirely independently of having been on site.example. That is what makes the original attack interesting.

So from now on, I assume that the arrival on news.example occurs at some random time in the future after being on site.example

  1. User browses around there and happens to hit a page on news.example that embeds site.example. It clears the cache.

The goal is to communicate a bit of information. You need to say what it should do if the bit==0 and what it should do if the bit == 1. Let's make the arbitrary choice

  • 0 => does clear the cache
  • 1 => does not clear the cache
  1. User ends their news.example session and this is made clear somehow to site.example (perhaps through visibility changes or focus changes).

This cannot happen anymore when the navigation is unconnected.

  1. site.example navigates back to determine if it was seen while the user was away.

How can site.example determine the value of bit?

  • If it sees that the page was not evicted then either
    • the user didn't go to news.example
    • or bit==1
  • If it sees that the page was evicted due to cache clear then either
    • the user went to news.example and bit==0
    • the user went to news.example and bit==1 but something else caused a cache clear
    • the user didn't go to news.example but something else caused a cache clear

It is impossible to reliably determine the value of bit. It's not even possible to assign probabilities based on whether the page was evicted or not because at the time of observation you have no idea whether the user has been to news.example during the BFCache period.

@fergald
Copy link

fergald commented Sep 6, 2023

Since there is no agreement on the subframe case, can we just spec that if the header is delivered on a top-level frame, it evicts any other top-levle frame in BFCache from that origin?

@smaug----
Copy link

@annevk so I'm a bit confused given #73 (comment) What has webkit shipped?

We discussed with @petervanderbeken about this, and our initial reaction is that supporting this for top level only (and only in case same storage is used) seems reasonable.

(But privacycg/storage-partitioning#11 is still open and that was the reason for https://bugzilla.mozilla.org/show_bug.cgi?id=1671182 )

@fergald
Copy link

fergald commented Nov 8, 2023

What do you mean by "and only in case same storage is used"? No storage partitioning? If so, please explain how this could be used to communiate across partitions. As far as I can tell, the argument in privacycg/storage-partitioning#11 requires persistent storage. You cannot simply replace persistent storage with "detect whether BFCaching occurred" as explained above.

@annevk
Copy link
Member

annevk commented Nov 8, 2023

If so, these now have an opener relationship and they have no need for CSD as a way of communicating.

I think for my scenario that does not matter. It could have been noopener through a policy. This is why I later on don't use the opener connection to communicate data back as that would indeed make a lot of it unneeded. It does indeed matter that site.example knows the user navigated to news.example.

@smaug---- I'll check.

@annevk
Copy link
Member

annevk commented Nov 8, 2023

In WebKit only a first-party origin can clear bfcache entries. So for #73 (comment) A would not be cleared.

(This makes sense to me as session history is tied to top-level navigable for now as all traversable navigables are top-level navigables.)

@fergald
Copy link

fergald commented Nov 10, 2023

@annevk does that mean you support top-level CSD header on example.com causing BFCache eviction of entries with toplevel of example.com?

@annevk
Copy link
Member

annevk commented Nov 10, 2023

Correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants