Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facebook's new Rotating IDs break replayWeb.page #111

Open
halmos opened this issue Oct 18, 2022 · 9 comments
Open

Facebook's new Rotating IDs break replayWeb.page #111

halmos opened this issue Oct 18, 2022 · 9 comments

Comments

@halmos
Copy link

halmos commented Oct 18, 2022

ReplayWeb.page cannot replay facebook posts that use their new Rotating ids scheme (https://about.fb.com/news/2022/09/deterring-scraping-by-protecting-facebook-identifiers/).

The archive works as expected at first, but stop working after somewhere between one and weeks.

To reproduce, make an archive of a facebook page while logged-in. Then try to replay the archive again after 10 or so days.

@ikreymer
Copy link
Member

@halmos do you have an example that's currently breaking at the moment?
I'm curious if this is indeed the issue or something else, as the archive should not have any interaction with actual identifiers, and we're also updating the Date on the replay to match the time of archive creation..

@halmos
Copy link
Author

halmos commented Oct 24, 2022

Starting the timer on:
https://inkdroid.org/web-archives/archive/?source=https%3A%2F%2Fedsu-webarchives.s3.amazonaws.com%2Frandom.wacz#view=pages&urlSearchType=prefix&url=https%3A%2F%2Fwww.facebook.com%2Fgeorgeclintonpfunk&ts=20221018150735

I'm not sure if this test will work. I think the problem will only occur on FB archives where the user was logged-in at the time of archiving. I also don't see an indication that the post is using the new pfbid system. I think facebook is rolling that new system out in stages, so not all posts currently use it.

@halmos
Copy link
Author

halmos commented Oct 24, 2022

@halmos do you have an example that's currently breaking at the moment?
I'm curious if this is indeed the issue or something else, as the archive should not have any interaction with actual identifiers, and we're also updating the Date on the replay to match the time of archive creation.

I'm worried that there is a security issue posting an archive from a logged-in FB session since the auth tokens could be captured in the cookies. I'm trying to find a way to do this securely.

@ikreymer
Copy link
Member

I'm worried that there is a security issue posting an archive from a logged-in FB session since the auth tokens could be captured in the cookies. I'm trying to find a way to do this securely.

Yes, don't post it here, but you can upload it somewhere and send us a link to dev [at] webrecorder.net
I'm hoping that it's something that could be fixed with rewriting improvements..

@edsu
Copy link
Collaborator

edsu commented Oct 24, 2022

Do the rotating IDs only work for logged in FB users?

@halmos
Copy link
Author

halmos commented Oct 24, 2022

Do the rotating IDs only work for logged in FB users?

That's a good question. I believe they are also used for public / signed out posts, but I think the IDs on signed-in pages are used with authenticated API requests which may have additional side-effects. Unfortunately, this is a hard thing to test, but so far my experience is that the wacz files seem to be effected only on logged-in sessions. More testing is needed, however.

@ikreymer
Copy link
Member

@halmos can you try archiving and replaying with latest versions? We may have fixed some issues related to this.
I'm not sure the rotating ID is involved.

@halmos
Copy link
Author

halmos commented Mar 28, 2023

I am seeing fewer problems with the latest version of the extension. However I do see at least one example where images are not loading. Facebook seems to be using some pretty obscure code to load images dynamically. For example, here is the markup for an image which is not loading in the archive, tho i can see that the image does exist in the web archive:

<a href="/replay/w/n1h09qy637kujzpdcndwzs/20221018145215mp_/https://m.facebook.com/aalisarem/photos/pcb.167216991371363/167216948038034/?type=3&amp;av=1498090096&amp;eav=AfaXUKGW9nRapv2ODNFrM9HwoFikbQp2ymZmtVBrqUKedbbDenWGVsYjKDos9_vuUYc&amp;source=48&amp;__tn__=EH-R&amp;paipv=0" class="_39pi _26ih" style="top:162px; left:162px; width: 158px; height: 158px;">
  <div class="_50xr _403j" style="width:158px;height:158px;">
    <i class="img _5sgi img _2sxw" style="top:-26px;background-image: url('https\3a //scontent-lga3-2.xx.fbcdn.net/v/t1.6435-9/86179586_167216951371367_1117819982437154816_n.jpg?stp\3d cp0_dst-jpg_e15_p320x320_q65\26 _nc_cat\3d 101\26 ccb\3d 1-7\26 _nc_sid\3d 110474\26 efg\3d eyJpIjoidCJ9\26 _nc_ohc\3d 2DPAfgJ8OVcAX--T7xA\26 tn\3d 2K8adAyEtjKShIqL\26 _nc_ht\3d scontent-lga3-2.xx\26 oh\3d 00_AT9uwRTs_EvIEYpqKKYaliFQAqixxEp7QFFdx_Ywm-HINg\26 oe\3d 63732404');background-repeat:no-repeat;background-size:100% 100%;-webkit-background-size:100% 100%;width:158px;height:211px;" aria-label="No photo description available." role="img"></i>
  </div>
</a>

and here is how the image url is listed in the archive:
https://scontent-lga3-2.xx.fbcdn.net/v/t1.6435-9/86179586_167216951371367_1117819982437154816_n.jpg?stp=cp0_dst-jpg_e15_p320x320_q65&_nc_cat=101&ccb=1-7&_nc_sid=110474&efg=eyJpIjoidCJ9&_nc_ohc=2DPAfgJ8OVcAX--T7xA&tn=2K8adAyEtjKShIqL&_nc_ht=scontent-lga3-2.xx&oh=00_AT9uwRTs_EvIEYpqKKYaliFQAqixxEp7QFFdx_Ywm-HINg&oe=63732404

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants