New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Facebook's new Rotating IDs break replayWeb.page #111
Comments
@halmos do you have an example that's currently breaking at the moment? |
I'm not sure if this test will work. I think the problem will only occur on FB archives where the user was logged-in at the time of archiving. I also don't see an indication that the post is using the new pfbid system. I think facebook is rolling that new system out in stages, so not all posts currently use it. |
I'm worried that there is a security issue posting an archive from a logged-in FB session since the auth tokens could be captured in the cookies. I'm trying to find a way to do this securely. |
Yes, don't post it here, but you can upload it somewhere and send us a link to dev [at] webrecorder.net |
Do the rotating IDs only work for logged in FB users? |
That's a good question. I believe they are also used for public / signed out posts, but I think the IDs on signed-in pages are used with authenticated API requests which may have additional side-effects. Unfortunately, this is a hard thing to test, but so far my experience is that the wacz files seem to be effected only on logged-in sessions. More testing is needed, however. |
@halmos can you try archiving and replaying with latest versions? We may have fixed some issues related to this. |
I am seeing fewer problems with the latest version of the extension. However I do see at least one example where images are not loading. Facebook seems to be using some pretty obscure code to load images dynamically. For example, here is the markup for an image which is not loading in the archive, tho i can see that the image does exist in the web archive: <a href="/replay/w/n1h09qy637kujzpdcndwzs/20221018145215mp_/https://m.facebook.com/aalisarem/photos/pcb.167216991371363/167216948038034/?type=3&av=1498090096&eav=AfaXUKGW9nRapv2ODNFrM9HwoFikbQp2ymZmtVBrqUKedbbDenWGVsYjKDos9_vuUYc&source=48&__tn__=EH-R&paipv=0" class="_39pi _26ih" style="top:162px; left:162px; width: 158px; height: 158px;">
<div class="_50xr _403j" style="width:158px;height:158px;">
<i class="img _5sgi img _2sxw" style="top:-26px;background-image: url('https\3a //scontent-lga3-2.xx.fbcdn.net/v/t1.6435-9/86179586_167216951371367_1117819982437154816_n.jpg?stp\3d cp0_dst-jpg_e15_p320x320_q65\26 _nc_cat\3d 101\26 ccb\3d 1-7\26 _nc_sid\3d 110474\26 efg\3d eyJpIjoidCJ9\26 _nc_ohc\3d 2DPAfgJ8OVcAX--T7xA\26 tn\3d 2K8adAyEtjKShIqL\26 _nc_ht\3d scontent-lga3-2.xx\26 oh\3d 00_AT9uwRTs_EvIEYpqKKYaliFQAqixxEp7QFFdx_Ywm-HINg\26 oe\3d 63732404');background-repeat:no-repeat;background-size:100% 100%;-webkit-background-size:100% 100%;width:158px;height:211px;" aria-label="No photo description available." role="img"></i>
</div>
</a> and here is how the image url is listed in the archive: |
ReplayWeb.page cannot replay facebook posts that use their new Rotating ids scheme (https://about.fb.com/news/2022/09/deterring-scraping-by-protecting-facebook-identifiers/).
The archive works as expected at first, but stop working after somewhere between one and weeks.
To reproduce, make an archive of a facebook page while logged-in. Then try to replay the archive again after 10 or so days.
The text was updated successfully, but these errors were encountered: