-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some URLs not playing back in pywb #29
Comments
@anjackson are you able to share the page that this URL came from? |
Thanks for the context. |
Thanks @N0taN3rd - I've asked OutbackCDX about it. |
Having discussed nla/outbackcdx#44 with @ato it we think the OpenSearch API implementation (here) is the issue (your test probably didn't use that @N0taN3rd?) To quote @ato
i.e. the problem is the OpenSearch API expects everything to be doubly-escaped. |
The original Java Wayback code, which is the reference implementation of the protocol that OutbackCDX is trying to be compatible with is here: You can see the double escaping with URLEncoder (once inside the loop, once outside it). |
@anjackson yes my test did not use that. |
Thanks all. I'll tag and roll-out ASAP. |
We have an oddity, in that some URLs, like this one:
Playback fine in OpenWayback: https://www.webarchive.org.uk/wayback/archive/20130307094428im_/http://3.bp.blogspot.com/-W8IWj9tFz-I/UTCS2D5Pt-I/AAAAAAAAAI4/8BCbTLsJ3tI/s320/African+women4.png
But do not play back in pywb: https://alpha.webarchive.org.uk/wayback/en/archive/20130307094428im_/http://3.bp.blogspot.com/-W8IWj9tFz-I/UTCS2D5Pt-I/AAAAAAAAAI4/8BCbTLsJ3tI/s320/African+women4.png
In the back-end OutbackCDX service (which is old, which might matter here), the thing checks out:
From the pywb logs we see:
So perhaps the issue is the
+
getting escaped?The text was updated successfully, but these errors were encountered: