-
-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
website with special caracters in URL not saved #1235
Comments
Thanks Thomas, I've made a note to look into this some more for the next version of Full-Text RSS. The URL encoding I sent you on Twitter, called punycode - https://en.wikipedia.org/wiki/Punycode - with only ascii characters, is apparently how such domains are stored in the DNS system. Would be good if Wallabag and FTR automatically handle these URLs without the user having to enter the punycode version, but then I'm not really sure how big a problem this is (see below). There's more information about the conversion here https://en.wikipedia.org/wiki/Internationalized_domain_name Interestingly, when I copy such a URL from Chrome, it copies the punycode version, not the original. So when I copy your original URL and paste it into the form in Full-Text RSS, it works fine because it pastes the punycode version. Similarly, if I load the original URL http://pérotin.com/post/2009/06/09/SAV-Free-un-sketch-kafkaien and then use the Push to Kindle extension button, Chrome automatically sends the punycode version in the URL request parameter. I also tested with the Wallabag Chrome extension, and the same thing happens. The base64 encoded URL sent to the Wallabag server, if I decode it, is the punycode version: http://xn--protin-bva.com/post/2009/06/09/SAV-Free-un-sketch-kafkaien Of course this might not be the same on all platforms/browsers. |
Thanks a lot for all the information. Do you think passing all URLs through the idn_to_ascii() function (if available) or this php lib would be enough ? As for FTRSS, it doesn't seem to matter if the global txt config file is named with the original domain name or the punycode one. |
Not sure if you need to test if the domain needs to be punycode encoded first - although perhaps the library doesn't change domains that don't need to be encoded. I have no experience with it, so can't really say. |
That's what I was thinking. Will do tests. |
ping @j0k3r & graby. |
The URL |
I don't know if I am bringing up an already solved issue, but if my URL has cjk charaters, it only works through the browser extension, not via the webpage or mobile app. e.g. |
Hello @zoenglinghou, What you observe is a different issue than this closed one. Could you please open a new issue? |
webpage is http://pérotin.com/post/2009/06/09/SAV-Free-un-sketch-kafkaien
Thanks to @fivefilters, it works if you enter http://xn--protin-bva.com/post/2009/06/09/SAV-Free-un-sketch-kafkaien instead.
But it doesn't automatically does the conversion.
The text was updated successfully, but these errors were encountered: