Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

website with special caracters in URL not saved #1235

Closed
tcitworld opened this issue Jul 8, 2015 · 8 comments
Closed

website with special caracters in URL not saved #1235

tcitworld opened this issue Jul 8, 2015 · 8 comments
Assignees

Comments

@tcitworld
Copy link
Member

webpage is http://pérotin.com/post/2009/06/09/SAV-Free-un-sketch-kafkaien

Thanks to @fivefilters, it works if you enter http://xn--protin-bva.com/post/2009/06/09/SAV-Free-un-sketch-kafkaien instead.

But it doesn't automatically does the conversion.

@fivefilters
Copy link

Thanks Thomas, I've made a note to look into this some more for the next version of Full-Text RSS. The URL encoding I sent you on Twitter, called punycode - https://en.wikipedia.org/wiki/Punycode - with only ascii characters, is apparently how such domains are stored in the DNS system. Would be good if Wallabag and FTR automatically handle these URLs without the user having to enter the punycode version, but then I'm not really sure how big a problem this is (see below). There's more information about the conversion here https://en.wikipedia.org/wiki/Internationalized_domain_name

Interestingly, when I copy such a URL from Chrome, it copies the punycode version, not the original. So when I copy your original URL and paste it into the form in Full-Text RSS, it works fine because it pastes the punycode version. Similarly, if I load the original URL http://pérotin.com/post/2009/06/09/SAV-Free-un-sketch-kafkaien and then use the Push to Kindle extension button, Chrome automatically sends the punycode version in the URL request parameter. I also tested with the Wallabag Chrome extension, and the same thing happens. The base64 encoded URL sent to the Wallabag server, if I decode it, is the punycode version: http://xn--protin-bva.com/post/2009/06/09/SAV-Free-un-sketch-kafkaien

Of course this might not be the same on all platforms/browsers.

@tcitworld
Copy link
Member Author

Thanks a lot for all the information. Do you think passing all URLs through the idn_to_ascii() function (if available) or this php lib would be enough ?

As for FTRSS, it doesn't seem to matter if the global txt config file is named with the original domain name or the punycode one.

@tcitworld tcitworld self-assigned this Jul 8, 2015
@fivefilters
Copy link

Not sure if you need to test if the domain needs to be punycode encoded first - although perhaps the library doesn't change domains that don't need to be encoded. I have no experience with it, so can't really say.

@tcitworld
Copy link
Member Author

although perhaps the library doesn't change domains that don't need to be encoded

That's what I was thinking. Will do tests.

@nicosomb
Copy link
Member

ping @j0k3r & graby.

@nicosomb
Copy link
Member

nicosomb commented May 2, 2016

The URL http://pérotin.com/post/2009/06/09/SAV-Free-un-sketch-kafkaien works with v2.wallabag.org.

@nicosomb nicosomb closed this as completed May 2, 2016
@zoenglinghou
Copy link

I don't know if I am bringing up an already solved issue, but if my URL has cjk charaters, it only works through the browser extension, not via the webpage or mobile app.

e.g.
https://matters.news/@loveyou_rabbit/做一個優雅的自由派中國人指南-保持清醒-bafyreihq7gxir656qvvycxkazs37odq5ow26hrgk65nlvr4dkf4pei53ii

@Kdecherf
Copy link
Member

Hello @zoenglinghou,

What you observe is a different issue than this closed one.

Could you please open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants