Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force answer content to be always UTF-8 encoded #11

Closed
weppos opened this issue Feb 6, 2010 · 8 comments
Closed

Force answer content to be always UTF-8 encoded #11

weppos opened this issue Feb 6, 2010 · 8 comments
Labels

Comments

@weppos
Copy link
Owner

weppos commented Feb 6, 2010

Internally WHOIS should always prefer UTF-8 encoding regardless server encoding.

@axic
Copy link
Contributor

axic commented Feb 21, 2010

You might want to check this: http://github.com/axic/whois/commit/955d5157c3b92679e62cca57d469713dedcbe5d1

It implements this feature.

@weppos
Copy link
Owner Author

weppos commented Feb 23, 2010

I checked the commit, but it doesn't really close this issue. Instead, it only provides a limited solution for two specific TLDs.
I know there are many other TLDs that would benefit from this feature. Instead, I would prefer to apply a single patch/commit instead of a separate list of per-TLD fixes.

Also, this feature definitely need an extensive test suite.

@weppos
Copy link
Owner Author

weppos commented Feb 23, 2010

If you want to work on this feature, I suggest you to move to a dedicated branch.
Let me know if you have any update, I'll be more than happy to integrate your changes into the mainstream repository.

Thanks for your contribution.

@semaperepelitsa
Copy link
Contributor

I've stopped getting encoding errors in my app after I passed whois answer through ActiveSupport::Multibyte::Unicode.tidy_bytes. (It replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.)

@weppos
Copy link
Owner Author

weppos commented Feb 23, 2011

Very interesting method. Unfortunately, it's not that simple. There's a very wide range of possible encodings (so far, I counted more than 10) and there are cases where a multipart whois record is returned with several different encodings.

I would love to find a solution that doesn't rely on third party Gems. Probably, it will be compatible with Ruby 1.9 only because Ruby 1.8 doesn't have encoding support.

tpalmer pushed a commit to tpalmer/whois that referenced this issue Oct 10, 2012
Create whois.fastdomain.com parser
@woodrow
Copy link

woodrow commented Jan 29, 2013

Hi @weppos. I was hoping to revive this thread and ask what strategy you think reasonable for properly handling of the various character encodings received from Whois servers around the Internet? In particular I was wondering about providing a list of hints for character encodings returned by well-known Whois servers (i.e. those in lib/definitions), or if you had other thoughts?

@weppos
Copy link
Owner Author

weppos commented Jan 29, 2013

In particular I was wondering about providing a list of hints for character encodings returned by well-known Whois servers (i.e. those in lib/definitions), or if you had other thoughts?

I tried this approach in the past. Unfortunately, it's not very effective. Maintaining that list is such a pain in the ***, because it can be very long and changes might not be immediately applied. Also, a few registries are able to return a response in more than one single encoding (this is insane, I know).

The only solution is to guess the encoding at runtime.

@weppos
Copy link
Owner Author

weppos commented Jun 9, 2016

Closing as this is a very old topic and there is no viable solution the client can implement at the time.

@weppos weppos closed this as completed Jun 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants