-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decoding failure for Ç #3
Comments
I'm unable to reproduce this. In
If you can supply a minimal test case, I'll investigate. |
Do you know of a way for me to be absolutely certain that I am feeding utf-8 to the decoder? I have a unit test on that section of my app that also passes however when I run the app in full, I'm getting this error. I do not for an instant discount a problem later down the line but my investigations so far brought me to suspect the decoder or what I'm feeding it. |
UPDATE: In script/console, I get this: in irb, the require 'htmlentities' didn't work. I probably need the full path. We're using htmlentities 4.2.0 under Rails 2.3.8 and Ruby 1.8.7 |
It looks like your terminal is not UTF-8. That's a separate problem. |
That's what I thought too but locale seems to think otherwise.
|
UPDATE: I found that further down the line in our code, I am trying to decode hex entities created by Nokogiri. It is Ç that is failing to decode properly. I'm getting à instead of Ç. |
However, let's try a different tack. Regardless of what your terminal is doing, the bytes should be the same. In
What results do you get? |
I'm using puTTY in xterm mode. I ran your example and got the same results. I've looked into this further and have determined that my tests are inadequate and misleading. It looks like my problem is that I have been feeding latin encoding to your decoder and this is the source of my difficulties. If I make sure that it it UTF-8, I get the proper decoding. Thank you for your time. Sorry to have wasted it. |
Ah, yes, doing UTF-8 on Windows is usually harder than it should be. Glad you've got a bit closer to the problem. |
I seem to be having a problem with Ç. For example: 'FRANÇOIS' is being decoded as 'FRANÇOIS'. However, 'François' is correctly handled as 'François'. I thought that it might be a case of the input string being latin1, but I'm pretty sure that's not the case and your documentation seems to imply that it won't decode things that it doesn't understand.
The text was updated successfully, but these errors were encountered: