You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From on 2005-03-16 08:25:18
:
Noticed on Debian Sarge:
libexpat1 1.95.8-1
libxml-parser-perl 2.34-3
perl 5.8.4-6
and Gentoo:
expat-1.95.8
XML-Parser-2.34
perl-5.8.5
XML::Parser is screwing around with non-ascii characters - most of the time, accented characters are converted from utf-8 down to iso-8859-1. After much debugging, I determined it wasn't Expat.so doing it but Parser.pm, despite the documentation saying that all text is returned as utf-8.
In the attached tar file, I have two xml files and a sample perl script... there is only one character difference between the xml file but perl handles them differently. the perl-unicode manpage says:
If strings operating under byte semantics and strings with Unicode
character data are concatenated, the new string will be created by
decoding the byte strings as ISO 8859-1 (Latin-1) [...]
Anyway, putting "use encoding 'utf8';" at the top of XML::Parser made perl keep the string as utf-8 instead of munging the accented characters. It also worked putting it at the top of the script with the Char handler, but it really should be in XML::Parser if you want it to always return utf-8 like it claims to do, I think.
John McPherson
The text was updated successfully, but these errors were encountered:
Migrated from rt.cpan.org#11899 (status was 'new')
Requestors:
Attachments:
From on 2005-03-16 08:25:18
:
Noticed on Debian Sarge:
libexpat1 1.95.8-1
libxml-parser-perl 2.34-3
perl 5.8.4-6
and Gentoo:
expat-1.95.8
XML-Parser-2.34
perl-5.8.5
XML::Parser is screwing around with non-ascii characters - most of the time, accented characters are converted from utf-8 down to iso-8859-1. After much debugging, I determined it wasn't Expat.so doing it but Parser.pm, despite the documentation saying that all text is returned as utf-8.
In the attached tar file, I have two xml files and a sample perl script... there is only one character difference between the xml file but perl handles them differently. the perl-unicode manpage says:
Anyway, putting "use encoding 'utf8';" at the top of XML::Parser made perl keep the string as utf-8 instead of munging the accented characters. It also worked putting it at the top of the script with the Char handler, but it really should be in XML::Parser if you want it to always return utf-8 like it claims to do, I think.
John McPherson
The text was updated successfully, but these errors were encountered: