-
-
Notifications
You must be signed in to change notification settings - Fork 49
::filter doesn't work well #7
Comments
This is the expected behavior, but documentation lacks a bit... I slightly updated the readme on this point, see the penultimate paragraph in the Usage section. The reasoning is the following:
|
So just to clarify, and I don’t mean to sound like a prick, but the expected behaviour is that my perfectly encoded utf-8 word gets mangled when there is some trailing invalid utf-8 by the Having read S3.6.1 I can see why you wouldn't want to remove the invalid bytes. But why does Again, sorry if I'm coming across as a prick asking these questions? |
This is a tricky point, you are right to ask, no pb at all. Your word is perfectly utf-8 valid, but the whole string is not, and u::filter() works by string. Do you have a real case where this string can come up in your data flow? |
I’m just playing around trying to understand how UTF-8 works and am writing a little script to hex-dump the byte values of a UTF-8 string: https://gist.github.com/jonnybarnes/6951138 So I suppose it’s not a real case of invalid utf-8 coming up in my data flow. And to be honest, other than manually creating some invalid utf-8 a la If I set the default value of But as you said, the only sensible way of dealing with an invalid UTF-8 string is to convert the characters into UTF-8, which is causing the valid portion of the string to get converted as well. |
I hope I answered you question. BTW, you should understand now that you shouldn't call isUtf8 before calling filter. |
So would a decent workflow to be filtering inputs then |
filtering your input with u::filter() garanties that you will get utf-8, so exception will never ever be thrown |
In fact, this is what \Patchwork\Utf8\Bootup::filterRequestInputs(); does for all autoglobals ( |
I was just about to say I'm using Thanks for the help :) |
This where someone tells me I'm doing this completely wrong, but given the following code
I'd like the i18n word to be preserved. Instead the output is
Iñtërnâtiônà lizætiøn ü¡¡¡¡¡
. I'd like the output to be more likeIñtërnâtiônàlizætiøn
.The text was updated successfully, but these errors were encountered: