Skip to content

Modified the to_json filter to allow UTF8-encoded character strings #288

Merged
merged 4 commits into from May 7, 2012

4 participants

@fcardinaux

With these changes, it is possible to specify that the input term contains UTF8-encoded character strings.

Here is an example with the French word "zèbre", which means zebra:

{{ ["zèbre"]|to_json:"utf8" }}

will be encoded in ["z\u00e8bre"], which will be displayed correctly: zèbre.

It is still possible to use to_json without parameter. However, it will break the unicode encoding:

{{ ["zèbre"]|to_json }}

will be encoded in ["z\u00c3\u00a8bre"], which will be incorrectly displayed: zèbre

@arjan
Zotonic member
arjan commented on b7985d1 Feb 16, 2012

The pull request looks fine but I think we should indeed assume the default input is UTF-8, as Zotonic stores all its data in this encoding.

If you modify your filter accordingly I'll happily merge it.

@kaos kaos commented on an outdated diff Feb 20, 2012
modules/mod_base/filters/filter_to_json.erl
+%% @spec to_json(ErlangTerm, Encoding, Context) -> Json
+%% Where:
+%% * ErlangTerm, Context and Json: like in to_json/2
+%% * Encoding =
+%% * "utf8": ErlangTerm contains UTF8 strings
+%% * "unicode": ErlangTerm contains unicode strings, as defined in the mochijson module
+to_json(Value, "utf8", _Context) ->
+ Encoder = mochijson:encoder([{input_encoding, utf8}]),
+ Encoder(z_convert:to_json(Value));
+
+to_json(Value, "unicode", _Context) ->
+ Encoder = mochijson:encoder([{input_encoding, unicode}]),
+ Encoder(z_convert:to_json(Value));
+
+to_json(Value, _Encoding, Context) ->
+ to_json(Value, Context).
@kaos
Zotonic member
kaos added a note Feb 20, 2012

I think that if a unknown value for encoding is given, I would rather have it fail, than to give a default encoding, which might not be at all compatible with what was requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fcardinaux

@arjan The default input is now UTF-8, as you suggested. More in detail:

  • If value is a term that contains UTF-8-encoded strings: {{ value|to_json }}
  • If value is a term that contains Latin-1 strings (ISO 8859-1): {{ value|to_json:"latin-1" }}
  • If value is a term that contains Unicode-encoded strings as defined in the mochijson module: {{ value|to_json:"unicode" }}
  • Any other parameter is equivalent to no parameter at all, so one can also use {{ value|to_json:"utf8" }} to specify that the input contains UTF-8-encoded strings.

@kaos If Arjan agrees with your suggestion, I will change the code again.

@kaos
Zotonic member
kaos commented Feb 21, 2012

Or, we can merge it, and change it later if more people think it is a good idea...

@arjan
Zotonic member
arjan commented Feb 21, 2012

I agree with kaos: if it crashes for unknown encodings, people will know that those encodings are not supported, instead of failing silently.

@fcardinaux

I will implement the change suggested by kaos this evening.

By the way, is there a place where I can implement some unit tests of this function? This would allow to make sure that the different character encodings work the way we believe, and give us a reference. Especially, Mochiweb doesn't clearly explain what the third "unicode" encoding is (UCS-2, UTF-16 or something else).

@kaos
Zotonic member
kaos commented Feb 21, 2012

You bring up a IMHO much needed topic to address: unit tests.

There are some tests, but from what I recall, they look more like integration tests, and are slightly too much trouble to setup and run to be used effectively as unit tests (I've not actually tried to run them, so I may be wrong).

Writing eunit tests for modules is quite easy, and I don't think we need to implement too much supporting code to make it easy to use through out the whole zotonic code base.

As an exercise, I wrote unit tests for my mod_gazonk module while writing my article series on writing your own module.

I'm willing to look into it, when I get a chance (which might not be too soon... :/ )

@fcardinaux

Please note that I have temporarily removed the third encoding that existed in the previous versions of this filter. This is because I don't know what Mochiweb's 'unicode' encoder exactly does.

@kaos
Zotonic member
kaos commented Feb 28, 2012

I miss the to_json/2 which ought to work the same as |to_json:"utf8"...

@fcardinaux

Sorry, I see now that I have misunderstood the meaning of your first comment. I'm going to fix this and publish the correction today.

@kaos
Zotonic member
kaos commented Feb 28, 2012

+1

@mworrell
Zotonic member
mworrell commented May 4, 2012

Can we merge this one? Anyone any objections?

@fcardinaux

It's fine with me.

@kaos kaos merged commit 0537a12 into zotonic:master May 7, 2012
@kaos
Zotonic member
kaos commented May 7, 2012
@fcardinaux

Thanks for having updated the documentation.

@kaos
Zotonic member
kaos commented May 7, 2012

:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.