Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Modified the to_json filter to allow UTF8-encoded character strings #288

Merged
merged 4 commits into from May 7, 2012

Conversation

Projects
None yet
4 participants
Contributor

fcardinaux commented Jan 31, 2012

With these changes, it is possible to specify that the input term contains UTF8-encoded character strings.

Here is an example with the French word "zèbre", which means zebra:

{{ ["zèbre"]|to_json:"utf8" }}

will be encoded in ["z\u00e8bre"], which will be displayed correctly: zèbre.

It is still possible to use to_json without parameter. However, it will break the unicode encoding:

{{ ["zèbre"]|to_json }}

will be encoded in ["z\u00c3\u00a8bre"], which will be incorrectly displayed: zèbre

arjan commented on b7985d1 Feb 16, 2012

The pull request looks fine but I think we should indeed assume the default input is UTF-8, as Zotonic stores all its data in this encoding.

If you modify your filter accordingly I'll happily merge it.

@kaos kaos commented on an outdated diff Feb 20, 2012

modules/mod_base/filters/filter_to_json.erl
+%% @spec to_json(ErlangTerm, Encoding, Context) -> Json
+%% Where:
+%% * ErlangTerm, Context and Json: like in to_json/2
+%% * Encoding =
+%% * "utf8": ErlangTerm contains UTF8 strings
+%% * "unicode": ErlangTerm contains unicode strings, as defined in the mochijson module
+to_json(Value, "utf8", _Context) ->
+ Encoder = mochijson:encoder([{input_encoding, utf8}]),
+ Encoder(z_convert:to_json(Value));
+
+to_json(Value, "unicode", _Context) ->
+ Encoder = mochijson:encoder([{input_encoding, unicode}]),
+ Encoder(z_convert:to_json(Value));
+
+to_json(Value, _Encoding, Context) ->
+ to_json(Value, Context).
@kaos

kaos Feb 20, 2012

Owner

I think that if a unknown value for encoding is given, I would rather have it fail, than to give a default encoding, which might not be at all compatible with what was requested.

Contributor

fcardinaux commented Feb 20, 2012

@arjan The default input is now UTF-8, as you suggested. More in detail:

  • If value is a term that contains UTF-8-encoded strings: {{ value|to_json }}
  • If value is a term that contains Latin-1 strings (ISO 8859-1): {{ value|to_json:"latin-1" }}
  • If value is a term that contains Unicode-encoded strings as defined in the mochijson module: {{ value|to_json:"unicode" }}
  • Any other parameter is equivalent to no parameter at all, so one can also use {{ value|to_json:"utf8" }} to specify that the input contains UTF-8-encoded strings.

@kaos If Arjan agrees with your suggestion, I will change the code again.

Owner

kaos commented Feb 21, 2012

Or, we can merge it, and change it later if more people think it is a good idea...

Owner

arjan commented Feb 21, 2012

I agree with kaos: if it crashes for unknown encodings, people will know that those encodings are not supported, instead of failing silently.

Contributor

fcardinaux commented Feb 21, 2012

I will implement the change suggested by kaos this evening.

By the way, is there a place where I can implement some unit tests of this function? This would allow to make sure that the different character encodings work the way we believe, and give us a reference. Especially, Mochiweb doesn't clearly explain what the third "unicode" encoding is (UCS-2, UTF-16 or something else).

Owner

kaos commented Feb 21, 2012

You bring up a IMHO much needed topic to address: unit tests.

There are some tests, but from what I recall, they look more like integration tests, and are slightly too much trouble to setup and run to be used effectively as unit tests (I've not actually tried to run them, so I may be wrong).

Writing eunit tests for modules is quite easy, and I don't think we need to implement too much supporting code to make it easy to use through out the whole zotonic code base.

As an exercise, I wrote unit tests for my mod_gazonk module while writing my article series on writing your own module.

I'm willing to look into it, when I get a chance (which might not be too soon... :/ )

Contributor

fcardinaux commented Feb 22, 2012

Please note that I have temporarily removed the third encoding that existed in the previous versions of this filter. This is because I don't know what Mochiweb's 'unicode' encoder exactly does.

Owner

kaos commented Feb 28, 2012

I miss the to_json/2 which ought to work the same as |to_json:"utf8"...

Contributor

fcardinaux commented Feb 28, 2012

Sorry, I see now that I have misunderstood the meaning of your first comment. I'm going to fix this and publish the correction today.

Owner

kaos commented Feb 28, 2012

+1

Owner

mworrell commented May 4, 2012

Can we merge this one? Anyone any objections?

Contributor

fcardinaux commented May 4, 2012

It's fine with me.

kaos added a commit that referenced this pull request May 7, 2012

Merge pull request #288 from fcardinaux/dev_filter_to_json_utf8
Modified the to_json filter to support different encoded character strings

@kaos kaos merged commit 0537a12 into zotonic:master May 7, 2012

Owner

kaos commented May 7, 2012

Contributor

fcardinaux commented May 7, 2012

Thanks for having updated the documentation.

Owner

kaos commented May 7, 2012

:)

rpip pushed a commit to rpip/zotonic that referenced this pull request Aug 12, 2013

Merge pull request #288 from fcardinaux/dev_filter_to_json_utf8
Modified the to_json filter to support different encoded character strings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment