Skip to content

z_html:sanitize/2 eats whitespace. #335

Closed
mworrell opened this Issue May 23, 2012 · 8 comments

3 participants

@mworrell
Zotonic member

When you pull something like:

<a href='#'>a</a> <a href='#'>b</a>

Through z_html:sanitize/2 then the space between the two anchor tags is removed.
This seems to be a problem within mochiweb_html:parse/1

@mworrell mworrell closed this in 5eb80d3 May 23, 2012
@mworrell mworrell reopened this May 23, 2012
@mworrell
Zotonic member

Reopening as reminder, as we need to push this upstream.

@mmzeeman
Zotonic member

The fix also causes a mochiweb_html unittest to fail.

@mworrell
Zotonic member

Indeed:


  mochiweb_html: parse_test...*failed*
::error:{assertEqual_failed,[{module,mochiweb_html},
                           {line,972},
                           {expression,"parse ( D0 )"},
                           {expected,{<<"html">>,[],
                                      [{<<"head">>,[],
                                        [{<<"meta">>,[{...}|...],[]},
                                         {<<"titl"...>>,[],...},
                                         {<<...>>,...},
                                         {...}|...]},
                                       {<<"body">>,
                                        [{<<"id">>,<<"home">>},
                                         {<<"clas"...>>,<<...>>}],
                                        [<<"&lt;<thi"...>>]}]}},
                           {value,{<<"html">>,[],
                                   [<<"\n ">>,
                                    {<<"head">>,[],[<<"\n   ">>,{...}|...]},
                                    <<"\n ">>,
                                    {<<"body">>,[{...}|...],[...]},
                                    <<"\n">>]}}]}
  in function mochiweb_html:'-parse_test/0-fun-0-'/1
  in call from mochiweb_html:parse_test/0

This fails on the whitespace between the block level elements.
In an ideal world we should drop the whitespace between block level elements and keep the whitespace of the 'inline' elements.

That is quite a bit of knowledge to add to the parser though.

I suggest that we change the test so that it accounts for the newly returned whitespace.

@mmzeeman
Zotonic member

:-)

Would be nice to have add that "knowledge" anyway. Maybe I'll add it later.

@hanikesn

You can't reliably detect wether an element is an inline or block level element, as that can be changed via CSS.

Does this still need to be pushed upstream?

@mworrell
Zotonic member

@mmzeeman do you know if this is pushed upstream?

@mworrell mworrell added pending reply and removed needs work labels Feb 16, 2015
@mmzeeman
Zotonic member

No sorry. Some of our fixes are pushed upstream, some not.

@mworrell mworrell added efficiency and removed pending reply labels Feb 16, 2015
@mworrell
Zotonic member

Closing - we need to do a complete check on the upstream push anyway,

@mworrell mworrell closed this Apr 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.