bug when parsing <script> tag using some template system #29

Open
ghostoy opened this Issue Sep 1, 2011 · 3 comments

Comments

Projects
None yet
3 participants

ghostoy commented Sep 1, 2011

var htmlparser = require('htmlparser'),
    util = require('util'),
    handler = new htmlparser.DefaultHandler(function(err, dom){}),
    parser = new htmlparser.Parser(handler),
    rawHtml = '<script type="text/template"><h1>Heading1</h1></script>';

parser.parseComplete(rawHtml);
console.log(util.inspect(handler.dom, false, null));

This piece of code discards "<" of <h1> and outputs:

[ { raw: 'script type="text/template"',
    data: 'script type="text/template"',
    type: 'script',
    name: 'script',
    attribs: { type: 'text/template' },
    children: 
     [ { raw: 'h1>Heading1</h1>',  // discard <
         data: 'h1>Heading1</h1>',
         type: 'text' } ] } ]

fb55 commented Oct 25, 2011

The funny thing is that, if you add a space between the script and the h1-tag, it actually works: https://github.com/FB55/node-htmlparser/blob/master/tests/23-template_script_tags.js

Nothing funny about it, @fb55. The problem is deep inside parseTags(), where it consumes the first less-than symbol following any tag, including the script tag, but then correctly goes back into text-parsing mode to handle all of the template.

fb55 commented Nov 12, 2011

I fixed the bug inside my own fork, the test linked above passes without a problem (the additional space was removed).

@kirbysayshi kirbysayshi pushed a commit to kirbysayshi/node-htmlparser that referenced this issue Dec 19, 2013

@fb55 fb55 Added a test dc5fe9c

@kirbysayshi kirbysayshi pushed a commit to kirbysayshi/node-htmlparser that referenced this issue Dec 19, 2013

@fb55 fb55 Replaced _tagStack with _contentFlags, tweaked DefaultHandler
That fixed tautologistics#29.
bc12cd8

@darobin darobin pushed a commit to darobin/jsdom that referenced this issue Sep 14, 2015

@elfsternberg elfsternberg Added test to ensure that the contents of <script type="text/..."> tags
were left unmolested.  This is to ensure that script tags can be used
for some language other than Javascript.

The test data has a unit of whitespace at the very front to work around
tautologistics/node-htmlparser#29
d3b2afe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment