New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid html in metadata causes #12

Open
hanleybrand opened this Issue Dec 3, 2018 · 1 comment

Comments

Projects
None yet
1 participant
@hanleybrand

hanleybrand commented Dec 3, 2018

This isn't a real show stopper, but an annoyance - HTML tags in a field can cause the pdf generator to throw an error and fail to generate a pdf (paraparser: syntax error: invalid attribute name rel attrMap=['backColor', 'backcolor', 'bgcolor', 'color', 'fg', 'fontName', 'fontSize', 'fontname', 'fontsize', 'href', 'name', 'textColor', 'textcolor'])

The record can be edited to remove the tags, but I'll see if I can figure out a fix and submit a pull request for the next version.

Here's an example of what breaks it - it's the rel="nofollow" attribute in this case:

<para>
<b>Title</b>: Brancusi, The Kiss, 1916<br />
<b>Description</b>: Constantin Brancusi, The Kiss, 1916, limestone, 58.4 x 33.7 x 25.4 cm (Philadelphia Museum of Art) 
<a href="http://smarthistory.org/constantin-brancusi-the-kiss/" rel="nofollow">Learn More on Smarthistory</a><br />
<b>Date</b>: 2014-02-27 00:57:17<br /><b>Identifier</b>: 19434583879<br /><b>Contributor</b>: profzucker<br />
<b>Contributor</b>: Steven Zucker<br />
<b>Subject</b>: brancusi<br />
...
<b>Subject</b>: embrace<br />
<b>Subject</b>: brancusikiss<br />
<b>Source</b>: https://www.flickr.com/photos/profzucker/19434583879/<br /></para>

Here's a sample traceback

[                django.request]   ERROR 2018-12-03 10:04:57,722 1687 Internal Server Error: /viewers/printviewviewer/5596/ [base.py:256]
Traceback (most recent call last):
  File "/opt/venv/mdid32/lib/python2.7/site-packages/django/core/handlers/base.py", line 132, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/gallery/rooibos/viewers/views.py", line 24, in viewer_shell
    response = viewer.view(request)
  File "/opt/gallery/rooibos/presentation/viewers.py", line 317, in view
    p = Paragraph(''.join(text), styles['Normal'])
  File "/opt/venv/mdid32/lib/python2.7/site-packages/reportlab/platypus/paragraph.py", line 1151, in __init__
    self._setup(text, style, bulletText or getattr(style,'bulletText',None), frags, cleanBlockQuotedText)
  File "/opt/venv/mdid32/lib/python2.7/site-packages/reportlab/platypus/paragraph.py", line 1173, in _setup
    style, frags, bulletTextFrags = _parser.parse(text,style)
  File "/opt/venv/mdid32/lib/python2.7/site-packages/reportlab/platypus/paraparser.py", line 1247, in parse
    annotateException('\nparagraph text %s caused exception' % ascii(text))
  File "/opt/venv/mdid32/lib/python2.7/site-packages/reportlab/lib/utils.py", line 1390, in annotateException
    rl_reraise(t,v,b)
  File "/opt/venv/mdid32/lib/python2.7/site-packages/reportlab/platypus/paraparser.py", line 1245, in parse
    self.feed(text)
  File "/usr/lib64/python2.7/HTMLParser.py", line 114, in feed
    self.goahead(0)
  File "/usr/lib64/python2.7/HTMLParser.py", line 158, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib64/python2.7/HTMLParser.py", line 324, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "/opt/venv/mdid32/lib/python2.7/site-packages/reportlab/platypus/paraparser.py", line 1268, in handle_starttag
    start(attrs or {})
  File "/opt/venv/mdid32/lib/python2.7/site-packages/reportlab/platypus/paraparser.py", line 724, in start_a
    A = self.getAttributes(attributes,_anchorAttrMap)
  File "/opt/venv/mdid32/lib/python2.7/site-packages/reportlab/platypus/paraparser.py", line 1100, in getAttributes
    self._syntax_error('invalid attribute name %s attrMap=%r'% (k,list(sorted(attrMap.keys()))))
  File "/opt/venv/mdid32/lib/python2.7/site-packages/reportlab/platypus/paraparser.py", line 824, in _syntax_error
    raise ValueError('paraparser: syntax error: %s' % message)
ValueError: (u"paraparser: syntax error: invalid attribute name rel attrMap=['backColor', 'backcolor', 'bgcolor', 'color', 'fg', 'fontName', 'fontSize', 'fontname', 'fontsize', 'href', 'name', 'textColor', 'textcolor']", '\nparagraph text u\'<para><b>Title</b>: Brancusi, The Kiss, 1916<br /><b>Description</b>: Constantin Brancusi, The Kiss, 1916, limestone, 58.4 x 33.7 x 25.4 cm (Philadelphia Museum of Art) <a href="http://smarthistory.org/constantin-brancusi-the-kiss/" rel="nofollow">Learn More on Smarthistory</a><br /><b>Date</b>: 2014-02-27 00:57:17<br /><b>Identifier</b>: 19434583879<br /><b>Contributor</b>: profzucker<br /><b>Contributor</b>: Steven Zucker<br /><b>Subject</b>: brancusi<br /><b>Subject</b>: thekiss<br /><b>Subject</b>: philadelphia<br /><b>Subject</b>: sculpture<br /><b>Subject</b>: art<br /><b>Subject</b>: modern<br /><b>Subject</b>: abstraction<br /><b>Subject</b>: cubic<br /><b>Subject</b>: block<br /><b>Subject</b>: couple<br /><b>Subject</b>: embrace<br /><b>Subject</b>: brancusikiss<br /><b>Source</b>: https://www.flickr.com/photos/profzucker/19434583879/<br /></para>\' caused exception')
@hanleybrand

This comment has been minimized.

hanleybrand commented Dec 3, 2018

In my experience this only happens with flickr imports, but fixing the reportlab generation seems like the better idea since these attributes could come from any source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment