[shadow] Bug with Links using javascript:... in href attribute #18

trentm · 2011-03-07T05:08:35Z

_This is a _shadow issue* for Issue 18 on Google Code (from which this project was moved).
Added 2008-04-30T09:28:35.000Z by joh...@gmail.com. Closed (Fixed).
Labels: Type-Defect, Priority-High.
Please make updates to the bug there.*

Original description

Hello,

I produce text files from html-files using the html2text python script from
Aaron Swartz (http://www.aaronsw.com/2002/html2text/). When using it on the
page http://www.sptimes.ru/index.php?action_id=1&i_number=1241 there are
some links your regexp doesn't match (see attached file, eg. link number
10, 86).

I have gently adapted the expression:
  a) the line with # url = \2: 
      Allow for any charachter in URL (also spaces)
  b) the line with # title = \3: 
      Allow for empty title strings (i.e. empty brackets)

My version of the regular expression:
_link_def_re = re.compile(r&quot;&quot;&quot;
            ^[ ]{0,%d}\[(.+)\]: # id = \1
              [ \t]*
              \n?               # maybe *one* newline
              [ \t]*
            <?(.+?)>?          # url = \2
              [ \t]*
              \n?               # maybe one newline
              [ \t]*
            (?:
                (?<=\s)         # lookbehind for whitespace
                ['&quot;(]
                (.*?)           # title = \3 (allow empty titles)
                ['&quot;)]
                [ \t]*
            )?  # title is optional
            (?:\n+|\Z)
            &quot;&quot;&quot; % less_than_tab, re.X | re.M)

Maybe you could give it a thought if it is worth to add this to the code.

regards
Johannes Fitz

The text was updated successfully, but these errors were encountered:

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[shadow] Bug with Links using javascript:... in href attribute #18

[shadow] Bug with Links using javascript:... in href attribute #18

trentm commented Mar 7, 2011

[shadow] Bug with Links using javascript:... in href attribute #18

[shadow] Bug with Links using javascript:... in href attribute #18

Comments

trentm commented Mar 7, 2011

Original description