Improve RawHtmlProcessor to have linear iso quadratic performance #454

Griffon26 · 2016-02-26T19:48:03Z

I noticed a significant slowdown as my input documents became larger.
It turned out to be caused by the RawHtmlProcessor doing a full text replace for every placeholder.
My patch changes this code to construct a regular expression that matches all items to be replaced and then replace each with the appropriate replacement text, requiring only one pass over the input text.

waylan · 2016-02-26T20:16:22Z

It would be interesting to see some benchmarks before and after this change. Mostly, I'm just curious how much of a difference it makes.

However, before it will be accepted, all the tests need to pass. Currently, 3 tests are failing related to syntax highlighting with the CodeHilite extension. That extensions actually stores its output in the HtmlStash so I'm assuming this change is causing the highlighted code blocks to not be swapped in for their placeholder's properly. Unfortunately, due to inconsistencies in output from different versions of Pygments, our tests are a little strange (we use startswith to verify a Pygments HTML block was returned without actually checking the contents of the block) and the tests' error messages are less than helpful in identifying the problem.

However, it occurs to me that if your patch is relying on the placeholders being in a specific order (both in the HTMLStash and in the document), with the CodeHilite extension that order is not necessarily preserved between document and stash. Any such patch needs to support that possibility.

mitya57 · 2016-02-26T20:26:03Z

Is there some error in the HtmlOutput plugin? The traceback from Python 3.x log looks like an internal testsuite error in that plugin.

Griffon26 · 2016-02-26T20:32:26Z

I've done some informal benchmarking. The big1.md file is a little under 10k lines big, big2 is twice that file, big3 three times, and so on. It's not entirely linear because the size of the regexp is also going to count when things get larger.

Before:
big1.md: 0.79s
big2.md: 2.51s
big3.md: 5.99s
big4.md: 11.32s
big5.md: 18.65s

After:
big1.md: 0.47s
big2.md: 1.00s
big3.md: 1.69s
big4.md: 2.51s
big5.md: 3.54s

I don't know why the tests are failing. They are all successful on my machine with python 2.7 and 3.4. I am unable to reproduce the failure seen for pypy/2.7, even manually and the errors for 3.3/3.4 seem to come from inside nose.

Are you referring to the use of an OrderedDict when you say that the patch relies on a specific order? The reason for the OrderedDict is to make sure that the placeholder with <p> around it ends up before the same placeholder without it in the regexp, otherwise the latter would match first and the <p> tags would remain in the output.

waylan · 2016-02-26T23:23:24Z

Is there some error in the HtmlOutput plugin? The traceback from Python 3.x log looks like an internal testsuite error in that plugin.

Wow, I missed that. Not sure what is going on there. Haven't seen that before.

The failure printed when self.assertTrue was used with str.startswith did not show the string that was being searched.

Griffon26 · 2016-02-26T23:57:22Z

I added an assertStartsWith function to the tests, so it now shows exactly why the assertions fail (at least for 2.7 and pypy):

AssertionError: u'<div class="codehilite"><pre><span class="hll"' not found at the start of u'<div class="codehilite"><pre><span></span><span cla...'

There's an additional <span></span> fragment in the the output. Any idea how that could happen at TravisCI but not on my or Dmitry's machines?

waylan · 2016-02-27T00:40:54Z

Hmm, I'd guess a different version of Pygments. Although, if you are running tox locally, then each test run should be using the most up-to-date versions. If that's not it, I don't know.

mitya57 · 2016-02-27T05:10:37Z

OK, it passed for me with Pygments 2.1 but now I upgraded to 2.1.1 and it fails. Looks like this commit: https://bitbucket.org/birkenfeld/pygments-main/commits/164574c13533.

This also happens without this pull request, so it's not a regression.

waylan · 2016-02-28T01:34:26Z

Groan. Yep, that would be it. Thanks for finding it @mitya57.

Griffon26 · 2016-02-28T19:53:28Z

I've checked why the failure caused by pygments is not reported properly for python3 and I think it's this issue. Unfortunately that got closed with "upgrade to nose2".

I do have a work-around (similar to what they are doing here):

diff --git a/tests/plugins.py b/tests/plugins.py
index 90c5c0d..4e7af97 100644
--- a/tests/plugins.py
+++ b/tests/plugins.py
@@ -97,6 +97,8 @@ class HtmlOutput(Plugin):

     def formatErr(self, err):
         exctype, value, tb = err
+        if not isinstance(value, exctype):
+            value = exctype(value)
         return ''.join(traceback.format_exception(exctype, value, tb))

     def startContext(self, ctx):

If you'd like me to commit this, I'll add it to this pull request.

Is there anything else I can do?

mitya57 · 2016-03-03T10:29:45Z

@waylan: ping? I can take care of this pull request myself if you prefer (i.e. make sure the tests pass and merge it).

waylan · 2016-03-03T13:39:37Z

@mitya57 go for it. I've been a little short on time lately.

mitya57 · 2016-03-03T14:31:38Z

Merged. The tests pass everywhere according to Travis. Thanks @Griffon26 :)

cf Python-Markdown/markdown#454 (comment)

Improve RawHtmlProcessor to have linear iso quadratic performance

429cc98

Griffon26 added 2 commits February 27, 2016 00:34

Added assertStartsWith to tests to give better failure messages

e56f0cd

The failure printed when self.assertTrue was used with str.startswith did not show the string that was being searched.

Added a few empty lines in the test to satisfy flake8

5966013

mitya57 merged commit 5966013 into Python-Markdown:master Mar 3, 2016

cgabard pushed a commit to zestedesavoir/Python-ZMarkdown that referenced this pull request Apr 25, 2016

Fix test for Python3

5b1b9b6

cf Python-Markdown/markdown#454 (comment)

waylan mentioned this pull request Jan 25, 2017

Placeholder in output for fenced code blocks nested in raw HTML blocks #458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve RawHtmlProcessor to have linear iso quadratic performance #454

Improve RawHtmlProcessor to have linear iso quadratic performance #454

Uh oh!

Griffon26 commented Feb 26, 2016

Uh oh!

waylan commented Feb 26, 2016

Uh oh!

mitya57 commented Feb 26, 2016

Uh oh!

Griffon26 commented Feb 26, 2016

Uh oh!

waylan commented Feb 26, 2016

Uh oh!

Griffon26 commented Feb 26, 2016

Uh oh!

waylan commented Feb 27, 2016

Uh oh!

mitya57 commented Feb 27, 2016

Uh oh!

waylan commented Feb 28, 2016

Uh oh!

Griffon26 commented Feb 28, 2016

Uh oh!

mitya57 commented Mar 3, 2016

Uh oh!

waylan commented Mar 3, 2016

Uh oh!

mitya57 commented Mar 3, 2016

Uh oh!

Uh oh!

Improve RawHtmlProcessor to have linear iso quadratic performance #454

Improve RawHtmlProcessor to have linear iso quadratic performance #454

Uh oh!

Conversation

Griffon26 commented Feb 26, 2016

Uh oh!

waylan commented Feb 26, 2016

Uh oh!

mitya57 commented Feb 26, 2016

Uh oh!

Griffon26 commented Feb 26, 2016

Uh oh!

waylan commented Feb 26, 2016

Uh oh!

Griffon26 commented Feb 26, 2016

Uh oh!

waylan commented Feb 27, 2016

Uh oh!

mitya57 commented Feb 27, 2016

Uh oh!

waylan commented Feb 28, 2016

Uh oh!

Griffon26 commented Feb 28, 2016

Uh oh!

mitya57 commented Mar 3, 2016

Uh oh!

waylan commented Mar 3, 2016

Uh oh!

mitya57 commented Mar 3, 2016

Uh oh!

Uh oh!