Skip to content

Conversation

Griffon26
Copy link
Contributor

I noticed a significant slowdown as my input documents became larger.
It turned out to be caused by the RawHtmlProcessor doing a full text replace for every placeholder.
My patch changes this code to construct a regular expression that matches all items to be replaced and then replace each with the appropriate replacement text, requiring only one pass over the input text.

@waylan
Copy link
Member

waylan commented Feb 26, 2016

It would be interesting to see some benchmarks before and after this change. Mostly, I'm just curious how much of a difference it makes.

However, before it will be accepted, all the tests need to pass. Currently, 3 tests are failing related to syntax highlighting with the CodeHilite extension. That extensions actually stores its output in the HtmlStash so I'm assuming this change is causing the highlighted code blocks to not be swapped in for their placeholder's properly. Unfortunately, due to inconsistencies in output from different versions of Pygments, our tests are a little strange (we use startswith to verify a Pygments HTML block was returned without actually checking the contents of the block) and the tests' error messages are less than helpful in identifying the problem.

However, it occurs to me that if your patch is relying on the placeholders being in a specific order (both in the HTMLStash and in the document), with the CodeHilite extension that order is not necessarily preserved between document and stash. Any such patch needs to support that possibility.

@mitya57
Copy link
Collaborator

mitya57 commented Feb 26, 2016

Is there some error in the HtmlOutput plugin? The traceback from Python 3.x log looks like an internal testsuite error in that plugin.

@Griffon26
Copy link
Contributor Author

I've done some informal benchmarking. The big1.md file is a little under 10k lines big, big2 is twice that file, big3 three times, and so on. It's not entirely linear because the size of the regexp is also going to count when things get larger.

Before:
big1.md: 0.79s
big2.md: 2.51s
big3.md: 5.99s
big4.md: 11.32s
big5.md: 18.65s

After:
big1.md: 0.47s
big2.md: 1.00s
big3.md: 1.69s
big4.md: 2.51s
big5.md: 3.54s

I don't know why the tests are failing. They are all successful on my machine with python 2.7 and 3.4. I am unable to reproduce the failure seen for pypy/2.7, even manually and the errors for 3.3/3.4 seem to come from inside nose.

Are you referring to the use of an OrderedDict when you say that the patch relies on a specific order? The reason for the OrderedDict is to make sure that the placeholder with <p> around it ends up before the same placeholder without it in the regexp, otherwise the latter would match first and the <p> tags would remain in the output.

@waylan
Copy link
Member

waylan commented Feb 26, 2016

Is there some error in the HtmlOutput plugin? The traceback from Python 3.x log looks like an internal testsuite error in that plugin.

Wow, I missed that. Not sure what is going on there. Haven't seen that before.

The failure printed when self.assertTrue was used with str.startswith
did not show the string that was being searched.
@Griffon26
Copy link
Contributor Author

I added an assertStartsWith function to the tests, so it now shows exactly why the assertions fail (at least for 2.7 and pypy):

AssertionError: u'<div class="codehilite"><pre><span class="hll"' not found at the start of u'<div class="codehilite"><pre><span></span><span cla...'

There's an additional <span></span> fragment in the the output. Any idea how that could happen at TravisCI but not on my or Dmitry's machines?

@waylan
Copy link
Member

waylan commented Feb 27, 2016

Hmm, I'd guess a different version of Pygments. Although, if you are running tox locally, then each test run should be using the most up-to-date versions. If that's not it, I don't know.

@mitya57
Copy link
Collaborator

mitya57 commented Feb 27, 2016

OK, it passed for me with Pygments 2.1 but now I upgraded to 2.1.1 and it fails. Looks like this commit: https://bitbucket.org/birkenfeld/pygments-main/commits/164574c13533.

This also happens without this pull request, so it's not a regression.

@waylan
Copy link
Member

waylan commented Feb 28, 2016

Groan. Yep, that would be it. Thanks for finding it @mitya57.

@Griffon26
Copy link
Contributor Author

I've checked why the failure caused by pygments is not reported properly for python3 and I think it's this issue. Unfortunately that got closed with "upgrade to nose2".

I do have a work-around (similar to what they are doing here):

diff --git a/tests/plugins.py b/tests/plugins.py
index 90c5c0d..4e7af97 100644
--- a/tests/plugins.py
+++ b/tests/plugins.py
@@ -97,6 +97,8 @@ class HtmlOutput(Plugin):

     def formatErr(self, err):
         exctype, value, tb = err
+        if not isinstance(value, exctype):
+            value = exctype(value)
         return ''.join(traceback.format_exception(exctype, value, tb))

     def startContext(self, ctx):

If you'd like me to commit this, I'll add it to this pull request.

Is there anything else I can do?

@mitya57
Copy link
Collaborator

mitya57 commented Mar 3, 2016

@waylan: ping? I can take care of this pull request myself if you prefer (i.e. make sure the tests pass and merge it).

@waylan
Copy link
Member

waylan commented Mar 3, 2016

@mitya57 go for it. I've been a little short on time lately.

@mitya57 mitya57 merged commit 5966013 into Python-Markdown:master Mar 3, 2016
@mitya57
Copy link
Collaborator

mitya57 commented Mar 3, 2016

Merged. The tests pass everywhere according to Travis. Thanks @Griffon26 :)

cgabard pushed a commit to zestedesavoir/Python-ZMarkdown that referenced this pull request Apr 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants