Fix infinite loop #430 #432

facelessuser · 2015-09-05T01:46:12Z

This should fix the remaining corner cases that can cause infinite
loops. Previous iterations did not account for scenarios where the
“end” index was less than the “start” index. If the “end” index is
ever less than or equal to the “start” index, the “end” will be
adjusted to to be “start” + 1 allow the full range to be extracted and
replaced.

This should fix the remaining corner cases that can cause infinite loops. Previous iterations did not account for scenarios where the “end” index was less than the “start” index. If the “end” index is ever less than or equal to the “start” index, the “end” will be adjusted to to be “start” + 1 allow the full range to be extracted and replaced.

ryneeverett · 2015-09-05T02:24:16Z

Nice.

ryneeverett · 2015-09-05T03:15:22Z

Another option would be:

                 else:  # raw html
-                    if len(items) - right_listindex <= 1:  # last element
+                    if len(items) - right_listindex <= 1 and right_listindex > i:  # last element
                         right_listindex -= 1

facelessuser · 2015-09-05T03:25:04Z

That doesn't look like a similar solution.

facelessuser · 2015-09-05T03:29:11Z

Let me rephrase; that doesn't look equivalent. I would have to break down all the cases to see if they were equivalent for sure.

ryneeverett · 2015-09-05T03:32:01Z

I believe the only way the right_listindex can be less than i is if it is decremented here. Regardless, I suppose I like your solution better since it essentially reverts 5d91369. 😛

facelessuser · 2015-09-05T03:36:26Z

It's not really a revert at all. If anything, it is doubling down on the same concept. It is actually just increasing the range that 5d91369 was normalizing without the extra variable.

facelessuser · 2015-09-05T03:52:48Z

@ryneeverett You are probably right about the decrement being done improperly. I would need to verify that the issue is because of the decrement, and test out your suggestion to verify that it solves all past problems as well and doesn't cause a regression. If you have evidence that supports it being equivalent, I wouldn't mind moving to it. I just don't have much time right now to look into this further than what I already have.

ryneeverett · 2015-09-05T03:53:10Z

It's not really a revert at all.

Oh, I see.

I'm starting to think it just doesn't make much sense any more to assign right_listindex prior to the if 'markdown' in attrs.keys():/else blocks since it needs to be so different.

ryneeverett · 2015-09-05T04:01:24Z

@facelessuser I believe it's equivalent but not better because it requires the confusing offset be left in. I'm +1 on this PR as is.

ryneeverett · 2015-09-05T04:21:43Z

Ok, here's a suggested diff on this PR:

                 else:  # raw html
-                    if len(items) - right_listindex <= 1:  # last element
+                    if len(items) == right_listindex:  # last element
                         right_listindex -= 1
-                    if right_listindex <= i:
+                    if right_listindex == i:
                         right_listindex = i + 1

This avoids the unnecessary decrement, not to mention making the intent a lot clearer.

facelessuser · 2015-09-05T04:30:43Z

Unfortunately, that isn't going to work like you think. The following code:

if len(items) == right_listindex:  # last element
    right_listindex -= 1

This will augment an index range of (2, 3) for an array of length 3 and make it (2,2):

>>> [1, 2, 3][2:3]
[3]

>>> [1, 2, 3][2:2]
[]

So now we have a regression in the content extraction. HTML tags will go MIA.

facelessuser · 2015-09-05T04:39:27Z

I should add that while your suggestion would actually fix my example above because:

if right_listindex == i:
    right_listindex = i + 1

would make the adjustment to normalize the end, it would not fix the following example where i is not the last index.

>>> [1, 2, 3][1:3]
[2, 3]

>>> [1, 2, 3][1:2]
[2]

Hope that is a little more clear.

facelessuser · 2015-09-05T17:45:08Z

@ryneeverett I revisited the code this morning. To be honest, the whole conditional decrementing logic is odd. It appears the right_listindex is always calculated bigger than needed, but there is a quirk where it becomes too small. The logic below is actually a cleaner way to look at the whole index adjustments and works just as well.

                 else:  # raw html
-                    if len(items) - right_listindex <= 1:  # last element
-                        right_listindex -= 1
-                    if right_listindex <= i:
+                    if right_listindex < (i + 1):
                         right_listindex = i + 1
+                    elif right_listindex > (i + 1):
+                        right_listindex -= 1
                     placeholder = self.markdown.htmlStash.store('\n\n'.join(
                         items[i:right_listindex]))
                     del items[i:right_listindex]

So you see, it appears that right_listindex is always bigger than needed, but right_listindex should never be less that the start i + 1 or you will not acquire the content during the slicing of the array. It's just a quirk of slicing vs indexing and how right_listindex is calculated. I think the above code might be a bit more clear.

What I suspect is that _stringindex_to_listindex isn't quite calculating indexes correct and we are basically correcting for the quirks of that method. I imagine everything under this is also bad, but it appears it never gets exercised either which is why we haven't had an issue with it:

                if 'markdown' in attrs.keys():
                    items[i] = items[i][left_index:]  # remove opening tag
                    placeholder = self.markdown.htmlStash.store_tag(
                        left_tag, attrs, i + 1, right_listindex + 1)
                    items.insert(i, placeholder)
                    if len(items) - right_listindex <= 1:  # last nest, no tail
                        right_listindex -= 1
                    items[right_listindex] = items[right_listindex][
                        :-len(right_tag) - 2]  # remove closing tag

Maybe the real work needs to be done here:

    def _stringindex_to_listindex(self, stringindex, items):
        """
        Same effect as concatenating the strings in items,
        finding the character to which stringindex refers in that string,
        and returning the index of the item in which that character resides.
        """
        items.append('dummy')
        i, count = 0, 0
        while count <= stringindex:
            count += len(items[i])
            i += 1
        return i - 1

waylan · 2015-09-05T17:58:15Z

I imagine everything under this is also bad, but it appears it never gets exercised either which is why we haven't had an issue with it

The coverage report definitely shows that some parts of this code never get run in the tests. It would be nice to clean that up. If a few of the code branches will never get executed perhaps they could be removed. Or if there are edge cases which they need to be there for, then we should add tests to account for them so we don't break things again later.

In any event, the provided PR fixes the known issue, so I'm going to accept it. Thank you @facelessuser for the work and @ryneeverett for reviewing it. If either of you want to clean up this code some more, I'd be happy to accept that as well.

Fix infinite loop #430

ryneeverett · 2015-09-05T21:38:12Z

It appears the right_listindex is always calculated bigger than needed, but there is a quirk where it becomes too small.

I think it was a lot more clear what right_listindex was supposed to represent before you started improving this section of the code. (Don't get me wrong, I appreciate it.) Before your first modification we were always ending the slice with right_listindex + 1 because a slice does not include the last element and right_listindex was supposed to be the final element. This fact was kind of lost when we started trying to accommodate more edge cases.

But yes, "a quirk where it becomes too small" is at the bottom of these infinite regressions.

it would not fix the following example where i is not the last index.

[1, 2, 3][1:3]
[2, 3]

[1, 2, 3][1:2]
[2]

Hope that is a little more clear.

I was able to follow your examples and reasoning, but I believe this behavior would be correct in those circumstances. Can you give a concrete markdown example of when this would be undesirable? There is no such example in the test suite.

The coverage report definitely shows that some parts of this code never get run in the tests. It would be nice to clean that up.

Agreed. I'll try to have a look at that some time.

facelessuser · 2015-09-05T22:08:50Z

I was able to follow your examples and reasoning, but I believe this behavior would be correct in those circumstances. Can you give a concrete markdown example of when this would be undesirable? There is no such example in the test suite.

I believe my current pull request will hang with your changes. I think I even tried it to confirm. We we would still have an infinite loop, as it didn't acquire the full extent of the tag.

Basically as I mentioned earlier we need this current logic as shown below or an equivalent (like in this pull request) that achieves the same end result.

                    if right_listindex < (i + 1):
                         right_listindex = i + 1
                    elif right_listindex > (i + 1):
                        right_listindex -= 1

What you suggested is not the same, and in some circumstances will still cause infinite loops as it will insert content over and over before the end without fully removing the end. That is my experience when looking into this issue.

ryneeverett · 2015-09-05T22:16:46Z

I believe my current pull request will hang with your changes. I think I even tried it to confirm. We we would still have an infinite loop, as it didn't acquire the full extent of the tag.

The tests run fine for me.

facelessuser · 2015-09-05T22:25:40Z

Okay, it didn't hang but the results were very different for me:

--- /Users/facelessuser/Desktop/Python-Markdown/tests/extensions/extra/raw-html.html
+++ actual_output.html
@@ -10,23 +10,29 @@
 <div name="Example">
 <p>The text of the <code>Example</code> element.</p>
 <div name="DefaultBlockMode">
-<p>This text gets wrapped in <code>p</code> tags.</p>
+<p>This text gets wrapped in <code>p</code> tags.
+</div></p>
+<p>The tail of the <code>DefaultBlockMode</code> subel</p>
 </div>
-<p>The tail of the <code>DefaultBlockMode</code> subelement.</p>
 <p name="DefaultSpanMode">
-This text <em>is not</em> wrapped in additional <code>p</code> tags.</p>
-<p>The tail of the <code>DefaultSpanMode</code> subelement.</p>
+This text <em>is not</em> wrapped in additional <code>p</code> tags.
+</p>
+The tail of the <code>DefaultSpanMode</code> subelem</p>
 <div name="SpanModeOverride">
 This <code>div</code> block is not wrapped in paragraph tags.
-Note: Subelements are not required to have tail text.</div>
+Note: Subelements are not required to have tail text.
+</div>
+hzzhzkh:7</div>
 <p name="BlockModeOverride">
-<p>This <code>p</code> block <em>is</em> foolishly wrapped in further paragraph tags.</p>
-</p>
+<p>This <code>p</code> block <em>is</em> foolishly wrapped in further paragraph tags</p>
 <p>The tail of the <code>BlockModeOverride</code> subelement.</p>
 <div name="RawHtml">
 Raw html blocks may also be nested.
 </div>

+
+
+</p>
 </div>
 <p>This text is after the markdown in html.</p>
 <div name="issue308">
@@ -39,7 +45,8 @@
 Raw html blocks may also be nested.
 </div>

-<p>Markdown is <em>still</em> active here.</p>
+Markdown is *still* active here.
+
 </div>
 <p>Markdown is <em>active again</em> here.</p>
 <div>

Are you doing something different than just replacing these lines?

                 else:  # raw html
-                    if len(items) - right_listindex <= 1:  # last element
+                    if len(items) == right_listindex:  # last element
                         right_listindex -= 1
-                    if right_listindex <= i:
+                    if right_listindex == i:
                         right_listindex = i + 1

Maybe I'm missing something, but this looks very wrong...

ryneeverett · 2015-09-05T22:39:28Z

Strange, that's exactly what I'm doing and I get no errors.

facelessuser · 2015-09-05T22:52:19Z

Never mind, you're right, I had one other experimental change that I hadn't removed. Your changes do pass.

Interesting. Maybe you're right and my understanding is off. I think we need to stress this portion of the code more to be sure.

facelessuser · 2015-09-06T03:12:57Z

Interestingly, all you need to pass all tests is:

                else:  # raw html
                    right_listindex = i + 1
                    placeholder = self.markdown.htmlStash.store('\n\n'.join(
                        items[i:right_listindex]))
                    del items[i:right_listindex]
                    items.insert(i, placeholder)

What this says is one of two things, 1: all this extra complexity isn't needed, or 2: we are not testing things nearly well enough. Food for thought...I'm not sure which it is.

waylan added a commit that referenced this pull request Sep 5, 2015

Merge pull request #432 from facelessuser/master

f3a68fa

Fix infinite loop #430

waylan merged commit f3a68fa into Python-Markdown:master Sep 5, 2015

waylan mentioned this pull request Sep 5, 2015

Infinite loop when beginning a paragraph with a HTML tag within a markdown="1" tag #430

Closed

Fix infinite loop #430 #432

Fix infinite loop #430 #432

Uh oh!

Conversation

facelessuser commented Sep 5, 2015

Uh oh!

ryneeverett commented Sep 5, 2015

Uh oh!

ryneeverett commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

ryneeverett commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

ryneeverett commented Sep 5, 2015

Uh oh!

ryneeverett commented Sep 5, 2015

Uh oh!

ryneeverett commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

waylan commented Sep 5, 2015

Uh oh!

ryneeverett commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

ryneeverett commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

ryneeverett commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 5, 2015

Uh oh!

facelessuser commented Sep 6, 2015

Uh oh!

Uh oh!