Skip to content

Conversation

facelessuser
Copy link
Collaborator

This should fix the remaining corner cases that can cause infinite
loops. Previous iterations did not account for scenarios where the
“end” index was less than the “start” index. If the “end” index is
ever less than or equal to the “start” index, the “end” will be
adjusted to to be “start” + 1 allow the full range to be extracted and
replaced.

This should fix the remaining corner cases that can cause infinite
loops.  Previous iterations did not account for scenarios where the
“end” index was less than the “start” index.  If the “end” index is
ever less than or equal to the “start” index, the “end” will be
adjusted to to be “start” + 1 allow the full range to be extracted and
replaced.
@ryneeverett
Copy link
Contributor

Nice.

@ryneeverett
Copy link
Contributor

Another option would be:

                 else:  # raw html
-                    if len(items) - right_listindex <= 1:  # last element
+                    if len(items) - right_listindex <= 1 and right_listindex > i:  # last element
                         right_listindex -= 1

@facelessuser
Copy link
Collaborator Author

That doesn't look like a similar solution.

@facelessuser
Copy link
Collaborator Author

Let me rephrase; that doesn't look equivalent. I would have to break down all the cases to see if they were equivalent for sure.

@ryneeverett
Copy link
Contributor

I believe the only way the right_listindex can be less than i is if it is decremented here. Regardless, I suppose I like your solution better since it essentially reverts 5d91369. 😛

@facelessuser
Copy link
Collaborator Author

It's not really a revert at all. If anything, it is doubling down on the same concept. It is actually just increasing the range that 5d91369 was normalizing without the extra variable.

@facelessuser
Copy link
Collaborator Author

@ryneeverett You are probably right about the decrement being done improperly. I would need to verify that the issue is because of the decrement, and test out your suggestion to verify that it solves all past problems as well and doesn't cause a regression. If you have evidence that supports it being equivalent, I wouldn't mind moving to it. I just don't have much time right now to look into this further than what I already have.

@ryneeverett
Copy link
Contributor

It's not really a revert at all.

Oh, I see.

I'm starting to think it just doesn't make much sense any more to assign right_listindex prior to the if 'markdown' in attrs.keys():/else blocks since it needs to be so different.

@ryneeverett
Copy link
Contributor

@facelessuser I believe it's equivalent but not better because it requires the confusing offset be left in. I'm +1 on this PR as is.

@ryneeverett
Copy link
Contributor

Ok, here's a suggested diff on this PR:

                 else:  # raw html
-                    if len(items) - right_listindex <= 1:  # last element
+                    if len(items) == right_listindex:  # last element
                         right_listindex -= 1
-                    if right_listindex <= i:
+                    if right_listindex == i:
                         right_listindex = i + 1

This avoids the unnecessary decrement, not to mention making the intent a lot clearer.

@facelessuser
Copy link
Collaborator Author

Unfortunately, that isn't going to work like you think. The following code:

if len(items) == right_listindex:  # last element
    right_listindex -= 1

This will augment an index range of (2, 3) for an array of length 3 and make it (2,2):

>>> [1, 2, 3][2:3]
[3]

>>> [1, 2, 3][2:2]
[]

So now we have a regression in the content extraction. HTML tags will go MIA.

@facelessuser
Copy link
Collaborator Author

I should add that while your suggestion would actually fix my example above because:

if right_listindex == i:
    right_listindex = i + 1

would make the adjustment to normalize the end, it would not fix the following example where i is not the last index.

>>> [1, 2, 3][1:3]
[2, 3]

>>> [1, 2, 3][1:2]
[2]

Hope that is a little more clear.

@facelessuser
Copy link
Collaborator Author

@ryneeverett I revisited the code this morning. To be honest, the whole conditional decrementing logic is odd. It appears the right_listindex is always calculated bigger than needed, but there is a quirk where it becomes too small. The logic below is actually a cleaner way to look at the whole index adjustments and works just as well.

                 else:  # raw html
-                    if len(items) - right_listindex <= 1:  # last element
-                        right_listindex -= 1
-                    if right_listindex <= i:
+                    if right_listindex < (i + 1):
                         right_listindex = i + 1
+                    elif right_listindex > (i + 1):
+                        right_listindex -= 1
                     placeholder = self.markdown.htmlStash.store('\n\n'.join(
                         items[i:right_listindex]))
                     del items[i:right_listindex]

So you see, it appears that right_listindex is always bigger than needed, but right_listindex should never be less that the start i + 1 or you will not acquire the content during the slicing of the array. It's just a quirk of slicing vs indexing and how right_listindex is calculated. I think the above code might be a bit more clear.

What I suspect is that _stringindex_to_listindex isn't quite calculating indexes correct and we are basically correcting for the quirks of that method. I imagine everything under this is also bad, but it appears it never gets exercised either which is why we haven't had an issue with it:

                if 'markdown' in attrs.keys():
                    items[i] = items[i][left_index:]  # remove opening tag
                    placeholder = self.markdown.htmlStash.store_tag(
                        left_tag, attrs, i + 1, right_listindex + 1)
                    items.insert(i, placeholder)
                    if len(items) - right_listindex <= 1:  # last nest, no tail
                        right_listindex -= 1
                    items[right_listindex] = items[right_listindex][
                        :-len(right_tag) - 2]  # remove closing tag

Maybe the real work needs to be done here:

    def _stringindex_to_listindex(self, stringindex, items):
        """
        Same effect as concatenating the strings in items,
        finding the character to which stringindex refers in that string,
        and returning the index of the item in which that character resides.
        """
        items.append('dummy')
        i, count = 0, 0
        while count <= stringindex:
            count += len(items[i])
            i += 1
        return i - 1

@waylan
Copy link
Member

waylan commented Sep 5, 2015

I imagine everything under this is also bad, but it appears it never gets exercised either which is why we haven't had an issue with it

The coverage report definitely shows that some parts of this code never get run in the tests. It would be nice to clean that up. If a few of the code branches will never get executed perhaps they could be removed. Or if there are edge cases which they need to be there for, then we should add tests to account for them so we don't break things again later.

In any event, the provided PR fixes the known issue, so I'm going to accept it. Thank you @facelessuser for the work and @ryneeverett for reviewing it. If either of you want to clean up this code some more, I'd be happy to accept that as well.

waylan added a commit that referenced this pull request Sep 5, 2015
@waylan waylan merged commit f3a68fa into Python-Markdown:master Sep 5, 2015
@ryneeverett
Copy link
Contributor

It appears the right_listindex is always calculated bigger than needed, but there is a quirk where it becomes too small.

I think it was a lot more clear what right_listindex was supposed to represent before you started improving this section of the code. (Don't get me wrong, I appreciate it.) Before your first modification we were always ending the slice with right_listindex + 1 because a slice does not include the last element and right_listindex was supposed to be the final element. This fact was kind of lost when we started trying to accommodate more edge cases.

But yes, "a quirk where it becomes too small" is at the bottom of these infinite regressions.

it would not fix the following example where i is not the last index.

[1, 2, 3][1:3]
[2, 3]

[1, 2, 3][1:2]
[2]

Hope that is a little more clear.

I was able to follow your examples and reasoning, but I believe this behavior would be correct in those circumstances. Can you give a concrete markdown example of when this would be undesirable? There is no such example in the test suite.

The coverage report definitely shows that some parts of this code never get run in the tests. It would be nice to clean that up.

Agreed. I'll try to have a look at that some time.

@facelessuser
Copy link
Collaborator Author

I was able to follow your examples and reasoning, but I believe this behavior would be correct in those circumstances. Can you give a concrete markdown example of when this would be undesirable? There is no such example in the test suite.

I believe my current pull request will hang with your changes. I think I even tried it to confirm. We we would still have an infinite loop, as it didn't acquire the full extent of the tag.

Basically as I mentioned earlier we need this current logic as shown below or an equivalent (like in this pull request) that achieves the same end result.

                    if right_listindex < (i + 1):
                         right_listindex = i + 1
                    elif right_listindex > (i + 1):
                        right_listindex -= 1

What you suggested is not the same, and in some circumstances will still cause infinite loops as it will insert content over and over before the end without fully removing the end. That is my experience when looking into this issue.

@ryneeverett
Copy link
Contributor

I believe my current pull request will hang with your changes. I think I even tried it to confirm. We we would still have an infinite loop, as it didn't acquire the full extent of the tag.

The tests run fine for me.

@facelessuser
Copy link
Collaborator Author

Okay, it didn't hang but the results were very different for me:

--- /Users/facelessuser/Desktop/Python-Markdown/tests/extensions/extra/raw-html.html
+++ actual_output.html
@@ -10,23 +10,29 @@
 <div name="Example">
 <p>The text of the <code>Example</code> element.</p>
 <div name="DefaultBlockMode">
-<p>This text gets wrapped in <code>p</code> tags.</p>
+<p>This text gets wrapped in <code>p</code> tags.
+</div></p>
+<p>The tail of the <code>DefaultBlockMode</code> subel</p>
 </div>
-<p>The tail of the <code>DefaultBlockMode</code> subelement.</p>
 <p name="DefaultSpanMode">
-This text <em>is not</em> wrapped in additional <code>p</code> tags.</p>
-<p>The tail of the <code>DefaultSpanMode</code> subelement.</p>
+This text <em>is not</em> wrapped in additional <code>p</code> tags.
+</p>
+The tail of the <code>DefaultSpanMode</code> subelem</p>
 <div name="SpanModeOverride">
 This <code>div</code> block is not wrapped in paragraph tags.
-Note: Subelements are not required to have tail text.</div>
+Note: Subelements are not required to have tail text.
+</div>
+hzzhzkh:7</div>
 <p name="BlockModeOverride">
-<p>This <code>p</code> block <em>is</em> foolishly wrapped in further paragraph tags.</p>
-</p>
+<p>This <code>p</code> block <em>is</em> foolishly wrapped in further paragraph tags</p>
 <p>The tail of the <code>BlockModeOverride</code> subelement.</p>
 <div name="RawHtml">
 Raw html blocks may also be nested.
 </div>

+
+
+</p>
 </div>
 <p>This text is after the markdown in html.</p>
 <div name="issue308">
@@ -39,7 +45,8 @@
 Raw html blocks may also be nested.
 </div>

-<p>Markdown is <em>still</em> active here.</p>
+Markdown is *still* active here.
+
 </div>
 <p>Markdown is <em>active again</em> here.</p>
 <div>

Are you doing something different than just replacing these lines?

                 else:  # raw html
-                    if len(items) - right_listindex <= 1:  # last element
+                    if len(items) == right_listindex:  # last element
                         right_listindex -= 1
-                    if right_listindex <= i:
+                    if right_listindex == i:
                         right_listindex = i + 1

Maybe I'm missing something, but this looks very wrong...

@ryneeverett
Copy link
Contributor

Strange, that's exactly what I'm doing and I get no errors.

@facelessuser
Copy link
Collaborator Author

Never mind, you're right, I had one other experimental change that I hadn't removed. Your changes do pass.

Interesting. Maybe you're right and my understanding is off. I think we need to stress this portion of the code more to be sure.

@facelessuser
Copy link
Collaborator Author

Interestingly, all you need to pass all tests is:

                else:  # raw html
                    right_listindex = i + 1
                    placeholder = self.markdown.htmlStash.store('\n\n'.join(
                        items[i:right_listindex]))
                    del items[i:right_listindex]
                    items.insert(i, placeholder)

What this says is one of two things, 1: all this extra complexity isn't needed, or 2: we are not testing things nearly well enough. Food for thought...I'm not sure which it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants