Max nesting limit #243

mareeo · 2016-04-20T05:15:37Z

Some Markdown parsers include nesting limits which causes elements past a certain depth to be omitted. This is useful allowing user input and wanting to put certain restrictions on it. Thoughts on adding this as an configuration parameter?

colinodell · 2016-05-06T20:35:28Z

This could either be implemented in core or as an external Document Processor which users can include as needed.

Would you mind sharing why this particular feature would be useful for you? In other words, what's the major use case it solves?

mareeo · 2016-05-06T22:37:19Z

Mainly just useful to prevent abuse via user input. For example, Reddit comments allow up to 10000 characters. If nearly all 10,000 of these characters were '>', then you would end up with incredibly deeply nested quotes and a large AST that could take several seconds to render to HTML, assuming it doesn't run out of memory first.

I don't think a custom Document Processor would prevent this abuse, since the issue is related to parsing and creation of a massive AST. Snudown considers this limit while parsing.

cebe · 2016-05-07T17:58:08Z

@colinodell here is a little demo about the problem:

Here is the gist: https://gist.github.com/cebe/615bef8086c9a11d81f568284c18fec5
Run it:

git clone https://gist.github.com/cebe/615bef8086c9a11d81f568284c18fec5 demo
cd demo
composer install
php demo.php

This should result in a segmentation fault of PHP.

cebe · 2016-05-07T17:59:25Z

Here is how I implemented this in my markdown parser: https://github.com/cebe/markdown/blob/e2a490ceec590bf5bfd1b43bd424fb9dceceb7c5/Parser.php#L157-L160

colinodell · 2016-05-07T19:00:40Z

Ah okay, that makes sense. Thanks for helping to clarify that.

The parsing and AST traversing is done without using any type of recursion to avoid segfaults. The rendering process doesn't use this same approach though. We could potentially change that to avoid segfaults, but we'd still the memory limit that @mareeo mentioned.

I think adding this to the core is a great idea, and I like the approach you're using @cebe - if the limit is reached, simply encode the remaining Markdown as text.

colinodell · 2016-05-07T22:47:28Z

I've got the nesting limits working for blocks - those were easy.

Limiting the inlines (while adhering to the CommonMark spec) is proving to be quite difficult though. We essentially can't reliably know how deep the inlines will be until after they're processed (because of how delimiter runs need to be processed according to the spec), and by then it's too late.

I found one possible approach that almost works, but the output isn't even close to what you'd expect, doesn't fit with the spec, and it severely reduces performance. IMO a feature to ensure performance in 0.01% of cases shouldn't negatively impact performance for the other 99.99%. And even if it did, it would violate our top priority of maintaining spec compliance.

I think we ultimately have four options:

Do nothing.
Implement nesting limits for block parsing only (segfaults and memory exhaustion less likely for blocks)
Implement nesting limits for the AST after parsing but before rendering (segaults and memory exhaustion less likely for both)
Modify the rendering process to not be recursive (segfaults much less likely for both; no impact to memory exhaustion)

Option 4 would be nice, but I simply don't have the time to implement a major refactoring of that size right now.

Option 3 seems like the next-best choice IMO.

What are your thoughts?

mareeo · 2016-05-15T05:32:32Z

I don't think option 3 solves this problem, since if the issues are going to happen, they're likely to happen during parsing. Options 2 is what I've done in the meantime, but I don't think it's very good (use this configuration options to prevent issues...sometimes!).

From what you've said and the testing I've done, it just doesn't seem like limiting inline element depth while parsing is going to be possible while adhering to the CommonMark spec. Personally, I like the idea that if you set a max nesting limit and it is reached, then there's no guarantee the element containing the deep nesting will be according to spec. Garbage in, garbage out.

But this may conflict with the '100% compliance' goal of this project. A simple warning that if a nesting limit is used and reached that 100% compliance is no longer guaranteed would be fine for me. This lets the user decide between safety and compliance. Other Markdown parsers just automatically make this choice for you.

Implement new max_nesting_level setting (#243)

colinodell added the enhancement New functionality or behavior label May 6, 2016

colinodell added feedback wanted We need your input! performance Something could be made faster or more efficient labels May 7, 2016

nitely mentioned this issue Jul 24, 2016

RecursionError on nested quotes lepture/mistune#114

Closed

colinodell added a commit that referenced this issue Dec 30, 2017

Implement new max_nesting_level setting (#243)

57107ed

colinodell added a commit that referenced this issue Dec 30, 2017

Implement new max_nesting_level setting (#243)

ef3c3f7

colinodell mentioned this issue Dec 30, 2017

Implement new max_nesting_level setting (#243) #307

Merged

colinodell added a commit that referenced this issue Dec 30, 2017

Implement new max_nesting_level setting (#243)

4ca1c44

colinodell added a commit that referenced this issue Dec 30, 2017

Merge pull request #307 from thephpleague/max-nesting-level

d39b2d2

Implement new max_nesting_level setting (#243)

colinodell closed this as completed Sep 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max nesting limit #243

Max nesting limit #243

mareeo commented Apr 20, 2016

colinodell commented May 6, 2016

mareeo commented May 6, 2016

cebe commented May 7, 2016

cebe commented May 7, 2016

colinodell commented May 7, 2016

colinodell commented May 7, 2016

mareeo commented May 15, 2016

Max nesting limit #243

Max nesting limit #243

Comments

mareeo commented Apr 20, 2016

colinodell commented May 6, 2016

mareeo commented May 6, 2016

cebe commented May 7, 2016

cebe commented May 7, 2016

colinodell commented May 7, 2016

colinodell commented May 7, 2016

mareeo commented May 15, 2016