New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max nesting limit #243

Closed
mareeo opened this Issue Apr 20, 2016 · 7 comments

Comments

Projects
None yet
3 participants
@mareeo

mareeo commented Apr 20, 2016

Some Markdown parsers include nesting limits which causes elements past a certain depth to be omitted. This is useful allowing user input and wanting to put certain restrictions on it. Thoughts on adding this as an configuration parameter?

@colinodell

This comment has been minimized.

Show comment
Hide comment
@colinodell

colinodell May 6, 2016

Member

This could either be implemented in core or as an external Document Processor which users can include as needed.

Would you mind sharing why this particular feature would be useful for you? In other words, what's the major use case it solves?

Member

colinodell commented May 6, 2016

This could either be implemented in core or as an external Document Processor which users can include as needed.

Would you mind sharing why this particular feature would be useful for you? In other words, what's the major use case it solves?

@mareeo

This comment has been minimized.

Show comment
Hide comment
@mareeo

mareeo May 6, 2016

Mainly just useful to prevent abuse via user input. For example, Reddit comments allow up to 10000 characters. If nearly all 10,000 of these characters were '>', then you would end up with incredibly deeply nested quotes and a large AST that could take several seconds to render to HTML, assuming it doesn't run out of memory first.

I don't think a custom Document Processor would prevent this abuse, since the issue is related to parsing and creation of a massive AST. Snudown considers this limit while parsing.

mareeo commented May 6, 2016

Mainly just useful to prevent abuse via user input. For example, Reddit comments allow up to 10000 characters. If nearly all 10,000 of these characters were '>', then you would end up with incredibly deeply nested quotes and a large AST that could take several seconds to render to HTML, assuming it doesn't run out of memory first.

I don't think a custom Document Processor would prevent this abuse, since the issue is related to parsing and creation of a massive AST. Snudown considers this limit while parsing.

@cebe

This comment has been minimized.

Show comment
Hide comment
@cebe

cebe May 7, 2016

@colinodell here is a little demo about the problem:

Here is the gist: https://gist.github.com/cebe/615bef8086c9a11d81f568284c18fec5
Run it:

git clone https://gist.github.com/cebe/615bef8086c9a11d81f568284c18fec5 demo
cd demo
composer install
php demo.php

This should result in a segmentation fault of PHP.

cebe commented May 7, 2016

@colinodell here is a little demo about the problem:

Here is the gist: https://gist.github.com/cebe/615bef8086c9a11d81f568284c18fec5
Run it:

git clone https://gist.github.com/cebe/615bef8086c9a11d81f568284c18fec5 demo
cd demo
composer install
php demo.php

This should result in a segmentation fault of PHP.

@cebe

This comment has been minimized.

Show comment
Hide comment
@cebe

cebe commented May 7, 2016

@colinodell

This comment has been minimized.

Show comment
Hide comment
@colinodell

colinodell May 7, 2016

Member

Ah okay, that makes sense. Thanks for helping to clarify that.

The parsing and AST traversing is done without using any type of recursion to avoid segfaults. The rendering process doesn't use this same approach though. We could potentially change that to avoid segfaults, but we'd still the memory limit that @mareeo mentioned.

I think adding this to the core is a great idea, and I like the approach you're using @cebe - if the limit is reached, simply encode the remaining Markdown as text.

Member

colinodell commented May 7, 2016

Ah okay, that makes sense. Thanks for helping to clarify that.

The parsing and AST traversing is done without using any type of recursion to avoid segfaults. The rendering process doesn't use this same approach though. We could potentially change that to avoid segfaults, but we'd still the memory limit that @mareeo mentioned.

I think adding this to the core is a great idea, and I like the approach you're using @cebe - if the limit is reached, simply encode the remaining Markdown as text.

@colinodell

This comment has been minimized.

Show comment
Hide comment
@colinodell

colinodell May 7, 2016

Member

I've got the nesting limits working for blocks - those were easy.

Limiting the inlines (while adhering to the CommonMark spec) is proving to be quite difficult though. We essentially can't reliably know how deep the inlines will be until after they're processed (because of how delimiter runs need to be processed according to the spec), and by then it's too late.

I found one possible approach that almost works, but the output isn't even close to what you'd expect, doesn't fit with the spec, and it severely reduces performance. IMO a feature to ensure performance in 0.01% of cases shouldn't negatively impact performance for the other 99.99%. And even if it did, it would violate our top priority of maintaining spec compliance.

I think we ultimately have four options:

  1. Do nothing.
  2. Implement nesting limits for block parsing only (segfaults and memory exhaustion less likely for blocks)
  3. Implement nesting limits for the AST after parsing but before rendering (segaults and memory exhaustion less likely for both)
  4. Modify the rendering process to not be recursive (segfaults much less likely for both; no impact to memory exhaustion)

Option 4 would be nice, but I simply don't have the time to implement a major refactoring of that size right now.

Option 3 seems like the next-best choice IMO.

What are your thoughts?

Member

colinodell commented May 7, 2016

I've got the nesting limits working for blocks - those were easy.

Limiting the inlines (while adhering to the CommonMark spec) is proving to be quite difficult though. We essentially can't reliably know how deep the inlines will be until after they're processed (because of how delimiter runs need to be processed according to the spec), and by then it's too late.

I found one possible approach that almost works, but the output isn't even close to what you'd expect, doesn't fit with the spec, and it severely reduces performance. IMO a feature to ensure performance in 0.01% of cases shouldn't negatively impact performance for the other 99.99%. And even if it did, it would violate our top priority of maintaining spec compliance.

I think we ultimately have four options:

  1. Do nothing.
  2. Implement nesting limits for block parsing only (segfaults and memory exhaustion less likely for blocks)
  3. Implement nesting limits for the AST after parsing but before rendering (segaults and memory exhaustion less likely for both)
  4. Modify the rendering process to not be recursive (segfaults much less likely for both; no impact to memory exhaustion)

Option 4 would be nice, but I simply don't have the time to implement a major refactoring of that size right now.

Option 3 seems like the next-best choice IMO.

What are your thoughts?

@mareeo

This comment has been minimized.

Show comment
Hide comment
@mareeo

mareeo May 15, 2016

I don't think option 3 solves this problem, since if the issues are going to happen, they're likely to happen during parsing. Options 2 is what I've done in the meantime, but I don't think it's very good (use this configuration options to prevent issues...sometimes!).

From what you've said and the testing I've done, it just doesn't seem like limiting inline element depth while parsing is going to be possible while adhering to the CommonMark spec. Personally, I like the idea that if you set a max nesting limit and it is reached, then there's no guarantee the element containing the deep nesting will be according to spec. Garbage in, garbage out.

But this may conflict with the '100% compliance' goal of this project. A simple warning that if a nesting limit is used and reached that 100% compliance is no longer guaranteed would be fine for me. This lets the user decide between safety and compliance. Other Markdown parsers just automatically make this choice for you.

mareeo commented May 15, 2016

I don't think option 3 solves this problem, since if the issues are going to happen, they're likely to happen during parsing. Options 2 is what I've done in the meantime, but I don't think it's very good (use this configuration options to prevent issues...sometimes!).

From what you've said and the testing I've done, it just doesn't seem like limiting inline element depth while parsing is going to be possible while adhering to the CommonMark spec. Personally, I like the idea that if you set a max nesting limit and it is reached, then there's no guarantee the element containing the deep nesting will be according to spec. Garbage in, garbage out.

But this may conflict with the '100% compliance' goal of this project. A simple warning that if a nesting limit is used and reached that 100% compliance is no longer guaranteed would be fine for me. This lets the user decide between safety and compliance. Other Markdown parsers just automatically make this choice for you.

colinodell added a commit that referenced this issue Dec 30, 2017

colinodell added a commit that referenced this issue Dec 30, 2017

colinodell added a commit that referenced this issue Dec 30, 2017

colinodell added a commit that referenced this issue Dec 30, 2017

Merge pull request #307 from thephpleague/max-nesting-level
Implement new max_nesting_level setting (#243)

@colinodell colinodell closed this Sep 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment