Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow when parsing a deeply nested toml_edit::Document #206

Closed
5225225 opened this issue Sep 13, 2021 · 8 comments · Fixed by #404
Closed

Stack overflow when parsing a deeply nested toml_edit::Document #206

5225225 opened this issue Sep 13, 2021 · 8 comments · Fixed by #404
Labels
A-parse Area: Parsing TOML C-bug Category: Things not working as expected

Comments

@5225225
Copy link

5225225 commented Sep 13, 2021

Basically the same error as seen as toml-rs/toml-rs#428

Reproduction code:

fn main() {
    let brackets = "[".repeat(20000);
    let input_string = format!("x={}", &brackets);
    input_string.parse::<toml_edit::Document>().unwrap();
}
@epage epage added the C-bug Category: Things not working as expected label Sep 13, 2021
@epage
Copy link
Member

epage commented Sep 13, 2021

I assume this is also true for { (inline tables).

Likewise, it'd take longer but I assume we'd also hit OOM for many dotted keys, many tables, and many key-values.

For dotted keys, I'm actually unsure whether we would hit OOM (too many keys for memory) or stack overflow (recursively walking data structures) first.

@ordian
Copy link
Member

ordian commented Sep 14, 2021

The issue is about stackoverflow, not OOM.

A solution to this problem is to use an explicit stack (using a vec) instead of recursion.

@epage
Copy link
Member

epage commented Sep 14, 2021

I brought up OOM because

  • Stackoverflow is a special case of OOM
  • Because we generally don't handle OOM errors in Rust, they lead to the same behavior (crash)
  • Changing recursion to explicit stack just converts the problem from stackoverflow to OOM, it will just take longer to crash without the stack frame boilerplate

The main reason I can think of to handle stackoverflows specially is to provide a good error on accidental infinite recursion so people can get out of that case, which is not applicable here.

We can easily change descend-path from recursion but parsing of inline tables and arrays will be more difficult.

@ordian
Copy link
Member

ordian commented Sep 14, 2021

Well, by OOM, people usually mean Out Of (heap) Memory.
And one of the differences between the stack and the heap is that the default stack size is fixed and relatively small (8MiB), whereas the heap size is usually limited by the amount of RAM a computer has (yeah, it's more complicated than that).

So I would say, ensuring no stackoverflow is almost always lib's responsibility, as opposed to ensure no OO(heap)M.

epage added a commit to epage/toml_edit that referenced this issue Sep 24, 2021
This is a part of toml-rs#206.  This wasn't even my goal but to work out what
was going on with lifetimes which led to the loosening of lifetimes on
`*Table::entry_format`.

Hopefully this will also help with performance.
epage added a commit to epage/toml_edit that referenced this issue Sep 24, 2021
This is a part of toml-rs#206.  This wasn't even my goal but to work out what
was going on with lifetimes which led to the loosening of lifetimes on
`*Table::entry_format`.

Hopefully this will also help with performance.
epage added a commit that referenced this issue Sep 24, 2021
This is a part of #206.  This wasn't even my goal but to work out what
was going on with lifetimes which led to the loosening of lifetimes on
`*Table::entry_format`.

Hopefully this will also help with performance.
@ordian
Copy link
Member

ordian commented Jul 31, 2022

We could use something like https://github.com/rust-lang/stacker for this.

@manunio
Copy link
Contributor

manunio commented Sep 19, 2022

Hi @ordian @epage While working on issue:#335 and running fuzzer locally for sometime, i found a similar stackoverflow crash. Following is the reproduction code.

#[test]
fn test_stackoverflow() {
    let testcase = "8x=[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
{8x=[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[{8x=[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
{7x=[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[{8x=[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[{8x=[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[{8x=[[[[[[[[[[[[[[[[[[[
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[{8x=[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
[{0x=[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[8x0={22={2[";
    testcase.parse::<toml_edit::Document>().unwrap();    
}
running 1 test

thread 'test_stackoverflow' has overflowed its stack
fatal runtime error: stack overflow
error: test failed, to rerun pass `--test test_stackoverflow`

Caused by:
  process didn't exit successfully: `/home/maxx/dev/security/oss-fuzz-projects/toml_edit/target/debug/deps/test_stackoverflow-e48339b99633f4c2` (signal: 6, SIGABRT: process abort signal)

Just thought of sharing this.

@epage
Copy link
Member

epage commented Sep 19, 2022

Yes, anything that will cause too much recursion will cause an overflow.

Speaking of ways of solving this, I've been thinking of switching this to nom once some ergoomic issues are resolved. I suspect I could more easily write a heap-allocated recursive parser with it. We'll see.

@5225225
Copy link
Author

5225225 commented Sep 19, 2022

That being said, it might also be worth adding recursion limits by default, since if you have a deeply nested object, you might run into issues working with it (drop being recursive and blowing the stack without taking care, for example).

I would be surprised if any object nested more than, say, 100 deep is actually legitimate. And if someone really does have objects nested that deep, they can disable the limits and be careful themselves.

@epage epage added the A-parse Area: Parsing TOML label Sep 23, 2022
epage added a commit to epage/toml_edit that referenced this issue Dec 27, 2022
Without the macros of the old parser, it was much easier to add new
state for this.  I chose the limit of 128 as that is what serde_json
does.

Technically, this may break someone but the likelihood is extremely low,
especially with how toml isn't exactly designed for arbitrary depth and
with how recently this release was out.

Fixes toml-rs#206
epage added a commit to epage/toml_edit that referenced this issue Dec 27, 2022
Without the macros of the old parser, it was much easier to add new
state for this.  I chose the limit of 128 as that is what serde_json
does.

I didn't both exposing more configuration for this than the `unbounded`
feature.  We can add more later if needed but I doubt that.
Technically, this may break someone but the likelihood is extremely low,
especially with how toml isn't exactly designed for arbitrary depth and
with how recently this release was out.

Fixes toml-rs#206
epage added a commit to epage/toml_edit that referenced this issue Dec 27, 2022
Without the macros of the old parser, it was much easier to add new
state for this.  I chose the limit of 128 as that is what serde_json
does.

I didn't both exposing more configuration for this than the `unbounded`
feature.  We can add more later if needed but I doubt that.
Technically, this may break someone but the likelihood is extremely low,
especially with how toml isn't exactly designed for arbitrary depth and
with how recently this release was out.

Fixes toml-rs#206
epage added a commit to epage/toml_edit that referenced this issue Dec 27, 2022
Without the macros of the old parser, it was much easier to add new
state for this.  I chose the limit of 128 as that is what serde_json
does.

I didn't both exposing more configuration for this than the `unbounded`
feature.  We can add more later if needed but I doubt that.
Technically, this may break someone but the likelihood is extremely low,
especially with how toml isn't exactly designed for arbitrary depth and
with how recently this release was out.

Fixes toml-rs#206
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parse Area: Parsing TOML C-bug Category: Things not working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants