-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON: the parser overflows stack on deeply nested files (modified by @masatake) #1682
Comments
Interesting. I will take a look when I can find time. |
This is fairly tricky at fixing completely, because there's a lot of recursion involved in this parser (as it's usually the simplest way of implementing recursive parsing).
#!/usr/bin/env python3
import sys
limit = 750000
sys.stdout.write('{"a":')
for i in range(0, limit):
sys.stdout.write('[')
sys.stdout.write('\n"foo"\n')
for i in range(0, limit):
sys.stdout.write(']')
sys.stdout.write('}\n')
I agree that it would be nice -- if not very important -- to fix this kind of problems, but I'm afraid it's not so trivial. Basically, you'd want no arbitrary recursion at all, and an overall resources limit enforced in a way that will only stop the parsers and not ctags's core. I don't know how easy it'd be to do that. Maybe most OSes have soft resource limits that could be applied only to part of the program or such, but I'm not very knowledgeable about that. |
For the part highlighted in the Geany report (handling of large malformed input), this could be a fix. It's handling invalid input differently, which probably doesn't matter as it's invalid anyway and there's no way to know what is the "best" way to handle it. diff --git a/parsers/json.c b/parsers/json.c
index 5098daa1..7646dd8a 100644
--- a/parsers/json.c
+++ b/parsers/json.c
@@ -242,21 +242,28 @@ static void skipToOneOf3 (tokenInfo *const token,
const tokenType type2,
const tokenType type3)
{
+ unsigned int depth = 0;
+
while (token->type != TOKEN_EOF &&
- token->type != type1 &&
- token->type != type2 &&
- token->type != type3)
+ (depth > 0 ||
+ (token->type != type1 &&
+ token->type != type2 &&
+ token->type != type3)))
{
readToken (token);
- if (token->type == TOKEN_OPEN_CURLY)
+ switch (token->type)
{
- skipTo (token, TOKEN_CLOSE_CURLY);
- readToken (token);
- }
- else if (token->type == TOKEN_OPEN_SQUARE)
- {
- skipTo (token, TOKEN_CLOSE_SQUARE);
- readToken (token);
+ case TOKEN_OPEN_CURLY:
+ case TOKEN_OPEN_SQUARE:
+ depth++;
+ break;
+ case TOKEN_CLOSE_CURLY:
+ case TOKEN_CLOSE_SQUARE:
+ if (depth > 0)
+ depth--;
+ break;
+ default:
+ break;
}
}
} However, it doesn't fix the issue at hand, which happens when parsing valid enormous input. |
Just brainstorming, but one thing that comes to mind is to use GCC's instrumentation hooks as a way to keep track of recursion depth in a It would be really hacky, but would require minimal changes to any parsing code and could be easily guarded out for unsupported compilers or for highly optimized builds. Just a thought, not sure how practical it really is. |
I just noticed the |
I did a quick test, the |
Close universal-ctags#2107. Related to universal-ctags#1682. To avoid stack overflow, this change provides the way to terminate the parsing. The recursion is very related to square brackets and curly brackets. Instead of limit the function recursion, this change tracks the depth of brackets. Here I assume deep brackets in input stream causes deep function call recursion. readTokenFull is changed to return EOF when it detects too deep (> 512) brackets in the current input stream. The EOF terminates the parsing. Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Could you look at #2111? |
…airs Close universal-ctags#2107. Related to universal-ctags#1682. To avoid stack overflow, this change provides the way to terminate the parsing if recursion CAN be too deep. The recursion is very related to square brackets and curly brackets in input stream. Instead of limit the depth of function recursion itself, this change tracks the depth of brackets. Here I assume deeply nested brackets increases the recursion depth of function calls. readTokenFull is changed to return EOF token when it detects too deeply nested (>= 512) brackets in the current input stream. The EOF token may terminate the parsing GENTLY. Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…airs Close universal-ctags#2107 reported by @hanwen. Related to universal-ctags#1682. To avoid stack overflow, this change provides the way to terminate the parsing if recursion CAN be too deep. The recursion is very related to square brackets and curly brackets in input stream. Instead of limit the depth of function recursion itself, this change tracks the depth of brackets. Here I assume deeply nested brackets increases the recursion depth of function calls. readTokenFull is changed to return EOF token when it detects too deeply nested (>= 512) brackets in the current input stream. The EOF token may terminate the parsing GENTLY. Signed-off-by: Masatake YAMATO <yamato@redhat.com>
About only JSON, the issue itself is fixed via #2111. |
Not sure if this is considered a bug, but the JSON parser uses recursion while parsing (as I'm sure other parsers do), so if a large enough, deeply-nested enough file is processed, the parser runs out of stack and crashes.
It would be great if it used iteration instead of recursion, or maybe limited the recursion depth so it doesn't crash at least. This came up as a Geany bug.
Backtrace:
I generated a test file to reproduce the crash by piping the output of the following script to a file and opening it in Geany:
The text was updated successfully, but these errors were encountered: